Verification of concurrent systems is difficult because of the inherent nondeterminism. Modern verification requires better locality and modularity. Reasoning of shared memory systems has gained much progress in these aspects. However, modular verification of distributed systems is still in demand. In this paper, we propose a new reasoning system for message-passing programs.
|Country:||Central African Republic|
|Published (Last):||8 October 2013|
|PDF File Size:||4.37 Mb|
|ePub File Size:||20.29 Mb|
|Price:||Free* [*Free Regsitration Required]|
Data sets are accessible to researchers through a community driven data grid, which facilitates global data access. Our analysis of a pilot collection of crystallographic data sets demonstrates that the information archived by SBDG is sufficient to reprocess data to statistics that meet or exceed the quality of the original published structures. SBDG has extended its services to the entire community and is used to develop support for other types of biomedical data sets.
It is anticipated that access to the experimental data sets will enhance the paradigm shift in the community towards a much more dynamic body of continuously improving data analysis. As one of the most powerful tools in structural biology, X-ray crystallography allows determination of the structure atomic coordinates of proteins, nucleic acids, small molecule compounds and macromolecular complexes to atomic-level resolution.
Crystallographic data continue to be a primary source of mechanistic understanding of macromolecules, the implications of which extend from basic research to translational studies and the rational design of therapeutics. To support the needs of a growing structural biology community, a global network of synchrotron beamlines 1 has been established and made available to researchers. These facilities remain the predominant source for crystallographic data collection. While the data collection process has become increasingly streamlined, deployment of a data management infrastructure to archive original diffraction images has been slow and uncertain 2.
With the exception of a modest number of data storage systems dedicated to the support of individual synchrotron beamlines 3 , or specific structural genomics projects 4 , storage of diffraction image data sets is typically the responsibility of primary investigators. Access to these original experimental data sets is therefore dependent on the policies of individual laboratories, which vary in storage organization, institutional resources, and researcher turnover. There is no universal archiving system to store X-ray diffraction data sets, and raw data sets are rarely made publicly available.
In the cases where data sets are available, their distribution format can vary significantly. The benefits of easy and public access to experimental data are numerous 5. Access to primary data would support community efforts to continuously improve existing models and identify new features through complete reprocessing of experimental data 6 , 7 , 8 with modern software tools and improved criteria 9.
Further, original data may provide a basis for validating questionable existing structures while mistakes in structure determination may be identified earlier 10 , 11 , Additionally, access to a diverse volume of raw data can be used to develop improved software to address limitations of existing programs.
Finally, access to a collection of varied experimental data will undoubtedly benefit the training and education of practitioners. The Worldwide Protein Data Bank 13 , 14 wwPDB has illustrated how these achievements can be realized with the collection of reduced experimental data, in the form of structure factor amplitudes.
Complementing this resource by preserving raw experimental data and making it available to a broad community promises a profound scientific impact in structural biology and other biomedical disciplines that face the challenges of preserving large data sets.
While the primary role of the SBGrid Consortium www. To support the outstanding needs of the global structural community, we have established a publication system for experimental diffraction data sets that supports published structural coordinates: the Structural Biology Data Grid SBDG.
The SBDG project was initiated with a collection of X-ray diffraction image data sets as well as a few additional data set types contributed by many SBGrid Consortium laboratories.
The collection supports a diverse subset of over 68 peer-reviewed publications and represents a sampling of numerous structure determination approaches.
To evaluate the utility of such a data grid, we reprocessed all published diffraction data sets in this initial collection with modern software and compared the derived statistics against those reported in the original publications.
We also demonstrate that by integrating the storage resources of multiple research groups and institutions, the data grid is poised to deliver a novel community driven data preservation system to support various types of structural biology and biomedical data sets. The SBDG is a centralized data publication service—a repository for discovering, downloading and depositing large structural biology data sets. We developed the SBDG to support the need of the SBGrid community to archive and disseminate X-ray diffraction image data sets, that is, images recorded on X-ray detectors, which support published structures.
The SBDG complements the PDB, which archives derived data—merged and post-refined data from diffraction images and the resulting refined coordinates of macromolecular structural models. For X-ray diffraction data, this primary data consists of experimental diffraction images supporting a derived structural model and journal publication. Release of this primary data by the SBDG coincides with publication of the resulting manuscript and for the structural biology data sets of related PDB files.
As of 1 September , the SBDG stores a diverse collection of data sets, including X-ray diffraction data sets and a handful of other data types including computational decoys and data sets from MicroED, lattice light-sheet microscopy and molecular dynamics Supplementary Table 1. These published data sets, contributed by 50 laboratories with diffraction data sets collected at 11 synchrotron facilities Fig.
Extrapolating from this initial collection, which is quite diverse and registers at just over 0. World map image courtesy of the U. Geological Survey. For structure factor amplitudes and PDB models file sizes were obtained from a subset of 96 PDB depositions derived from the pilot data sets. On average, SBDG stores 1. Numbers in red indicated the estimated storage requirements to accommodate data sets for , structures.
We estimate that for each primary data set, additional data sets are collected at national facilities. Primary data refers to original experimental diffraction images supporting the derived structural model, as distinguished from all experimental data screening images, inferior quality data sets, and so on.
For crystallographic experiments, reduced data refers to the integrated intensities or amplitudes, which do not materially affect storage requirements. On the home page, deposited data sets are organized into laboratory and institutional collections Fig. The website molecular viewer, PV 22 , offers visitors an option to view structures in a manipulatable cartoon representation Fig. With multiple high-quality viewing options and flexible search functionality, users of the SBDG website can easily identify a small subset of relevant data sets.
Persistent data set pages are an important element for any research data repository because they typically provide a landing URL, which resolves from a given DOI A Data set Page can also be located by searching the SBDG for a PDB code, although often several related data sets are used to determine a single set of macromolecular coordinates. As the Data Grid is developed, the Data set Pages will include additional functionality, with more information on how to reprocess data sets, extended data statistics, and discussion forums allowing users to annotate data sets after publication.
Taken together, the uniquely defined Data set Pages provide a comprehensive and persistent location for individual data sets. All data sets in the SBDG are readily and freely accessible to the community. Access rights were formalized with adoption of the creative commons zero licence CC0 , which supports dedication of research results to the public domain and is used by many open-data projects.
This licence allows use and redistribution of data for both commercial and non-commercial purposes without requiring additional agreements. The CC0 licence does not affect patents or trademark rights of contributors, and is similar to the licensing terms that are used for macromolecular models released by the wwPDB.
Although data sets can be downloaded individually, their size can make this cumbersome. The DAA is a voluntary and open organization of research-data-storage providers and is being developed in collaboration with the Globus Project. The DAA has two aims: 1 to minimize the chance of data loss by replicating SBDG data sets, and 2 to facilitate global data access through its members.
Although it is expected that DAA membership and architecture will evolve rapidly, in its current state the DAA framework already provides a global solution for data dissemination.
As a secondary service, DAA centres can provide local, direct access to data sets for their institutional research groups. For example, Harvard Medical School hosts the entire collection and provides direct access to all data from its computing center. The DAA infrastructure is further extended by the DAA satellites, which replicate fractions of SBDG data sets in their local storage for direct access by members of individual institutions. This mode of participation provides an attractive option for research institutions to develop local archives of all primary data generated by the local community.
We expect that, as research storage infrastructure catches up with the capacities required to archive larger collections of diffraction data sets, some DAA satellites will elect to replicate a larger fraction of SBDG archives and make them available to the general community.
End-users can access data sets by downloading from DAA centres and by direct access from Satellites. While the DAA offers a variety of data access options that will support growth of the repository, members of the community can also download individual data sets directly from SBGrid servers at Harvard using an rsync protocol.
The rsync utility, which is native to Linux and OS X systems, is particularly suitable for downloading large data files and can be restarted in case of interruption.
After download, the data integrity of individual data sets can be verified by following instructions on the Data Grid website. With a well defined and permissive CC0 access licence and multiple channels for accessing data four DAA sites and the rsync download mechanism our initial infrastructure is well suited to support expansion of the data collection.
For many SBGrid laboratories, interest in data deposition is driven by a desire to better organize research data and comply with institutional, federal, and project-specific data preservation requirements. During the pilot phase, data deposition privileges were limited to SBGrid member laboratories.
With recent funding to further support the project, the Data Grid is now open to the entire structural biology community. Wide adoption of data preservation systems is often hindered by the complexities involved in the data deposition process itself.
To register a data set, the depositor completes a web form with basic information about the sample, data collection facility, related objects for example, publication, PDB code , and authorship; this information is mapped to the DataCite schema Fig.
Many details necessary for data set reprocessing—beam center, distance, wavelength, and so on—are automatically included with most data sets in the form of an image header generated by the data collection software at the time of collection, simplifying the registration process.
A principal investigator is authorized to sponsor depositions as a recognized member of the community and must approve each deposit. This system allows maximum flexibility when accepting data for deposition, facilitating the upload of complex data sets that otherwise could be challenging to validate. Following registration, a DOI is reserved for the data set and the user is provided with data transfer instructions. Upon verification, the primary data are either released in the bi-weekly SBDG release or placed on hold.
As with the PDB, release of data placed on hold will coincide with publication. The two-step publication process is complemented by behind-the-scenes data replication, DOI registrations, and data analysis.
All X-ray diffraction images are currently post-processed using data processing pipelines that provide a post-publication data review that will be shared with depositors and the community in the next phase of the SBDG project. We are building additional tools to help increase data deposition rates, including automatic reminders sent to consortium members to encourage them to deposit data for previously published work.
Research data are the legitimate and citable product of research 24 , 25 and, therefore, the SBDG recommends that depositors and data users cite all data deposited with the SBDG in the standard reference section of their manuscripts following well established community standards 24 , 26 , Data citation examples are provided on individual data set pages Fig.
Both services are now presented to users in a unified publication support workflow Fig. In step 1, the user deposits research-related data that are put on hold until publication.
A set of DOIs and corresponding data citations are then generated and provided to the end-user. Users can also use AppCiter to generate a list of software citations for all scientific software used in the project. In step 2, all research data and scientific software citations are included in the References section of the manuscript. In step 3 the user, anticipating manuscript publication, contacts relevant databases to request release of the primary and supporting data.
This process should, ideally, take place before manuscript publication and be timed to coincide with the publication date, allowing the community to access the data when the manuscript is released. When preparing future publications that refer to completed structures, scientists should reference the relevant publications and macromolecular models, unless they are referring to a specific data set.
For specific data sets, authors should explicitly reference experimental data using the corresponding data citation Fig. Citation metrics for published data sets will be comparable to those obtained for journal publications.
Data publication with the structural biology data grid supports live analysis
Modular Reasoning for Message-Passing Programs