Molecular Structure Determination on the Grid

2012 ◽  
pp. 862-880
Author(s):  
Russ Miller ◽  
Charles Weeks

Grids represent an emerging technology that allows geographically- and organizationally-distributed resources (e.g., computer systems, data repositories, sensors, imaging systems, and so forth) to be linked in a fashion that is transparent to the user. The New York State Grid (NYS Grid) is an integrated computational and data grid that provides access to a wide variety of resources to users from around the world. NYS Grid can be accessed via a Web portal, where the users have access to their data sets and applications, but do not need to be made aware of the details of the data storage or computational devices that are specifically employed in solving their problems. Grid-enabled versions of the SnB and BnP programs, which implement the Shake-and-Bake method of molecular structure (SnB) and substructure (BnP) determination, respectively, have been deployed on NYS Grid. Further, through the Grid Portal, SnB has been run simultaneously on all computational resources on NYS Grid as well as on more than 1100 of the over 3000 processors available through the Open Science Grid.

Author(s):  
Russ Miller ◽  
Charles Weeks

Grids represent an emerging technology that allows geographically- and organizationally-distributed resources (e.g., compute systems, data repositories, sensors, imaging systems, and so forth) to be linked in a fashion that is transparent to the user. The New York State Grid (NYS Grid) is an integrated computational and data grid that provides access to a wide variety of resources to users from around the world. NYS Grid can be accessed via a Web portal, where the users have access to their data sets and applications, but do not need to be made aware of the details of the data storage or computational devices that are specifically employed in solving their problems. Grid-enabled versions of the SnB and BnP programs, which implement the Shake-and-Bake method of molecular structure (SnB) and substructure (BnP) determination, respectively, have been deployed on NYS Grid. Further, through the Grid Portal, SnB has been run simultaneously on all computational resources on NYS Grid as well as on more than 1100 of the over 3000 processors available through the Open Science Grid.


2007 ◽  
Vol 40 (5) ◽  
pp. 938-944 ◽  
Author(s):  
Russ Miller ◽  
Naimesh Shah ◽  
Mark L. Green ◽  
William Furey ◽  
Charles M. Weeks

Computational and data grids represent an emerging technology that allows geographically and organizationally distributed resources (e.g.computing and storage resources) to be linked and accessed in a fashion that is transparent to the user, presenting an extension of the desktop for users whose computational, data and visualization needs extend beyond their local systems. The New York State Grid is an integrated computational and data grid that provides web-based access for users from around the world to computational, application and data storage resources. This grid is used in a ubiquitous fashion, where the users have virtual access to their data sets and applications, but do not need to be made aware of the details of the data storage or computational devices that are specifically employed. Two of the applications that users worldwide have access to on a variety of grids, including the New York State Grid, are theSnBandBnPprograms, which implement theShake-and-Bakemethod of molecular structure (SnB) and substructure (BnP) determination, respectively. In particular, through our grid portal (i.e.logging on to a web site),SnBhas been run simultaneously on all computational resources on the New York State Grid as well as on more than 1100 of the over 3000 processors available through the Open Science Grid.


2007 ◽  
Vol 15 (4) ◽  
pp. 249-268 ◽  
Author(s):  
Gurmeet Singh ◽  
Karan Vahi ◽  
Arun Ramakrishnan ◽  
Gaurang Mehta ◽  
Ewa Deelman ◽  
...  

In this paper we examine the issue of optimizing disk usage and scheduling large-scale scientific workflows onto distributed resources where the workflows are data-intensive, requiring large amounts of data storage, and the resources have limited storage resources. Our approach is two-fold: we minimize the amount of space a workflow requires during execution by removing data files at runtime when they are no longer needed and we demonstrate that workflows may have to be restructured to reduce the overall data footprint of the workflow. We show the results of our data management and workflow restructuring solutions using a Laser Interferometer Gravitational-Wave Observatory (LIGO) application and an astronomy application, Montage, running on a large-scale production grid-the Open Science Grid. We show that although reducing the data footprint of Montage by 48% can be achieved with dynamic data cleanup techniques, LIGO Scientific Collaboration workflows require additional restructuring to achieve a 56% reduction in data space usage. We also examine the cost of the workflow restructuring in terms of the application's runtime.


2012 ◽  
Vol 33 (6) ◽  
pp. 565-571 ◽  
Author(s):  
Valerie B. Haley ◽  
Carole Van Antwerpen ◽  
Boldtsetseg Tserenpuntsag ◽  
Kathleen A. Gase ◽  
Peggy Hazamy ◽  
...  

Objective.To efficiently validate the accuracy of surgical site infection (SSI) data reported to the National Healthcare Safety Network (NHSN) by New York State (NYS) hospitals.Design.Validation study.Setting.176 NYS hospitals.Methods.NYS Department of Health staff validated the data reported to NHSN by review of a stratified sample of medical records from each hospital. The four strata were (1) SSIs reported to NHSN; (2) records with an indication of infection from diagnosis codes in administrative data but not reported to NHSN as SSIs; (3) records with discordant procedure codes in NHSN and state data sets; (4) records not in the other three strata.Results.A total of 7,059 surgical charts (6% of the procedures reported by hospitals) were reviewed. In stratum 1, 7% of reported SSIs did not meet the criteria for inclusion in NHSN and were subsequently removed. In stratum 2, 24% of records indicated missed SSIs not reported to NHSN, whereas in strata 3 and 4, only 1% of records indicated missed SSIs; these SSIs were subsequently added to NHSN. Also, in stratum 3, 75% of records were not coded for the correct NHSN procedure. Errors were highest for colon data; the NYS colon SSI rate increased by 7.5% as a result of hospital audits.Conclusions.Audits are vital for ensuring the accuracy of hospital-acquired infection (HAI) data so that hospital HAI rates can be fairly compared. Use of administrative data increased the efficiency of identifying problems in hospitals' SSI surveillance that caused SSIs to be unreported and caused errors in denominator data.


2006 ◽  
Vol 14 (3-4) ◽  
pp. 195-208
Author(s):  
J.S. Pahwa ◽  
A.C. Jones ◽  
R.J. White ◽  
M. Burgess ◽  
W.A. Gray ◽  
...  

In the Biodiversity World (BDW) project we have created a flexible and extensible Web Services-based Grid environment for biodiversity researchers to solve problems in biodiversity and analyse biodiversity patterns. In this environment, heterogeneous and globally distributed biodiversity-related resources such as data sets and analytical tools are made available to be accessed and assembled by users into workflows to perform complex scientific experiments. One such experiment is bioclimatic modelling of the geographical distribution of individual species using climate variables in order to explain past and future climate-related changes in species distribution. Data sources and analytical tools required for such analysis of species distribution are widely dispersed, available on heterogeneous platforms, present data in different formats and lack inherent interoperability. The present BDW system brings all these disparate units together so that the user can combine tools with little thought as to their original availability, data formats and interoperability. The new prototype BDW system architecture not only brings together heterogeneous resources but also enables utilisation of computational resources and provides a secure access to BDW resources via a federated security model. We describe features of the new BDW system and its security model which enable user authentication from a workflow application as part of workflow execution.


2020 ◽  
Vol 245 ◽  
pp. 03005
Author(s):  
Pascal Paschos ◽  
Benedikt Riedel ◽  
Mats Rynge ◽  
Lincoln Bryant ◽  
Judith Stephen ◽  
...  

In this paper we showcase the support in Open Science Grid (OSG) of Midscale collaborations, the region of computing and storage scale where multi-institutional researchers collaborate to execute their science workflows on the grid without having dedicated technical support teams of their own. Collaboration Services enables such collaborations to take advantage of the distributed resources of the Open Science Grid by facilitating access to submission hosts, the deployment of their applications and supporting their data management requirements. Distributed computing software adopted from large scale collaborations, such as CVMFS, Rucio, xCache lower the barrier of intermediate scale research to integrate with existing infrastructure.


2013 ◽  
Vol 18 (4) ◽  
pp. 334-339 ◽  
Author(s):  
Erika E. Scott ◽  
Nicole L. Krupa ◽  
Julie Sorensen ◽  
Paul L. Jenkins

FACETS ◽  
2018 ◽  
Vol 3 (1) ◽  
pp. 326-337 ◽  
Author(s):  
Dewey W. Dunnington ◽  
Ian S. Spooner

Multiparameter data with both spatial and temporal components are critical to advancing the state of environmental science. These data and data collected in the future are most useful when compared with each other and analyzed together, which is often inhibited by inconsistent data formats and a lack of structured documentation provided by researchers and (or) data repositories. In this paper we describe a linked table-based structure that encodes multiparameter spatiotemporal data and their documentation that is both flexible (able to store a wide variety of data sets) and usable (can easily be viewed, edited, and converted to plottable formats). The format is a collection of five tables (Data, Locations, Params, Data Sets, and Columns), on which restrictions are placed to ensure data are represented consistently from multiple sources. These tables can be stored in a variety of ways including spreadsheet files, comma-separated value (CSV) files, JavaScript object notation (JSON) files, databases, or objects in a software environment such as R or Python. A toolkit for users of R statistical software was also developed to facilitate converting data to and from the data format. We have used this format to combine data from multiple sources with minimal metadata loss and to effectively archive and communicate the results of spatiotemporal studies. We believe that this format and associated discussion of data and data storage will facilitate increased synergies between past, present, and future data sets in the environmental science community.


2021 ◽  
Vol 6 ◽  
pp. 355
Author(s):  
Helen Buckley Woods ◽  
Stephen Pinfield

Background: Numerous mechanisms exist to incentivise researchers to share their data. This scoping review aims to identify and summarise evidence of the efficacy of different interventions to promote open data practices and provide an overview of current research. Methods: This scoping review is based on data identified from Web of Science and LISTA, limited from 2016 to 2021. A total of 1128 papers were screened, with 38 items being included. Items were selected if they focused on designing or evaluating an intervention or presenting an initiative to incentivise sharing. Items comprised a mixture of research papers, opinion pieces and descriptive articles. Results: Seven major themes in the literature were identified: publisher/journal data sharing policies, metrics, software solutions, research data sharing agreements in general, open science ‘badges’, funder mandates, and initiatives. Conclusions: A number of key messages for data sharing include: the need to build on existing cultures and practices, meeting people where they are and tailoring interventions to support them; the importance of publicising and explaining the policy/service widely; the need to have disciplinary data champions to model good practice and drive cultural change; the requirement to resource interventions properly; and the imperative to provide robust technical infrastructure and protocols, such as labelling of data sets, use of DOIs, data standards and use of data repositories.


Sign in / Sign up

Export Citation Format

Share Document