scholarly journals FAIRSCAPE: A Framework for FAIR and Reproducible Biomedical Analytics

2020 ◽  
Author(s):  
Maxwell Adam Levinson ◽  
Justin Niestroy ◽  
Sadnan Al Manir ◽  
Karen Fairchild ◽  
Douglas E. Lake ◽  
...  

AbstractResults of computational analyses require transparent disclosure of their supporting resources, while the analyses themselves often can be very large scale and involve multiple processing steps separated in time. Evidence for the correctness of any analysis consists of accessible data and software with runtime parameters, environment, and personnel involved.Evidence graphs - a derivation of argumentation frameworks adapted to biological science - can provide this disclosure as machine-readable metadata resolvable from persistent identifiers for computationally generated graphs, images, or tables, that can be archived and cited in a publication including a persistent ID.We have built a cloud-based, computational research commons for predictive analytics on biomedical time series datasets with hundreds of algorithms and thousands of computations using a reusable computational framework we call FAIRSCAPE.FAIRSCAPE computes a complete chain of evidence on every result, including software, computations, and datasets. An ontology for Evidence Graphs, EVI (https://w3id.org/EVI), supports inferential reasoning over the evidence. FAIRSCAPE can run nested or disjoint workflows and preserves the provenance graph across them. It can run Apache Spark jobs, scripts, workflows, or user-supplied containers. All objects are assigned persistent IDs, including software. All results are annotated with FAIR metadata using the evidence graph model for access, validation, reproducibility, and re-use of archived data and software.FAIRSCAPE is a reusable computational framework, enabling simplified access to modern scalable cloud-based components. It fully implements the FAIR data principles and extends them to provide FAIR Evidence, including provenance of datasets, software and computations, as metadata for all computed results.

2021 ◽  
Author(s):  
Maxwell Adam Levinson ◽  
Justin Niestroy ◽  
Sadnan Al Manir ◽  
Karen Fairchild ◽  
Douglas E. Lake ◽  
...  

AbstractResults of computational analyses require transparent disclosure of their supporting resources, while the analyses themselves often can be very large scale and involve multiple processing steps separated in time. Evidence for the correctness of any analysis should include not only a textual description, but also a formal record of the computations which produced the result, including accessible data and software with runtime parameters, environment, and personnel involved. This article describes FAIRSCAPE, a reusable computational framework, enabling simplified access to modern scalable cloud-based components. FAIRSCAPE fully implements the FAIR data principles and extends them to provide fully FAIR Evidence, including machine-interpretable provenance of datasets, software and computations, as metadata for all computed results. The FAIRSCAPE microservices framework creates a complete Evidence Graph for every computational result, including persistent identifiers with metadata, resolvable to the software, computations, and datasets used in the computation; and stores a URI to the root of the graph in the result’s metadata. An ontology for Evidence Graphs, EVI (https://w3id.org/EVI), supports inferential reasoning over the evidence. FAIRSCAPE can run nested or disjoint workflows and preserves provenance across them. It can run Apache Spark jobs, scripts, workflows, or user-supplied containers. All objects are assigned persistent IDs, including software. All results are annotated with FAIR metadata using the evidence graph model for access, validation, reproducibility, and re-use of archived data and software.


2021 ◽  
Author(s):  
Sadnan Al Manir ◽  
Justin Niestroy ◽  
Maxwell Adam Levinson ◽  
Timothy Clark

Introduction: Transparency of computation is a requirement for assessing the validity of computed results and research claims based upon them; and it is essential for access to, assessment, and reuse of computational components. These components may be subject to methodological or other challenges over time. While reference to archived software and/or data is increasingly common in publications, a single machine-interpretable, integrative representation of how results were derived, that supports defeasible reasoning, has been absent. Methods: We developed the Evidence Graph Ontology, EVI, in OWL 2, with a set of inference rules, to provide deep representations of supporting and challenging evidence for computations, services, software, data, and results, across arbitrarily deep networks of computations, in connected or fully distinct processes. EVI integrates FAIR practices on data and software, with important concepts from provenance models, and argumentation theory. It extends PROV for additional expressiveness, with support for defeasible reasoning. EVI treats any com- putational result or component of evidence as a defeasible assertion, supported by a DAG of the computations, software, data, and agents that produced it. Results: We have successfully deployed EVI for very-large-scale predictive analytics on clinical time-series data. Every result may reference its own evidence graph as metadata, which can be extended when subsequent computations are executed. Discussion: Evidence graphs support transparency and defeasible reasoning on results. They are first-class computational objects, and reference the datasets and software from which they are derived. They support fully transparent computation, with challenge and support propagation. The EVI approach may be extended to include instruments, animal models, and critical experimental reagents.


1969 ◽  
Vol 08 (01) ◽  
pp. 07-11 ◽  
Author(s):  
H. B. Newcombe

Methods are described for deriving personal and family histories of birth, marriage, procreation, ill health and death, for large populations, from existing civil registrations of vital events and the routine records of ill health. Computers have been used to group together and »link« the separately derived records pertaining to successive events in the lives of the same individuals and families, rapidly and on a large scale. Most of the records employed are already available as machine readable punchcards and magnetic tapes, for statistical and administrative purposes, and only minor modifications have been made to the manner in which these are produced.As applied to the population of the Canadian province of British Columbia (currently about 2 million people) these methods have already yielded substantial information on the risks of disease: a) in the population, b) in relation to various parental characteristics, and c) as correlated with previous occurrences in the family histories.


Author(s):  
Mark Newman

An introduction to the mathematics of the Poisson random graph, the simplest model of a random network. The chapter starts with a definition of the model, followed by derivations of basic properties like the mean degree, degree distribution, and clustering coefficient. This is followed with a detailed derivation of the large-scale structural properties of random graphs, including the position of the phase transition at which a giant component appears, the size of the giant component, the average size of the small components, and the expected diameter of the network. The chapter ends with a discussion of some of the shortcomings of the random graph model.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Mohammadreza Yaghoobi ◽  
Krzysztof S. Stopka ◽  
Aaditya Lakshmanan ◽  
Veera Sundararaghavan ◽  
John E. Allison ◽  
...  

AbstractThe PRISMS-Fatigue open-source framework for simulation-based analysis of microstructural influences on fatigue resistance for polycrystalline metals and alloys is presented here. The framework uses the crystal plasticity finite element method as its microstructure analysis tool and provides a highly efficient, scalable, flexible, and easy-to-use ICME community platform. The PRISMS-Fatigue framework is linked to different open-source software to instantiate microstructures, compute the material response, and assess fatigue indicator parameters. The performance of PRISMS-Fatigue is benchmarked against a similar framework implemented using ABAQUS. Results indicate that the multilevel parallelism scheme of PRISMS-Fatigue is more efficient and scalable than ABAQUS for large-scale fatigue simulations. The performance and flexibility of this framework is demonstrated with various examples that assess the driving force for fatigue crack formation of microstructures with different crystallographic textures, grain morphologies, and grain numbers, and under different multiaxial strain states, strain magnitudes, and boundary conditions.


2021 ◽  
Vol 22 (14) ◽  
pp. 7590
Author(s):  
Liza Vinhoven ◽  
Frauke Stanke ◽  
Sylvia Hafkemeyer ◽  
Manuel Manfred Nietert

Different causative therapeutics for CF patients have been developed. There are still no mutation-specific therapeutics for some patients, especially those with rare CFTR mutations. For this purpose, high-throughput screens have been performed which result in various candidate compounds, with mostly unclear modes of action. In order to elucidate the mechanism of action for promising candidate substances and to be able to predict possible synergistic effects of substance combinations, we used a systems biology approach to create a model of the CFTR maturation pathway in cells in a standardized, human- and machine-readable format. It is composed of a core map, manually curated from small-scale experiments in human cells, and a coarse map including interactors identified in large-scale efforts. The manually curated core map includes 170 different molecular entities and 156 reactions from 221 publications. The coarse map encompasses 1384 unique proteins from four publications. The overlap between the two data sources amounts to 46 proteins. The CFTR Lifecycle Map can be used to support the identification of potential targets inside the cell and elucidate the mode of action for candidate substances. It thereby provides a backbone to structure available data as well as a tool to develop hypotheses regarding novel therapeutics.


Author(s):  
Natasha Balac ◽  
Tamara Sipes ◽  
Nicole Wolter ◽  
Kenneth Nunes ◽  
Bob Sinkovits ◽  
...  

2021 ◽  
pp. 004728752110247
Author(s):  
Vinh Bui ◽  
Ali Reza Alaei ◽  
Huy Quan Vu ◽  
Gang Li ◽  
Rob Law

Understanding and being able to measure, analyze, compare, and contrast the image of a tourism destination, also known as tourism destination image (TDI), is critical in tourism management and destination marketing. Although various methodologies have been developed, a consistent, reliable, and scalable method for measuring TDI is still unavailable. This study aims to address the challenge by proposing a framework for a holistic measure of TDI in four dimensions, including popularity, sentiment, time, and location. A structural model for TDI measurement that covers various aspects of a tourism destination is developed. TDI is then measured by a comprehensive computational framework that can analyze complex textual and visual data on a large scale. A case study using more than 30,000 images, and 10,000 comments in relation to three tourism destinations in Australia demonstrates the effectiveness of the proposed framework.


2019 ◽  
Author(s):  
Anna Danese ◽  
Maria L. Richter ◽  
David S. Fischer ◽  
Fabian J. Theis ◽  
Maria Colomé-Tatché

ABSTRACTEpigenetic single-cell measurements reveal a layer of regulatory information not accessible to single-cell transcriptomics, however single-cell-omics analysis tools mainly focus on gene expression data. To address this issue, we present epiScanpy, a computational framework for the analysis of single-cell DNA methylation and single-cell ATAC-seq data. EpiScanpy makes the many existing RNA-seq workflows from scanpy available to large-scale single-cell data from other -omics modalities. We introduce and compare multiple feature space constructions for epigenetic data and show the feasibility of common clustering, dimension reduction and trajectory learning techniques. We benchmark epiScanpy by interrogating different single-cell brain mouse atlases of DNA methylation, ATAC-seq and transcriptomics. We find that differentially methylated and differentially open markers between cell clusters enrich transcriptome-based cell type labels by orthogonal epigenetic information.


Sign in / Sign up

Export Citation Format

Share Document