FAIRSCAPE: A Framework for FAIR and Reproducible Biomedical Analytics

AbstractResults of computational analyses require transparent disclosure of their supporting resources, while the analyses themselves often can be very large scale and involve multiple processing steps separated in time. Evidence for the correctness of any analysis consists of accessible data and software with runtime parameters, environment, and personnel involved.Evidence graphs - a derivation of argumentation frameworks adapted to biological science - can provide this disclosure as machine-readable metadata resolvable from persistent identifiers for computationally generated graphs, images, or tables, that can be archived and cited in a publication including a persistent ID.We have built a cloud-based, computational research commons for predictive analytics on biomedical time series datasets with hundreds of algorithms and thousands of computations using a reusable computational framework we call FAIRSCAPE.FAIRSCAPE computes a complete chain of evidence on every result, including software, computations, and datasets. An ontology for Evidence Graphs, EVI (https://w3id.org/EVI), supports inferential reasoning over the evidence. FAIRSCAPE can run nested or disjoint workflows and preserves the provenance graph across them. It can run Apache Spark jobs, scripts, workflows, or user-supplied containers. All objects are assigned persistent IDs, including software. All results are annotated with FAIR metadata using the evidence graph model for access, validation, reproducibility, and re-use of archived data and software.FAIRSCAPE is a reusable computational framework, enabling simplified access to modern scalable cloud-based components. It fully implements the FAIR data principles and extends them to provide FAIR Evidence, including provenance of datasets, software and computations, as metadata for all computed results.

Download Full-text

FAIRSCAPE: a Framework for FAIR and Reproducible Biomedical Analytics

Neuroinformatics ◽

10.1007/s12021-021-09529-4 ◽

2021 ◽

Author(s):

Maxwell Adam Levinson ◽

Justin Niestroy ◽

Sadnan Al Manir ◽

Karen Fairchild ◽

Douglas E. Lake ◽

...

Keyword(s):

Computational Result ◽

Large Scale ◽

Graph Model ◽

Inferential Reasoning ◽

Computational Framework ◽

Textual Description ◽

Evidence Graph ◽

Computational Analyses ◽

Processing Steps ◽

Multiple Processing

AbstractResults of computational analyses require transparent disclosure of their supporting resources, while the analyses themselves often can be very large scale and involve multiple processing steps separated in time. Evidence for the correctness of any analysis should include not only a textual description, but also a formal record of the computations which produced the result, including accessible data and software with runtime parameters, environment, and personnel involved. This article describes FAIRSCAPE, a reusable computational framework, enabling simplified access to modern scalable cloud-based components. FAIRSCAPE fully implements the FAIR data principles and extends them to provide fully FAIR Evidence, including machine-interpretable provenance of datasets, software and computations, as metadata for all computed results. The FAIRSCAPE microservices framework creates a complete Evidence Graph for every computational result, including persistent identifiers with metadata, resolvable to the software, computations, and datasets used in the computation; and stores a URI to the root of the graph in the result’s metadata. An ontology for Evidence Graphs, EVI (https://w3id.org/EVI), supports inferential reasoning over the evidence. FAIRSCAPE can run nested or disjoint workflows and preserves provenance across them. It can run Apache Spark jobs, scripts, workflows, or user-supplied containers. All objects are assigned persistent IDs, including software. All results are annotated with FAIR metadata using the evidence graph model for access, validation, reproducibility, and re-use of archived data and software.

Download Full-text

Evidence Graphs: Supporting Transparent and FAIR Computation, with Defeasible Reasoning on Data, Methods and Results

10.1101/2021.03.29.437561 ◽

2021 ◽

Author(s):

Sadnan Al Manir ◽

Justin Niestroy ◽

Maxwell Adam Levinson ◽

Timothy Clark

Keyword(s):

Time Series ◽

Large Scale ◽

Time Series Data ◽

Predictive Analytics ◽

Defeasible Reasoning ◽

Series Data ◽

Inference Rules ◽

Deep Networks ◽

Evidence Graph ◽

Over Time

Introduction: Transparency of computation is a requirement for assessing the validity of computed results and research claims based upon them; and it is essential for access to, assessment, and reuse of computational components. These components may be subject to methodological or other challenges over time. While reference to archived software and/or data is increasingly common in publications, a single machine-interpretable, integrative representation of how results were derived, that supports defeasible reasoning, has been absent. Methods: We developed the Evidence Graph Ontology, EVI, in OWL 2, with a set of inference rules, to provide deep representations of supporting and challenging evidence for computations, services, software, data, and results, across arbitrarily deep networks of computations, in connected or fully distinct processes. EVI integrates FAIR practices on data and software, with important concepts from provenance models, and argumentation theory. It extends PROV for additional expressiveness, with support for defeasible reasoning. EVI treats any com- putational result or component of evidence as a defeasible assertion, supported by a DAG of the computations, software, data, and agents that produced it. Results: We have successfully deployed EVI for very-large-scale predictive analytics on clinical time-series data. Every result may reference its own evidence graph as metadata, which can be extended when subsequent computations are executed. Discussion: Evidence graphs support transparency and defeasible reasoning on results. They are first-class computational objects, and reference the datasets and software from which they are derived. They support fully transparent computation, with challenge and support propagation. The EVI approach may be extended to include instruments, animal models, and critical experimental reagents.

Download Full-text

The Use of Medical Record Linkage for Population and Genetic Studies

Methods of Information in Medicine ◽

10.1055/s-0038-1635962 ◽

1969 ◽

Vol 08 (01) ◽

pp. 07-11 ◽

Cited By ~ 9

Author(s):

H. B. Newcombe

Keyword(s):

Record Linkage ◽

Large Scale ◽

Medical Record Linkage ◽

Canadian Province ◽

Genetic Studies ◽

Parental Characteristics ◽

Family Histories ◽

The Family ◽

Large Populations ◽

Machine Readable

Methods are described for deriving personal and family histories of birth, marriage, procreation, ill health and death, for large populations, from existing civil registrations of vital events and the routine records of ill health. Computers have been used to group together and »link« the separately derived records pertaining to successive events in the lives of the same individuals and families, rapidly and on a large scale. Most of the records employed are already available as machine readable punchcards and magnetic tapes, for statistical and administrative purposes, and only minor modifications have been made to the manner in which these are produced.As applied to the population of the Canadian province of British Columbia (currently about 2 million people) these methods have already yielded substantial information on the risks of disease: a) in the population, b) in relation to various parental characteristics, and c) as correlated with previous occurrences in the family histories.

Download Full-text

Random graphs

10.1093/oso/9780198805090.003.0011 ◽

2018 ◽

Author(s):

Mark Newman

Keyword(s):

Random Graph ◽

Random Graphs ◽

Large Scale ◽

Random Network ◽

Graph Model ◽

Clustering Coefficient ◽

Giant Component ◽

Random Graph Model ◽

Definition Of ◽

Average Size

An introduction to the mathematics of the Poisson random graph, the simplest model of a random network. The chapter starts with a definition of the model, followed by derivations of basic properties like the mean degree, degree distribution, and clustering coefficient. This is followed with a detailed derivation of the large-scale structural properties of random graphs, including the position of the phase transition at which a giant component appears, the size of the giant component, the average size of the small components, and the expected diameter of the network. The chapter ends with a discussion of some of the shortcomings of the random graph model.

Download Full-text

PRISMS-Fatigue computational framework for fatigue analysis in polycrystalline metals and alloys

npj Computational Materials ◽

10.1038/s41524-021-00506-8 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Mohammadreza Yaghoobi ◽

Krzysztof S. Stopka ◽

Aaditya Lakshmanan ◽

Veera Sundararaghavan ◽

John E. Allison ◽

...

Keyword(s):

Open Source ◽

Open Source Software ◽

Large Scale ◽

Metals And Alloys ◽

Analysis Tool ◽

Computational Framework ◽

Crystal Plasticity Finite Element ◽

Polycrystalline Metals ◽

Simulation Based ◽

Open Source Framework

AbstractThe PRISMS-Fatigue open-source framework for simulation-based analysis of microstructural influences on fatigue resistance for polycrystalline metals and alloys is presented here. The framework uses the crystal plasticity finite element method as its microstructure analysis tool and provides a highly efficient, scalable, flexible, and easy-to-use ICME community platform. The PRISMS-Fatigue framework is linked to different open-source software to instantiate microstructures, compute the material response, and assess fatigue indicator parameters. The performance of PRISMS-Fatigue is benchmarked against a similar framework implemented using ABAQUS. Results indicate that the multilevel parallelism scheme of PRISMS-Fatigue is more efficient and scalable than ABAQUS for large-scale fatigue simulations. The performance and flexibility of this framework is demonstrated with various examples that assess the driving force for fatigue crack formation of microstructures with different crystallographic textures, grain morphologies, and grain numbers, and under different multiaxial strain states, strain magnitudes, and boundary conditions.

Download Full-text

CFTR Lifecycle Map—A Systems Medicine Model of CFTR Maturation to Predict Possible Active Compound Combinations

International Journal of Molecular Sciences ◽

10.3390/ijms22147590 ◽

2021 ◽

Vol 22 (14) ◽

pp. 7590

Author(s):

Liza Vinhoven ◽

Frauke Stanke ◽

Sylvia Hafkemeyer ◽

Manuel Manfred Nietert

Keyword(s):

Large Scale ◽

Synergistic Effects ◽

Small Scale ◽

Systems Medicine ◽

Promising Candidate ◽

Cftr Mutations ◽

Machine Readable ◽

High Throughput Screens ◽

Readable Format ◽

Machine Readable Format

Different causative therapeutics for CF patients have been developed. There are still no mutation-specific therapeutics for some patients, especially those with rare CFTR mutations. For this purpose, high-throughput screens have been performed which result in various candidate compounds, with mostly unclear modes of action. In order to elucidate the mechanism of action for promising candidate substances and to be able to predict possible synergistic effects of substance combinations, we used a systems biology approach to create a model of the CFTR maturation pathway in cells in a standardized, human- and machine-readable format. It is composed of a core map, manually curated from small-scale experiments in human cells, and a coarse map including interactors identified in large-scale efforts. The manually curated core map includes 170 different molecular entities and 156 reactions from 221 publications. The coarse map encompasses 1384 unique proteins from four publications. The overlap between the two data sources amounts to 46 proteins. The CFTR Lifecycle Map can be used to support the identification of potential targets inside the cell and elucidate the mode of action for candidate substances. It thereby provides a backbone to structure available data as well as a tool to develop hypotheses regarding novel therapeutics.

Download Full-text

Large Scale predictive analytics for real-time energy management

2013 IEEE International Conference on Big Data ◽

10.1109/bigdata.2013.6691635 ◽

2013 ◽

Cited By ~ 13

Author(s):

Natasha Balac ◽

Tamara Sipes ◽

Nicole Wolter ◽

Kenneth Nunes ◽

Bob Sinkovits ◽

...

Keyword(s):

Real Time ◽

Energy Management ◽

Large Scale ◽

Predictive Analytics

Download Full-text

Revisiting Tourism Destination Image: A Holistic Measurement Framework Using Big Data

Journal of Travel Research ◽

10.1177/00472875211024749 ◽

2021 ◽

pp. 004728752110247

Author(s):

Vinh Bui ◽

Ali Reza Alaei ◽

Huy Quan Vu ◽

Gang Li ◽

Rob Law

Keyword(s):

Large Scale ◽

Structural Model ◽

Destination Image ◽

Tourism Destination ◽

Computational Framework ◽

Four Dimensions ◽

Tourism Management ◽

Measurement Framework ◽

Compare And Contrast

Understanding and being able to measure, analyze, compare, and contrast the image of a tourism destination, also known as tourism destination image (TDI), is critical in tourism management and destination marketing. Although various methodologies have been developed, a consistent, reliable, and scalable method for measuring TDI is still unavailable. This study aims to address the challenge by proposing a framework for a holistic measure of TDI in four dimensions, including popularity, sentiment, time, and location. A structural model for TDI measurement that covers various aspects of a tourism destination is developed. TDI is then measured by a comprehensive computational framework that can analyze complex textual and visual data on a large scale. A case study using more than 30,000 images, and 10,000 comments in relation to three tourism destinations in Australia demonstrates the effectiveness of the proposed framework.

Download Full-text

An efficient computational framework for labeling large scale spatiotemporal remote sensing datasets

2014 Seventh International Conference on Contemporary Computing (IC3) ◽

10.1109/ic3.2014.6897247 ◽

2014 ◽

Cited By ~ 2

Author(s):

Manu Sethi ◽

Yupeng Yan ◽

Anand Rangarajan ◽

Ranga Raju Vatsavaiy ◽

Sanjay Ranka

Keyword(s):

Remote Sensing ◽

Large Scale ◽

Computational Framework

Download Full-text

EpiScanpy: integrated single-cell epigenomic analysis

10.1101/648097 ◽

2019 ◽

Cited By ~ 4

Author(s):

Anna Danese ◽

Maria L. Richter ◽

David S. Fischer ◽

Fabian J. Theis ◽

Maria Colomé-Tatché

Keyword(s):

Dna Methylation ◽

Single Cell ◽

Large Scale ◽

Feature Space ◽

Rna Seq ◽

Computational Framework ◽

Learning Techniques ◽

Multiple Feature ◽

The Many ◽

Cell Data

ABSTRACTEpigenetic single-cell measurements reveal a layer of regulatory information not accessible to single-cell transcriptomics, however single-cell-omics analysis tools mainly focus on gene expression data. To address this issue, we present epiScanpy, a computational framework for the analysis of single-cell DNA methylation and single-cell ATAC-seq data. EpiScanpy makes the many existing RNA-seq workflows from scanpy available to large-scale single-cell data from other -omics modalities. We introduce and compare multiple feature space constructions for epigenetic data and show the feasibility of common clustering, dimension reduction and trajectory learning techniques. We benchmark epiScanpy by interrogating different single-cell brain mouse atlases of DNA methylation, ATAC-seq and transcriptomics. We find that differentially methylated and differentially open markers between cell clusters enrich transcriptome-based cell type labels by orthogonal epigenetic information.

Download Full-text