scholarly journals FAIRSCAPE: a Framework for FAIR and Reproducible Biomedical Analytics

2021 ◽  
Author(s):  
Maxwell Adam Levinson ◽  
Justin Niestroy ◽  
Sadnan Al Manir ◽  
Karen Fairchild ◽  
Douglas E. Lake ◽  
...  

AbstractResults of computational analyses require transparent disclosure of their supporting resources, while the analyses themselves often can be very large scale and involve multiple processing steps separated in time. Evidence for the correctness of any analysis should include not only a textual description, but also a formal record of the computations which produced the result, including accessible data and software with runtime parameters, environment, and personnel involved. This article describes FAIRSCAPE, a reusable computational framework, enabling simplified access to modern scalable cloud-based components. FAIRSCAPE fully implements the FAIR data principles and extends them to provide fully FAIR Evidence, including machine-interpretable provenance of datasets, software and computations, as metadata for all computed results. The FAIRSCAPE microservices framework creates a complete Evidence Graph for every computational result, including persistent identifiers with metadata, resolvable to the software, computations, and datasets used in the computation; and stores a URI to the root of the graph in the result’s metadata. An ontology for Evidence Graphs, EVI (https://w3id.org/EVI), supports inferential reasoning over the evidence. FAIRSCAPE can run nested or disjoint workflows and preserves provenance across them. It can run Apache Spark jobs, scripts, workflows, or user-supplied containers. All objects are assigned persistent IDs, including software. All results are annotated with FAIR metadata using the evidence graph model for access, validation, reproducibility, and re-use of archived data and software.

2020 ◽  
Author(s):  
Maxwell Adam Levinson ◽  
Justin Niestroy ◽  
Sadnan Al Manir ◽  
Karen Fairchild ◽  
Douglas E. Lake ◽  
...  

AbstractResults of computational analyses require transparent disclosure of their supporting resources, while the analyses themselves often can be very large scale and involve multiple processing steps separated in time. Evidence for the correctness of any analysis consists of accessible data and software with runtime parameters, environment, and personnel involved.Evidence graphs - a derivation of argumentation frameworks adapted to biological science - can provide this disclosure as machine-readable metadata resolvable from persistent identifiers for computationally generated graphs, images, or tables, that can be archived and cited in a publication including a persistent ID.We have built a cloud-based, computational research commons for predictive analytics on biomedical time series datasets with hundreds of algorithms and thousands of computations using a reusable computational framework we call FAIRSCAPE.FAIRSCAPE computes a complete chain of evidence on every result, including software, computations, and datasets. An ontology for Evidence Graphs, EVI (https://w3id.org/EVI), supports inferential reasoning over the evidence. FAIRSCAPE can run nested or disjoint workflows and preserves the provenance graph across them. It can run Apache Spark jobs, scripts, workflows, or user-supplied containers. All objects are assigned persistent IDs, including software. All results are annotated with FAIR metadata using the evidence graph model for access, validation, reproducibility, and re-use of archived data and software.FAIRSCAPE is a reusable computational framework, enabling simplified access to modern scalable cloud-based components. It fully implements the FAIR data principles and extends them to provide FAIR Evidence, including provenance of datasets, software and computations, as metadata for all computed results.


Author(s):  
Mark Newman

An introduction to the mathematics of the Poisson random graph, the simplest model of a random network. The chapter starts with a definition of the model, followed by derivations of basic properties like the mean degree, degree distribution, and clustering coefficient. This is followed with a detailed derivation of the large-scale structural properties of random graphs, including the position of the phase transition at which a giant component appears, the size of the giant component, the average size of the small components, and the expected diameter of the network. The chapter ends with a discussion of some of the shortcomings of the random graph model.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Mohammadreza Yaghoobi ◽  
Krzysztof S. Stopka ◽  
Aaditya Lakshmanan ◽  
Veera Sundararaghavan ◽  
John E. Allison ◽  
...  

AbstractThe PRISMS-Fatigue open-source framework for simulation-based analysis of microstructural influences on fatigue resistance for polycrystalline metals and alloys is presented here. The framework uses the crystal plasticity finite element method as its microstructure analysis tool and provides a highly efficient, scalable, flexible, and easy-to-use ICME community platform. The PRISMS-Fatigue framework is linked to different open-source software to instantiate microstructures, compute the material response, and assess fatigue indicator parameters. The performance of PRISMS-Fatigue is benchmarked against a similar framework implemented using ABAQUS. Results indicate that the multilevel parallelism scheme of PRISMS-Fatigue is more efficient and scalable than ABAQUS for large-scale fatigue simulations. The performance and flexibility of this framework is demonstrated with various examples that assess the driving force for fatigue crack formation of microstructures with different crystallographic textures, grain morphologies, and grain numbers, and under different multiaxial strain states, strain magnitudes, and boundary conditions.


2021 ◽  
pp. 004728752110247
Author(s):  
Vinh Bui ◽  
Ali Reza Alaei ◽  
Huy Quan Vu ◽  
Gang Li ◽  
Rob Law

Understanding and being able to measure, analyze, compare, and contrast the image of a tourism destination, also known as tourism destination image (TDI), is critical in tourism management and destination marketing. Although various methodologies have been developed, a consistent, reliable, and scalable method for measuring TDI is still unavailable. This study aims to address the challenge by proposing a framework for a holistic measure of TDI in four dimensions, including popularity, sentiment, time, and location. A structural model for TDI measurement that covers various aspects of a tourism destination is developed. TDI is then measured by a comprehensive computational framework that can analyze complex textual and visual data on a large scale. A case study using more than 30,000 images, and 10,000 comments in relation to three tourism destinations in Australia demonstrates the effectiveness of the proposed framework.


2021 ◽  
Author(s):  
Sadnan Al Manir ◽  
Justin Niestroy ◽  
Maxwell Adam Levinson ◽  
Timothy Clark

Introduction: Transparency of computation is a requirement for assessing the validity of computed results and research claims based upon them; and it is essential for access to, assessment, and reuse of computational components. These components may be subject to methodological or other challenges over time. While reference to archived software and/or data is increasingly common in publications, a single machine-interpretable, integrative representation of how results were derived, that supports defeasible reasoning, has been absent. Methods: We developed the Evidence Graph Ontology, EVI, in OWL 2, with a set of inference rules, to provide deep representations of supporting and challenging evidence for computations, services, software, data, and results, across arbitrarily deep networks of computations, in connected or fully distinct processes. EVI integrates FAIR practices on data and software, with important concepts from provenance models, and argumentation theory. It extends PROV for additional expressiveness, with support for defeasible reasoning. EVI treats any com- putational result or component of evidence as a defeasible assertion, supported by a DAG of the computations, software, data, and agents that produced it. Results: We have successfully deployed EVI for very-large-scale predictive analytics on clinical time-series data. Every result may reference its own evidence graph as metadata, which can be extended when subsequent computations are executed. Discussion: Evidence graphs support transparency and defeasible reasoning on results. They are first-class computational objects, and reference the datasets and software from which they are derived. They support fully transparent computation, with challenge and support propagation. The EVI approach may be extended to include instruments, animal models, and critical experimental reagents.


2019 ◽  
Author(s):  
Anna Danese ◽  
Maria L. Richter ◽  
David S. Fischer ◽  
Fabian J. Theis ◽  
Maria Colomé-Tatché

ABSTRACTEpigenetic single-cell measurements reveal a layer of regulatory information not accessible to single-cell transcriptomics, however single-cell-omics analysis tools mainly focus on gene expression data. To address this issue, we present epiScanpy, a computational framework for the analysis of single-cell DNA methylation and single-cell ATAC-seq data. EpiScanpy makes the many existing RNA-seq workflows from scanpy available to large-scale single-cell data from other -omics modalities. We introduce and compare multiple feature space constructions for epigenetic data and show the feasibility of common clustering, dimension reduction and trajectory learning techniques. We benchmark epiScanpy by interrogating different single-cell brain mouse atlases of DNA methylation, ATAC-seq and transcriptomics. We find that differentially methylated and differentially open markers between cell clusters enrich transcriptome-based cell type labels by orthogonal epigenetic information.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Agrim Gupta ◽  
Silvio Savarese ◽  
Surya Ganguli ◽  
Li Fei-Fei

AbstractThe intertwined processes of learning and evolution in complex environmental niches have resulted in a remarkable diversity of morphological forms. Moreover, many aspects of animal intelligence are deeply embodied in these evolved morphologies. However, the principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control, remain elusive, because performing large-scale in silico experiments on evolution and learning is challenging. Here, we introduce Deep Evolutionary Reinforcement Learning (DERL): a computational framework which can evolve diverse agent morphologies to learn challenging locomotion and manipulation tasks in complex environments. Leveraging DERL we demonstrate several relations between environmental complexity, morphological intelligence and the learnability of control. First, environmental complexity fosters the evolution of morphological intelligence as quantified by the ability of a morphology to facilitate the learning of novel tasks. Second, we demonstrate a morphological Baldwin effect i.e., in our simulations evolution rapidly selects morphologies that learn faster, thereby enabling behaviors learned late in the lifetime of early ancestors to be expressed early in the descendants lifetime. Third, we suggest a mechanistic basis for the above relationships through the evolution of morphologies that are more physically stable and energy efficient, and can therefore facilitate learning and control.


2021 ◽  
Vol 48 (1) ◽  
pp. 55-71
Author(s):  
Xiao-Bo Tang ◽  
Wei-Gang Fu ◽  
Yan Liu

The scale of know­ledge is growing rapidly in the big data environment, and traditional know­ledge organization and services have faced the dilemma of semantic inaccuracy and untimeliness. From a know­ledge fusion perspective-combining the precise semantic superiority of traditional ontology with the large-scale graph processing power and the predicate attribute expression ability of property graph-this paper presents an ontology and property graph fusion framework (OPGFF). The fusion process is divided into content layer fusion and constraint layer fusion. The result of the fusion, that is, the know­ledge representation model is called know­ledge big graph. In addition, this paper applies the know­ledge big graph model to the ownership network in the China’s financial field and builds a financial ownership know­ledge big graph. Furthermore, this paper designs and implements six consistency inference algorithms for finding contradictory data and filling in missing data in the financial ownership know­ledge big graph, five of which are completely domain agnostic. The correctness and validity of the algorithms have been experimentally verified with actual data. The fusion OPGFF framework and the implementation method of the know­ledge big graph could provide technical reference for big data know­ledge organization and services.


2019 ◽  
Vol 3 (Supplement_1) ◽  
pp. S208-S208
Author(s):  
Samuel Beck ◽  
Junyeong Lee

Abstract Aging causes the global disorganization of nuclear chromatin architecture. In a normal young nucleus, silent heterochromatin is associated with the nuclear lamina layer underlying nuclear envelope, thus spatially separated from euchromatin at the nuclear center. Notably, aging causes the disruption of nuclear lamina and the decondensation of associated heterochromatin. However, it is not clearly understood how these changes of chromatin architectures contribute to age-related diseases. Through large-scale computational analyses, we present that CpG islands (CGIs) give important clues to answering this question. CGIs are DNA elements with high Cytosine-phosphate-Guanine dinucleotide frequencies. In human, about 60% of total genes contain CGIs at their promoters (CGI+ genes) and are broadly expressed throughout the body. The other 40% of genes that do not have CGIs (CGI- genes) exhibit tissue-restricted expression patterns. Our results demonstrate that, in normal young nuclei, only CGI- genes can reside within lamina-associated heterochromatin when transcriptionally inactive, while CGI+ genes associate with nuclear central euchromatin even when they are repressed. In parallel, we show that age-associated heterochromatin decondensation can specifically de-repress tissue-specific CGI- genes leading to their uncontrolled expressions. Our results further demonstrate that global misregulation of CGI- genes increases the noise in gene transcription that, in turn, causes the loss of cellular identities during aging. Taken together, our study establishes critical implication of CGI-mediated chromatin architecture in age-associated degenerative changes and loss of tissue homeostasis.


Sign in / Sign up

Export Citation Format

Share Document