scalable analysis Latest Research Papers

Identification of periodic attractors in Boolean networks using a priori information

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009702 ◽

2022 ◽

Vol 18 (1) ◽

pp. e1009702

Author(s):

Ulrike Münzner ◽

Tomoya Mori ◽

Marcus Krantz ◽

Edda Klipp ◽

Tatsuya Akutsu

Keyword(s):

Cell Cycle Control ◽

A Priori ◽

Boolean Model ◽

Boolean Networks ◽

Stable States ◽

Naive Method ◽

A Cell ◽

Priori Information ◽

Scalable Analysis ◽

Modeling Formalism

Boolean networks (BNs) have been developed to describe various biological processes, which requires analysis of attractors, the long-term stable states. While many methods have been proposed to detection and enumeration of attractors, there are no methods which have been demonstrated to be theoretically better than the naive method and be practically used for large biological BNs. Here, we present a novel method to calculate attractors based on a priori information, which works much and verifiably faster than the naive method. We apply the method to two BNs which differ in size, modeling formalism, and biological scope. Despite these differences, the method presented here provides a powerful tool for the analysis of both networks. First, our analysis of a BN studying the effect of the microenvironment during angiogenesis shows that the previously defined microenvironments inducing the specialized phalanx behavior in endothelial cells (ECs) additionally induce stalk behavior. We obtain this result from an extended network version which was previously not analyzed. Second, we were able to heuristically detect attractors in a cell cycle control network formalized as a bipartite Boolean model (bBM) with 3158 nodes. These attractors are directly interpretable in terms of genotype-to-phenotype relationships, allowing network validation equivalent to an in silico mutagenesis screen. Our approach contributes to the development of scalable analysis methods required for whole-cell modeling efforts.

ClusTRace, a bioinformatic pipeline for analyzing clusters in virus phylogenies

10.1101/2021.12.09.471941 ◽

2021 ◽

Author(s):

Ilya Plyusnin ◽

Phuoc Thien Truong Nguyen ◽

Tarja Sironen ◽

Olli Vapalahti ◽

Teemu Smura ◽

...

Keyword(s):

Variant Calling ◽

Etiologic Agent ◽

Tree Reconstruction ◽

Bioinformatic Pipeline ◽

Transmission Chain ◽

Main Emphasis ◽

Depth Analysis ◽

Public Health Challenge ◽

High Level ◽

Scalable Analysis

Summary: SARS-CoV-2 is the highly transmissible etiologic agent of coronavirus disease 2019 (COVID-19) and has become a global scientific and public health challenge since December 2019. Several new variants of SARS-CoV-2 have emerged globally raising concern about prevention and treatment of COVID-19. Early detection and in depth analysis of the emerging variants allowing pre-emptive alert and mitigation efforts are thus of paramount importance. Here we present ClusTRace, a novel bioinformatic pipeline for a fast and scalable analysis of sequence clusters or clades in large viral phylogenies. ClusTRace offers several high level functionalities including outlier filtering, aligning, phylogenetic tree reconstruction, cluster or clade extraction, variant calling, visualization and reporting. ClusTRace was developed as an aid for COVID-19 transmission chain tracing in Finland and the main emphasis has been on fast and unsupervised screening of phylogenies for markers of super-spreading events and other features of concern, such as high rates of cluster growth and/or accumulation of novel mutations. Availability: All code is freely available from https://bitbucket.org/plyusnin/clustrace/

Scalable analysis of multi-modal biomedical data

GigaScience ◽

10.1093/gigascience/giab058 ◽

2021 ◽

Vol 10 (9) ◽

Cited By ~ 1

Author(s):

Jaclyn Smith ◽

Yao Shi ◽

Michael Benedikt ◽

Milos Nikolic

Keyword(s):

Data Integration ◽

Large Scale ◽

Treatment Options ◽

Complex Data ◽

Biomedical Data ◽

Data Types ◽

Large Scale Data Processing ◽

Scalable Analysis ◽

Targeted Medicine ◽

The Impact

Abstract Background Targeted diagnosis and treatment options are dependent on insights drawn from multi-modal analysis of large-scale biomedical datasets. Advances in genomics sequencing, image processing, and medical data management have supported data collection and management within medical institutions. These efforts have produced large-scale datasets and have enabled integrative analyses that provide a more thorough look of the impact of a disease on the underlying system. The integration of large-scale biomedical data commonly involves several complex data transformation steps, such as combining datasets to build feature vectors for learning analysis. Thus, scalable data integration solutions play a key role in the future of targeted medicine. Though large-scale data processing frameworks have shown promising performance for many domains, they fail to support scalable processing of complex datatypes. Solution To address these issues and achieve scalable processing of multi-modal biomedical data, we present TraNCE, a framework that automates the difficulties of designing distributed analyses with complex biomedical data types. Performance We outline research and clinical applications for the platform, including data integration support for building feature sets for classification. We show that the system is capable of outperforming the common alternative, based on “flattening” complex data structures, and runs efficiently when alternative approaches are unable to perform at all.

SAMPRA: Scalable Analysis, Management, Protection of Research Artifacts

10.1109/escience51609.2021.00028 ◽

2021 ◽

Author(s):

Patrick G. Bridges ◽

Zeinab Akhavan ◽

Jonathan Wheeler ◽

Hussein Al-Azzawi ◽

Orlando Albillar ◽

...

Keyword(s):

Scalable Analysis

Distributed temporal graph analytics with GRADOOP

The VLDB Journal ◽

10.1007/s00778-021-00667-4 ◽

2021 ◽

Author(s):

Christopher Rost ◽

Kevin Gomez ◽

Matthias Täschner ◽

Philip Fritzsche ◽

Lucas Schons ◽

...

Keyword(s):

Graph Model ◽

Lessons Learned ◽

Temporal Property ◽

Graph Analytics ◽

Matching Graph ◽

Temporal Graph ◽

Scalable Analysis ◽

Temporal Graphs ◽

Distributed Analytics ◽

Difference Pattern

AbstractTemporal property graphs are graphs whose structure and properties change over time. Temporal graph datasets tend to be large due to stored historical information, asking for scalable analysis capabilities. We give a complete overview of Gradoop, a graph dataflow system for scalable, distributed analytics of temporal property graphs which has been continuously developed since 2005. Its graph model TPGM allows bitemporal modeling not only of vertices and edges but also of graph collections. A declarative analytical language called GrALa allows analysts to flexibly define analytical graph workflows by composing different operators that support temporal graph analysis. Built on a distributed dataflow system, large temporal graphs can be processed on a shared-nothing cluster. We present the system architecture of Gradoop, its data model TPGM with composable temporal graph operators, like snapshot, difference, pattern matching, graph grouping and several implementation details. We evaluate the performance and scalability of selected operators and a composed workflow for synthetic and real-world temporal graphs with up to 283 M vertices and 1.8 B edges, and a graph lifetime of about 8 years with up to 20 M new edges per year. We also reflect on lessons learned from the Gradoop effort.

satuRn: Scalable analysis of differential transcript usage for bulk and single-cell RNA-sequencing applications

F1000Research ◽

10.12688/f1000research.51749.1 ◽

2021 ◽

Vol 10 ◽

pp. 374

Author(s):

Jeroen Gilis ◽

Kristoffer Vitting-Seerup ◽

Koen Van den Berge ◽

Lieven Clement

Keyword(s):

Rate Control ◽

Single Gene ◽

Experimental Designs ◽

Rna Seq ◽

Modelling Framework ◽

Gene Dysregulation ◽

False Discovery ◽

Generalized Linear Modelling ◽

Linear Modelling ◽

Scalable Analysis

Alternative splicing produces multiple functional transcripts from a single gene. Dysregulation of splicing is known to be associated with disease and as a hallmark of cancer. Existing tools for differential transcript usage (DTU) analysis either lack in performance, cannot account for complex experimental designs or do not scale to massive scRNA-seq data. We introduce satuRn, a fast and flexible quasi-binomial generalized linear modelling framework that is on par with the best performing DTU methods from the bulk RNA-seq realm, while providing good false discovery rate control, addressing complex experimental designs and scaling to scRNA-seq applications.

VolPy: Automated and scalable analysis pipelines for voltage imaging datasets

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008806 ◽

2021 ◽

Vol 17 (4) ◽

pp. e1008806 ◽

Cited By ~ 1

Author(s):

Changjia Cai ◽

Johannes Friedrich ◽

Amrita Singh ◽

M. Hossein Eybposh ◽

Eftychios A. Pnevmatikakis ◽

...

Keyword(s):

Motion Correction ◽

State Of The Art ◽

Ground Truth ◽

Automated Segmentation ◽

Voltage Imaging ◽

High Data ◽

Memory Mapping ◽

Data Rates ◽

Spatio Temporal ◽

Scalable Analysis

Voltage imaging enables monitoring neural activity at sub-millisecond and sub-cellular scale, unlocking the study of subthreshold activity, synchrony, and network dynamics with unprecedented spatio-temporal resolution. However, high data rates (>800MB/s) and low signal-to-noise ratios create bottlenecks for analyzing such datasets. Here we present VolPy, an automated and scalable pipeline to pre-process voltage imaging datasets. VolPy features motion correction, memory mapping, automated segmentation, denoising and spike extraction, all built on a highly parallelizable, modular, and extensible framework optimized for memory and speed. To aid automated segmentation, we introduce a corpus of 24 manually annotated datasets from different preparations, brain areas and voltage indicators. We benchmark VolPy against ground truth segmentation, simulations and electrophysiology recordings, and we compare its performance with existing algorithms in detecting spikes. Our results indicate that VolPy’s performance in spike extraction and scalability are state-of-the-art.

Independent component analysis recovers consistent regulatory signals from disparate datasets

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008647 ◽

2021 ◽

Vol 17 (2) ◽

pp. e1008647 ◽

Cited By ~ 1

Author(s):

Anand V. Sastry ◽

Alyssa Hu ◽

David Heckmann ◽

Saugat Poudel ◽

Erol Kavvas ◽

...

Keyword(s):

Independent Component Analysis ◽

Regulatory Networks ◽

Expression Profiles ◽

Component Analysis ◽

Independent Component ◽

Underlying Structure ◽

Rna Seq ◽

The Individual ◽

Scalable Analysis ◽

Microarray Datasets

The availability of bacterial transcriptomes has dramatically increased in recent years. This data deluge could result in detailed inference of underlying regulatory networks, but the diversity of experimental platforms and protocols introduces critical biases that could hinder scalable analysis of existing data. Here, we show that the underlying structure of the E. coli transcriptome, as determined by Independent Component Analysis (ICA), is conserved across multiple independent datasets, including both RNA-seq and microarray datasets. We subsequently combined five transcriptomics datasets into a large compendium containing over 800 expression profiles and discovered that its underlying ICA-based structure was still comparable to that of the individual datasets. With this understanding, we expanded our analysis to over 3,000 E. coli expression profiles and predicted three high-impact regulons that respond to oxidative stress, anaerobiosis, and antibiotic treatment. ICA thus enables deep analysis of disparate data to uncover new insights that were not visible in the individual datasets.

Fluid Approximation–based Analysis for Mode-switching Population Dynamics

ACM Transactions on Modeling and Computer Simulation ◽

10.1145/3441680 ◽

2021 ◽

Vol 31 (2) ◽

pp. 1-26

Author(s):

Paul Piho ◽

Jane Hillston

Keyword(s):

Population Dynamics ◽

Mode Switching ◽

Fluid Limit ◽

Fluid Approximation ◽

Large Numbers ◽

Piecewise Deterministic Markov Processes ◽

Computer Based ◽

The Mean ◽

Scalable Analysis ◽

Hybrid Fluid

Fluid approximation results provide powerful methods for scalable analysis of models of population dynamics with large numbers of discrete states and have seen wide-ranging applications in modelling biological and computer-based systems and model checking. However, the applicability of these methods relies on assumptions that are not easily met in a number of modelling scenarios. This article focuses on one particular class of scenarios in which rapid information propagation in the system is considered. In particular, we study the case where changes in population dynamics are induced by information about the environment being communicated between components of the population via broadcast communication. We see how existing hybrid fluid limit results, resulting in piecewise deterministic Markov processes, can be adapted to such models. Finally, we propose heuristic constructions for extracting the mean behaviour from the resulting approximations without the need to simulate individual trajectories.

CloudReg: Automatic Terabyte-Scale Cross-Modal Brain Volume Registration

10.1101/2021.01.26.428355 ◽

2021 ◽

Author(s):

Vikram Chandrashekhar ◽

Daniel J Tward ◽

Devin Crowley ◽

Ailey K Crow ◽

Matthew A Wright ◽

...

Keyword(s):

Ex Vivo ◽

Brain Magnetic Resonance Imaging ◽

Light Sheet ◽

Imaging Data ◽

Micro Computed Tomography ◽

Light Sheet Microscopy ◽

Volume Registration ◽

Scalable Analysis ◽

Spatially Varying

AbstractQuantifying terabyte-scale multi-modal human and animal imaging data requires scalable analysis tools. We developed CloudReg, an open-source, automatic, terabyte-scale, cloud-based image analysis pipeline that pre-processes and registers cross-modal volumetric datasets with artifacts via spatially-varying polynomial intensity transform. CloudReg accurately registers the following datasets to their respective atlases: in vivo human and ex vivo macaque brain magnetic resonance imaging, ex vivo mouse brain micro-computed tomography, and cleared murine brain light-sheet microscopy.

scalable analysis
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Identification of periodic attractors in Boolean networks using a priori information

ClusTRace, a bioinformatic pipeline for analyzing clusters in virus phylogenies

Scalable analysis of multi-modal biomedical data

SAMPRA: Scalable Analysis, Management, Protection of Research Artifacts

Distributed temporal graph analytics with GRADOOP

satuRn: Scalable analysis of differential transcript usage for bulk and single-cell RNA-sequencing applications

VolPy: Automated and scalable analysis pipelines for voltage imaging datasets

Independent component analysis recovers consistent regulatory signals from disparate datasets

Fluid Approximation–based Analysis for Mode-switching Population Dynamics

CloudReg: Automatic Terabyte-Scale Cross-Modal Brain Volume Registration

Export Citation Format

scalable analysisRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Identification of periodic attractors in Boolean networks using a priori information

ClusTRace, a bioinformatic pipeline for analyzing clusters in virus phylogenies

Scalable analysis of multi-modal biomedical data

SAMPRA: Scalable Analysis, Management, Protection of Research Artifacts

Distributed temporal graph analytics with GRADOOP

satuRn: Scalable analysis of differential transcript usage for bulk and single-cell RNA-sequencing applications

VolPy: Automated and scalable analysis pipelines for voltage imaging datasets

Independent component analysis recovers consistent regulatory signals from disparate datasets

Fluid Approximation–based Analysis for Mode-switching Population Dynamics

CloudReg: Automatic Terabyte-Scale Cross-Modal Brain Volume Registration

scalable analysis
Recently Published Documents