scalable analysis
Recently Published Documents


TOTAL DOCUMENTS

86
(FIVE YEARS 33)

H-INDEX

16
(FIVE YEARS 3)

2022 ◽  
Vol 18 (1) ◽  
pp. e1009702
Author(s):  
Ulrike Münzner ◽  
Tomoya Mori ◽  
Marcus Krantz ◽  
Edda Klipp ◽  
Tatsuya Akutsu

Boolean networks (BNs) have been developed to describe various biological processes, which requires analysis of attractors, the long-term stable states. While many methods have been proposed to detection and enumeration of attractors, there are no methods which have been demonstrated to be theoretically better than the naive method and be practically used for large biological BNs. Here, we present a novel method to calculate attractors based on a priori information, which works much and verifiably faster than the naive method. We apply the method to two BNs which differ in size, modeling formalism, and biological scope. Despite these differences, the method presented here provides a powerful tool for the analysis of both networks. First, our analysis of a BN studying the effect of the microenvironment during angiogenesis shows that the previously defined microenvironments inducing the specialized phalanx behavior in endothelial cells (ECs) additionally induce stalk behavior. We obtain this result from an extended network version which was previously not analyzed. Second, we were able to heuristically detect attractors in a cell cycle control network formalized as a bipartite Boolean model (bBM) with 3158 nodes. These attractors are directly interpretable in terms of genotype-to-phenotype relationships, allowing network validation equivalent to an in silico mutagenesis screen. Our approach contributes to the development of scalable analysis methods required for whole-cell modeling efforts.


2021 ◽  
Author(s):  
Ilya Plyusnin ◽  
Phuoc Thien Truong Nguyen ◽  
Tarja Sironen ◽  
Olli Vapalahti ◽  
Teemu Smura ◽  
...  

Summary: SARS-CoV-2 is the highly transmissible etiologic agent of coronavirus disease 2019 (COVID-19) and has become a global scientific and public health challenge since December 2019. Several new variants of SARS-CoV-2 have emerged globally raising concern about prevention and treatment of COVID-19. Early detection and in depth analysis of the emerging variants allowing pre-emptive alert and mitigation efforts are thus of paramount importance. Here we present ClusTRace, a novel bioinformatic pipeline for a fast and scalable analysis of sequence clusters or clades in large viral phylogenies. ClusTRace offers several high level functionalities including outlier filtering, aligning, phylogenetic tree reconstruction, cluster or clade extraction, variant calling, visualization and reporting. ClusTRace was developed as an aid for COVID-19 transmission chain tracing in Finland and the main emphasis has been on fast and unsupervised screening of phylogenies for markers of super-spreading events and other features of concern, such as high rates of cluster growth and/or accumulation of novel mutations. Availability: All code is freely available from https://bitbucket.org/plyusnin/clustrace/


GigaScience ◽  
2021 ◽  
Vol 10 (9) ◽  
Author(s):  
Jaclyn Smith ◽  
Yao Shi ◽  
Michael Benedikt ◽  
Milos Nikolic

Abstract Background Targeted diagnosis and treatment options are dependent on insights drawn from multi-modal analysis of large-scale biomedical datasets. Advances in genomics sequencing, image processing, and medical data management have supported data collection and management within medical institutions. These efforts have produced large-scale datasets and have enabled integrative analyses that provide a more thorough look of the impact of a disease on the underlying system. The integration of large-scale biomedical data commonly involves several complex data transformation steps, such as combining datasets to build feature vectors for learning analysis. Thus, scalable data integration solutions play a key role in the future of targeted medicine. Though large-scale data processing frameworks have shown promising performance for many domains, they fail to support scalable processing of complex datatypes. Solution To address these issues and achieve scalable processing of multi-modal biomedical data, we present TraNCE, a framework that automates the difficulties of designing distributed analyses with complex biomedical data types. Performance We outline research and clinical applications for the platform, including data integration support for building feature sets for classification. We show that the system is capable of outperforming the common alternative, based on “flattening” complex data structures, and runs efficiently when alternative approaches are unable to perform at all.


Author(s):  
Patrick G. Bridges ◽  
Zeinab Akhavan ◽  
Jonathan Wheeler ◽  
Hussein Al-Azzawi ◽  
Orlando Albillar ◽  
...  
Keyword(s):  

2021 ◽  
Author(s):  
Christopher Rost ◽  
Kevin Gomez ◽  
Matthias Täschner ◽  
Philip Fritzsche ◽  
Lucas Schons ◽  
...  

AbstractTemporal property graphs are graphs whose structure and properties change over time. Temporal graph datasets tend to be large due to stored historical information, asking for scalable analysis capabilities. We give a complete overview of Gradoop, a graph dataflow system for scalable, distributed analytics of temporal property graphs which has been continuously developed since 2005. Its graph model TPGM allows bitemporal modeling not only of vertices and edges but also of graph collections. A declarative analytical language called GrALa allows analysts to flexibly define analytical graph workflows by composing different operators that support temporal graph analysis. Built on a distributed dataflow system, large temporal graphs can be processed on a shared-nothing cluster. We present the system architecture of Gradoop, its data model TPGM with composable temporal graph operators, like snapshot, difference, pattern matching, graph grouping and several implementation details. We evaluate the performance and scalability of selected operators and a composed workflow for synthetic and real-world temporal graphs with up to 283 M vertices and 1.8 B edges, and a graph lifetime of about 8 years with up to 20 M new edges per year. We also reflect on lessons learned from the Gradoop effort.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 374
Author(s):  
Jeroen Gilis ◽  
Kristoffer Vitting-Seerup ◽  
Koen Van den Berge ◽  
Lieven Clement

Alternative splicing produces multiple functional transcripts from a single gene. Dysregulation of splicing is known to be associated with disease and as a hallmark of cancer. Existing tools for differential transcript usage (DTU) analysis either lack in performance, cannot account for complex experimental designs or do not scale to massive scRNA-seq data. We introduce satuRn, a fast and flexible quasi-binomial generalized linear modelling framework that is on par with the best performing DTU methods from the bulk RNA-seq realm, while providing good false discovery rate control, addressing complex experimental designs and scaling to scRNA-seq applications.


2021 ◽  
Vol 17 (4) ◽  
pp. e1008806 ◽  
Author(s):  
Changjia Cai ◽  
Johannes Friedrich ◽  
Amrita Singh ◽  
M. Hossein Eybposh ◽  
Eftychios A. Pnevmatikakis ◽  
...  

Voltage imaging enables monitoring neural activity at sub-millisecond and sub-cellular scale, unlocking the study of subthreshold activity, synchrony, and network dynamics with unprecedented spatio-temporal resolution. However, high data rates (>800MB/s) and low signal-to-noise ratios create bottlenecks for analyzing such datasets. Here we present VolPy, an automated and scalable pipeline to pre-process voltage imaging datasets. VolPy features motion correction, memory mapping, automated segmentation, denoising and spike extraction, all built on a highly parallelizable, modular, and extensible framework optimized for memory and speed. To aid automated segmentation, we introduce a corpus of 24 manually annotated datasets from different preparations, brain areas and voltage indicators. We benchmark VolPy against ground truth segmentation, simulations and electrophysiology recordings, and we compare its performance with existing algorithms in detecting spikes. Our results indicate that VolPy’s performance in spike extraction and scalability are state-of-the-art.


2021 ◽  
Vol 17 (2) ◽  
pp. e1008647 ◽  
Author(s):  
Anand V. Sastry ◽  
Alyssa Hu ◽  
David Heckmann ◽  
Saugat Poudel ◽  
Erol Kavvas ◽  
...  

The availability of bacterial transcriptomes has dramatically increased in recent years. This data deluge could result in detailed inference of underlying regulatory networks, but the diversity of experimental platforms and protocols introduces critical biases that could hinder scalable analysis of existing data. Here, we show that the underlying structure of the E. coli transcriptome, as determined by Independent Component Analysis (ICA), is conserved across multiple independent datasets, including both RNA-seq and microarray datasets. We subsequently combined five transcriptomics datasets into a large compendium containing over 800 expression profiles and discovered that its underlying ICA-based structure was still comparable to that of the individual datasets. With this understanding, we expanded our analysis to over 3,000 E. coli expression profiles and predicted three high-impact regulons that respond to oxidative stress, anaerobiosis, and antibiotic treatment. ICA thus enables deep analysis of disparate data to uncover new insights that were not visible in the individual datasets.


2021 ◽  
Vol 31 (2) ◽  
pp. 1-26
Author(s):  
Paul Piho ◽  
Jane Hillston

Fluid approximation results provide powerful methods for scalable analysis of models of population dynamics with large numbers of discrete states and have seen wide-ranging applications in modelling biological and computer-based systems and model checking. However, the applicability of these methods relies on assumptions that are not easily met in a number of modelling scenarios. This article focuses on one particular class of scenarios in which rapid information propagation in the system is considered. In particular, we study the case where changes in population dynamics are induced by information about the environment being communicated between components of the population via broadcast communication. We see how existing hybrid fluid limit results, resulting in piecewise deterministic Markov processes, can be adapted to such models. Finally, we propose heuristic constructions for extracting the mean behaviour from the resulting approximations without the need to simulate individual trajectories.


2021 ◽  
Author(s):  
Vikram Chandrashekhar ◽  
Daniel J Tward ◽  
Devin Crowley ◽  
Ailey K Crow ◽  
Matthew A Wright ◽  
...  

AbstractQuantifying terabyte-scale multi-modal human and animal imaging data requires scalable analysis tools. We developed CloudReg, an open-source, automatic, terabyte-scale, cloud-based image analysis pipeline that pre-processes and registers cross-modal volumetric datasets with artifacts via spatially-varying polynomial intensity transform. CloudReg accurately registers the following datasets to their respective atlases: in vivo human and ex vivo macaque brain magnetic resonance imaging, ex vivo mouse brain micro-computed tomography, and cleared murine brain light-sheet microscopy.


Sign in / Sign up

Export Citation Format

Share Document