Learning dynamics from large biological data sets: Machine learning meets systems biology

2020 ◽  
Vol 22 ◽  
pp. 1-7
Author(s):  
William Gilpin ◽  
Yitong Huang ◽  
Daniel B. Forger
Materials ◽  
2020 ◽  
Vol 13 (14) ◽  
pp. 3083
Author(s):  
Maciej E. Marchwiany ◽  
Magdalena Birowska ◽  
Mariusz Popielski ◽  
Jacek A. Majewski ◽  
Agnieszka M. Jastrzębska

To speed up the implementation of the two-dimensional materials in the development of potential biomedical applications, the toxicological aspects toward human health need to be addressed. Due to time-consuming and expensive analysis, only part of the continuously expanding family of 2D materials can be tested in vitro. The machine learning methods can be used—by extracting new insights from available biological data sets, and provide further guidance for experimental studies. This study identifies the most relevant highly surface-specific features that might be responsible for cytotoxic behavior of 2D materials, especially MXenes. In particular, two factors, namely, the presence of transition metal oxides and lithium atoms on the surface, are identified as cytotoxicity-generating features. The developed machine learning model succeeds in predicting toxicity for other 2D MXenes, previously not tested in vitro, and hence, is able to complement the existing knowledge coming from in vitro studies. Thus, we claim that it might be one of the solutions for reducing the number of toxicological studies needed, and allows for minimizing failures in future biological applications.


2021 ◽  
Vol 3 (Supplement_2) ◽  
pp. ii1-ii1
Author(s):  
Niven Narain ◽  
Michael Kiebish ◽  
Vivek Vishnudas ◽  
Vladimir Tolstikov ◽  
Gregory Miller ◽  
...  

Abstract The past decade has been witness to an explosive proliferation of data analytics modalities, all seeking to unravel insight into large-scale data sets. Machine learning and AI methodologies now occupy a central role in analyses of data sets that range in nature from genomics, “omics”, clinical, real-world evidence, and demographic data. Despite advances in data analytics/machine learning, access to complex population level clinical and related datasets, translating information into actionable guidance in human health and disease remains a challenge. Interrogative Biology, a systems biology/AI platform generates an unbiased, data-informed network for identifying targets (disease drivers) and biomarkers for disease interception at the point of transition to dysregulation, preceding clinical phenotype. The data topology is enabled by a systematic acquisition and interrogation of longitudinal bio-samples of clinically annotated human matrices (e.g. blood, urine, saliva, tissues) subjected to comprehensive multi-omic (genomic, proteomics, lipidomics and metabolomics) profiling over time. The molecular profiles are integrated with clinical health information using Bayesian artificial intelligence analytics, bAIcis, to generate causal network maps of overall health. Differentials between “health” and “disease” network maps identifies drivers (targets and biomarkers) of disease and are rapidly validated in orthogonal wet-lab disease specific perturbed model systems. Target information imputed into the bAIcis framework can define therapeutic strategies including identification of existing drugs and bio-actives for corrective response. Using a combination of clinic based sampling and dried blood spot analysis for longitudinal dynamic monitoring of markers of health-disease status provides opportunity for proactive clinical management and intervention for corrective response in advance of major deterioration of health status. Taken together, the approach herein allows for health surveillance based on in-depth biological profiling of alterations in the patient narrative to guide treatment modalities and strategies in a longitudinal and dynamic manner to identify, track, intercept, and arrest human disease.


2019 ◽  
Vol 32 (1) ◽  
pp. 45-55 ◽  
Author(s):  
Bharat Mishra ◽  
Nilesh Kumar ◽  
M. Shahid Mukhtar

Systems biology is an inclusive approach to study the static and dynamic emergent properties on a global scale by integrating multiomics datasets to establish qualitative and quantitative associations among multiple biological components. With an abundance of improved high throughput -omics datasets, network-based analyses and machine learning technologies are playing a pivotal role in comprehensive understanding of biological systems. Network topological features reveal most important nodes within a network as well as prioritize significant molecular components for diverse biological networks, including coexpression, protein–protein interaction, and gene regulatory networks. Machine learning techniques provide enormous predictive power through specific feature extraction from biological data. Deep learning, a subtype of machine learning, has plausible future applications because a domain expert for feature extraction is not needed in this algorithm. Inspired by diverse domains of biology, we here review classic systems biology techniques applied in plant immunity thus far. We also discuss additional advanced approaches in both graph theory and machine learning, which may provide new insights for understanding plant–microbe interactions. Finally, we propose a hybrid approach in plant immune systems that harnesses the power of both network biology and machine learning, with a potential to be applicable to both model systems and agronomically important crop plants.


2017 ◽  
Author(s):  
Sofia Triantafillou ◽  
Vincenzo Lagani ◽  
Christina Heinze-Deml ◽  
Angelika Schmidt ◽  
Jesper Tegner ◽  
...  

ABSTRACTLearning the causal relationships that define a molecular system allows us to predict how the system will respond to different interventions. Distinguishing causality from mere association typically requires randomized experiments. Methods for automated causal discovery from limited experiments exist, but have so far rarely been tested in systems biology applications. In this work, we apply state-of-the art causal discovery methods on a large collection of public mass cytometry data sets, measuring intra-cellular signaling proteins of the human immune system and their response to several perturbations. We show how different experimental conditions can be used to facilitate causal discovery, and apply two fundamental methods that produce context-specific causal predictions. Causal predictions were reproducible across independent data sets from two different studies, but often disagree with the KEGG pathway databases. Within this context, we discuss the caveats we need to overcome for automated causal discovery to become a part of the routine data analysis in systems biology.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Simon Dirmeier ◽  
Mario Emmenlauer ◽  
Christoph Dehio ◽  
Niko Beerenwinkel

Abstract Background Analysing large and high-dimensional biological data sets poses significant computational difficulties for bioinformaticians due to lack of accessible tools that scale to hundreds of millions of data points. Results We developed a novel machine learning command line tool called PyBDA for automated, distributed analysis of big biological data sets. By using Apache Spark in the backend, PyBDA scales to data sets beyond the size of current applications. It uses Snakemake in order to automatically schedule jobs to a high-performance computing cluster. We demonstrate the utility of the software by analyzing image-based RNA interference data of 150 million single cells. Conclusion PyBDA allows automated, easy-to-use data analysis using common statistical methods and machine learning algorithms. It can be used with simple command line calls entirely making it accessible to a broad user base. PyBDA is available at https://pybda.rtfd.io.


2020 ◽  
Author(s):  
Maciej Marchwiany ◽  
Magdalena Birowska ◽  
Mariusz Popielski ◽  
Jacek A. Majewski ◽  
Agnieszka M. Jastrzębska

Abstract Background: Prediction of the compound cytotoxicity is a crucial issue in the development of new drugs and potential biomedical applications. Experimental studies are time-consuming and expensive. Machine learning models can quickly predict the cytotoxicity of compounds, by extracting new insights from large materials and biological data sets, and provide further guidance for experimental studies. Results: Here, we identify the most relevant features that are responsible for the cytotoxic behavior of layered MXenes materials. The most important result of our work is the identification of 2D MXenes specific surface parameters as responsible for the potential cytotoxicity of these materials, in particular, the presence of transition metal oxides and Lithium atoms on the surface. After successful verification of the correct predictions of our model,we have also succeeded in predicting toxicity for 2D MXenes not tested in vitro. Hence, we have been able to complement the existing knowledge coming from in vitro studies. Conclusions: Our results allow for the future selection of synthesis methods preventing surface oxidation, which should allow production of non-toxic 2D MXenes. Such materials might find application in many fields of science and technology, especially in biotechnology and nanomedicine.


2021 ◽  
Vol 34 (2) ◽  
pp. 541-549 ◽  
Author(s):  
Leihong Wu ◽  
Ruili Huang ◽  
Igor V. Tetko ◽  
Zhonghua Xia ◽  
Joshua Xu ◽  
...  

2021 ◽  
Vol 13 (13) ◽  
pp. 2433
Author(s):  
Shu Yang ◽  
Fengchao Peng ◽  
Sibylle von Löwis ◽  
Guðrún Nína Petersen ◽  
David Christian Finger

Doppler lidars are used worldwide for wind monitoring and recently also for the detection of aerosols. Automatic algorithms that classify the lidar signals retrieved from lidar measurements are very useful for the users. In this study, we explore the value of machine learning to classify backscattered signals from Doppler lidars using data from Iceland. We combined supervised and unsupervised machine learning algorithms with conventional lidar data processing methods and trained two models to filter noise signals and classify Doppler lidar observations into different classes, including clouds, aerosols and rain. The results reveal a high accuracy for noise identification and aerosols and clouds classification. However, precipitation detection is underestimated. The method was tested on data sets from two instruments during different weather conditions, including three dust storms during the summer of 2019. Our results reveal that this method can provide an efficient, accurate and real-time classification of lidar measurements. Accordingly, we conclude that machine learning can open new opportunities for lidar data end-users, such as aviation safety operators, to monitor dust in the vicinity of airports.


2021 ◽  
Author(s):  
Austė Kanapeckaitė ◽  
Neringa Burokienė

Abstract At present, heart failure (HF) treatment only targets the symptoms based on the left ventricle dysfunction severity; however, the lack of systemic ‘omics’ studies and available biological data to uncover the heterogeneous underlying mechanisms signifies the need to shift the analytical paradigm towards network-centric and data mining approaches. This study, for the first time, aimed to investigate how bulk and single cell RNA-sequencing as well as the proteomics analysis of the human heart tissue can be integrated to uncover HF-specific networks and potential therapeutic targets or biomarkers. We also aimed to address the issue of dealing with a limited number of samples and to show how appropriate statistical models, enrichment with other datasets as well as machine learning-guided analysis can aid in such cases. Furthermore, we elucidated specific gene expression profiles using transcriptomic and mined data from public databases. This was achieved using the two-step machine learning algorithm to predict the likelihood of the therapeutic target or biomarker tractability based on a novel scoring system, which has also been introduced in this study. The described methodology could be very useful for the target or biomarker selection and evaluation during the pre-clinical therapeutics development stage as well as disease progression monitoring. In addition, the present study sheds new light into the complex aetiology of HF, differentiating between subtle changes in dilated cardiomyopathies (DCs) and ischemic cardiomyopathies (ICs) on the single cell, proteome and whole transcriptome level, demonstrating that HF might be dependent on the involvement of not only the cardiomyocytes but also on other cell populations. Identified tissue remodelling and inflammatory processes can be beneficial when selecting targeted pharmacological management for DCs or ICs, respectively.


Sign in / Sign up

Export Citation Format

Share Document