scholarly journals Fully interpretable deep learning model of transcriptional control

2020 ◽  
Vol 36 (Supplement_1) ◽  
pp. i499-i507 ◽  
Author(s):  
Yi Liu ◽  
Kenneth Barr ◽  
John Reinitz

Abstract Motivation The universal expressibility assumption of Deep Neural Networks (DNNs) is the key motivation behind recent worksin the systems biology community to employDNNs to solve important problems in functional genomics and moleculargenetics. Typically, such investigations have taken a ‘black box’ approach in which the internal structure of themodel used is set purely by machine learning considerations with little consideration of representing the internalstructure of the biological system by the mathematical structure of the DNN. DNNs have not yet been applied to thedetailed modeling of transcriptional control in which mRNA production is controlled by the binding of specific transcriptionfactors to DNA, in part because such models are in part formulated in terms of specific chemical equationsthat appear different in form from those used in neural networks. Results In this paper, we give an example of a DNN whichcan model the detailed control of transcription in a precise and predictive manner. Its internal structure is fully interpretableand is faithful to underlying chemistry of transcription factor binding to DNA. We derive our DNN from asystems biology model that was not previously recognized as having a DNN structure. Although we apply our DNNto data from the early embryo of the fruit fly Drosophila, this system serves as a test bed for analysis of much larger datasets obtained by systems biology studies on a genomic scale. . Availability and implementation The implementation and data for the models used in this paper are in a zip file in the supplementary material. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Yi Liu ◽  
Kenneth Barr ◽  
John Reinitz

AbstractThe universal expressibility assumption of Deep Neural Networks (DNNs) is the key motivation behind recent work in the system biology community to employ DNNs to solve important problems in functional genomics and molecular genetics. Because of the black box nature of DNNs, such assumptions, while useful in practice, are unsatisfactory for scientific analysis. In this paper, we give an example of a DNN in which every layer is interpretable. Moreover, this DNN is biologically validated and predictive. We derive our DNN from a systems biology model that was not previously recognized as having a DNN structure. This DNN is concerned with a key unsolved biological problem, which is to understand the DNA regulatory code which controls how genes in multicellular organisms are turned on and off. Although we apply our DNN to data from the early embryo of the fruit fly Drosophila, this system serves as a testbed for analysis of much larger data sets obtained by systems biology studies on a genomic scale.


2020 ◽  
Vol 36 (16) ◽  
pp. 4527-4529
Author(s):  
Ales Saska ◽  
David Tichy ◽  
Robert Moore ◽  
Achilles Rasquinha ◽  
Caner Akdas ◽  
...  

Abstract Summary Visualizing a network provides a concise and practical understanding of the information it represents. Open-source web-based libraries help accelerate the creation of biologically based networks and their use. ccNetViz is an open-source, high speed and lightweight JavaScript library for visualization of large and complex networks. It implements customization and analytical features for easy network interpretation. These features include edge and node animations, which illustrate the flow of information through a network as well as node statistics. Properties can be defined a priori or dynamically imported from models and simulations. ccNetViz is thus a network visualization library particularly suited for systems biology. Availability and implementation The ccNetViz library, demos and documentation are freely available at http://helikarlab.github.io/ccNetViz/. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Michelle Przedborski ◽  
Munisha Smalley ◽  
Saravanan Thiyagarajan ◽  
Aaron Goldman ◽  
Mohammad Kohandel

AbstractAnti-PD-1 immunotherapy has recently shown tremendous success for the treatment of several aggressive cancers. However, variability and unpredictability in treatment outcome have been observed, and are thought to be driven by patient-specific biology and interactions of the patient’s immune system with the tumor. Here we develop an integrative systems biology and machine learning approach, built around clinical data, to predict patient response to anti-PD-1 immunotherapy and to improve the response rate. Using this approach, we determine biomarkers of patient response and identify potential mechanisms of drug resistance. We develop systems biology informed neural networks (SBINN) to calculate patient-specific kinetic parameter values and to predict clinical outcome. We show how transfer learning can be leveraged with simulated clinical data to significantly improve the response prediction accuracy of the SBINN. Further, we identify novel drug combinations and optimize the treatment protocol for triple combination therapy consisting of IL-6 inhibition, recombinant IL-12, and anti-PD-1 immunotherapy in order to maximize patient response. We also find unexpected differences in protein expression levels between response phenotypes which complement recent clinical findings. Our approach has the potential to aid in the development of targeted experiments for patient drug screening as well as identify novel therapeutic targets.


2019 ◽  
Vol 36 (1) ◽  
pp. 272-279 ◽  
Author(s):  
Hannah F Löchel ◽  
Dominic Eger ◽  
Theodor Sperlea ◽  
Dominik Heider

AbstractMotivationClassification of protein sequences is one big task in bioinformatics and has many applications. Different machine learning methods exist and are applied on these problems, such as support vector machines (SVM), random forests (RF) and neural networks (NN). All of these methods have in common that protein sequences have to be made machine-readable and comparable in the first step, for which different encodings exist. These encodings are typically based on physical or chemical properties of the sequence. However, due to the outstanding performance of deep neural networks (DNN) on image recognition, we used frequency matrix chaos game representation (FCGR) for encoding of protein sequences into images. In this study, we compare the performance of SVMs, RFs and DNNs, trained on FCGR encoded protein sequences. While the original chaos game representation (CGR) has been used mainly for genome sequence encoding and classification, we modified it to work also for protein sequences, resulting in n-flakes representation, an image with several icosagons.ResultsWe could show that all applied machine learning techniques (RF, SVM and DNN) show promising results compared to the state-of-the-art methods on our benchmark datasets, with DNNs outperforming the other methods and that FCGR is a promising new encoding method for protein sequences.Availability and implementationhttps://cran.r-project.org/.Supplementary informationSupplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (6) ◽  
pp. 1757-1764
Author(s):  
Saida Saad Mohamed Mahmoud ◽  
Gennaro Esposito ◽  
Giuseppe Serra ◽  
Federico Fogolari

Abstract Motivation Implicit solvent models play an important role in describing the thermodynamics and the dynamics of biomolecular systems. Key to an efficient use of these models is the computation of generalized Born (GB) radii, which is accomplished by algorithms based on the electrostatics of inhomogeneous dielectric media. The speed and accuracy of such computations are still an issue especially for their intensive use in classical molecular dynamics. Here, we propose an alternative approach that encodes the physics of the phenomena and the chemical structure of the molecules in model parameters which are learned from examples. Results GB radii have been computed using (i) a linear model and (ii) a neural network. The input is the element, the histogram of counts of neighbouring atoms, divided by atom element, within 16 Å. Linear models are ca. 8 times faster than the most widely used reference method and the accuracy is higher with correlation coefficient with the inverse of ‘perfect’ GB radii of 0.94 versus 0.80 of the reference method. Neural networks further improve the accuracy of the predictions with correlation coefficient with ‘perfect’ GB radii of 0.97 and ca. 20% smaller root mean square error. Availability and implementation We provide a C program implementing the computation using the linear model, including the coefficients appropriate for the set of Bondi radii, as Supplementary Material. We also provide a Python implementation of the neural network model with parameter and example files in the Supplementary Material as well. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (8) ◽  
pp. 2620-2622 ◽  
Author(s):  
Irina Balaur ◽  
Ludovic Roy ◽  
Alexander Mazein ◽  
S Gökberk Karaca ◽  
Ugur Dogrusoz ◽  
...  

Abstract Motivation CellDesigner is a well-established biological map editor used in many large-scale scientific efforts. However, the interoperability between the Systems Biology Graphical Notation (SBGN) Markup Language (SBGN-ML) and the CellDesigner’s proprietary Systems Biology Markup Language (SBML) extension formats remains a challenge due to the proprietary extensions used in CellDesigner files. Results We introduce a library named cd2sbgnml and an associated web service for bidirectional conversion between CellDesigner’s proprietary SBML extension and SBGN-ML formats. We discuss the functionality of the cd2sbgnml converter, which was successfully used for the translation of comprehensive large-scale diagrams such as the RECON Human Metabolic network and the complete Atlas of Cancer Signalling Network, from the CellDesigner file format into SBGN-ML. Availability and implementation The cd2sbgnml conversion library and the web service were developed in Java, and distributed under the GNU Lesser General Public License v3.0. The sources along with a set of examples are available on GitHub (https://github.com/sbgn/cd2sbgnml and https://github.com/sbgn/cd2sbgnml-webservice, respectively). Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (17) ◽  
pp. 3208-3210 ◽  
Author(s):  
Yangzhen Wang ◽  
Feng Su ◽  
Shanshan Wang ◽  
Chaojuan Yang ◽  
Yonglu Tian ◽  
...  

Abstract Motivation Functional imaging at single-neuron resolution offers a highly efficient tool for studying the functional connectomics in the brain. However, mainstream neuron-detection methods focus on either the morphologies or activities of neurons, which may lead to the extraction of incomplete information and which may heavily rely on the experience of the experimenters. Results We developed a convolutional neural networks and fluctuation method-based toolbox (ImageCN) to increase the processing power of calcium imaging data. To evaluate the performance of ImageCN, nine different imaging datasets were recorded from awake mouse brains. ImageCN demonstrated superior neuron-detection performance when compared with other algorithms. Furthermore, ImageCN does not require sophisticated training for users. Availability and implementation ImageCN is implemented in MATLAB. The source code and documentation are available at https://github.com/ZhangChenLab/ImageCN. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (20) ◽  
pp. 5021-5026 ◽  
Author(s):  
Gang Xu ◽  
Qinghua Wang ◽  
Jianpeng Ma

Abstract Motivation Predictions of protein backbone torsion angles (ϕ and ψ) and secondary structure from sequence are crucial subproblems in protein structure prediction. With the development of deep learning approaches, their accuracies have been significantly improved. To capture the long-range interactions, most studies integrate bidirectional recurrent neural networks into their models. In this study, we introduce and modify a recently proposed architecture named Transformer to capture the interactions between the two residues theoretically with arbitrary distance. Moreover, we take advantage of multitask learning to improve the generalization of neural network by introducing related tasks into the training process. Similar to many previous studies, OPUS-TASS uses an ensemble of models and achieves better results. Results OPUS-TASS uses the same training and validation sets as SPOT-1D. We compare the performance of OPUS-TASS and SPOT-1D on TEST2016 (1213 proteins) and TEST2018 (250 proteins) proposed in the SPOT-1D paper, CASP12 (55 proteins), CASP13 (32 proteins) and CASP-FM (56 proteins) proposed in the SAINT paper, and a recently released PDB structure collection from CAMEO (93 proteins) named as CAMEO93. On these six test sets, OPUS-TASS achieves consistent improvements in both backbone torsion angles prediction and secondary structure prediction. On CAMEO93, SPOT-1D achieves the mean absolute errors of 16.89 and 23.02 for ϕ and ψ predictions, respectively, and the accuracies for 3- and 8-state secondary structure predictions are 87.72 and 77.15%, respectively. In comparison, OPUS-TASS achieves 16.56 and 22.56 for ϕ and ψ predictions, and 89.06 and 78.87% for 3- and 8-state secondary structure predictions, respectively. In particular, after using our torsion angles refinement method OPUS-Refine as the post-processing procedure for OPUS-TASS, the mean absolute errors for final ϕ and ψ predictions are further decreased to 16.28 and 21.98, respectively. Availability and implementation The training and the inference codes of OPUS-TASS and its data are available at https://github.com/thuxugang/opus_tass. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (11) ◽  
pp. 3537-3548
Author(s):  
Nova F Smedley ◽  
Suzie El-Saden ◽  
William Hsu

Abstract Motivation Cancer heterogeneity is observed at multiple biological levels. To improve our understanding of these differences and their relevance in medicine, approaches to link organ- and tissue-level information from diagnostic images and cellular-level information from genomics are needed. However, these ‘radiogenomic’ studies often use linear or shallow models, depend on feature selection, or consider one gene at a time to map images to genes. Moreover, no study has systematically attempted to understand the molecular basis of imaging traits based on the interpretation of what the neural network has learned. These studies are thus limited in their ability to understand the transcriptomic drivers of imaging traits, which could provide additional context for determining clinical outcomes. Results We present a neural network-based approach that takes high-dimensional gene expression data as input and performs non-linear mapping to an imaging trait. To interpret the models, we propose gene masking and gene saliency to extract learned relationships from radiogenomic neural networks. In glioblastoma patients, our models outperformed comparable classifiers (>0.10 AUC) and our interpretation methods were validated using a similar model to identify known relationships between genes and molecular subtypes. We found that tumor imaging traits had specific transcription patterns, e.g. edema and genes related to cellular invasion, and 10 radiogenomic traits were significantly predictive of survival. We demonstrate that neural networks can model transcriptomic heterogeneity to reflect differences in imaging and can be used to derive radiogenomic traits with clinical value. Availability and implementation https://github.com/novasmedley/deepRadiogenomics. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document