scholarly journals Deep learning–based cell composition analysis from tissue expression profiles

2020 ◽  
Vol 6 (30) ◽  
pp. eaba2619 ◽  
Author(s):  
Kevin Menden ◽  
Mohamed Marouf ◽  
Sergio Oller ◽  
Anupriya Dalmia ◽  
Daniel Sumner Magruder ◽  
...  

We present Scaden, a deep neural network for cell deconvolution that uses gene expression information to infer the cellular composition of tissues. Scaden is trained on single-cell RNA sequencing (RNA-seq) data to engineer discriminative features that confer robustness to bias and noise, making complex data preprocessing and feature selection unnecessary. We demonstrate that Scaden outperforms existing deconvolution algorithms in both precision and robustness. A single trained network reliably deconvolves bulk RNA-seq and microarray, human and mouse tissue expression data and leverages the combined information of multiple datasets. Because of this stability and flexibility, we surmise that deep learning will become an algorithmic mainstay for cell deconvolution of various data types. Scaden’s software package and web application are easy to use on new as well as diverse existing expression datasets available in public resources, deepening the molecular and cellular understanding of developmental and disease processes.

2019 ◽  
Author(s):  
Kevin Menden ◽  
Mohamed Marouf ◽  
Sergio Oller ◽  
Anupriya Dalmia ◽  
Karin Kloiber ◽  
...  

AbstractWe present Scaden, a deep neural network for cell deconvolution that uses gene expression information to infer the cellular composition of tissues. Scaden is trained on single cell RNA-seq data to engineer discriminative features that confer robustness to bias and noise, making complex data preprocessing and feature selection unnecessary. We demonstrate that Scaden outperforms existing deconvolution algorithms in both precision and robustness. A single trained network reliably deconvolves bulk RNA-seq and microarray, human and mouse tissue expression data and leverages the combined information of multiple data sets. Due to this stability and flexibility, we surmise that deep learning will become an algorithmic mainstay for cell deconvolution of various data types. Scaden’s comprehensive software package is easy to use on novel as well as diverse existing expression datasets available in public resources, deepening the molecular and cellular understanding of developmental and disease processes.


2001 ◽  
Vol 4 (3) ◽  
pp. 183-188 ◽  
Author(s):  
KOJI KADOTA ◽  
RIKA MIKI ◽  
HIDEMASA BONO ◽  
KENTARO SHIMIZU ◽  
YASUSHI OKAZAKI ◽  
...  

cDNA microarray technology is useful for systematically analyzing the expression profiles of thousands of genes at once. Although many useful results inferred by using this technology and a hierarchical clustering method for statistical analysis have been confirmed using other methods, there are still questions about the reproducibility of the data. We have therefore developed a data processing method that very efficiently extracts reproducible data from the result of duplicate experiments. It is designed to automatically filter the raw results obtained from cDNA microarray image-analysis software. We optimize the threshold value for filtering the data by using the product of N and R, where N is the ratio of the number of spots that passed the filtering vs. the total number of spots, and R is the correlation coefficient for results obtained in the duplicate experiments. Using this method to process mouse tissue expression profile data that contain 1,881,600 points of analysis, we obtained clustered results more reasonable than those obtained using previously reported filtering methods.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Melvyn Yap ◽  
Rebecca L. Johnston ◽  
Helena Foley ◽  
Samual MacDonald ◽  
Olga Kondrashova ◽  
...  

AbstractFor complex machine learning (ML) algorithms to gain widespread acceptance in decision making, we must be able to identify the features driving the predictions. Explainability models allow transparency of ML algorithms, however their reliability within high-dimensional data is unclear. To test the reliability of the explainability model SHapley Additive exPlanations (SHAP), we developed a convolutional neural network to predict tissue classification from Genotype-Tissue Expression (GTEx) RNA-seq data representing 16,651 samples from 47 tissues. Our classifier achieved an average F1 score of 96.1% on held-out GTEx samples. Using SHAP values, we identified the 2423 most discriminatory genes, of which 98.6% were also identified by differential expression analysis across all tissues. The SHAP genes reflected expected biological processes involved in tissue differentiation and function. Moreover, SHAP genes clustered tissue types with superior performance when compared to all genes, genes detected by differential expression analysis, or random genes. We demonstrate the utility and reliability of SHAP to explain a deep learning model and highlight the strengths of applying ML to transcriptome data.


2018 ◽  
Author(s):  
Yue Deng ◽  
Feng Bao ◽  
Qionghai Dai ◽  
Lani F. Wu ◽  
Steven J. Altschuler

Recent advances in large-scale single cell RNA-seq enable fine-grained characterization of phenotypically distinct cellular states within heterogeneous tissues. We present scScope, a scalable deep-learning based approach that can accurately and rapidly identify cell-type composition from millions of noisy single-cell gene-expression profiles.


2019 ◽  
Author(s):  
Jorge L. Del-Aguila ◽  
Zeran Li ◽  
Umber Dube ◽  
Kathie A. Mihindukulasuriya ◽  
John P Budde ◽  
...  

AbstractAlzheimer Disease (AD) is the most common form of dementia. This neurodegenerative disorder is associated with neuronal death and gliosis heavily impacting the cerebral cortex. AD has a substantial but heterogeneous genetic component, presenting both Mendelian and complex genetic architectures. Using bulk RNA-seq from parietal lobes and deconvolution methods, we previously reported that brains exhibiting different AD genetic architecture exhibit different cellular proportions. Here, we sought to directly investigate AD brain changes in cell proportion and gene expression using single cell resolution. To do so, we generated unsorted single-nuclei RNA-sequencing data from brain tissue. We leveraged tissue donated from a carrier of a Mendelian genetic mutation and two family members who suffer from AD, but do not have the same mutation. We evaluated alternative alignment approaches to maximize the titer of reads, genes and cells with high quality. In addition, we employed distinct clustering strategies to determine the best approach to identify cell clusters that reveal neuronal and glial cell types and avoid artifacts such as sample and batch effects. We propose an approach to cluster cells that reduces biases and enable further analyses. We identified distinct types of neurons, both excitatory and inhibitory, and glial cells, including astrocytes, oligodendrocytes, and microglia among others. In particular, we identified a reduced proportion of excitatory neurons in the Mendelian mutation carrier, but a similar distribution of inhibitory neurons. Furthermore, we investigated whether single-nuclei RNA-seq from human brains recapitulate the expression profile of Disease Associated Microglia (DAM) discovered in mouse models. We also determined that when analyzing human single-nuclei data it is critical to control for biases introduced by donor specific expression profiles. In conclusion, we propose a collection of best practices to generate a highly-detailed molecular cell atlas of highly informative frozen tissue stored in brain banks. Importantly, we have developed a new web application to make this unique single-nuclei molecular atlas publicly available.


2020 ◽  
Author(s):  
Ye Yuan ◽  
Ziv Bar-Joseph

AbstractMotivationTime-course gene expression data has been widely used to infer regulatory and signaling relationships between genes. Most of the widely used methods for such analysis were developed for bulk expression data. Single cell RNA-Seq (scRNA-Seq) data offers several advantages including the large number of expression profiles available and the ability to focus on individual cells rather than averages. However, this data also raises new computational challenges.ResultsUsing a novel encoding for scRNA-Seq expression data we develop deep learning methods for interaction prediction from time-course data. Our methods use a supervised framework which represents the data as a 3D tensor and train convolutional and recurrent neural networks (CNN and RNN) for predicting interactions. We tested our Time-course Deep Learning (TDL) models on five different time series scRNA-Seq datasets. As we show, TDL can accurately identify causal and regulatory gene-gene interactions and can also be used to assign new function to genes. TDL improves on prior methods for the above tasks and can be generally applied to new time series scRNA-Seq data.Availability and ImplementationFreely available at https://github.com/xiaoyeye/[email protected] informationSupplementary data are available at XXX online.


2020 ◽  
Author(s):  
Tal Koffler-Brill ◽  
Shahar Taiber ◽  
Alejandro Anaya ◽  
Mor Bordeynik-Cohen ◽  
Einat Rosen ◽  
...  

AbstractThe auditory system is a complex sensory network with an orchestrated multilayer regulatory program governing its development and maintenance. Accumulating evidence has implicated long non-coding RNAs (lncRNAs) as important regulators in numerous systems, as well as in pathological pathways. However, their function in the auditory system has yet to be explored. Using a set of specific criteria, we selected four lncRNAs expressed in the mouse cochlea, which are conserved in the human transcriptome and are relevant for inner ear function. Bioinformatic characterization demonstrated a lack of coding potential and an absence of evolutionary conservation that represent properties commonly shared by their class members. RNAscope analysis of the spatial and temporal expression profiles revealed specific localization to inner ear cells. Sub-cellular localization analysis presented a distinct pattern for each lncRNA and mouse tissue expression evaluation displayed a large variability in terms of level and location. Our findings establish the expression of specific lncRNAs in different cell types of the auditory system and present a potential pathway by which the lncRNA Gas5 acts in the inner ear. Studying lncRNAs and deciphering their functions may deepen our knowledge of inner ear physiology and morphology and may reveal the basis of as yet unresolved genetic hearing loss-related pathologies. Moreover, our experimental design may be employed as a reference for studying other inner ear-related lncRNAs, as well as lncRNAs expressed in other sensory systems.


2020 ◽  
Vol 27 (5) ◽  
pp. 359-369 ◽  
Author(s):  
Cheng Shi ◽  
Jiaxing Chen ◽  
Xinyue Kang ◽  
Guiling Zhao ◽  
Xingzhen Lao ◽  
...  

: Protein-related interaction prediction is critical to understanding life processes, biological functions, and mechanisms of drug action. Experimental methods used to determine proteinrelated interactions have always been costly and inefficient. In recent years, advances in biological and medical technology have provided us with explosive biological and physiological data, and deep learning-based algorithms have shown great promise in extracting features and learning patterns from complex data. At present, deep learning in protein research has emerged. In this review, we provide an introductory overview of the deep neural network theory and its unique properties. Mainly focused on the application of this technology in protein-related interactions prediction over the past five years, including protein-protein interactions prediction, protein-RNA\DNA, Protein– drug interactions prediction, and others. Finally, we discuss some of the challenges that deep learning currently faces.


2020 ◽  
Vol 15 ◽  
Author(s):  
Deeksha Saxena ◽  
Mohammed Haris Siddiqui ◽  
Rajnish Kumar

Background: Deep learning (DL) is an Artificial neural network-driven framework with multiple levels of representation for which non-linear modules combined in such a way that the levels of representation can be enhanced from lower to a much abstract level. Though DL is used widely in almost every field, it has largely brought a breakthrough in biological sciences as it is used in disease diagnosis and clinical trials. DL can be clubbed with machine learning, but at times both are used individually as well. DL seems to be a better platform than machine learning as the former does not require an intermediate feature extraction and works well with larger datasets. DL is one of the most discussed fields among the scientists and researchers these days for diagnosing and solving various biological problems. However, deep learning models need some improvisation and experimental validations to be more productive. Objective: To review the available DL models and datasets that are used in disease diagnosis. Methods: Available DL models and their applications in disease diagnosis were reviewed discussed and tabulated. Types of datasets and some of the popular disease related data sources for DL were highlighted. Results: We have analyzed the frequently used DL methods, data types and discussed some of the recent deep learning models used for solving different biological problems. Conclusion: The review presents useful insights about DL methods, data types, selection of DL models for the disease diagnosis.


Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4736
Author(s):  
Sk. Tanzir Mehedi ◽  
Adnan Anwar ◽  
Ziaur Rahman ◽  
Kawsar Ahmed

The Controller Area Network (CAN) bus works as an important protocol in the real-time In-Vehicle Network (IVN) systems for its simple, suitable, and robust architecture. The risk of IVN devices has still been insecure and vulnerable due to the complex data-intensive architectures which greatly increase the accessibility to unauthorized networks and the possibility of various types of cyberattacks. Therefore, the detection of cyberattacks in IVN devices has become a growing interest. With the rapid development of IVNs and evolving threat types, the traditional machine learning-based IDS has to update to cope with the security requirements of the current environment. Nowadays, the progression of deep learning, deep transfer learning, and its impactful outcome in several areas has guided as an effective solution for network intrusion detection. This manuscript proposes a deep transfer learning-based IDS model for IVN along with improved performance in comparison to several other existing models. The unique contributions include effective attribute selection which is best suited to identify malicious CAN messages and accurately detect the normal and abnormal activities, designing a deep transfer learning-based LeNet model, and evaluating considering real-world data. To this end, an extensive experimental performance evaluation has been conducted. The architecture along with empirical analyses shows that the proposed IDS greatly improves the detection accuracy over the mainstream machine learning, deep learning, and benchmark deep transfer learning models and has demonstrated better performance for real-time IVN security.


Sign in / Sign up

Export Citation Format

Share Document