DeepImpute: an accurate, fast and scalable deep neural network method to impute single-cell RNA-Seq data

BackgroundSingle-cell RNA sequencing (scRNA-seq) offers new opportunities to study gene expression of tens of thousands of single cells simultaneously. However, a significant problem of current scRNA-seq data is the large fractions of missing values or “dropouts” in gene counts. Incorrect handling of dropouts may affect downstream bioinformatics analysis. As the number of scRNA-seq datasets grows drastically, it is crucial to have accurate and efficient imputation methods to handle these dropouts.MethodsWe present DeepImpute, a deep neural network based imputation algorithm. The architecture of DeepImpute efficiently uses dropout layers and loss functions to learn patterns in the data, allowing for accurate imputation.ResultsOverall DeepImpute yields better accuracy than other publicly available scRNA-Seq imputation methods on experimental data, as measured by mean squared error or Pearson’s correlation coefficient. Moreover, its efficient implementation provides significantly higher performance over the other methods as dataset size increases. Additionally, as a machine learning method, DeepImpute allows to use a subset of data to train the model and save even more computing time, without much sacrifice on the prediction accuracy.ConclusionsDeepImpute is an accurate, fast and scalable imputation tool that is suited to handle the ever increasing volume of scRNA-seq data. The package is freely available at https://github.com/lanagarmire/DeepImpute

Download Full-text

DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data

Genome Biology ◽

10.1186/s13059-019-1837-6 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 31

Author(s):

Cédric Arisdakessian ◽

Olivier Poirion ◽

Breck Yunits ◽

Xun Zhu ◽

Lana X. Garmire

Keyword(s):

Neural Network ◽

Single Cell ◽

Deep Neural Network ◽

Mean Squared Error ◽

Single Cells ◽

Rna Seq ◽

Squared Error ◽

Study Gene Expression ◽

Network Method ◽

The Mean

Abstract Single-cell RNA sequencing (scRNA-seq) offers new opportunities to study gene expression of tens of thousands of single cells simultaneously. We present DeepImpute, a deep neural network-based imputation algorithm that uses dropout layers and loss functions to learn patterns in the data, allowing for accurate imputation. Overall, DeepImpute yields better accuracy than other six publicly available scRNA-seq imputation methods on experimental data, as measured by the mean squared error or Pearson’s correlation coefficient. DeepImpute is an accurate, fast, and scalable imputation tool that is suited to handle the ever-increasing volume of scRNA-seq data, and is freely available at https://github.com/lanagarmire/DeepImpute.

Download Full-text

Optimised deep neural network model to predict asthma exacerbation based on personalised weather triggers

F1000Research ◽

10.12688/f1000research.73026.1 ◽

2021 ◽

Vol 10 ◽

pp. 911

Author(s):

Radiah Haque ◽

Sin-Ban Ho ◽

Ian Chai ◽

Adina Abdullah

Keyword(s):

Neural Network ◽

Asthma Exacerbation ◽

Deep Neural Network ◽

Mean Squared Error ◽

Computing Time ◽

Error Rates ◽

Self Management ◽

Prediction Errors ◽

Asthma Control Test ◽

Uv Index

Background – Recently, there have been attempts to develop mHealth applications for asthma self-management. However, there is a lack of applications that can offer accurate predictions of asthma exacerbation using the weather triggers and demographic characteristics to give tailored response to users. This paper proposes an optimised Deep Neural Network Regression (DNNR) model to predict asthma exacerbation based on personalised weather triggers. Methods – With the aim of integrating weather, demography, and asthma tracking, an mHealth application was developed where users conduct the Asthma Control Test (ACT) to identify the chances of their asthma exacerbation. The asthma dataset consists of panel data from 10 users that includes 1010 ACT scores as the target output. Moreover, the dataset contains 10 input features which include five weather features (temperature, humidity, air-pressure, UV-index, wind-speed) and five demography features (age, gender, outdoor-job, outdoor-activities, location). Results – Using the DNNR model on the asthma dataset, a score of 0.83 was achieved with Mean Absolute Error (MAE)=1.44 and Mean Squared Error (MSE)=3.62. It was recognised that, for effective asthma self-management, the prediction errors must be in the acceptable loss range (error<0.5). Therefore, an optimisation process was proposed to reduce the error rates and increase the accuracy by applying standardisation and fragmented-grid-search. Consequently, the optimised-DNNR model (with 2 hidden-layers and 50 hidden-nodes) using the Adam optimiser achieved a 94% accuracy with MAE=0.20 and MSE=0.09. Conclusions – This study is the first of its kind that recognises the potentials of DNNR to identify the correlation patterns among asthma, weather, and demographic variables. The optimised-DNNR model provides predictions with a significantly higher accuracy rate than the existing predictive models and using less computing time. Thus, the optimisation process is useful to build an enhanced model that can be integrated into the asthma self-management for mHealth application.

Download Full-text

Single-cell conventional pap smear image classification using pre-trained deep neural network architectures

BMC Biomedical Engineering ◽

10.1186/s42490-021-00056-6 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Mohammed Aliy Mohammed ◽

Fetulhak Abdurahman ◽

Yodit Abebe Ayalew

Keyword(s):

Neural Network ◽

Cervical Cancer ◽

Computer Vision ◽

Single Cell ◽

Deep Neural Network ◽

Deep Neural Networks ◽

Pap Smear ◽

Experimental Result ◽

Network Architectures ◽

Average Accuracy

Abstract Background Automating cytology-based cervical cancer screening could alleviate the shortage of skilled pathologists in developing countries. Up until now, computer vision experts have attempted numerous semi and fully automated approaches to address the need. Yet, these days, leveraging the astonishing accuracy and reproducibility of deep neural networks has become common among computer vision experts. In this regard, the purpose of this study is to classify single-cell Pap smear (cytology) images using pre-trained deep convolutional neural network (DCNN) image classifiers. We have fine-tuned the top ten pre-trained DCNN image classifiers and evaluated them using five class single-cell Pap smear images from SIPaKMeD dataset. The pre-trained DCNN image classifiers were selected from Keras Applications based on their top 1% accuracy. Results Our experimental result demonstrated that from the selected top-ten pre-trained DCNN image classifiers DenseNet169 outperformed with an average accuracy, precision, recall, and F1-score of 0.990, 0.974, 0.974, and 0.974, respectively. Moreover, it dashed the benchmark accuracy proposed by the creators of the dataset with 3.70%. Conclusions Even though the size of DenseNet169 is small compared to the experimented pre-trained DCNN image classifiers, yet, it is not suitable for mobile or edge devices. Further experimentation with mobile or small-size DCNN image classifiers is required to extend the applicability of the models in real-world demands. In addition, since all experiments used the SIPaKMeD dataset, additional experiments will be needed using new datasets to enhance the generalizability of the models.

Download Full-text

Development of a Deep Neural Network Model for Estimating Joint Location of Occupant Indoor Activities for Providing Thermal Comfort

Energies ◽

10.3390/en14030696 ◽

2021 ◽

Vol 14 (3) ◽

pp. 696

Author(s):

Eun Ji Choi ◽

Jin Woo Moon ◽

Ji-hoon Han ◽

Yongseok Yoo

Keyword(s):

Neural Network ◽

Thermal Comfort ◽

Deep Neural Network ◽

Mean Squared Error ◽

Thermal Environment ◽

The Body ◽

Estimation Accuracy ◽

Accurate Estimation ◽

Body Parts ◽

Joint Location

The type of occupant activities is a significantly important factor to determine indoor thermal comfort; thus, an accurate method to estimate occupant activity needs to be developed. The purpose of this study was to develop a deep neural network (DNN) model for estimating the joint location of diverse human activities, which will be used to provide a comfortable thermal environment. The DNN model was trained with images to estimate 14 joints of a person performing 10 common indoor activities. The DNN contained numerous shortcut connections for efficient training and had two stages of sequential and parallel layers for accurate joint localization. Estimation accuracy was quantified using the mean squared error (MSE) for the estimated joints and the percentage of correct parts (PCP) for the body parts. The results show that the joint MSEs for the head and neck were lowest, and the PCP was highest for the torso. The PCP for individual activities ranged from 0.71 to 0.92, while typing and standing in a relaxed manner were the activities with the highest PCP. Estimation accuracy was higher for relatively still activities and lower for activities involving wide-ranging arm or leg motion. This study thus highlights the potential for the accurate estimation of occupant indoor activities by proposing a novel DNN model. This approach holds significant promise for finding the actual type of occupant activities and for use in target indoor applications related to thermal comfort in buildings.

Download Full-text

SINC: a scale-invariant deep-neural-network classifier for bulk and single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btz801 ◽

2019 ◽

Vol 36 (6) ◽

pp. 1779-1784 ◽

Cited By ~ 1

Author(s):

Chuanqi Wang ◽

Jun Li

Keyword(s):

Neural Network ◽

Single Cell ◽

Count Data ◽

Deep Neural Network ◽

Sequencing Depth ◽

Supplementary Information ◽

Neural Network Classifier ◽

Rna Seq ◽

Scale Invariant ◽

Downstream Analysis

Abstract Motivation Scaling by sequencing depth is usually the first step of analysis of bulk or single-cell RNA-seq data, but estimating sequencing depth accurately can be difficult, especially for single-cell data, risking the validity of downstream analysis. It is thus of interest to eliminate the use of sequencing depth and analyze the original count data directly. Results We call an analysis method ‘scale-invariant’ (SI) if it gives the same result under different estimates of sequencing depth and hence can use the original count data without scaling. For the problem of classifying samples into pre-specified classes, such as normal versus cancerous, we develop a deep-neural-network based SI classifier named scale-invariant deep neural-network classifier (SINC). On nine bulk and single-cell datasets, the classification accuracy of SINC is better than or competitive to the best of eight other classifiers. SINC is easier to use and more reliable on data where proper sequencing depth is hard to determine. Availability and implementation This source code of SINC is available at https://www.nd.edu/∼jli9/SINC.zip. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DeepDRIM: a deep neural network to reconstruct cell-type-specific gene regulatory network using single-cell RNA-Seq Data

10.1101/2021.02.03.429484 ◽

2021 ◽

Author(s):

Jiaxing Chen ◽

Chinwang Cheong ◽

Liang Lan ◽

Xin Zhou ◽

Jiming Liu ◽

...

Keyword(s):

Neural Network ◽

Single Cell ◽

Regulatory Networks ◽

Deep Neural Network ◽

Neighborhood Context ◽

Cellular Heterogeneity ◽

Specific Gene ◽

Rna Seq ◽

Cell Type Specific ◽

Gene Regulatory

AbstractSingle-cell RNA sequencing is used to capture cell-specific gene expression, thus allowing reconstruction of gene regulatory networks. The existing algorithms struggle to deal with dropouts and cellular heterogeneity, and commonly require pseudotime-ordered cells. Here, we describe DeepDRIM a supervised deep neural network that represents gene pair joint expression as images and considers the neighborhood context to eliminate the transitive interactions. Deep-DRIM yields significantly better performance than the other nine algorithms used on the eight cell lines tested, and can be used to successfully discriminate key functional modules between patients with mild and severe symptoms of coronavirus disease 2019 (COVID-19).

Download Full-text

Gene set inference from single-cell sequencing data using a hybrid of matrix factorization and variational autoencoders

10.1101/740415 ◽

2019 ◽

Author(s):

Soeren Lukassen ◽

Foo Wei Ten ◽

Roland Eils ◽

Christian Conrad

Keyword(s):

Neural Network ◽

Single Cell ◽

Network Model ◽

Neural Network Model ◽

Matrix Factorization ◽

Latent Variable ◽

Single Cells ◽

Sequencing Data ◽

Gene Set ◽

Gene Sets

AbstractRecent advances in single-cell RNA sequencing (scRNA-Seq) have driven the simultaneous measurement of the expression of 1,000s of genes in 1,000s of single cells. These growing data sets allow us to model gene sets in biological networks at an unprecedented level of detail, in spite of heterogenous cell populations. Here, we propose an unsupervised deep neural network model that is a hybrid of matrix factorization and conditional variational autoencoders (CVA), which utilizes weights as matrix factorizations to obtain gene sets, while class-specific inputs to the latent variable space facilitate a plausible identification of cell types. This artificial neural network model seamlessly integrates functional gene set inference, experimental batch effect correction, and static gene identification, which we conceptually prove here for three single-cell RNA-Seq datasets and suggest for future single-cell-gene analytics.

Download Full-text

Spike-in normalization for single-cell RNA-seq reveals dynamic global transcriptional activity mediating anticancer drug response

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab054 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Xin Wang ◽

Jane Frederick ◽

Hongbin Wang ◽

Sheng Hui ◽

Vadim Backman ◽

...

Keyword(s):

Single Cell ◽

Cancer Cells ◽

Transcriptional Repression ◽

Drug Response ◽

Transcriptional Activity ◽

Gene Expression Regulation ◽

Single Cells ◽

Rna Seq ◽

Study Gene Expression

Abstract The transcriptional plasticity of cancer cells promotes intercellular heterogeneity in response to anticancer drugs and facilitates the generation of subpopulation surviving cells. Characterizing single-cell transcriptional heterogeneity after drug treatments can provide mechanistic insights into drug efficacy. Here, we used single-cell RNA-seq to examine transcriptomic profiles of cancer cells treated with paclitaxel, celecoxib and the combination of the two drugs. By normalizing the expression of endogenous genes to spike-in molecules, we found that cellular mRNA abundance shows dynamic regulation after drug treatment. Using a random forest model, we identified gene signatures classifying single cells into three states: transcriptional repression, amplification and control-like. Treatment with paclitaxel or celecoxib alone generally repressed gene transcription across single cells. Interestingly, the drug combination resulted in transcriptional amplification and hyperactivation of mitochondrial oxidative phosphorylation pathway linking to enhanced cell killing efficiency. Finally, we identified a regulatory module enriched with metabolism and inflammation-related genes activated in a subpopulation of paclitaxel-treated cells, the expression of which predicted paclitaxel efficacy across cancer cell lines and in vivo patient samples. Our study highlights the dynamic global transcriptional activity driving single-cell heterogeneity during drug response and emphasizes the importance of adding spike-in molecules to study gene expression regulation using single-cell RNA-seq.

Download Full-text

Monitoring Population Phenology of Asian Citrus Psyllid Using Deep Learning

Complexity ◽

10.1155/2021/4644213 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Maria Bibi ◽

Muhammad Kashif Hanif ◽

Muhammad Umer Sarwar ◽

Muhammad Irfan Khan ◽

Shouket Zaman Khan ◽

...

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Prediction Models ◽

Mean Squared Error ◽

Maximum Temperature ◽

Economic Losses ◽

Asian Citrus Psyllid ◽

Average Maximum ◽

Average Minimum Temperature ◽

Network Approaches

Asian citrus psyllid, Diaphorina citri Kuwayama (Liviidae: Hemiptera) is a menacing and notorious pest of citrus plants. It vectors a phloem vessel-dwelling bacterium Candidatus Liberibacter asiaticus, which is a causative pathogen of the serious citrus disease known as Huanglongbing. Huanglongbing disease is a major bottleneck in the export of citrus fruits from Pakistan. It is being responsible for huge citrus economic losses globally. In the current study, several prediction models were developed based on regression algorithms of machine learning to monitor different phenological stages of Asian citrus psyllid to predict its population about different abiotic variables (average maximum temperature, average minimum temperature, average weekly temperature, average weekly relative humidity, and average weekly rainfall) and biotic variable (host plant phenological patterns) in citrus-growing regions of Pakistan. The pest prediction models can be used for proper applications of pesticides only when needed for reducing the environmental and cost impacts of pesticides. Pearson’s correlation analysis was performed to find the relationship between different predictor (abiotic and biotic) variables and pest infestation rate on citrus plants. Multiple linear regression, random forest regressor, and deep neural network approaches were compared to predict population dynamics of Asian citrus psyllid. In comparison with other regression techniques, a deep neural network-based prediction model resulted in the least root mean squared error values while predicting egg, nymph, and adult populations.

Download Full-text

A framework for testing different imputation methods for tabular datasets

10.1101/773762 ◽

2019 ◽

Author(s):

Tabea Kossen ◽

Michelle Livne ◽

Vince I Madai ◽

Ivana Galinovic ◽

Dietmar Frey ◽

...

Keyword(s):

Linear Model ◽

Missing Values ◽

Mean Squared Error ◽

Missing At Random ◽

Imputation Method ◽

Similar Data ◽

Missing Value ◽

Imputation Methods ◽

Listwise Deletion ◽

Clinical Dataset

AbstractBackground and purposeHandling missing values is a prevalent challenge in the analysis of clinical data. The rise of data-driven models demands an efficient use of the available data. Methods to impute missing values are thus crucial. Here, we developed a publicly available framework to test different imputation methods and compared their impact in a typical stroke clinical dataset as a use case.MethodsA clinical dataset based on the 1000Plus stroke study with 380 completed-entries patients was used. 13 common clinical parameters including numerical and categorical values were selected. Missing values in a missing-at-random (MAR) and missing-completely-at-random (MCAR) fashion from 0% to 60% were simulated and consequently imputed using the mean, hot-deck, multiple imputation by chained equations, expectation maximization method and listwise deletion. The performance was assessed by the root mean squared error, the absolute bias and the performance of a linear model for discharge mRS prediction.ResultsListwise deletion was the worst performing method and started to be significantly worse than any imputation method from 2% (MAR) and 3% (MCAR) missing values on. The underlying missing value mechanism seemed to have a crucial influence on the identified best performing imputation method. Consequently no single imputation method outperformed all others. A significant performance drop of the linear model started from 11% (MAR+MCAR) and 18% (MCAR) missing values.ConclusionsIn the presented case study of a typical clinical stroke dataset we confirmed that listwise deletion should be avoided for dealing with missing values. Our findings indicate that the underlying missing value mechanism and other dataset characteristics strongly influence the best choice of imputation method. For future studies with similar data structure, we thus suggest to use the developed framework in this study to select the most suitable imputation method for a given dataset prior to analysis.

Download Full-text