SINC: a scale-invariant deep-neural-network classifier for bulk and single-cell RNA-seq data

Chuanqi Wang; Jun Li

doi:10.1093/bioinformatics/btz801

SINC: a scale-invariant deep-neural-network classifier for bulk and single-cell RNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btz801 ◽

2019 ◽

Vol 36 (6) ◽

pp. 1779-1784 ◽

Cited By ~ 1

Author(s):

Chuanqi Wang ◽

Jun Li

Keyword(s):

Neural Network ◽

Single Cell ◽

Count Data ◽

Deep Neural Network ◽

Sequencing Depth ◽

Supplementary Information ◽

Neural Network Classifier ◽

Rna Seq ◽

Scale Invariant ◽

Downstream Analysis

Abstract Motivation Scaling by sequencing depth is usually the first step of analysis of bulk or single-cell RNA-seq data, but estimating sequencing depth accurately can be difficult, especially for single-cell data, risking the validity of downstream analysis. It is thus of interest to eliminate the use of sequencing depth and analyze the original count data directly. Results We call an analysis method ‘scale-invariant’ (SI) if it gives the same result under different estimates of sequencing depth and hence can use the original count data without scaling. For the problem of classifying samples into pre-specified classes, such as normal versus cancerous, we develop a deep-neural-network based SI classifier named scale-invariant deep neural-network classifier (SINC). On nine bulk and single-cell datasets, the classification accuracy of SINC is better than or competitive to the best of eight other classifiers. SINC is easier to use and more reliable on data where proper sequencing depth is hard to determine. Availability and implementation This source code of SINC is available at https://www.nd.edu/∼jli9/SINC.zip. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

RgCop-A regularized copula based method for gene selection in single cell rna-seq data

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009464 ◽

2021 ◽

Vol 17 (10) ◽

pp. e1009464

Author(s):

Snehalika Lall ◽

Sumanta Ray ◽

Sanghamitra Bandyopadhyay

Keyword(s):

Single Cell ◽

Gene Selection ◽

Real Life ◽

Classification Performance ◽

Rna Seq ◽

Scale Invariant ◽

Dependence Measure ◽

Highly Expressed Genes ◽

The Stability ◽

Downstream Analysis

Gene selection in unannotated large single cell RNA sequencing (scRNA-seq) data is important and crucial step in the preliminary step of downstream analysis. The existing approaches are primarily based on high variation (highly variable genes) or significant high expression (highly expressed genes) failed to provide stable and predictive feature set due to technical noise present in the data. Here, we propose RgCop, a novel regularized copula based method for gene selection from large single cell RNA-seq data. RgCop utilizes copula correlation (Ccor), a robust equitable dependence measure that captures multivariate dependency among a set of genes in single cell expression data. We raise an objective function by adding a l1 regularization term with Ccor to penalizes the redundant co-efficient of features/genes, resulting non-redundant effective features/genes set. Results show a significant improvement in the clustering/classification performance of real life scRNA-seq data over the other state-of-the-art. RgCop performs extremely well in capturing dependence among the features of noisy data due to the scale invariant property of copula, thereby improving the stability of the method. Moreover, the differentially expressed (DE) genes identified from the clusters of scRNA-seq data are found to provide an accurate annotation of cells. Finally, the features/genes obtained from RgCop can able to annotate the unknown cells with high accuracy.

Download Full-text

DeepDRIM: a deep neural network to reconstruct cell-type-specific gene regulatory network using single-cell RNA-Seq Data

10.1101/2021.02.03.429484 ◽

2021 ◽

Author(s):

Jiaxing Chen ◽

Chinwang Cheong ◽

Liang Lan ◽

Xin Zhou ◽

Jiming Liu ◽

...

Keyword(s):

Neural Network ◽

Single Cell ◽

Regulatory Networks ◽

Deep Neural Network ◽

Neighborhood Context ◽

Cellular Heterogeneity ◽

Specific Gene ◽

Rna Seq ◽

Cell Type Specific ◽

Gene Regulatory

AbstractSingle-cell RNA sequencing is used to capture cell-specific gene expression, thus allowing reconstruction of gene regulatory networks. The existing algorithms struggle to deal with dropouts and cellular heterogeneity, and commonly require pseudotime-ordered cells. Here, we describe DeepDRIM a supervised deep neural network that represents gene pair joint expression as images and considers the neighborhood context to eliminate the transitive interactions. Deep-DRIM yields significantly better performance than the other nine algorithms used on the eight cell lines tested, and can be used to successfully discriminate key functional modules between patients with mild and severe symptoms of coronavirus disease 2019 (COVID-19).

Download Full-text

SPARSim single cell: a count data simulator for scRNA-seq data

Bioinformatics ◽

10.1093/bioinformatics/btz752 ◽

2019 ◽

Cited By ~ 2

Author(s):

Giacomo Baruzzo ◽

Ilaria Patuzzi ◽

Barbara Di Camillo

Keyword(s):

Single Cell ◽

Count Data ◽

Simulated Data ◽

Real Data ◽

R Package ◽

Supplementary Information ◽

Rna Seq ◽

Distribution Of Zeros ◽

New Methods ◽

Research Fields

Abstract Motivation Single cell RNA-seq (scRNA-seq) count data show many differences compared with bulk RNA-seq count data, making the application of many RNA-seq pre-processing/analysis methods not straightforward or even inappropriate. For this reason, the development of new methods for handling scRNA-seq count data is currently one of the most active research fields in bioinformatics. To help the development of such new methods, the availability of simulated data could play a pivotal role. However, only few scRNA-seq count data simulators are available, often showing poor or not demonstrated similarity with real data. Results In this article we present SPARSim, a scRNA-seq count data simulator based on a Gamma-Multivariate Hypergeometric model. We demonstrate that SPARSim allows to generate count data that resemble real data in terms of count intensity, variability and sparsity, performing comparably or better than one of the most used scRNA-seq simulator, Splat. In particular, SPARSim simulated count matrices well resemble the distribution of zeros across different expression intensities observed in real count data. Availability and implementation SPARSim R package is freely available at http://sysbiobig.dei.unipd.it/? q=SPARSim and at https://gitlab.com/sysbiobig/sparsim. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data

Genome Biology ◽

10.1186/s13059-019-1837-6 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 31

Author(s):

Cédric Arisdakessian ◽

Olivier Poirion ◽

Breck Yunits ◽

Xun Zhu ◽

Lana X. Garmire

Keyword(s):

Neural Network ◽

Single Cell ◽

Deep Neural Network ◽

Mean Squared Error ◽

Single Cells ◽

Rna Seq ◽

Squared Error ◽

Study Gene Expression ◽

Network Method ◽

The Mean

Abstract Single-cell RNA sequencing (scRNA-seq) offers new opportunities to study gene expression of tens of thousands of single cells simultaneously. We present DeepImpute, a deep neural network-based imputation algorithm that uses dropout layers and loss functions to learn patterns in the data, allowing for accurate imputation. Overall, DeepImpute yields better accuracy than other six publicly available scRNA-seq imputation methods on experimental data, as measured by the mean squared error or Pearson’s correlation coefficient. DeepImpute is an accurate, fast, and scalable imputation tool that is suited to handle the ever-increasing volume of scRNA-seq data, and is freely available at https://github.com/lanagarmire/DeepImpute.

Download Full-text

snakePipes: facilitating flexible, scalable and integrative epigenomic analysis

Bioinformatics ◽

10.1093/bioinformatics/btz436 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4757-4759 ◽

Cited By ~ 18

Author(s):

Vivek Bhardwaj ◽

Steffen Heyne ◽

Katarzyna Sikora ◽

Leily Rabbani ◽

Michael Rauer ◽

...

Keyword(s):

Single Cell ◽

Source Code ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Rna Seq ◽

Downstream Analysis ◽

Scalable Analysis

Abstract Summary Due to the rapidly increasing scale and diversity of epigenomic data, modular and scalable analysis workflows are of wide interest. Here we present snakePipes, a workflow package for processing and downstream analysis of data from common epigenomic assays: ChIP-seq, RNA-seq, Bisulfite-seq, ATAC-seq, Hi-C and single-cell RNA-seq. snakePipes enables users to assemble variants of each workflow and to easily install and upgrade the underlying tools, via its simple command-line wrappers and yaml files. Availability and implementation snakePipes can be installed via conda: `conda install -c mpi-ie -c bioconda -c conda-forge snakePipes’. Source code (https://github.com/maxplanck-ie/snakepipes) and documentation (https://snakepipes.readthedocs.io/en/latest/) are available online. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A Universal Deep Neural Network for In-Depth Cleaning of Single-Cell RNA-Seq Data

10.1101/2020.12.04.412247 ◽

2020 ◽

Author(s):

Hui Li ◽

Cory R. Brouwer ◽

Weijun Luo

Keyword(s):

Neural Network ◽

Single Cell ◽

Clustering Analysis ◽

Deep Neural Network ◽

Differential Expression Analysis ◽

Noise Removal ◽

Data Recovery ◽

Single Type ◽

Rna Seq ◽

Wide Range

AbstractSingle cell RNA sequencing (scRNA-Seq) has been widely used in biomedical research and generated enormous volume and diversity of data. The raw data contain multiple types of noise and technical artifacts and need thorough cleaning. The existing denoising and imputation methods largely focus on a single type of noise (i.e. dropouts) and have strong distribution assumptions which greatly limit their performance and application. We designed and developed the AutoClass model, integrating two deep neural network components, an autoencoder and a classifier, as to maximize both noise removal and signal retention. AutoClass is free of distribution assumptions, hence can effectively clean a wide range of noises and artifacts. AutoClass outperforms the state-of-art methods in multiple types of scRNA-Seq data analyses, including data recovery, differential expression analysis, clustering analysis and batch effect removal. Importantly, AutoClass is robust on key hyperparameter settings including bottleneck layer size, pre-clustering number and classifier weight. We have made AutoClass open source at: https://github.com/datapplab/AutoClass.

Download Full-text

scater: pre-processing, quality control, normalisation and visualisation of single-cell RNA-seq data in R

10.1101/069633 ◽

2016 ◽

Cited By ~ 10

Author(s):

Davis J. McCarthy ◽

Kieran R. Campbell ◽

Aaron T. L. Lun ◽

Quin F. Wills

Keyword(s):

Quality Control ◽

Single Cell ◽

Sequence Data ◽

Supplementary Information ◽

Processing Quality ◽

Rna Seq ◽

Study Gene Expression ◽

Supplementary Material ◽

Downstream Analysis ◽

Cell Data

AbstractMotivationSingle-cell RNA sequencing (scRNA-seq) is increasingly used to study gene expression at the level of individual cells. However, preparing raw sequence data for further analysis is not a straightforward process. Biases, artifacts, and other sources of unwanted variation are present in the data, requiring substantial time and effort to be spent on pre-processing, quality control (QC) and normalisation.ResultsWe have developed the R/Bioconductor package scater to facilitate rigorous pre-processing, quality control, normalisation and visualisation of scRNA-seq data. The package provides a convenient, flexible workflow to process raw sequencing reads into a high-quality expression dataset ready for downstream analysis. scater provides a rich suite of plotting tools for single-cell data and a flexible data structure that is compatible with existing tools and can be used as infrastructure for future software development.AvailabilityThe open-source code, along with installation instructions, vignettes and case studies, is available through Bioconductor at http://bioconductor.org/packages/scater.Supplementary informationSupplementary material is available online at bioRxiv accompanying this manuscript, and all materials required to reproduce the results presented in this paper are available at dx.doi.org/10.5281/zenodo.60139.

Download Full-text

Withdrawn: Automated Diagnosis of Glaucoma Using Deep Neural Network Classifier

Current Signal Transduction Therapy ◽

10.2174/1574362414666190906112910 ◽

2019 ◽

Vol 14 ◽

Author(s):

M. Madhumalini ◽

T. Meera Devi

Keyword(s):

Neural Network ◽

Signal Transduction ◽

Deep Neural Network ◽

Neural Network Classifier ◽

Current Signal ◽

Automated Diagnosis ◽

Signal Transduction Therapy ◽

Legal Right

The article has been withdrawn on the request of the authors and the editor of the journal Current Signal Transduction Therapy. Bentham Science apologizes to the readers of the journal for any inconvenience this may have caused. BENTHAM SCIENCE DISCLAIMER: It is a condition of publication that manuscripts submitted to this journal have not been published and will not be simultaneously submitted or published elsewhere. Furthermore, any data, illustration, structure or table that has been published elsewhere must be reported, and copyright permission for reproduction must be obtained. Plagiarism is strictly forbidden, and by submitting the article for publication the authors agree that the publishers have the legal right to take appropriate action against the authors, if plagiarism or fabricated information is discovered. By submitting a manuscript the authors agree that the copyright of their article is transferred to the publishers, if and when the article is accepted for publication.

Download Full-text

ExperimentSubset: an R package to manage subsets of Bioconductor Experiment objects

Bioinformatics ◽

10.1093/bioinformatics/btab179 ◽

2021 ◽

Author(s):

Irzam Sarfraz ◽

Muhammad Asif ◽

Joshua D Campbell

Keyword(s):

Single Cell ◽

R Package ◽

Poor Quality ◽

Data Matrix ◽

Supplementary Information ◽

Data Provenance ◽

Rna Seq ◽

Efficient Management ◽

The Matrix ◽

The Relationship

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Single-cell conventional pap smear image classification using pre-trained deep neural network architectures

BMC Biomedical Engineering ◽

10.1186/s42490-021-00056-6 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Mohammed Aliy Mohammed ◽

Fetulhak Abdurahman ◽

Yodit Abebe Ayalew

Keyword(s):

Neural Network ◽

Cervical Cancer ◽

Computer Vision ◽

Single Cell ◽

Deep Neural Network ◽

Deep Neural Networks ◽

Pap Smear ◽

Experimental Result ◽

Network Architectures ◽

Average Accuracy

Abstract Background Automating cytology-based cervical cancer screening could alleviate the shortage of skilled pathologists in developing countries. Up until now, computer vision experts have attempted numerous semi and fully automated approaches to address the need. Yet, these days, leveraging the astonishing accuracy and reproducibility of deep neural networks has become common among computer vision experts. In this regard, the purpose of this study is to classify single-cell Pap smear (cytology) images using pre-trained deep convolutional neural network (DCNN) image classifiers. We have fine-tuned the top ten pre-trained DCNN image classifiers and evaluated them using five class single-cell Pap smear images from SIPaKMeD dataset. The pre-trained DCNN image classifiers were selected from Keras Applications based on their top 1% accuracy. Results Our experimental result demonstrated that from the selected top-ten pre-trained DCNN image classifiers DenseNet169 outperformed with an average accuracy, precision, recall, and F1-score of 0.990, 0.974, 0.974, and 0.974, respectively. Moreover, it dashed the benchmark accuracy proposed by the creators of the dataset with 3.70%. Conclusions Even though the size of DenseNet169 is small compared to the experimented pre-trained DCNN image classifiers, yet, it is not suitable for mobile or edge devices. Further experimentation with mobile or small-size DCNN image classifiers is required to extend the applicability of the models in real-world demands. In addition, since all experiments used the SIPaKMeD dataset, additional experiments will be needed using new datasets to enhance the generalizability of the models.

Download Full-text