Genotyping Copy Number Alterations from single-cell RNA sequencing

AbstractCancers are constituted by heterogeneous populations of cells that show complex genotypes and phenotypes which we can read out by sequencing. Many attempts at deciphering the clonal process that drives these populations are focusing on single-cell technologies to resolve genetic and phenotypic intra-tumour heterogeneity. While the ideal technologies for these investigations are multi-omics assays, unfortunately these types of data are still too expensive and have limited scalability. We can resort to single-molecule assays, which are cheaper and scalable, and statistically emulate a joint assay, only if we can integrate measurements collected from independent cells of the same sample. In this work we follow this intuition and construct a new Bayesian method to genotype copy number alterations on single-cell RNA sequencing data, therefore integrating DNA and RNA measurements. Our method is unsupervised, and leverages on a segmentation of the input DNA to determine the sample subclonal composition at the copy number level, together with clone-specific phenotypes defined from RNA counts. By design our probabilistic method works without a reference RNA expression profile, and therefore can be applied in cases where this information may not be accessible. We implement the method on a probabilistic backend that allows easy running on both CPUs and GPUs, and test it on both simulated and real data. Our analysis shows its ability to determine copy number associated clones and their RNA phenotypes in tumour data from 10x and Smart-Seq assays, as well as in data from the Human Cell Atlas project.

Download Full-text

Lung transplantation for patients with severe COVID-19

Science Translational Medicine ◽

10.1126/scitranslmed.abe4282 ◽

2020 ◽

Vol 12 (574) ◽

pp. eabe4282 ◽

Cited By ~ 1

Author(s):

Ankit Bharat ◽

Melissa Querrey ◽

Nikolay S. Markov ◽

Samuel Kim ◽

Chitaru Kurihara ◽

...

Keyword(s):

Respiratory Failure ◽

Pulmonary Fibrosis ◽

Lung Transplantation ◽

Single Cell ◽

Rna Sequencing ◽

Lung Tissue ◽

Single Molecule ◽

Sequencing Data ◽

Native Lung ◽

Single Cell Rna Sequencing

Lung transplantation can potentially be a life-saving treatment for patients with nonresolving COVID-19–associated respiratory failure. Concerns limiting lung transplantation include recurrence of SARS-CoV-2 infection in the allograft, technical challenges imposed by viral-mediated injury to the native lung, and the potential risk for allograft infection by pathogens causing ventilator-associated pneumonia in the native lung. Additionally, the native lung might recover, resulting in long-term outcomes preferable to those of transplant. Here, we report the results of lung transplantation in three patients with nonresolving COVID-19–associated respiratory failure. We performed single-molecule fluorescence in situ hybridization (smFISH) to detect both positive and negative strands of SARS-CoV-2 RNA in explanted lung tissue from the three patients and in additional control lung tissue samples. We conducted extracellular matrix imaging and single-cell RNA sequencing on explanted lung tissue from the three patients who underwent transplantation and on warm postmortem lung biopsies from two patients who had died from COVID-19–associated pneumonia. Lungs from these five patients with prolonged COVID-19 disease were free of SARS-CoV-2 as detected by smFISH, but pathology showed extensive evidence of injury and fibrosis that resembled end-stage pulmonary fibrosis. Using machine learning, we compared single-cell RNA sequencing data from the lungs of patients with late-stage COVID-19 to that from the lungs of patients with pulmonary fibrosis and identified similarities in gene expression across cell lineages. Our findings suggest that some patients with severe COVID-19 develop fibrotic lung disease for which lung transplantation is their only option for survival.

Download Full-text

Splatter: simulation of single-cell RNA sequencing data

10.1101/133173 ◽

2017 ◽

Cited By ~ 8

Author(s):

Luke Zappia ◽

Belinda Phipson ◽

Alicia Oshlack

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Real Data ◽

Cell Types ◽

Rna Seq ◽

Sequencing Data ◽

Sequencing Technologies ◽

Simulation Based ◽

Single Cell Rna Sequencing ◽

Multiple Cell

AbstractAs single-cell RNA sequencing technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available.Here we present the Splatter Bioconductor package for simple, reproducible and well-documented simulation of single-cell RNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types or differentiation paths.

Download Full-text

Distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data

10.1101/234872 ◽

2018 ◽

Cited By ~ 7

Author(s):

Aaron T. L. Lun ◽

Samantha Riesenfeld ◽

Tallulah Andrews ◽

Tomas Gomes ◽

John C. Marioni ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Real Data ◽

Cell Types ◽

Sequencing Data ◽

Minimum Threshold ◽

False Discovery ◽

Distinct Cell ◽

Single Cell Rna Sequencing ◽

Unique Molecular Identifier

AbstractDroplet-based single-cell RNA sequencing protocols have dramatically increased the throughput and efficiency of single-cell transcriptomics studies. A key computational challenge when processing these data is to distinguish libraries for real cells from empty droplets. Existing methods for cell calling set a minimum threshold on the total unique molecular identifier (UMI) count for each library, which indiscriminately discards cell libraries with low UMI counts. Here, we describe a new statistical method for calling cells from droplet-based data, based on detecting significant deviations from the expression profile of the ambient solution. Using simulations, we demonstrate that our method has greater power than existing approaches for detecting cell libraries with low UMI counts, while controlling the false discovery rate among detected cells. We also apply our method to real data, where we show that the use of our method results in the retention of distinct cell types that would otherwise have been discarded.

Download Full-text

bayNorm: Bayesian gene expression recovery, imputation and normalisation for single cell RNA-sequencing data

10.1101/384586 ◽

2018 ◽

Cited By ~ 7

Author(s):

Wenhao Tang ◽

François Bertaux ◽

Philipp Thomas ◽

Claire Stefanelli ◽

Malika Saint ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Single Molecule ◽

Empirical Bayes ◽

Missing Values ◽

Likelihood Function ◽

Differential Expression Analysis ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Normalisation of single cell RNA sequencing (scRNA-seq) data is a prerequisite to their interpretation. The marked technical variability and high amounts of missing observations typical of scRNA-seq datasets make this task particularly challenging. Here, we introduce bayNorm, a novel Bayesian approach for scaling and inference of scRNA-seq counts. The method’s likelihood function follows a binomial model of mRNA capture, while priors are estimated from expression values across cells using an empirical Bayes approach. We demonstrate using publicly-available scRNA-seq datasets and simulated expression data that bayNorm allows robust imputation of missing values generating realistic transcript distributions that match single molecule FISH measurements. Moreover, by using priors informed by dataset structures, bayNorm improves accuracy and sensitivity of differential expression analysis and reduces batch effect compared to other existing methods. Altogether, bayNorm provides an efficient, integrated solution for global scaling normalisation, imputation and true count recovery of gene expression measurements from scRNA-seq data.

Download Full-text

Comparison of computational methods for imputing single-cell RNA-sequencing data

10.1101/241190 ◽

2017 ◽

Cited By ~ 10

Author(s):

Lihua Zhang ◽

Shihua Zhang

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Large Scale ◽

Real Data ◽

Cell Types ◽

Biological Functions ◽

Sequencing Data ◽

Imputation Methods ◽

Future Studies ◽

Single Cell Rna Sequencing

AbstractSingle-cell RNA-sequencing (scRNA-seq) is a recent breakthrough technology, which paves the way for measuring RNA levels at single cell resolution to study precise biological functions. One of the main challenges when analyzing scRNA-seq data is the presence of zeros or dropout events, which may mislead downstream analyses. To compensate the dropout effect, several methods have been developed to impute gene expression since the first Bayesian-based method being proposed in 2016. However, these methods have shown very diverse characteristics in terms of model hypothesis and imputation performance. Thus, large-scale comparison and evaluation of these methods is urgently needed now. To this end, we compared eight imputation methods, evaluated their power in recovering original real data, and performed broad analyses to explore their effects on clustering cell types, detecting differentially expressed genes, and reconstructing lineage trajectories in the context of both simulated and real data. Simulated datasets and case studies highlight that there are no one method performs the best in all the situations. Some defects of these methods such as scalability, robustness and unavailability in some situations need to be addressed in future studies.

Download Full-text

SPsimSeq: semi-parametric simulation of bulk and single-cell RNA-sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btaa105 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3276-3278 ◽

Cited By ~ 2

Author(s):

Alemu Takele Assefa ◽

Jo Vandesompele ◽

Olivier Thas

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Real Data ◽

Simulation Method ◽

R Package ◽

Supplementary Information ◽

Expression Data ◽

Sequencing Data ◽

Wide Range ◽

Single Cell Rna Sequencing

Abstract Summary SPsimSeq is a semi-parametric simulation method to generate bulk and single-cell RNA-sequencing data. It is designed to simulate gene expression data with maximal retention of the characteristics of real data. It is reasonably flexible to accommodate a wide range of experimental scenarios, including different sample sizes, biological signals (differential expression) and confounding batch effects. Availability and implementation The R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PseudotimeDE: inference of differential gene expression along cell pseudotime with well-calibrated p-values from single-cell RNA sequencing data

10.1101/2020.11.17.387779 ◽

2020 ◽

Cited By ~ 1

Author(s):

Dongyuan Song ◽

Jingyi Jessica Li

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Molecular Mechanisms ◽

Real Data ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Ill Posed ◽

Differential Gene ◽

Cell Trajectory ◽

Downstream Analysis

AbstractIn the investigation of molecular mechanisms underlying cell state changes, a crucial analysis is to identify differentially expressed (DE) genes along a continuous cell trajectory, which can be estimated by pseudotime inference from single-cell RNA-sequencing (scRNA-seq) data. However, existing methods that identify DE genes based on inferred pseudotime do not account for the uncertainty in pseudotime inference. Also, they either have ill-posed p-values that hinder the control of false discovery rate (FDR) or have restrictive models that reduce the power of DE gene identification. To overcome these drawbacks, we propose PseudotimeDE, a robust method that accounts for the uncertainty in pseudotime inference and thus identifies DE genes along cell pseudotime with well-calibrated p-values. PseudotimeDE is flexible in allowing users to specify the pseudotime inference method and to choose the appropriate model for scRNA-seq data. Comprehensive simulations and real-data applications verify that PseudotimeDE provides well-calibrated p-values essential for controlling FDR and downstream analysis and that PseudotimeDE is more powerful than existing methods to identify DE genes.

Download Full-text

DrImpute: Imputing dropout events in single cell RNA sequencing data

10.1101/181479 ◽

2017 ◽

Cited By ~ 8

Author(s):

Il-Youp Kwak ◽

Wuming Gong ◽

Naoko Koyano-Nakagawa ◽

Daniel J. Garry

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Gene Expression Pattern ◽

Real Data ◽

Sequencing Data ◽

New Era ◽

Numerical Studies ◽

Single Cell Rna Sequencing ◽

High Chance

AbstractThe single cell RNA sequencing (scRNA-seq) technique began a new era by allowing the observation of gene expression at the single cell level. However, there is also a large amount of technical and biological noise. Because of the low number of RNA transcriptomes and the stochastic nature of the gene expression pattern, there is a high chance of missing nonzero entries as zero, which are called dropout events. However, many statistical methods used for analyzing scRNA-seq data in cell type identification, visualization, and lineage reconstruction do not model for dropout events. We have developed DrImpute to impute dropout events, and it improves many of the statistical tools used for scRNA-seq analysis that do not account for dropout events. Our numerical studies with real data demonstrate the promising performance of the proposed method, which has been implemented in R.

Download Full-text

PseudotimeDE: inference of differential gene expression along cell pseudotime with well-calibrated p-values from single-cell RNA sequencing data

Genome Biology ◽

10.1186/s13059-021-02341-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Dongyuan Song ◽

Jingyi Jessica Li

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Molecular Mechanisms ◽

Real Data ◽

Sequencing Data ◽

False Discovery ◽

Single Cell Rna Sequencing ◽

Ill Posed ◽

Differential Gene ◽

Inference Methods

AbstractTo investigate molecular mechanisms underlying cell state changes, a crucial analysis is to identify differentially expressed (DE) genes along the pseudotime inferred from single-cell RNA-sequencing data. However, existing methods do not account for pseudotime inference uncertainty, and they have either ill-posed p-values or restrictive models. Here we propose PseudotimeDE, a DE gene identification method that adapts to various pseudotime inference methods, accounts for pseudotime inference uncertainty, and outputs well-calibrated p-values. Comprehensive simulations and real-data applications verify that PseudotimeDE outperforms existing methods in false discovery rate control and power.

Download Full-text

Mixed Distribution Models Based on Single-Cell RNA Sequencing Data

Interdisciplinary Sciences Computational Life Sciences ◽

10.1007/s12539-021-00427-6 ◽

2021 ◽

Author(s):

Min Wu ◽

Junhua Xu ◽

Tao Ding ◽

Jie Gao

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Sequencing Data ◽

Distribution Models ◽

Mixed Distribution ◽

Single Cell Rna Sequencing

Download Full-text