Exploring cell-specific miRNA regulation with single-cell miRNA-mRNA co-sequencing data

Abstract Background Existing computational methods for studying miRNA regulation are mostly based on bulk miRNA and mRNA expression data. However, bulk data only allows the analysis of miRNA regulation regarding a group of cells, rather than the miRNA regulation unique to individual cells. Recent advance in single-cell miRNA-mRNA co-sequencing technology has opened a way for investigating miRNA regulation at single-cell level. However, as currently single-cell miRNA-mRNA co-sequencing data is just emerging and only available at small-scale, there is a strong need of novel methods to exploit existing single-cell data for the study of cell-specific miRNA regulation. Results In this work, we propose a new method, CSmiR (Cell-Specific miRNA regulation) to combine single-cell miRNA-mRNA co-sequencing data and putative miRNA-mRNA binding information to identify miRNA regulatory networks at the resolution of individual cells. We apply CSmiR to the miRNA-mRNA co-sequencing data in 19 K562 single-cells to identify cell-specific miRNA-mRNA regulatory networks for understanding miRNA regulation in each K562 single-cell. By analyzing the obtained cell-specific miRNA-mRNA regulatory networks, we observe that the miRNA regulation in each K562 single-cell is unique. Moreover, we conduct detailed analysis on the cell-specific miRNA regulation associated with the miR-17/92 family as a case study. The comparison results indicate that CSmiR is effective in predicting cell-specific miRNA targets. Finally, through exploring cell–cell similarity matrix characterized by cell-specific miRNA regulation, CSmiR provides a novel strategy for clustering single-cells and helps to understand cell–cell crosstalk. Conclusions To the best of our knowledge, CSmiR is the first method to explore miRNA regulation at a single-cell resolution level, and we believe that it can be a useful method to enhance the understanding of cell-specific miRNA regulation.

Download Full-text

Exploring cell-specific miRNA regulation with single-cell miRNA-mRNA co-sequencing data

10.1101/2020.10.14.340299 ◽

2020 ◽

Author(s):

Junpeng Zhang ◽

Lin Liu ◽

Taosheng Xu ◽

Wu Zhang ◽

Chunwen Zhao ◽

...

Keyword(s):

Single Cell ◽

Regulatory Networks ◽

Single Cells ◽

Small Scale ◽

Mirna Regulation ◽

Sequencing Data ◽

Resolution Level ◽

Novel Strategy ◽

Cell Cell

AbstractBackgroundExisting computational methods for studying miRNA regulation are mostly based on bulk miRNA and mRNA expression data. However, bulk data only allows the analysis of miRNA regulation regarding a group of cells, rather than the miRNA regulation unique to individual cells. Recent advance in single-cell miRNA-mRNA co-sequencing technology has opened a way for investigating miRNA regulation at single-cell level. However, as currently single-cell miRNA-mRNA co-sequencing data is just emerging and only available at small-scale, there is a strong need of novel methods to exploit existing single-cell data for the study of cell-specific miRNA regulation.ResultsIn this work, we propose a new method, CSmiR (Cell-Specific miRNA regulation) to use single-cell miRNA-mRNA co-sequencing data to identify miRNA regulatory networks at the resolution of individual cells. We apply CSmiR to the miRNA-mRNA co-sequencing data in 19 K562 single-cells to identify cell-specific miRNA-mRNA regulatory networks to understand miRNA regulation in each K562 single-cell. By analyzing the obtained cell-specific miRNA-mRNA regulatory networks, we observe that the miRNA regulation in each K562 single-cell is unique. Moreover, we conduct detailed analysis on the cell-specific miRNA regulation associated with the miR-17/92 family as a case study. Finally, through exploring cell-cell similarity matrix characterized by cell-specific miRNA regulation, CSmiR provides a novel strategy for clustering single-cells to help understand cell-cell crosstalk.ConclusionsTo the best of our knowledge, CSmiR is the first method to explore miRNA regulation at a single-cell resolution level, and we believe that it can be a useful method to enhance the understanding of cell-specific miRNA regulation.

Download Full-text

Leveraging high-powered RNA-Seq datasets to improve inference of regulatory activity in single-cell RNA-Seq data

10.1101/553040 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ning Wang ◽

Andrew E. Teschendorff

Keyword(s):

Transcription Factors ◽

Single Cell ◽

Cell Fate ◽

Regulatory Networks ◽

Large Scale ◽

Single Cells ◽

Differential Expression Analysis ◽

Dropout Rate ◽

Rna Seq ◽

Regulatory Activity

AbstractInferring the activity of transcription factors in single cells is a key task to improve our understanding of development and complex genetic diseases. This task is, however, challenging due to the relatively large dropout rate and noisy nature of single-cell RNA-Seq data. Here we present a novel statistical inference framework called SCIRA (Single Cell Inference of Regulatory Activity), which leverages the power of large-scale bulk RNA-Seq datasets to infer high-quality tissue-specific regulatory networks, from which regulatory activity estimates in single cells can be subsequently obtained. We show that SCIRA can correctly infer regulatory activity of transcription factors affected by high technical dropouts. In particular, SCIRA can improve sensitivity by as much as 70% compared to differential expression analysis and current state-of-the-art methods. Importantly, SCIRA can reveal novel regulators of cell-fate in tissue-development, even for cell-types that only make up 5% of the tissue, and can identify key novel tumor suppressor genes in cancer at single cell resolution. In summary, SCIRA will be an invaluable tool for single-cell studies aiming to accurately map activity patterns of key transcription factors during development, and how these are altered in disease.

Download Full-text

Quality assessment of single-cell RNA sequencing data by coverage skewness analysis

10.1101/2019.12.31.890269 ◽

2019 ◽

Author(s):

Imad Abugessaisa ◽

Shuhei Noguchi ◽

Melissa Cardon ◽

Akira Hasegawa ◽

Kazuhide Watanabe ◽

...

Keyword(s):

Quality Assessment ◽

Single Cell ◽

Rna Sequencing ◽

Single Cells ◽

Assessment Method ◽

Poor Quality ◽

Sequencing Data ◽

Single Cell Rna Sequencing ◽

Gene Coverage ◽

The Impact

AbstractAnalysis and interpretation of single-cell RNA-sequencing (scRNA-seq) experiments are compromised by the presence of poor quality cells. For meaningful analyses, such poor quality cells should be excluded to avoid biases and large variation. However, no clear guidelines exist. We introduce SkewC, a novel quality-assessment method to identify poor quality single-cells in scRNA-seq experiments. The method is based on the assessment of gene coverage for each single cell and its skewness as a quality measure. To validate the method, we investigated the impact of poor quality cells on downstream analyses and compared biological differences between typical and poor quality cells. Moreover, we measured the ratio of intergenic expression, suggesting genomic contamination, and foreign organism contamination of single-cell samples. SkewC is tested in 37,993 single-cells generated by 15 scRNA-seq protocols. We envision SkewC as an indispensable QC method to be incorporated into scRNA-seq experiment to preclude the possibility of scRNA-seq data misinterpretation.

Download Full-text

SCELLECTOR: ranking amplification bias in single cells using shallow sequencing

BMC Bioinformatics ◽

10.1186/s12859-020-03858-y ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Vivekananda Sarangi ◽

Alexandre Jourdon ◽

Taejeong Bae ◽

Arijit Panda ◽

Flora Vaccarino ◽

...

Keyword(s):

Single Cell ◽

Multiple Displacement Amplification ◽

Single Cells ◽

Sequencing Data ◽

High Coverage ◽

Amplification Bias ◽

Single Cell Profiling ◽

High Concordance ◽

Human Neuronal Cells

Abstract Background The study of mosaic mutation is important since it has been linked to cancer and various disorders. Single cell sequencing has become a powerful tool to study the genome of individual cells for the detection of mosaic mutations. The amount of DNA in a single cell needs to be amplified before sequencing and multiple displacement amplification (MDA) is widely used owing to its low error rate and long fragment length of amplified DNA. However, the phi29 polymerase used in MDA is sensitive to template fragmentation and presence of sites with DNA damage that can lead to biases such as allelic imbalance, uneven coverage and over representation of C to T mutations. It is therefore important to select cells with uniform amplification to decrease false positives and increase sensitivity for mosaic mutation detection. Results We propose a method, Scellector (single cell selector), which uses haplotype information to detect amplification quality in shallow coverage sequencing data. We tested Scellector on single human neuronal cells, obtained in vitro and amplified by MDA. Qualities were estimated from shallow sequencing with coverage as low as 0.3× per cell and then confirmed using 30× deep coverage sequencing. The high concordance between shallow and high coverage data validated the method. Conclusion Scellector can potentially be used to rank amplifications obtained from single cell platforms relying on a MDA-like amplification step, such as Chromium Single Cell profiling solution.

Download Full-text

Estimating the Allele-Specific Expression of SNVs From 10× Genomics Single-Cell RNA-Sequencing Data

Genes ◽

10.3390/genes11030240 ◽

2020 ◽

Vol 11 (3) ◽

pp. 240 ◽

Cited By ~ 2

Author(s):

Prashant N. M. ◽

Hongyu Liu ◽

Pavlos Bousounis ◽

Liam Spurr ◽

Nawaf Alomran ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Single Cells ◽

Sequencing Data ◽

Specific Expression ◽

Single Nucleotide ◽

Healthy Donors ◽

Allele Expression ◽

Single Cell Rna Sequencing ◽

Allele Specific

With the recent advances in single-cell RNA-sequencing (scRNA-seq) technologies, the estimation of allele expression from single cells is becoming increasingly reliable. Allele expression is both quantitative and dynamic and is an essential component of the genomic interactome. Here, we systematically estimate the allele expression from heterozygous single nucleotide variant (SNV) loci using scRNA-seq data generated on the 10×Genomics Chromium platform. We analyzed 26,640 human adipose-derived mesenchymal stem cells (from three healthy donors), sequenced to an average of 150K sequencing reads per cell (more than 4 billion scRNA-seq reads in total). High-quality SNV calls assessed in our study contained approximately 15% exonic and >50% intronic loci. To analyze the allele expression, we estimated the expressed variant allele fraction (VAFRNA) from SNV-aware alignments and analyzed its variance and distribution (mono- and bi-allelic) at different minimum sequencing read thresholds. Our analysis shows that when assessing positions covered by a minimum of three unique sequencing reads, over 50% of the heterozygous SNVs show bi-allelic expression, while at a threshold of 10 reads, nearly 90% of the SNVs are bi-allelic. In addition, our analysis demonstrates the feasibility of scVAFRNA estimation from current scRNA-seq datasets and shows that the 3′-based library generation protocol of 10×Genomics scRNA-seq data can be informative in SNV-based studies, including analyses of transcriptional kinetics.

Download Full-text

Scirpy: A Scanpy extension for analyzing single-cell T-cell receptor sequencing data

10.1101/2020.04.10.035865 ◽

2020 ◽

Author(s):

Gregor Sturm ◽

Tamas Szabo ◽

Georgios Fotakis ◽

Marlene Haider ◽

Dietmar Rieder ◽

...

Keyword(s):

T Cell ◽

Single Cell ◽

Large Scale ◽

Single Cells ◽

Cell Receptor ◽

Sequencing Data ◽

Seamless Integration ◽

T Cell Phenotypes ◽

Cell Phenotypes

AbstractSummaryAdvances in single-cell technologies have enabled the investigation of T cell phenotypes and repertoires at unprecedented resolution and scale. Bioinformatic methods for the efficient analysis of these large-scale datasets are instrumental for advancing our understanding of adaptive immune responses in cancer, but also in infectious diseases like COVID-19. However, while well-established solutions are accessible for the processing of single-cell transcriptomes, no streamlined pipelines are available for the comprehensive characterization of T cell receptors. Here we propose Scirpy, a scalable Python toolkit that provides simplified access to the analysis and visualization of immune repertoires from single cells and seamless integration with transcriptomic data.Availability and implementationScirpy source code and documentation are available at https://github.com/icbi-lab/scirpy.

Download Full-text

Random forest based similarity learning for single cell RNA sequencing data

10.1101/258699 ◽

2018 ◽

Author(s):

Maziyar Baran Pouyan ◽

Dennis Kostka

Keyword(s):

Data Analysis ◽

Random Forest ◽

Single Cells ◽

R Package ◽

Similarity Learning ◽

Sequencing Data ◽

Genome Wide ◽

Step Procedure ◽

Exploratory Data ◽

Cell Cell

AbstractMotivationGenome-wide transcriptome sequencing applied to single cells (scRNA-seq) is rapidly becoming an assay of choice across many fields of biological and biomedical research. Scientific objectives often revolve around discovery or characterization of types or sub-types of cells, and therefore obtaining accurate cell–cell similarities from scRNA-seq data is critical step in many studies. While rapid advances are being made in the development of tools for scRNA-seq data analysis, few approaches exist that explicitly address this task. Furthermore, abundance and type of noise present in scRNA-seq datasets suggest that application of generic methods, or of methods developed for bulk RNA-seq data, is likely suboptimal.ResultsHere we present RAFSIL, a random forest based approach to learn cell–cell similarities from scRNA-seq data. RAFSIL implements a two-step procedure, where feature construction geared towards scRNA-seq data is followed by similarity learning. It is designed to be adaptable and expandable, and RAFSIL similarities can be used for typical exploratory data analysis tasks like dimension reduction, visualization, and clustering. We show that our approach compares favorably with current methods across a diverse collection of datasets, and that it can be used to detect and highlight unwanted technical variation in scRNA-seq datasets in situations where other methods fail. Overall, RAFSIL implements a flexible approach yielding a useful tool that improves the analysis of scRNA-seq data.Availability and ImplementationThe RAFSIL R package is available online at www.kostkalab.net/software.html

Download Full-text

mDia2 formin selectively interacts with catenins and not E-cadherin to regulate Adherens Junction formation

10.1101/721530 ◽

2019 ◽

Author(s):

Yuqi Zhang ◽

Krista M. Pettee ◽

Kathryn N. Becker ◽

Kathryn M. Eisenmann

Keyword(s):

Single Cell ◽

Actin Polymerization ◽

Plasma Membranes ◽

Adherens Junctions ◽

Single Cells ◽

Critical Role ◽

Cell Junctions ◽

Hek 293 ◽

E Cadherin ◽

Cell Cell

AbstractBackgroundEpithelial ovarian cancer (EOC) cells disseminate within the peritoneal cavity, in part, via the peritoneal fluid as single cells, clusters, or spheroids. Initial single cell egress from a tumor can involve disruption of cell-cell adhesions as cells are shed from the primary tumor into the peritoneum. In epithelial cells, Adherens Junctions (AJs) are characterized by homotypic linkage of E-cadherins on the plasma membranes of adjacent cells. AJs are anchored to the intracellular actin cytoskeletal network through a complex involving E-cadherin, p120 catenin, β-catenin, and αE-catenin. However, the specific players involved in the interaction between the junctional E-cadherin complex and the underlying F-actin network remains unclear. Recent evidence indicates that mammalian Diaphanous-related (mDia) formins plays a key role in epithelial cell AJ formation and maintenance through generation of linear actin filaments. Binding of αE-catenin to linear F-actin inhibits association of the branched-actin nucleator Arp2/3, while favoring linear F-actin bundling. We previously demonstrated that loss of mDia2 was associated with invasive single cell egress from EOC spheroids through disruption of junctional F-actin.ResultsIn the current study, we now show that mDia2 has a role at adherens junctions (AJs) in EOC OVCA429 cells and human embryonic kidney (HEK) 293 cells through its association with αE-catenin and β-catenin. mDia2 depletion in EOC cells leads to reduction in actin polymerization and disruption of cell-cell junctions with decreased interaction between β-catenin and E-cadherin.ConclusionsOur results support a necessary role for mDia2 in AJ stability in EOC cell monolayers and indicate a critical role for mDia formins in regulating EOC AJs during invasive transitions.

Download Full-text

Gene set inference from single-cell sequencing data using a hybrid of matrix factorization and variational autoencoders

10.1101/740415 ◽

2019 ◽

Author(s):

Soeren Lukassen ◽

Foo Wei Ten ◽

Roland Eils ◽

Christian Conrad

Keyword(s):

Neural Network ◽

Single Cell ◽

Network Model ◽

Neural Network Model ◽

Matrix Factorization ◽

Latent Variable ◽

Single Cells ◽

Sequencing Data ◽

Gene Set ◽

Gene Sets

AbstractRecent advances in single-cell RNA sequencing (scRNA-Seq) have driven the simultaneous measurement of the expression of 1,000s of genes in 1,000s of single cells. These growing data sets allow us to model gene sets in biological networks at an unprecedented level of detail, in spite of heterogenous cell populations. Here, we propose an unsupervised deep neural network model that is a hybrid of matrix factorization and conditional variational autoencoders (CVA), which utilizes weights as matrix factorizations to obtain gene sets, while class-specific inputs to the latent variable space facilitate a plausible identification of cell types. This artificial neural network model seamlessly integrates functional gene set inference, experimental batch effect correction, and static gene identification, which we conceptually prove here for three single-cell RNA-Seq datasets and suggest for future single-cell-gene analytics.

Download Full-text

MQuad enables clonal substructure discovery using single cell mitochondrial variants

10.1101/2021.03.27.437331 ◽

2021 ◽

Author(s):

Aaron Wing Cheung Kwok ◽

Chen Qiao ◽

Rongting Huang ◽

Mai-Har Sham ◽

Joshua W. K. Ho ◽

...

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Single Cells ◽

High Sensitivity ◽

Copy Number Variations ◽

Sequencing Data ◽

Single Nucleotide ◽

Single Cell Sequencing ◽

Mtdna Variants ◽

Python Package

AbstractMitochondrial mutations are increasingly recognised as informative endogenous genetic markers that can be used to reconstruct cellular clonal structure using single-cell RNA or DNA sequencing data. However, there is a lack of effective computational methods to identify informative mtDNA variants in noisy and sparse single-cell sequencing data. Here we present an open source computational tool MQuad that accurately calls clonally informative mtDNA variants in a population of single cells, and an analysis suite for complete clonality inference, based on single cell RNA or DNA sequencing data. Through a variety of simulated and experimental single cell sequencing data, we showed that MQuad can identify mitochondrial variants with both high sensitivity and specificity, outperforming existing methods by a large extent. Furthermore, we demonstrated its wide applicability in different single cell sequencing protocols, particularly in complementing single-nucleotide and copy-number variations to extract finer clonal resolution. MQuad is a Python package available via https://github.com/single-cell-genetics/MQuad.

Download Full-text