scholarly journals Correcting Chimeric Crosstalk in Single Cell RNA-seq Experiments

2016 ◽  
Author(s):  
Atray Dixit

AbstractAs part of the process of preparing scRNA-seq libraries, a diverse template is typically amplified by PCR. During amplification, spurious chimeric molecules can be formed between molecules originating in different cells. While several computational and experimental strategies have been suggested to mitigate the impact of chimeric molecules, they have not been addressed in the context of scRNA-seq experiments. We demonstrate that chimeras become increasingly problematic as samples are sequenced deeply and propose two computational solutions. The first is unsupervised and relies only on cell barcode and UMI information. The second is a supervised approach built on labeled data and a set of molecule specific features. The classifier can accurately identify most of the contaminating molecules in a deeply sequenced species mixing dataset. Code is publicly available at https://github.com/asncd/schimera.

2020 ◽  
Author(s):  
Davide Risso ◽  
Stefano M. Pagnotta

AbstractMotivationData transformations are an important step in the analysis of RNA-seq data. Nonetheless, the impact of transformations on the outcome of unsupervised clustering procedures is still unclear.ResultsHere, we present an Asymmetric Winsorization per Sample Transformation (AWST), which is robust to data perturbations and removes the need for selecting the most informative genes prior to sample clustering. Our procedure leads to robust and biologically meaningful clusters both in bulk and in single-cell applications.AvailabilityThe AWST method is available at https://github.com/drisso/awst. The code to reproduce the analyses is available at https://github.com/drisso/awst_analysis.


2020 ◽  
Author(s):  
Silvia Llonch ◽  
Montserrat Barragán ◽  
Paula Nieto ◽  
Anna Mallol ◽  
Marc Elosua-Bayes ◽  
...  

AbstractStudy questionTo which degree does maternal age affect the transcriptome of human oocytes at the germinal vesicle (GV) stage or at metaphase II after maturation in vitro (IVM-MII)?Summary answerWhile the oocytes’ transcriptome is predominantly determined by maturation stage, transcript levels of genes related to chromosome segregation, mitochondria and RNA processing are affected by age after in vitro maturation of denuded oocytes.What is known alreadyFemale fertility is inversely correlated with maternal age due to both a depletion of the oocyte pool and a reduction in oocyte developmental competence. Few studies have addressed the effect of maternal age on the human mature oocyte (MII) transcriptome, which is established during oocyte growth and maturation, and the pathways involved remain unclear. Here, we characterize and compare the transcriptomes of a large cohort of fully grown GV and IVM-MII oocytes from women of varying reproductive age.Study design, size, durationIn this prospective molecular study, 37 women were recruited from May 2018 to June 2019. The mean age was 28.8 years (SD=7.7, range 18-43). A total of 72 oocytes were included in the study at GV stage after ovarian stimulation, and analyzed as GV (n=40) and in vitro matured oocytes (IVM-MII; n=32).Participants/materials, setting, methodsDenuded oocytes were included either as GV at the time of ovum pick-up or as IVM-MII after in vitro maturation for 30 hours in G2™ medium, and processed for transcriptomic analysis by single-cell RNA-seq using the Smart-seq2 technology. Cluster and maturation stage marker analysis were performed using the Seurat R package. Genes with an average fold change greater than 2 and a p-value < 0.01 were considered maturation stage markers. A Pearson correlation test was used to identify genes whose expression levels changed progressively with age. Those genes presenting a correlation value (R) >= |0.3| and a p-value < 0.05 were considered significant.Main results and the role of chanceFirst, by exploration of the RNA-seq data using tSNE dimensionality reduction, we identified two clusters of cells reflecting the oocyte maturation stage (GV and IVM-MII) with 4,445 and 324 putative marker genes, respectively. Next we identified genes, for which RNA levels either progressively increased or decreased with age. This analysis was performed independently for GV and IVM-MII oocytes. Our results indicate that the transcriptome is more affected by age in IVM-MII oocytes (1,219 genes) than in GV oocytes (596 genes). In particular, we found that genes involved in chromosome segregation and RNA splicing significantly increase in transcript levels with age, while genes related to mitochondrial activity present lower transcript levels with age. Gene regulatory network analysis revealed potential upstream master regulator functions for genes whose transcript levels present positive (GPBP1, RLF, SON, TTF1) or negative (BNC1, THRB) correlation with age.Limitations, reasons for cautionIVM-MII oocytes used in this study were obtained after in vitro maturation of denuded GV oocytes, therefore, their transcriptome might not be fully representative of in vivo matured MII oocytes.The Smart-seq2 methodology used in this study detects polyadenylated transcripts only and we could therefore not assess non-polyadenylated transcripts.Wider implications of the findingsOur analysis suggests that advanced maternal age does not globally affect the oocyte transcriptome at GV or IVM-MII stages. Nonetheless, hundreds of genes displayed altered transcript levels with age, particularly in IVM-MII oocytes. Especially affected by age were genes related to chromosome segregation and mitochondrial function, pathways known to be involved in oocyte ageing. Our study thereby suggests that misregulation of chromosome segregation and mitochondrial pathways also at the RNA-level might contribute to the age-related quality decline in human oocytes.Study funding/competing interest(s)This study was funded by the AXA research fund, the European commission, intramural funding of Clinica EUGIN, the Spanish Ministry of Science, Innovation and Universities, the Catalan Agència de Gestió d’Ajuts Universitaris i de Recerca (AGAUR) and by contributions of the Spanish Ministry of Economy, Industry and Competitiveness (MEIC) to the EMBL partnership and to the “Centro de Excelencia Severo Ochoa”.The authors have no conflict of interest to declare.


2020 ◽  
Author(s):  
Ruben Chazarra-Gil ◽  
Stijn van Dongen ◽  
Vladimir Yu Kiselev ◽  
Martin Hemberg

AbstractAs the cost of single-cell RNA-seq experiments has decreased, an increasing number of datasets are now available. Combining newly generated and publicly accessible datasets is challenging due to non-biological signals, commonly known as batch effects. Although there are several computational methods available that can remove batch effects, evaluating which method performs best is not straightforward. Here we present BatchBench (https://github.com/cellgeni/batchbench), a modular and flexible pipeline for comparing batch correction methods for single-cell RNA-seq data. We apply BatchBench to eight methods, highlighting their methodological differences and assess their performance and computational requirements through a compendium of well-studied datasets. This systematic comparison guides users in the choice of batch correction tool, and the pipeline makes it easy to evaluate other datasets.


Author(s):  
Davide Risso ◽  
Stefano Maria Pagnotta

Abstract Motivation Data transformations are an important step in the analysis of RNA-seq data. Nonetheless, the impact of transformation on the outcome of unsupervised clustering procedures is still unclear. Results Here, we present an Asymmetric Winsorization per Sample Transformation (AWST), which is robust to data perturbations and removes the need for selecting the most informative genes prior to sample clustering. Our procedure leads to robust and biologically meaningful clusters both in bulk and in single-cell applications. Availability The AWST method is available at https://github.com/drisso/awst. The code to reproduce the analyses is available at https://github.com/drisso/awst\_analysis. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Massimo Andreatta ◽  
Santiago J. Carmona

AbstractComputational tools for the integration of single-cell transcriptomics data are designed to correct batch effects between technical replicates or different technologies applied to the same population of cells. However, they have inherent limitations when applied to heterogeneous sets of data with moderate overlap in cell states or sub-types. STACAS is a package for the identification of integration anchors in the Seurat environment, optimized for the integration of datasets that share only a subset of cell types. We demonstrate that by i) correcting batch effects while preserving relevant biological variability across datasets, ii) filtering aberrant integration anchors with a quantitative distance measure, and iii) constructing optimal guide trees for integration, STACAS can accurately align scRNA-seq datasets composed of only partially overlapping cell populations. We anticipate that the algorithm will be a useful tool for the construction of comprehensive single-cell atlases by integration of the growing amount of single-cell data becoming available in public repositories.Code availabilityR package:https://github.com/carmonalab/STACASDocker image:https://hub.docker.com/repository/docker/mandrea1/stacas_demo


2017 ◽  
Author(s):  
Luke Zappia ◽  
Belinda Phipson ◽  
Alicia Oshlack

AbstractAs single-cell RNA-sequencing (scRNA-seq) datasets have become more widespread the number of tools designed to analyse these data has dramatically increased. Navigating the vast sea of tools now available is becoming increasingly challenging for researchers. In order to better facilitate selection of appropriate analysis tools we have created the scRNA-tools database (www.scRNA-tools.org) to catalogue and curate analysis tools as they become available. Our database collects a range of information on each scRNA-seq analysis tool and categorises them according to the analysis tasks they perform. Exploration of this database gives insights into the areas of rapid development of analysis methods for scRNA-seq data. We see that many tools perform tasks specific to scRNA-seq analysis, particularly clustering and ordering of cells. We also find that the scRNA-seq community embraces an open-source approach, with most tools available under open-source licenses and preprints being extensively used as a means to describe methods. The scRNA-tools database provides a valuable resource for researchers embarking on scRNA-seq analysis and records of the growth of the field over time.Author summaryIn recent years single-cell RNA-sequeing technologies have emerged that allow scientists to measure the activity of genes in thousands of individual cells simultaneously. This means we can start to look at what each cell in a sample is doing instead of considering an average across all cells in a sample, as was the case with older technologies. However, while access to this kind of data presents a wealth of opportunities it comes with a new set of challenges. Researchers across the world have developed new methods and software tools to make the most of these datasets but the field is moving at such a rapid pace it is difficult to keep up with what is currently available. To make this easier we have developed the scRNA-tools database and website (www.scRNA-tools.org). Our database catalogues analysis tools, recording the tasks they can be used for, where they can be downloaded from and the publications that describe how they work. By looking at this database we can see that developers have focued on methods specific to single-cell data and that they embrace an open-source approach with permissive licensing, sharing of code and preprint publications.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Jun Xu ◽  
Caitlin Falconer ◽  
Quan Nguyen ◽  
Joanna Crawford ◽  
Brett D. McKinnon ◽  
...  

AbstractA variety of methods have been developed to demultiplex pooled samples in a single cell RNA sequencing (scRNA-seq) experiment which either require hashtag barcodes or sample genotypes prior to pooling. We introduce scSplit which utilizes genetic differences inferred from scRNA-seq data alone to demultiplex pooled samples. scSplit also enables mapping clusters to original samples. Using simulated, merged, and pooled multi-individual datasets, we show that scSplit prediction is highly concordant with demuxlet predictions and is highly consistent with the known truth in cell-hashing dataset. scSplit is ideally suited to samples without external genotype information and is available at: https://github.com/jon-xu/scSplit


2021 ◽  
Author(s):  
Arda Durmaz ◽  
Jacob G. Scott

ABSTRACTTranscriptional dynamics of evolutionary processes through time are highly complex and require single-cell resolution datasets. This is especially important in cancer during the evolution of resistance, where stochasticity can lead to selection for divergent transcriptional mechanisms. Statistical methods developed to address various questions in single-cell datasets are prone to variability and require careful adjustments of multiple parameter space. To assess the impact of this variation, we utilized commonly used single-cell RNA-Seq analysis tools in a combinatorial fashion to evaluate how repeatable the results are when different methods are combined. In the context of clustering and trajectory estimation, we benchmark the combinatorial space and highlight ares and methods that are sensitive to parameter changes. We have observed that utilizing temporal information in a supervised framework or regularization in latent modeling reduces variability leading to improved overlap when different parameters/methods are used. We hope that future studies can benefit from the results presented here as use of scRNA-Seq analysis tools as out of the box is becoming a standard approach in cancer research.


2018 ◽  
Author(s):  
Jesse M. Zhang ◽  
Govinda M. Kamath ◽  
David N. Tse

SummarySingle-cell computational pipelines involve two critical steps: organizing cells (clustering) and identifying the markers driving this organization (differential expression analysis). State-of-the-art pipelines perform differential analysis after clustering on the same dataset. We observe that because clustering forces separation, reusing the same dataset generates artificially low p-values and hence false discoveries. We introduce a valid post-clustering differential analysis framework which corrects for this problem. We provide software at https://github.com/jessemzhang/tn_test.


Author(s):  
Lili Blumenberg ◽  
Kelly V. Ruggles

AbstractUnsupervised clustering is a common and exceptionally useful tool for large biological datasets. However, clustering requires upfront algorithm and hyperparameter selection, which can introduce bias into the final clustering labels. It is therefore advisable to obtain a range of clustering results from multiple models and hyperparameters, which can be cumbersome and slow. To streamline this process, we present hypercluster, a python package and SnakeMake pipeline for flexible and parallelized clustering evaluation and selection. Hypercluster is available on bioconda; installation, documentation and example workflows can be found at: https://github.com/ruggleslab/hypercluster.Author summaryUnsupervised clustering is a technique for grouping similar samples within a dataset. It is extremely common when analyzing big data from patient samples, or high throughput techniques like single cell RNA-seq. When researchers use unsupervised clustering, they have to select parameters that affect the final result—for instance, how many groups they expect to find or what the smallest group is allowed to be. Some methods require setting even less intuitive parameters. For most applications, it is extremely challenging to guess what the values of these parameters should be; therefore to prevent introducing bias into the final results, researchers should test many different parameters and methods to find the best groups. This process is cumbersome, slow and challenging to perform in a reproducible way. We developed hypercluster, a tool that automates this process, make it much faster, and presenting the results in a reproducible and helpful manner.


Sign in / Sign up

Export Citation Format

Share Document