scholarly journals Per-sample standardization and asymmetric winsorization lead to accurate clustering of RNA-seq expression profiles

2020 ◽  
Author(s):  
Davide Risso ◽  
Stefano M. Pagnotta

AbstractMotivationData transformations are an important step in the analysis of RNA-seq data. Nonetheless, the impact of transformations on the outcome of unsupervised clustering procedures is still unclear.ResultsHere, we present an Asymmetric Winsorization per Sample Transformation (AWST), which is robust to data perturbations and removes the need for selecting the most informative genes prior to sample clustering. Our procedure leads to robust and biologically meaningful clusters both in bulk and in single-cell applications.AvailabilityThe AWST method is available at https://github.com/drisso/awst. The code to reproduce the analyses is available at https://github.com/drisso/awst_analysis.

Author(s):  
Davide Risso ◽  
Stefano Maria Pagnotta

Abstract Motivation Data transformations are an important step in the analysis of RNA-seq data. Nonetheless, the impact of transformation on the outcome of unsupervised clustering procedures is still unclear. Results Here, we present an Asymmetric Winsorization per Sample Transformation (AWST), which is robust to data perturbations and removes the need for selecting the most informative genes prior to sample clustering. Our procedure leads to robust and biologically meaningful clusters both in bulk and in single-cell applications. Availability The AWST method is available at https://github.com/drisso/awst. The code to reproduce the analyses is available at https://github.com/drisso/awst\_analysis. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Lili Blumenberg ◽  
Kelly V. Ruggles

AbstractUnsupervised clustering is a common and exceptionally useful tool for large biological datasets. However, clustering requires upfront algorithm and hyperparameter selection, which can introduce bias into the final clustering labels. It is therefore advisable to obtain a range of clustering results from multiple models and hyperparameters, which can be cumbersome and slow. To streamline this process, we present hypercluster, a python package and SnakeMake pipeline for flexible and parallelized clustering evaluation and selection. Hypercluster is available on bioconda; installation, documentation and example workflows can be found at: https://github.com/ruggleslab/hypercluster.Author summaryUnsupervised clustering is a technique for grouping similar samples within a dataset. It is extremely common when analyzing big data from patient samples, or high throughput techniques like single cell RNA-seq. When researchers use unsupervised clustering, they have to select parameters that affect the final result—for instance, how many groups they expect to find or what the smallest group is allowed to be. Some methods require setting even less intuitive parameters. For most applications, it is extremely challenging to guess what the values of these parameters should be; therefore to prevent introducing bias into the final results, researchers should test many different parameters and methods to find the best groups. This process is cumbersome, slow and challenging to perform in a reproducible way. We developed hypercluster, a tool that automates this process, make it much faster, and presenting the results in a reproducible and helpful manner.


2017 ◽  
Author(s):  
Vladimir Yu Kiselev ◽  
Andrew Yiu ◽  
Martin Hemberg

AbstractSingle-cell RNA-seq (scRNA-seq) is widely used to investigate the composition of complex tissues1–9 since the technology allows researchers to define cell-types using unsupervised clustering of the transcriptome8,10. However, due to differences in experimental methods and computational analyses, it is often challenging to directly compare the cells identified in two different experiments. Here, we present scmap (http://bioconductor.org/packages/scmap), a method for projecting cells from a scRNA-seq experiment onto the cell-types or individual cells identified in other experiments (the application can be run for free, without restrictions, from http://www.hemberg-lab.cloud/scmap).


2016 ◽  
Author(s):  
Atray Dixit

AbstractAs part of the process of preparing scRNA-seq libraries, a diverse template is typically amplified by PCR. During amplification, spurious chimeric molecules can be formed between molecules originating in different cells. While several computational and experimental strategies have been suggested to mitigate the impact of chimeric molecules, they have not been addressed in the context of scRNA-seq experiments. We demonstrate that chimeras become increasingly problematic as samples are sequenced deeply and propose two computational solutions. The first is unsupervised and relies only on cell barcode and UMI information. The second is a supervised approach built on labeled data and a set of molecule specific features. The classifier can accurately identify most of the contaminating molecules in a deeply sequenced species mixing dataset. Code is publicly available at https://github.com/asncd/schimera.


2019 ◽  
Author(s):  
Elaine Y. Cao ◽  
John F. Ouyang ◽  
Owen J.L. Rackham

AbstractSummaryEmerging single-cell RNA-seq technologies has made it possible to capture and assess the gene expression of individual cells. Based on the similarity of gene expression profiles, many tools have been developed to generate an in silico ordering of cells in the form of pseudo-time trajectories. However, these tools do not provide a means to find the ordering of critical gene expression changes over pseudo-time. We present GeneSwitches, a tool that takes any single-cell pseudo-time trajectory and determines the precise order of gene-expression and functional-event changes over time. GeneSwitches uses a statistical framework based on logistic regression to identify the order in which genes are either switched on or off along pseudo-time. With this information, users can identify the order in which surface markers appear, investigate how functional ontologies are gained or lost over time, and compare the ordering of switching genes from two related pseudo-temporal processes.AvailabilityGeneSwitches is available at https://geneswitches.ddnetbio.comContactowen.rackham@duke-nus.edu.sgSupplementary Informationis available at http://www.ddnetbio.com/files/GeneSwitches_SI.pdf


Aquaculture ◽  
2021 ◽  
pp. 737194
Author(s):  
Lingzhan Xue ◽  
Dan Jia ◽  
Luohao Xu ◽  
Zhen Huang ◽  
Haiping Fan ◽  
...  

2020 ◽  
Author(s):  
Silvia Llonch ◽  
Montserrat Barragán ◽  
Paula Nieto ◽  
Anna Mallol ◽  
Marc Elosua-Bayes ◽  
...  

AbstractStudy questionTo which degree does maternal age affect the transcriptome of human oocytes at the germinal vesicle (GV) stage or at metaphase II after maturation in vitro (IVM-MII)?Summary answerWhile the oocytes’ transcriptome is predominantly determined by maturation stage, transcript levels of genes related to chromosome segregation, mitochondria and RNA processing are affected by age after in vitro maturation of denuded oocytes.What is known alreadyFemale fertility is inversely correlated with maternal age due to both a depletion of the oocyte pool and a reduction in oocyte developmental competence. Few studies have addressed the effect of maternal age on the human mature oocyte (MII) transcriptome, which is established during oocyte growth and maturation, and the pathways involved remain unclear. Here, we characterize and compare the transcriptomes of a large cohort of fully grown GV and IVM-MII oocytes from women of varying reproductive age.Study design, size, durationIn this prospective molecular study, 37 women were recruited from May 2018 to June 2019. The mean age was 28.8 years (SD=7.7, range 18-43). A total of 72 oocytes were included in the study at GV stage after ovarian stimulation, and analyzed as GV (n=40) and in vitro matured oocytes (IVM-MII; n=32).Participants/materials, setting, methodsDenuded oocytes were included either as GV at the time of ovum pick-up or as IVM-MII after in vitro maturation for 30 hours in G2™ medium, and processed for transcriptomic analysis by single-cell RNA-seq using the Smart-seq2 technology. Cluster and maturation stage marker analysis were performed using the Seurat R package. Genes with an average fold change greater than 2 and a p-value < 0.01 were considered maturation stage markers. A Pearson correlation test was used to identify genes whose expression levels changed progressively with age. Those genes presenting a correlation value (R) >= |0.3| and a p-value < 0.05 were considered significant.Main results and the role of chanceFirst, by exploration of the RNA-seq data using tSNE dimensionality reduction, we identified two clusters of cells reflecting the oocyte maturation stage (GV and IVM-MII) with 4,445 and 324 putative marker genes, respectively. Next we identified genes, for which RNA levels either progressively increased or decreased with age. This analysis was performed independently for GV and IVM-MII oocytes. Our results indicate that the transcriptome is more affected by age in IVM-MII oocytes (1,219 genes) than in GV oocytes (596 genes). In particular, we found that genes involved in chromosome segregation and RNA splicing significantly increase in transcript levels with age, while genes related to mitochondrial activity present lower transcript levels with age. Gene regulatory network analysis revealed potential upstream master regulator functions for genes whose transcript levels present positive (GPBP1, RLF, SON, TTF1) or negative (BNC1, THRB) correlation with age.Limitations, reasons for cautionIVM-MII oocytes used in this study were obtained after in vitro maturation of denuded GV oocytes, therefore, their transcriptome might not be fully representative of in vivo matured MII oocytes.The Smart-seq2 methodology used in this study detects polyadenylated transcripts only and we could therefore not assess non-polyadenylated transcripts.Wider implications of the findingsOur analysis suggests that advanced maternal age does not globally affect the oocyte transcriptome at GV or IVM-MII stages. Nonetheless, hundreds of genes displayed altered transcript levels with age, particularly in IVM-MII oocytes. Especially affected by age were genes related to chromosome segregation and mitochondrial function, pathways known to be involved in oocyte ageing. Our study thereby suggests that misregulation of chromosome segregation and mitochondrial pathways also at the RNA-level might contribute to the age-related quality decline in human oocytes.Study funding/competing interest(s)This study was funded by the AXA research fund, the European commission, intramural funding of Clinica EUGIN, the Spanish Ministry of Science, Innovation and Universities, the Catalan Agència de Gestió d’Ajuts Universitaris i de Recerca (AGAUR) and by contributions of the Spanish Ministry of Economy, Industry and Competitiveness (MEIC) to the EMBL partnership and to the “Centro de Excelencia Severo Ochoa”.The authors have no conflict of interest to declare.


2020 ◽  
Author(s):  
Ruben Chazarra-Gil ◽  
Stijn van Dongen ◽  
Vladimir Yu Kiselev ◽  
Martin Hemberg

AbstractAs the cost of single-cell RNA-seq experiments has decreased, an increasing number of datasets are now available. Combining newly generated and publicly accessible datasets is challenging due to non-biological signals, commonly known as batch effects. Although there are several computational methods available that can remove batch effects, evaluating which method performs best is not straightforward. Here we present BatchBench (https://github.com/cellgeni/batchbench), a modular and flexible pipeline for comparing batch correction methods for single-cell RNA-seq data. We apply BatchBench to eight methods, highlighting their methodological differences and assess their performance and computational requirements through a compendium of well-studied datasets. This systematic comparison guides users in the choice of batch correction tool, and the pipeline makes it easy to evaluate other datasets.


2019 ◽  
Vol 20 (5) ◽  
pp. 310-310 ◽  
Author(s):  
Vladimir Yu Kiselev ◽  
Tallulah S. Andrews ◽  
Martin Hemberg

Sign in / Sign up

Export Citation Format

Share Document