scholarly journals geneBasis: an iterative approach for unsupervised selection of targeted gene panels from scRNA-seq.

2021 ◽  
Author(s):  
Alsu Missarova ◽  
Jaison Jain ◽  
Andrew Butler ◽  
Shila Ghazanfar ◽  
Tim Stuart ◽  
...  

The problem of selecting targeted gene panels that capture maximum variability encoded in scRNA-sequencing data has become of great practical importance. scRNA-seq datasets are increasingly being used to identify gene panels that can be probed using alternative molecular technologies, such as spatial transcriptomics. In this context, the number of genes that can be probed is an important limiting factor, so choosing the best subset of genes is vital. Existing methods for this task are limited by either a reliance on pre-existing cell type labels or by difficulties in identifying markers of rare cell types. We resolve this by introducing an iterative approach, geneBasis, for selecting an optimal gene panel, where each newly added gene captures the maximum distance between the true manifold and the manifold constructed using the currently selected gene panel. We demonstrate, using a variety of metrics and diverse datasets, that our approach outperforms existing strategies, and can not only resolve cell types but also more subtle cell state differences. Our approach is available as an open source, easy-to-use, documented R package (https://github.com/MarioniLab/geneBasisR).

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Alsu Missarova ◽  
Jaison Jain ◽  
Andrew Butler ◽  
Shila Ghazanfar ◽  
Tim Stuart ◽  
...  

AbstractscRNA-seq datasets are increasingly used to identify gene panels that can be probed using alternative technologies, such as spatial transcriptomics, where choosing the best subset of genes is vital. Existing methods are limited by a reliance on pre-existing cell type labels or by difficulties in identifying markers of rare cells. We introduce an iterative approach, geneBasis, for selecting an optimal gene panel, where each newly added gene captures the maximum distance between the true manifold and the manifold constructed using the currently selected gene panel. Our approach outperforms existing strategies and can resolve cell types and subtle cell state differences.


2021 ◽  
Author(s):  
Yang Young Lu ◽  
Timothy C. Yu ◽  
Giancarlo Bonora ◽  
William Stafford Noble

AbstractA common workflow in single-cell RNA-seq analysis is to project the data to a latent space, cluster the cells in that space, and identify sets of marker genes that explain the differences among the discovered clusters. A primary drawback to this three-step procedure is that each step is carried out independently, thereby neglecting the effects of the nonlinear embedding and inter-gene dependencies on the selection of marker genes. Here we propose an integrated deep learning frame-work, Adversarial Clustering Explanation (ACE), that bundles all three steps into a single workflow. The method thus moves away from the notion of “marker genes” to instead identify a panel of explanatory genes. This panel may include genes that are not only enriched but also depleted relative to other cell types, as well as genes that exhibit differences between closely related cell types. Empirically, we demonstrate that ACE is able to identify gene panels that are both highly discriminative and nonredundant, and we demonstrate the applicability of ACE to an image recognition task.


2021 ◽  
Author(s):  
Daniel Osorio ◽  
Marieke Lydia Kuijjer ◽  
James J. Cai

Motivation: Characterizing cells with rare molecular phenotypes is one of the promises of high throughput single-cell RNA sequencing (scRNA-seq) techniques. However, collecting enough cells with the desired molecular phenotype in a single experiment is challenging, requiring several samples preprocessing steps to filter and collect the desired cells experimentally before sequencing. Data integration of multiple public single-cell experiments stands as a solution for this problem, allowing the collection of enough cells exhibiting the desired molecular signatures. By increasing the sample size of the desired cell type, this approach enables a robust cell type transcriptome characterization. Results: Here, we introduce rPanglaoDB, an R package to download and merge the uniformly processed and annotated scRNA-seq data provided by the PanglaoDB database. To show the potential of rPanglaoDB for collecting rare cell types by integrating multiple public datasets, we present a biological application collecting and characterizing a set of 157 fibrocytes. Fibrocytes are a rare monocyte-derived cell type, that exhibits both the inflammatory features of macrophages and the tissue remodeling properties of fibroblasts. This constitutes the first fibrocytes' unbiased transcriptome profile report. We compared the transcriptomic profile of the fibrocytes against the fibroblasts collected from the same tissue samples and confirm their associated relationship with healing processes in tissue damage and infection through the activation of the prostaglandin biosynthesis and regulation pathway. Availability and Implementation: rPanglaoDB is implemented as an R package available through the CRAN repositories https://CRAN.R-project.org/package=rPanglaoDB.


2020 ◽  
Author(s):  
Yun Zhang ◽  
Brian D. Aevermann ◽  
Trygve E. Bakken ◽  
Jeremy A. Miller ◽  
Rebecca D. Hodge ◽  
...  

AbstractSingle cell/nucleus RNA sequencing (scRNAseq) is emerging as an essential tool to unravel the phenotypic heterogeneity of cells in complex biological systems. While computational methods for scRNAseq cell type clustering have advanced, the ability to integrate datasets to identify common and novel cell types across experiments remains a challenge. Here, we introduce a cluster-to-cluster cell type matching method – FR-Match – that utilizes supervised feature selection for dimensionality reduction and incorporates shared information among cells to determine whether two cell type clusters share the same underlying multivariate gene expression distribution. FR-Match is benchmarked with existing cell-to-cell and cell-to-cluster cell type matching methods using both simulated and real scRNAseq data. FR-Match proved to be a stringent method that produced fewer erroneous matches of distinct cell subtypes and had the unique ability to identify novel cell phenotypes in new datasets. In silico validation demonstrated that the proposed workflow is the only self-contained algorithm that was robust to increasing numbers of true negatives (i.e. non-represented cell types). FR-Match was applied to two human brain scRNAseq datasets sampled from cortical layer 1 and full thickness middle temporal gyrus. When mapping cell types identified in specimens isolated from these overlapping human brain regions, FR-Match precisely recapitulated the laminar characteristics of matched cell type clusters, reflecting their distinct neuroanatomical distributions. An R package and Shiny application are provided at https://github.com/JCVenterInstitute/FRmatch for users to interactively explore and match scRNAseq cell type clusters with complementary visualization tools.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Clémentine Decamps ◽  
◽  
Florian Privé ◽  
Raphael Bacher ◽  
Daniel Jost ◽  
...  

Abstract Background Cell-type heterogeneity of tumors is a key factor in tumor progression and response to chemotherapy. Tumor cell-type heterogeneity, defined as the proportion of the various cell-types in a tumor, can be inferred from DNA methylation of surgical specimens. However, confounding factors known to associate with methylation values, such as age and sex, complicate accurate inference of cell-type proportions. While reference-free algorithms have been developed to infer cell-type proportions from DNA methylation, a comparative evaluation of the performance of these methods is still lacking. Results Here we use simulations to evaluate several computational pipelines based on the software packages MeDeCom, EDec, and RefFreeEWAS. We identify that accounting for confounders, feature selection, and the choice of the number of estimated cell types are critical steps for inferring cell-type proportions. We find that removal of methylation probes which are correlated with confounder variables reduces the error of inference by 30–35%, and that selection of cell-type informative probes has similar effect. We show that Cattell’s rule based on the scree plot is a powerful tool to determine the number of cell-types. Once the pre-processing steps are achieved, the three deconvolution methods provide comparable results. We observe that all the algorithms’ performance improves when inter-sample variation of cell-type proportions is large or when the number of available samples is large. We find that under specific circumstances the methods are sensitive to the initialization method, suggesting that averaging different solutions or optimizing initialization is an avenue for future research. Conclusion Based on the lessons learned, to facilitate pipeline validation and catalyze further pipeline improvement by the community, we develop a benchmark pipeline for inference of cell-type proportions and implement it in the R package medepir.


Author(s):  
Yun Zhang ◽  
Brian D Aevermann ◽  
Trygve E Bakken ◽  
Jeremy A Miller ◽  
Rebecca D Hodge ◽  
...  

Abstract Single cell/nucleus RNA sequencing (scRNAseq) is emerging as an essential tool to unravel the phenotypic heterogeneity of cells in complex biological systems. While computational methods for scRNAseq cell type clustering have advanced, the ability to integrate datasets to identify common and novel cell types across experiments remains a challenge. Here, we introduce a cluster-to-cluster cell type matching method—FR-Match—that utilizes supervised feature selection for dimensionality reduction and incorporates shared information among cells to determine whether two cell type clusters share the same underlying multivariate gene expression distribution. FR-Match is benchmarked with existing cell-to-cell and cell-to-cluster cell type matching methods using both simulated and real scRNAseq data. FR-Match proved to be a stringent method that produced fewer erroneous matches of distinct cell subtypes and had the unique ability to identify novel cell phenotypes in new datasets. In silico validation demonstrated that the proposed workflow is the only self-contained algorithm that was robust to increasing numbers of true negatives (i.e. non-represented cell types). FR-Match was applied to two human brain scRNAseq datasets sampled from cortical layer 1 and full thickness middle temporal gyrus. When mapping cell types identified in specimens isolated from these overlapping human brain regions, FR-Match precisely recapitulated the laminar characteristics of matched cell type clusters, reflecting their distinct neuroanatomical distributions. An R package and Shiny application are provided at https://github.com/JCVenterInstitute/FRmatch for users to interactively explore and match scRNAseq cell type clusters with complementary visualization tools.


2014 ◽  
Author(s):  
Felix A. Klein ◽  
Tibor Pakozdi ◽  
Simon Anders ◽  
Yad Ghavi-Helm ◽  
Eileen E. M. Furlong ◽  
...  

Abstract Motivation: Circularized Chromosome Conformation Capture (4C) is a powerful technique for studying the spatial interactions of a specific genomic region called the ?view- point? with the rest of the genome, both in a single condition or comparing different experimental conditions or cell types. Observed ligation frequencies show a strong, regular dependence on genomic distance from the viewpoint, on top of which specific interaction peaks are superimposed. Here, we address the computational task to find these specific interactions and to detect changes between interaction profiles of different conditions. Results: We model the overall trend of decreasing interaction frequency with genomic distance by fitting a smooth monotonously decreasing function to suitably trans- formed count data. Based on the fit, z-scores are calculated from the residuals, with high z scores being interpreted as peaks providing evidence for specific interactions. To compare different conditions, we normalize fragment counts between samples, and call for differential contact frequencies using the statisti- cal method DESeq2 adapted from RNA-Seq analysis. Availability and Implementation: A full end-to-end analysis pipeline is implemented in the R package FourCSeq available at www.bioconductor.org.


Genes ◽  
2021 ◽  
Vol 12 (9) ◽  
pp. 1427
Author(s):  
Beryl Royer-Bertrand ◽  
Katarina Cisarova ◽  
Florence Niel-Butschi ◽  
Laureane Mittaz-Crettol ◽  
Heidi Fodstad ◽  
...  

To assess the potential of detecting copy number variations (CNVs) directly from exome sequencing (ES) data in diagnostic settings, we developed a CNV-detection pipeline based on ExomeDepth software and applied it to ES data of 450 individuals. Initially, only CNVs affecting genes in the requested diagnostic gene panels were scored and tested against arrayCGH results. Pathogenic CNVs were detected in 18 individuals. Most detected CNVs were larger than 400 kb (11/18), but three individuals had small CNVs impacting one or a few exons only and were thus not detectable by arrayCGH. Conversely, two pathogenic CNVs were initially missed, as they impacted genes not included in the original gene panel analysed, and a third one was missed as it was in a poorly covered region. The overall combined diagnostic rate (SNVs + CNVs) in our cohort was 36%, with wide differences between clinical domains. We conclude that (1) the ES-based CNV pipeline detects efficiently large and small pathogenic CNVs, (2) the detection of CNV relies on uniformity of sequencing and good coverage, and (3) in patients who remain unsolved by the gene panel analysis, CNV analysis should be extended to all captured genes, as diagnostically relevant CNVs may occur everywhere in the genome.


2020 ◽  
Author(s):  
Vy Nguyen ◽  
Johannes Griss

AbstractMotivationAutomatic cell type identification in scRNA-seq datasets is an essential method to alleviate a key bottleneck in scRNA-seq data analysis. While most existing tools show good sensitivity and specificity in classifying cell types, they often fail to adequately not-classify cells that are not present in the used reference.ResultsscClassifR is a novel R package that provides a complete framework to automatically classify cells in scRNA-seq datasets. It supports both Seurat and Bioconductor’s SingleCellExperiment and is thereby compatible with the vast majority of R-based analysis workflows. scClassifR uses hierarchically organised SVMs to distinguish a specific cell type versus all others. It shows comparable or even superior sensitivity and specificity compared to existing tools while being robust in not-classifying unknown cell types. As a unique feature, it reports ambiguous cell assignments, including the respective probabilities. Finally, scClassifR provides dedicated functions to train and evaluate classifiers for additional cell types.Availability and ImplementationscClassifR is freely available on GitHub (https://github.com/grisslab/scClassifR).


2020 ◽  
Vol 57 (3) ◽  
pp. 181-189
Author(s):  
Asma Majid ◽  
GA Parray ◽  
NR Sofi ◽  
Gazala H Khan ◽  
Showkat A Waza ◽  
...  

Rice being a staple food crop of Kashmir valley, the focus is on enhancement of yield in order to meet the needs of ever-growing population.Identification of new parental lines is crucial for developing ecology-specific hybrids with ideal agronomic performance. Exploitation of heterosis in the form of hybrid rice technology can be one of the approaches to increase productivity in this crop, especially exploiting diversity among japonica lines can serve as an excellent route.A number of CMS lines suitable formountainous areas of Kashmir have been developed, however, the availability of promising restorer lines remains to be the major limitation for utilization of these lines.Identification of potential restorers acts as the main limiting factor for hybrid development in the Kashmir valley. Marker based screening for Rf3 and Rf4 fertility restorer genes can be helpful in rapid selection of restorer lines while dealing with the large quantity of genetic materials. In the present study, 100 rice germplasm were screened with the help of SSR markers, RM3148 and RM6100linked to Rf3 and Rf4 genes on chromosome 1 and 10, respectively. In total, 19 lines revealed the presence of both Rf3 and Rf4 genes. These lines amplified fertility restorer specific alleles for both the genes and may serve as potential restorers for obtaining heterotic rice hybrids. Further the germplasm lines were also evaluated for yield and quality traits.The present results would help in selection of suitable restorers along with preferred grain shape/size.


Sign in / Sign up

Export Citation Format

Share Document