scholarly journals geneBasis: an iterative approach for unsupervised selection of targeted gene panels from scRNA-seq

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Alsu Missarova ◽  
Jaison Jain ◽  
Andrew Butler ◽  
Shila Ghazanfar ◽  
Tim Stuart ◽  
...  

AbstractscRNA-seq datasets are increasingly used to identify gene panels that can be probed using alternative technologies, such as spatial transcriptomics, where choosing the best subset of genes is vital. Existing methods are limited by a reliance on pre-existing cell type labels or by difficulties in identifying markers of rare cells. We introduce an iterative approach, geneBasis, for selecting an optimal gene panel, where each newly added gene captures the maximum distance between the true manifold and the manifold constructed using the currently selected gene panel. Our approach outperforms existing strategies and can resolve cell types and subtle cell state differences.

2021 ◽  
Author(s):  
Alsu Missarova ◽  
Jaison Jain ◽  
Andrew Butler ◽  
Shila Ghazanfar ◽  
Tim Stuart ◽  
...  

The problem of selecting targeted gene panels that capture maximum variability encoded in scRNA-sequencing data has become of great practical importance. scRNA-seq datasets are increasingly being used to identify gene panels that can be probed using alternative molecular technologies, such as spatial transcriptomics. In this context, the number of genes that can be probed is an important limiting factor, so choosing the best subset of genes is vital. Existing methods for this task are limited by either a reliance on pre-existing cell type labels or by difficulties in identifying markers of rare cell types. We resolve this by introducing an iterative approach, geneBasis, for selecting an optimal gene panel, where each newly added gene captures the maximum distance between the true manifold and the manifold constructed using the currently selected gene panel. We demonstrate, using a variety of metrics and diverse datasets, that our approach outperforms existing strategies, and can not only resolve cell types but also more subtle cell state differences. Our approach is available as an open source, easy-to-use, documented R package (https://github.com/MarioniLab/geneBasisR).


2021 ◽  
Author(s):  
Yang Young Lu ◽  
Timothy C. Yu ◽  
Giancarlo Bonora ◽  
William Stafford Noble

AbstractA common workflow in single-cell RNA-seq analysis is to project the data to a latent space, cluster the cells in that space, and identify sets of marker genes that explain the differences among the discovered clusters. A primary drawback to this three-step procedure is that each step is carried out independently, thereby neglecting the effects of the nonlinear embedding and inter-gene dependencies on the selection of marker genes. Here we propose an integrated deep learning frame-work, Adversarial Clustering Explanation (ACE), that bundles all three steps into a single workflow. The method thus moves away from the notion of “marker genes” to instead identify a panel of explanatory genes. This panel may include genes that are not only enriched but also depleted relative to other cell types, as well as genes that exhibit differences between closely related cell types. Empirically, we demonstrate that ACE is able to identify gene panels that are both highly discriminative and nonredundant, and we demonstrate the applicability of ACE to an image recognition task.


1986 ◽  
Vol 84 (1) ◽  
pp. 69-92
Author(s):  
T.D. Oberley ◽  
A.H. Yang ◽  
J. Gould-Kostka

Adult guinea pig glomeruli were grown in vitro either in serum or in a chemically defined medium. Glomeruli were plated either directly into plastic flasks or into plastic flasks that had been coated with the extracellular matrix produced by the PF-HR-9 mouse teratocarcinoma endodermal cell line. Both the composition of the medium and the nature of the culture substrate affected whole glomerular attachment and the type of cells produced in culture. Quantitative studies demonstrated selection of cell types by different culture conditions. Three colony types, each composed of distinctive cell types, could be identified by morphological features. The cells constituting two of these colony types were epithelial in nature, but they were identified as different epithelial types by both histochemical and ultrastructural criteria. Previous studies suggested that one epithelial cell type was derived from the glomerular visceral epithelial cell. This study demonstrates that this cell type could be selectively grown in defined medium on plastic. A second cell type showed several features of renal tubular epithelial cells, including histochemical staining for catalase, cell surface microvilli and cilia, and formation of hemicysts and structures that resembled tubules after prolonged periods in culture. To demonstrate that the ‘glomerulus-derived’ tubular cells were indeed tubular epithelium, we isolated purified renal cortical tubules (greater than 99% pure) and cultured them on the HR-9 matrix in a serum-free chemically defined medium. The resultant outgrowths had morphological properties identical to those of the glomerulus-derived tubular cells. It seems likely that small tubular fragments attached to a minority of the glomeruli are the source of these glomerulus-derived tubular cells. Neither epithelial cell type could be subcultured on plastic, but both could be passaged on the HR-9 matrix. A third cell type, the spindle-shaped cell, was easily propagated on both plastic and the HR-9 matrix. The origin of this cell type is not clear. Our results demonstrate the important effect of culture conditions on the selection, growth and differentiation of kidney cell types in vitro.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Clémentine Decamps ◽  
◽  
Florian Privé ◽  
Raphael Bacher ◽  
Daniel Jost ◽  
...  

Abstract Background Cell-type heterogeneity of tumors is a key factor in tumor progression and response to chemotherapy. Tumor cell-type heterogeneity, defined as the proportion of the various cell-types in a tumor, can be inferred from DNA methylation of surgical specimens. However, confounding factors known to associate with methylation values, such as age and sex, complicate accurate inference of cell-type proportions. While reference-free algorithms have been developed to infer cell-type proportions from DNA methylation, a comparative evaluation of the performance of these methods is still lacking. Results Here we use simulations to evaluate several computational pipelines based on the software packages MeDeCom, EDec, and RefFreeEWAS. We identify that accounting for confounders, feature selection, and the choice of the number of estimated cell types are critical steps for inferring cell-type proportions. We find that removal of methylation probes which are correlated with confounder variables reduces the error of inference by 30–35%, and that selection of cell-type informative probes has similar effect. We show that Cattell’s rule based on the scree plot is a powerful tool to determine the number of cell-types. Once the pre-processing steps are achieved, the three deconvolution methods provide comparable results. We observe that all the algorithms’ performance improves when inter-sample variation of cell-type proportions is large or when the number of available samples is large. We find that under specific circumstances the methods are sensitive to the initialization method, suggesting that averaging different solutions or optimizing initialization is an avenue for future research. Conclusion Based on the lessons learned, to facilitate pipeline validation and catalyze further pipeline improvement by the community, we develop a benchmark pipeline for inference of cell-type proportions and implement it in the R package medepir.


1990 ◽  
Vol 10 (2) ◽  
pp. 662-671 ◽  
Author(s):  
R P Hershberger ◽  
L A Culp

Fibronectin polypeptide diversity is generated to a large extent by alternative splicing of the fibronectin primary transcript at three sites: two extra domain exons encoding extra structural repeats and a region of nonhomologous sequence termed the type-III connecting segment (IIICS). A novel double primer extension assay was developed to identify and quantify simultaneously each of the five human IIICS mRNA splicing variants. Expression of the five IIICS variants was analyzed in a variety of human normal and tumor cell types as well as in human liver. Differences in IIICS expression patterns were observed among different cell types, among fibroblasts of different tissue origins, and between comparable normal and transformed cells. The most predominant cell-type-specific differences were in the abundance of the one IIICS- mRNA variant relative to the four IIICS+ variants. The percentage of O variant (IIICS-) mRNAs within the total fibronectin mRNA pool varied between 3 and 17% among tumor cells and between 7 and 46% among normal cells. The O variant composed 57% of the fibronectin mRNA in liver tissue, correlating with the previously described increased abundance of IIICS- polypeptide subunits in plasma fibronectin, compared with those in cellular fibronectins. Additional cell-type-specific changes among the expression levels of the four IIICS+ mRNA variants are consistent with a proposed model in which regulation of an alternative selection of a 3'splice site predominates over regulation of the selection of a 5' splice site in generating specific patterns of IIICS mRNA expression.


2019 ◽  
Author(s):  
Aleksandr Ianevski ◽  
Anil K Giri ◽  
Tero Aittokallio

AbstractSingle-cell transcriptomics enables systematic charting of cellular composition of complex tissues. Identification of cell populations often relies on unsupervised clustering of cells based on the similarity of the scRNA-seq profiles, followed by manual annotation of cell clusters using established marker genes. However, manual selection of marker genes for cell-type annotation is a laborious and error-prone task since the selected markers must be specific both to the individual cell clusters and various cell types. Here, we developed a computational method, termed ScType, which enables data-driven selection of marker genes based solely on given scRNA-seq data. Using a compendium of 7 scRNA-seq datasets from various human and mouse tissues, we demonstrate how ScType enables unbiased, accurate and fully-automated single-cell type annotation by guaranteeing the specificity of marker genes both across cell clusters and cell types. The widely-applicable method is implemented as an interactive web-tool (https://sctype.fimm.fi), connected with comprehensive database of specific markers.


2021 ◽  
Author(s):  
Fabio Sacher ◽  
Christian Feregrino ◽  
Patrick Tschopp ◽  
Collin Y. Ewald

AbstractTranscriptomic signatures based on cellular mRNA expression profiles can be used to categorize cell types and states. Yet whether different functional groups of genes perform better or worse in this process remains largely unexplored. Here we test the core matrisome - that is, all genes coding for structural proteins of the extracellular matrix - for its ability to delineate distinct cell types in embryonic single-cell RNA-sequencing (scRNA-seq) data. We show that even though expressed core matrisome genes correspond to less than 2% of an entire cellular transcriptome, their RNA expression levels suffice to recapitulate important aspects of cell type-specific clustering. Notably, using scRNA-seq data from the embryonic limb, we demonstrate that core matrisome gene expression outperforms random gene subsets of similar sizes and can match and exceed the predictive power of transcription factors. While transcription factor signatures generally perform better in predicting cell types at early stages of chicken and mouse limb development, i.e., when cells are less differentiated, the information content of the core matrisome signature increases in more differentiated cells. Our findings suggest that each cell type produces its own unique extracellular matrix, or matreotype, which becomes progressively more refined and cell type-specific as embryonic tissues mature.HighlightsCell types produce unique extracellular matrix compositionsDynamic extracellular matrix gene expression profiles hold predictive power for cell type and cell state identification


1990 ◽  
Vol 10 (2) ◽  
pp. 662-671
Author(s):  
R P Hershberger ◽  
L A Culp

Fibronectin polypeptide diversity is generated to a large extent by alternative splicing of the fibronectin primary transcript at three sites: two extra domain exons encoding extra structural repeats and a region of nonhomologous sequence termed the type-III connecting segment (IIICS). A novel double primer extension assay was developed to identify and quantify simultaneously each of the five human IIICS mRNA splicing variants. Expression of the five IIICS variants was analyzed in a variety of human normal and tumor cell types as well as in human liver. Differences in IIICS expression patterns were observed among different cell types, among fibroblasts of different tissue origins, and between comparable normal and transformed cells. The most predominant cell-type-specific differences were in the abundance of the one IIICS- mRNA variant relative to the four IIICS+ variants. The percentage of O variant (IIICS-) mRNAs within the total fibronectin mRNA pool varied between 3 and 17% among tumor cells and between 7 and 46% among normal cells. The O variant composed 57% of the fibronectin mRNA in liver tissue, correlating with the previously described increased abundance of IIICS- polypeptide subunits in plasma fibronectin, compared with those in cellular fibronectins. Additional cell-type-specific changes among the expression levels of the four IIICS+ mRNA variants are consistent with a proposed model in which regulation of an alternative selection of a 3'splice site predominates over regulation of the selection of a 5' splice site in generating specific patterns of IIICS mRNA expression.


2020 ◽  
Author(s):  
Abhinav Kaushik ◽  
Diane Dunham ◽  
Ziyuan He ◽  
Monali Manohar ◽  
Manisha Desai ◽  
...  

AbstractFor immune system monitoring in large-scale studies at the single-cell resolution using CyTOF, (semi-)automated computational methods are applied for annotating live cells of mixed cell types. Here, we show that the live cell pool can be highly enriched with undefined heterogeneous cells, i.e. ‘ungated’ cells, and that current (semi-)automated approaches ignore their modeling resulting in misclassified annotations. Therefore, we introduce ‘CyAnno’, a novel semi-automated approach for deconvoluting the unlabeled cytometry dataset based on a machine learning framework utilizing manually gated training data that allows the integrative modeling of ‘gated’ cell types and the ‘ungated’ cells. By applying this framework on several CyTOF datasets, we demonstrated that including the ‘ungated’ cells can lead to a significant increase in the prediction accuracy of the ‘gated’ cell types. CyAnno can be used to identify even a single cell type, including rare cells, with higher efficacy than current state-of-the-art semi-automated approaches.


Author(s):  
G. Rowden ◽  
M. G. Lewis ◽  
T. M. Phillips

Langerhans cells of mammalian stratified squamous epithelial have proven to be an enigma since their discovery in 1868. These dendritic suprabasal cells have been considered as related to melanocytes either as effete cells, or as post divisional products. Although grafting experiments seemed to demonstrate the independence of the cell types, much confusion still exists. The presence in the epidermis of a cell type with morphological features seemingly shared by melanocytes and Langerhans cells has been especially troublesome. This so called "indeterminate", or " -dendritic cell" lacks both Langerhans cells granules and melanosomes, yet it is clearly not a keratinocyte. Suggestions have been made that it is related to either Langerhans cells or melanocyte. Recent studies have unequivocally demonstrated that Langerhans cells are independent cells with immune function. They display Fc and C3 receptors on their surface as well as la (immune region associated) antigens.


Sign in / Sign up

Export Citation Format

Share Document