scholarly journals SurfaceGenie: a web-based application for prioritizing cell-type-specific marker candidates

2020 ◽  
Vol 36 (11) ◽  
pp. 3447-3456 ◽  
Author(s):  
Matthew Waas ◽  
Shana T Snarrenberg ◽  
Jack Littrell ◽  
Rachel A Jones Lipinski ◽  
Polly A Hansen ◽  
...  

Abstract Motivation Cell-type-specific surface proteins can be exploited as valuable markers for a range of applications including immunophenotyping live cells, targeted drug delivery and in vivo imaging. Despite their utility and relevance, the unique combination of molecules present at the cell surface are not yet described for most cell types. A significant challenge in analyzing ‘omic’ discovery datasets is the selection of candidate markers that are most applicable for downstream applications. Results Here, we developed GenieScore, a prioritization metric that integrates a consensus-based prediction of cell surface localization with user-input data to rank-order candidate cell-type-specific surface markers. In this report, we demonstrate the utility of GenieScore for analyzing human and rodent data from proteomic and transcriptomic experiments in the areas of cancer, stem cell and islet biology. We also demonstrate that permutations of GenieScore, termed IsoGenieScore and OmniGenieScore, can efficiently prioritize co-expressed and intracellular cell-type-specific markers, respectively. Availability and implementation Calculation of GenieScores and lookup of SPC scores is made freely accessible via the SurfaceGenie web application: www.cellsurfer.net/surfacegenie. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Matthew Waas ◽  
Shana T. Snarrenberg ◽  
Jack Littrell ◽  
Rachel A. Jones Lipinski ◽  
Polly A. Hansen ◽  
...  

AbstractMotivationCell-type specific surface proteins can be exploited as valuable markers for a range of applications including immunophenotyping live cells, targeted drug delivery, and in vivo imaging. Despite their utility and relevance, the unique combination of molecules present at the cell surface are not yet described for most cell types. A significant challenge in analyzing ‘omic’ discovery datasets is the selection of candidate markers that are most applicable for downstream applications.ResultsHere, we developed GenieScore, a prioritization metric that integrates a consensus-based prediction of cell surface localization with user-input data to rank-order candidate cell-type specific surface markers. In this report, we demonstrate the utility of GenieScore for analyzing human and rodent data from proteomic and transcriptomic experiments in the areas of cancer, stem cell, and islet biology. We also demonstrate that permutations of GenieScore, termed IsoGenieScore and OmniGenieScore, can efficiently prioritize co-expressed and intracellular cell-type specific markers, respectively.Availability and ImplementationCalculation of GenieScores and lookup of SPC scores is made freely accessible via the SurfaceGenie web-application: www.cellsurfer.net/surfacegenie.


2016 ◽  
Vol 83 (5) ◽  
Author(s):  
Rachana Gyawali ◽  
Srijana Upadhyay ◽  
Joshua Way ◽  
Xiaorong Lin

ABSTRACT Cryptococcus neoformans, an opportunistic human fungal pathogen, can undergo a yeast-to-hypha transition in response to environmental cues. This morphological transition is associated with changes in the expression of cell surface proteins. The Cryptococcus cell surface and secreted protein Cfl1 was the first identified adhesin in the Basidiomycota. Cfl1 has been shown to regulate morphology, biofilm formation, and intercellular communication. Four additional homologs of CFL1 are harbored by the Cryptococcus genome: DHA1, DHA2, CPL1, and CFL105. The common features of this gene family are the conserved C-terminal SIGC domain and the presence of an N-terminal signal peptide. We found that all these Cfl1 homolog proteins are indeed secreted extracellularly. Interestingly, some of these secretory proteins display cell type-specific expression patterns: Cfl1 is hypha specific, Dha2 is yeast specific, and Dha1 (delayed hypersensitivity antigen 1) is expressed in all cell types but is particularly enriched at basidia. Interestingly, Dha1 is induced by copper limitation and suppressed by excessive copper in the medium. This study further attests to the physiological heterogeneity of the Cryptococcus mating colony, which is composed of cells with heterogeneous morphotypes. The differential expression of these secretory proteins contributes to heterogeneity, which is beneficial for the fungus to adapt to changing environments. IMPORTANCE Heterogeneity in physiology and morphology is an important bet-hedging strategy for nonmobile microbes such as fungi to adapt to unpredictable environmental changes. Cryptococcus neoformans, a ubiquitous basidiomycetous fungus, is known to switch from the yeast form to the hypha form during sexual development. However, in a mating colony, only a subset of yeast cells switch to hyphae, and only a fraction of the hyphal subpopulation will develop into fruiting bodies, where meiosis and sporulation occur. Here, we investigated a basidiomycete-specific secretory protein family. We found that some of these proteins are cell type specific, thus contributing to the heterogeneity of a mating colony. Our study also demonstrates the importance of examining the protein expression pattern at the individual-cell level in addition to population gene expression profiling for the investigation of a heterogeneous community.


Author(s):  
Yixuan Qiu ◽  
Jiebiao Wang ◽  
Jing Lei ◽  
Kathryn Roeder

Abstract Motivation Marker genes, defined as genes that are expressed primarily in a single cell type, can be identified from the single cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern. Results To capitalize on this pattern, we develop a new algorithm to detect marker genes by combining published information about likely marker genes with bulk transcriptome data in the form of a semi-supervised algorithm. The algorithm then exploits the correlation structure of the bulk data to refine the published marker genes by adding or removing genes from the list. Availability and implementation We implement this method as an R package markerpen, hosted on CRAN (https://CRAN.R-project.org/package=markerpen). Supplementary information Supplementary data are available at Bioinformatics online.


1989 ◽  
Vol 92 (2) ◽  
pp. 231-239
Author(s):  
P.I. Francz ◽  
K. Bayreuther ◽  
H.P. Rodemann

Methods for the selective enrichment of various subpopulations of the human skin fibroblast cell line HH-8 have been developed. These methods permit the selection of homogeneous populations of the three mitotic fibroblast cell types MF I, II and III, and the four postmitotic cell types PMF IV, V, VI and VII. These seven cell types exhibit differentiation-dependent and cell-type-specific patterns of [35S]methionine-labelled polypeptides in total soluble cytoplasmic and nuclear proteins, also in membrane-bound proteins, and in secreted proteins. In the differentiation sequence MF II-MF III-PMF IV - PMF V - PMF VI 14 cell-type-specific marker proteins have been found in the cytoplasmic and nuclear fraction, also 24 cell-type-specific marker proteins have been found in the membrane-bound protein fraction, and 11 cell-type-specific marker proteins in the secreted protein fraction. Markers in spontaneously arising and experimentally selected or induced populations of a single fibroblast cell type were found to be identical.


1986 ◽  
Vol 6 (9) ◽  
pp. 3240-3245
Author(s):  
G A Bannon ◽  
R Perkins-Dameron ◽  
A Allen-Nash

The presence of specific proteins (known as immobilization antigens) on the surface of the ciliated protozoan Tetrahymena thermophila is under environmental regulation. There are five different classes (serotypes) of surface proteins which appear on the cell surface when T. thermophila is cultured under different conditions of temperature or incubation medium; three of these are temperature dependent. The appearance of these proteins on the cell surface is mutually exclusive. We used polyclonal antibodies raised against 30 degrees C (designated SerH3)- and 40 degrees C (designated SerT)-specific surface antigens to study their structure and expression. We showed that these surface proteins contain at least one disulfide bridge. On sodium dodecyl sulfate-denaturing polyacrylamide gels, the nonreduced 30 degrees C- and 40 degrees C-specific surface proteins migrated with molecular sizes of 69 and 36 kilodaltons, respectively. The reduced forms of the proteins migrated with molecular sizes of 58 and 30 kilodaltons, respectively. The synthesis of the surface proteins responded rapidly and with a time course similar to that of the incubation temperature. The synthesis of each surface protein was greatly reduced within 1 h and undetectable by 2 h after a shift to the temperature at which the protein is not expressed. Surface protein synthesis resumed by the end of 1 h after a shift to the temperature at which the protein is expressed. The temperature-dependent induction of these surface proteins appears to be dependent on the synthesis of new mRNA, as indicated by a sensitivity to actinomycin D. Surface protein syntheses were mutually exclusive except at a transition temperature. At 35 degrees C both surface proteins were synthesized by a cell population. These data support the potential of this system as a model for the study of the effects of environmental factors on the genetic regulation of cell surface proteins.


2020 ◽  
Author(s):  
Mohit Goyal ◽  
Guillermo Serrano ◽  
Ilan Shomorony ◽  
Mikel Hernaez ◽  
Idoia Ochoa

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.


2019 ◽  
Author(s):  
Matthew N. Bernstein ◽  
Zhongjie Ma ◽  
Michael Gleicher ◽  
Colin N. Dewey

SummaryCell type annotation is a fundamental task in the analysis of single-cell RNA-sequencing data. In this work, we present CellO, a machine learning-based tool for annotating human RNA-seq data with the Cell Ontology. CellO enables accurate and standardized cell type classification by considering the rich hierarchical structure of known cell types, a source of prior knowledge that is not utilized by existing methods. Furthemore, CellO comes pre-trained on a novel, comprehensive dataset of human, healthy, untreated primary samples in the Sequence Read Archive, which to the best of our knowledge, is the most diverse curated collection of primary cell data to date. CellO’s comprehensive training set enables it to run out-of-the-box on diverse cell types and achieves superior or competitive performance when compared to existing state-of-the-art methods. Lastly, CellO’s linear models are easily interpreted, thereby enabling exploration of cell type-specific expression signatures across the ontology. To this end, we also present the CellO Viewer: a web application for exploring CellO’s models across the ontology.HighlightWe present CellO, a tool for hierarchically classifying cell type from single-cell RNA-seq data against the graph-structured Cell OntologyCellO is pre-trained on a comprehensive dataset comprising nearly all bulk RNA-seq primary cell samples in the Sequence Read ArchiveCellO achieves superior or comparable performance with existing methods while featuring a more comprehensive pre-packaged training setCellO is built with easily interpretable models which we expose through a novel web application, the CellO Viewer, for exploring cell type-specific signatures across the Cell OntologyGraphical Abstract


2019 ◽  
Vol 36 (3) ◽  
pp. 782-788 ◽  
Author(s):  
Jiebiao Wang ◽  
Bernie Devlin ◽  
Kathryn Roeder

Abstract Motivation Patterns of gene expression, quantified at the level of tissue or cells, can inform on etiology of disease. There are now rich resources for tissue-level (bulk) gene expression data, which have been collected from thousands of subjects, and resources involving single-cell RNA-sequencing (scRNA-seq) data are expanding rapidly. The latter yields cell type information, although the data can be noisy and typically are derived from a small number of subjects. Results Complementing these approaches, we develop a method to estimate subject- and cell-type-specific (CTS) gene expression from tissue using an empirical Bayes method that borrows information across multiple measurements of the same tissue per subject (e.g. multiple regions of the brain). Analyzing expression data from multiple brain regions from the Genotype-Tissue Expression project (GTEx) reveals CTS expression, which then permits downstream analyses, such as identification of CTS expression Quantitative Trait Loci (eQTL). Availability and implementation We implement this method as an R package MIND, hosted on https://github.com/randel/MIND. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (22) ◽  
pp. 4767-4769 ◽  
Author(s):  
Charles E Breeze ◽  
Alex P Reynolds ◽  
Jenny van Dongen ◽  
Ian Dunham ◽  
John Lazar ◽  
...  

Abstract Summary The Illumina Infinium EPIC BeadChip is a new high-throughput array for DNA methylation analysis, extending the earlier 450k array by over 400 000 new sites. Previously, a method named eFORGE was developed to provide insights into cell type-specific and cell-composition effects for 450k data. Here, we present a significantly updated and improved version of eFORGE that can analyze both EPIC and 450k array data. New features include analysis of chromatin states, transcription factor motifs and DNase I footprints, providing tools for epigenome-wide association study interpretation and epigenome editing. Availability and implementation eFORGE v2.0 is implemented as a web tool available from https://eforge.altiusinstitute.org and https://eforge-tf.altiusinstitute.org/. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document