scholarly journals Big data and single cell transcriptomics: implications for ontological representation

2018 ◽  
Author(s):  
Brian D. Aevermann ◽  
Mark Novotny ◽  
Trygve Bakken ◽  
Jeremy A. Miller ◽  
Alexander D. Diehl ◽  
...  

AbstractCells are fundamental functional units of multicellular organisms, with different cell types playing distinct physiological roles in the body. The recent advent of single cell transcriptional profiling using RNA sequencing is producing “big data”, enabling the identification of novel human cell types at an unprecedented rate. In this review, we summarize recent work characterizing cell types in the human central nervous and immune systems using single cell and single nuclei RNA sequencing, and discuss the implications that these discoveries are having on the representation of cell types in the reference Cell Ontology (CL). We propose a method based on random forest machine learning for identifying sets of necessary and sufficient marker genes that can be used to assemble consistent and reproducible cell type definitions for incorporation into the CL. The representation of defined cell type classes and their relationships in the CL using this strategy will make the cell type classes findable, accessible, interoperable, and reusable (FAIR), allowing the CL to serve as a reference knowledgebase of information about the role that distinct cellular phenotypes play in human health and disease.


2019 ◽  
Author(s):  
Feiyang Ma ◽  
Matteo Pellegrini

AbstractCell type identification is one of the major goals in single cell RNA sequencing (scRNA-seq). Current methods for assigning cell types typically involve the use of unsupervised clustering, the identification of signature genes in each cluster, followed by a manual lookup of these genes in the literature and databases to assign cell types. However, there are several limitations associated with these approaches, such as unwanted sources of variation that influence clustering and a lack of canonical markers for certain cell types. Here, we present ACTINN (Automated Cell Type Identification using Neural Networks), which employs a neural network with 3 hidden layers, trains on datasets with predefined cell types, and predicts cell types for other datasets based on the trained parameters. We trained the neural network on a mouse cell type atlas (Tabula Muris Atlas) and a human immune cell dataset, and used it to predict cell types for mouse leukocytes, human PBMCs and human T cell sub types. The results showed that our neural network is fast and accurate, and should therefore be a useful tool to complement existing scRNA-seq pipelines.Author SummarySingle cell RNA sequencing (scRNA-seq) provides high resolution profiling of the transcriptomes of individual cells, which inevitably results in high volumes of data that require complex data processing pipelines. Usually, one of the first steps in the analysis of scRNA-seq is to assign individual cells to known cell types. To accomplish this, traditional methods first group the cells into different clusters, then find marker genes, and finally use these to manually assign cell types for each cluster. Thus these methods require prior knowledge of cell type canonical markers, and some level of subjectivity to make the cell type assignments. As a result, the process is often laborious and requires domain specific expertise, which is a barrier for inexperienced users. By contrast, our neural network ACTINN automatically learns the features for each predefined cell type and uses these features to predict cell types for individual cells. This approach is computationally efficient and requires no domain expertise of the tissues being studied. We believe ACTINN allows users to rapidly identify cell types in their datasets, thus rendering the analysis of their scRNA-seq datasets more efficient.



2020 ◽  
Author(s):  
Brian Aevermann ◽  
Yun Zhang ◽  
Mark Novotny ◽  
Trygve Bakken ◽  
Jeremy Miller ◽  
...  

AbstractSingle cell genomics is rapidly advancing our knowledge of cell phenotypic types and states. Driven by single cell/nucleus RNA sequencing (scRNA-seq) data, comprehensive atlas projects covering a wide range of organisms and tissues are currently underway. As a result, it is critical that the cell transcriptional phenotypes discovered are defined and disseminated in a consistent and concise manner. Molecular biomarkers have historically played an important role in biological research, from defining immune cell-types by surface protein expression to defining diseases by molecular drivers. Here we describe a machine learning-based marker gene selection algorithm, NS-Forest version 2.0, which leverages the non-linear attributes of random forest feature selection and a binary expression scoring approach to discover the minimal marker gene expression combinations that precisely captures the cell type identity represented in the complete scRNA-seq transcriptional profiles. The marker genes selected provide a barcode of the necessary and sufficient characteristics for semantic cell type definition and serve as useful tools for downstream biological investigation. The use of NS-Forest to identify marker genes for human brain middle temporal gyrus cell types reveals the importance of cell signaling and non-coding RNAs in neuronal cell type identity.



2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ann J. Ligocki ◽  
Wen Fury ◽  
Christian Gutierrez ◽  
Christina Adler ◽  
Tao Yang ◽  
...  

AbstractBulk RNA sequencing of a tissue captures the gene expression profile from all cell types combined. Single-cell RNA sequencing identifies discrete cell-signatures based on transcriptomic identities. Six adult human corneas were processed for single-cell RNAseq and 16 cell clusters were bioinformatically identified. Based on their transcriptomic signatures and RNAscope results using representative cluster marker genes on human cornea cross-sections, these clusters were confirmed to be stromal keratocytes, endothelium, several subtypes of corneal epithelium, conjunctival epithelium, and supportive cells in the limbal stem cell niche. The complexity of the epithelial cell layer was captured by eight distinct corneal clusters and three conjunctival clusters. These were further characterized by enriched biological pathways and molecular characteristics which revealed novel groupings related to development, function, and location within the epithelial layer. Moreover, epithelial subtypes were found to reflect their initial generation in the limbal region, differentiation, and migration through to mature epithelial cells. The single-cell map of the human cornea deepens the knowledge of the cellular subsets of the cornea on a whole genome transcriptional level. This information can be applied to better understand normal corneal biology, serve as a reference to understand corneal disease pathology, and provide potential insights into therapeutic approaches.



2020 ◽  
Author(s):  
Mohit Goyal ◽  
Guillermo Serrano ◽  
Ilan Shomorony ◽  
Mikel Hernaez ◽  
Idoia Ochoa

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.



PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0254194
Author(s):  
Hong-Tae Park ◽  
Woo Bin Park ◽  
Suji Kim ◽  
Jong-Sung Lim ◽  
Gyoungju Nah ◽  
...  

Mycobacterium avium subsp. paratuberculosis (MAP) is a causative agent of Johne’s disease, which is a chronic and debilitating disease in ruminants. MAP is also considered to be a possible cause of Crohn’s disease in humans. However, few studies have focused on the interactions between MAP and human macrophages to elucidate the pathogenesis of Crohn’s disease. We sought to determine the initial responses of human THP-1 cells against MAP infection using single-cell RNA-seq analysis. Clustering analysis showed that THP-1 cells were divided into seven different clusters in response to phorbol-12-myristate-13-acetate (PMA) treatment. The characteristics of each cluster were investigated by identifying cluster-specific marker genes. From the results, we found that classically differentiated cells express CD14, CD36, and TLR2, and that this cell type showed the most active responses against MAP infection. The responses included the expression of proinflammatory cytokines and chemokines such as CCL4, CCL3, IL1B, IL8, and CCL20. In addition, the Mreg cell type, a novel cell type differentiated from THP-1 cells, was discovered. Thus, it is suggested that different cell types arise even when the same cell line is treated under the same conditions. Overall, analyzing gene expression patterns via scRNA-seq classification allows a more detailed observation of the response to infection by each cell type.



2018 ◽  
Author(s):  
Douglas Abrams ◽  
Parveen Kumar ◽  
R. Krishna Murthy Karuturi ◽  
Joshy George

AbstractBackgroundThe advent of single cell RNA sequencing (scRNA-seq) enabled researchers to study transcriptomic activity within individual cells and identify inherent cell types in the sample. Although numerous computational tools have been developed to analyze single cell transcriptomes, there are no published studies and analytical packages available to guide experimental design and to devise suitable analysis procedure for cell type identification.ResultsWe have developed an empirical methodology to address this important gap in single cell experimental design and analysis into an easy-to-use tool called SCEED (Single Cell Empirical Experimental Design and analysis). With SCEED, user can choose a variety of combinations of tools for analysis, conduct performance analysis of analytical procedures and choose the best procedure, and estimate sample size (number of cells to be profiled) required for a given analytical procedure at varying levels of cell type rarity and other experimental parameters. Using SCEED, we examined 3 single cell algorithms using 48 simulated single cell datasets that were generated for varying number of cell types and their proportions, number of genes expressed per cell, number of marker genes and their fold change, and number of single cells successfully profiled in the experiment.ConclusionsBased on our study, we found that when marker genes are expressed at fold change of 4 or more than the rest of the genes, either Seurat or Simlr algorithm can be used to analyze single cell dataset for any number of single cells isolated (minimum 1000 single cells were tested). However, when marker genes are expected to be only up to fC 2 upregulated, choice of the single cell algorithm is dependent on the number of single cells isolated and proportion of rare cell type to be identified. In conclusion, our work allows the assessment of various single cell methods and also aids in examining the single cell experimental design.



2018 ◽  
Author(s):  
Wennan Chang ◽  
Changlin Wan ◽  
Xiaoyu Lu ◽  
Szu-wei Tu ◽  
Yifan Sun ◽  
...  

AbstractWe developed a novel deconvolution method, namely Inference of Cell Types and Deconvolution (ICTD) that addresses the fundamental issue of identifiability and robustness in current tissue data deconvolution problem. ICTD provides substantially new capabilities for omics data based characterization of a tissue microenvironment, including (1) maximizing the resolution in identifying resident cell and sub types that truly exists in a tissue, (2) identifying the most reliable marker genes for each cell type, which are tissue and data set specific, (3) handling the stability problem with co-linear cell types, (4) co-deconvoluting with available matched multi-omics data, and (5) inferring functional variations specific to one or several cell types. ICTD is empowered by (i) rigorously derived mathematical conditions of identifiable cell type and cell type specific functions in tissue transcriptomics data and (ii) a semi supervised approach to maximize the knowledge transfer of cell type and functional marker genes identified in single cell or bulk cell data in the analysis of tissue data, and (iii) a novel unsupervised approach to minimize the bias brought by training data. Application of ICTD on real and single cell simulated tissue data validated that the method has consistently good performance for tissue data coming from different species, tissue microenvironments, and experimental platforms. Other than the new capabilities, ICTD outperformed other state-of-the-art devolution methods on prediction accuracy, the resolution of identifiable cell, detection of unknown sub cell types, and assessment of cell type specific functions. The premise of ICTD also lies in characterizing cell-cell interactions and discovering cell types and prognostic markers that are predictive of clinical outcomes.



Cephalalgia ◽  
2018 ◽  
Vol 38 (13) ◽  
pp. 1976-1983 ◽  
Author(s):  
William Renthal

Background Migraine is a debilitating disorder characterized by severe headaches and associated neurological symptoms. A key challenge to understanding migraine has been the cellular complexity of the human brain and the multiple cell types implicated in its pathophysiology. The present study leverages recent advances in single-cell transcriptomics to localize the specific human brain cell types in which putative migraine susceptibility genes are expressed. Methods The cell-type specific expression of both familial and common migraine-associated genes was determined bioinformatically using data from 2,039 individual human brain cells across two published single-cell RNA sequencing datasets. Enrichment of migraine-associated genes was determined for each brain cell type. Results Analysis of single-brain cell RNA sequencing data from five major subtypes of cells in the human cortex (neurons, oligodendrocytes, astrocytes, microglia, and endothelial cells) indicates that over 40% of known migraine-associated genes are enriched in the expression profiles of a specific brain cell type. Further analysis of neuronal migraine-associated genes demonstrated that approximately 70% were significantly enriched in inhibitory neurons and 30% in excitatory neurons. Conclusions This study takes the next step in understanding the human brain cell types in which putative migraine susceptibility genes are expressed. Both familial and common migraine may arise from dysfunction of discrete cell types within the neurovascular unit, and localization of the affected cell type(s) in an individual patient may provide insight into to their susceptibility to migraine.



Author(s):  
Jun Cheng ◽  
Wenduo Gu ◽  
Ting Lan ◽  
Jiacheng Deng ◽  
Zhichao Ni ◽  
...  

Abstract Aims Hypertension is a major risk factor for cardiovascular diseases. However, vascular remodelling, a hallmark of hypertension, has not been systematically characterized yet. We described systematic vascular remodelling, especially the artery type- and cell type-specific changes, in hypertension using spontaneously hypertensive rats (SHRs). Methods and results Single-cell RNA sequencing was used to depict the cell atlas of mesenteric artery (MA) and aortic artery (AA) from SHRs. More than 20 000 cells were included in the analysis. The number of immune cells more than doubled in aortic aorta in SHRs compared to Wistar Kyoto controls, whereas an expansion of MA mesenchymal stromal cells (MSCs) was observed in SHRs. Comparison of corresponding artery types and cell types identified in integrated datasets unravels dysregulated genes specific for artery types and cell types. Intersection of dysregulated genes with curated gene sets including cytokines, growth factors, extracellular matrix (ECM), receptors, etc. revealed vascular remodelling events involving cell–cell interaction and ECM re-organization. Particularly, AA remodelling encompasses upregulated cytokine genes in smooth muscle cells, endothelial cells, and especially MSCs, whereas in MA, change of genes involving the contractile machinery and downregulation of ECM-related genes were more prominent. Macrophages and T cells within the aorta demonstrated significant dysregulation of cellular interaction with vascular cells. Conclusion Our findings provide the first cell landscape of resistant and conductive arteries in hypertensive animal models. Moreover, it also offers a systematic characterization of the dysregulated gene profiles with unbiased, artery type-specific and cell type-specific manners during hypertensive vascular remodelling.



2019 ◽  
Vol 21 (5) ◽  
pp. 1581-1595 ◽  
Author(s):  
Xinlei Zhao ◽  
Shuang Wu ◽  
Nan Fang ◽  
Xiao Sun ◽  
Jue Fan

Abstract Single-cell RNA sequencing (scRNA-seq) has been rapidly developing and widely applied in biological and medical research. Identification of cell types in scRNA-seq data sets is an essential step before in-depth investigations of their functional and pathological roles. However, the conventional workflow based on clustering and marker genes is not scalable for an increasingly large number of scRNA-seq data sets due to complicated procedures and manual annotation. Therefore, a number of tools have been developed recently to predict cell types in new data sets using reference data sets. These methods have not been generally adapted due to a lack of tool benchmarking and user guidance. In this article, we performed a comprehensive and impartial evaluation of nine classification software tools specifically designed for scRNA-seq data sets. Results showed that Seurat based on random forest, SingleR based on correlation analysis and CaSTLe based on XGBoost performed better than others. A simple ensemble voting of all tools can improve the predictive accuracy. Under nonideal situations, such as small-sized and class-imbalanced reference data sets, tools based on cluster-level similarities have superior performance. However, even with the function of assigning ‘unassigned’ labels, it is still challenging to catch novel cell types by solely using any of the single-cell classifiers. This article provides a guideline for researchers to select and apply suitable classification tools in their analysis workflows and sheds some lights on potential direction of future improvement on classification tools.



Sign in / Sign up

Export Citation Format

Share Document