Deciphering cell lineage specification of human lung adenocarcinoma with single-cell RNA sequencing

Abstract Lung adenocarcinomas (LUAD) start as precancerous lesions such as atypical adenomatous hyperplasia (AAH), develop stepwise into adenocarcinoma in situ (AIS) and minimally invasive adenocarcinoma (MIA), then eventually progress toward invasive adenocarcinoma (IA). To date the cellular heterogeneity across these distinct clinical stages and the underlying molecular events driving tumor progression remain largely unclear. In this study, we performed single-cell RNA sequencing on 52 specimens from 25 patients spanning the four clinical stages. By assessing the expression pattern of marker genes among 268,471 cells, we identified 16 major cell types. We demonstrated that AT2 feature cell types (AT2-like cells) were associated with malignant composition. AT2-like subcluster emerged first in AAH and partially lost AT2 cell transcriptional identity, accompanied with a gain of stemness during cell transition. In addition, genes related to energy metabolism, ribosome synthesis were upregulated in the early stage of LUAD, leading us to identify new markers including miRNA10 and β-hydroxybutyric acid to diagnose early-stage LUAD noninvasively in the blood. We also identified MDK and TIMP1 as potential biomarkers to facilitate our understanding of LUAD pathogenesis. Taken together, our data identified a new mechanism in LUAD evolution, and provided a robust basis for diagnosis and treatment of LUAD.

Download Full-text

Deciphering cell lineage specification of human lung adenocarcinoma with single-cell RNA sequencing

Nature Communications ◽

10.1038/s41467-021-26770-2 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Zhoufeng Wang ◽

Zhe Li ◽

Kun Zhou ◽

Chengdi Wang ◽

Lili Jiang ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Precancerous Lesions ◽

Cellular Heterogeneity ◽

Alveolar Type ◽

Adenocarcinoma In Situ ◽

Atypical Adenomatous Hyperplasia ◽

Adenomatous Hyperplasia ◽

Invasive Adenocarcinoma ◽

Single Cell Rna Sequencing

AbstractLung adenocarcinomas (LUAD) arise from precancerous lesions such as atypical adenomatous hyperplasia, which progress into adenocarcinoma in situ and minimally invasive adenocarcinoma, then finally into invasive adenocarcinoma. The cellular heterogeneity and molecular events underlying this stepwise progression remain unclear. In this study, we perform single-cell RNA sequencing of 268,471 cells collected from 25 patients in four histologic stages of LUAD and compare them to normal cell types. We detect a group of cells closely resembling alveolar type 2 cells (AT2) that emerged during atypical adenomatous hyperplasia and whose transcriptional profile began to diverge from that of AT2 cells as LUAD progressed, taking on feature characteristic of stem-like cells. We identify genes related to energy metabolism and ribosome synthesis that are upregulated in early stages of LUAD and may promote progression. MDK and TIMP1 could be potential biomarkers for understanding LUAD pathogenesis. Our work shed light on the underlying transcriptional signatures of distinct histologic stages of LUAD progression and our findings may facilitate early diagnosis.

Download Full-text

Evaluation of single-cell classifiers for single-cell RNA sequencing data sets

Briefings in Bioinformatics ◽

10.1093/bib/bbz096 ◽

2019 ◽

Vol 21 (5) ◽

pp. 1581-1595 ◽

Cited By ~ 6

Author(s):

Xinlei Zhao ◽

Shuang Wu ◽

Nan Fang ◽

Xiao Sun ◽

Jue Fan

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Reference Data ◽

Predictive Accuracy ◽

Cell Types ◽

Superior Performance ◽

Marker Genes ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Abstract Single-cell RNA sequencing (scRNA-seq) has been rapidly developing and widely applied in biological and medical research. Identification of cell types in scRNA-seq data sets is an essential step before in-depth investigations of their functional and pathological roles. However, the conventional workflow based on clustering and marker genes is not scalable for an increasingly large number of scRNA-seq data sets due to complicated procedures and manual annotation. Therefore, a number of tools have been developed recently to predict cell types in new data sets using reference data sets. These methods have not been generally adapted due to a lack of tool benchmarking and user guidance. In this article, we performed a comprehensive and impartial evaluation of nine classification software tools specifically designed for scRNA-seq data sets. Results showed that Seurat based on random forest, SingleR based on correlation analysis and CaSTLe based on XGBoost performed better than others. A simple ensemble voting of all tools can improve the predictive accuracy. Under nonideal situations, such as small-sized and class-imbalanced reference data sets, tools based on cluster-level similarities have superior performance. However, even with the function of assigning ‘unassigned’ labels, it is still challenging to catch novel cell types by solely using any of the single-cell classifiers. This article provides a guideline for researchers to select and apply suitable classification tools in their analysis workflows and sheds some lights on potential direction of future improvement on classification tools.

Download Full-text

Information Theoretic Feature Selection Methods for Single Cell RNA-Sequencing

10.1101/646919 ◽

2019 ◽

Author(s):

Umang Varma ◽

Justin Colacino ◽

Anna Gilbert

Keyword(s):

Feature Selection ◽

Single Cell ◽

Rna Sequencing ◽

Complex Mixture ◽

Cell Types ◽

Marker Genes ◽

Selection Methods ◽

Information Theoretic ◽

Single Cell Rna Sequencing ◽

Information Theoretic Methods

AbstractSingle cell RNA-sequencing (scRNA-seq) technologies have generated an expansive amount of new biological information, revealing new cellular populations and hierarchical relationships. A number of technologies complementary to scRNA-seq rely on the selection of a smaller number of marker genes (or features) to accurately differentiate cell types within a complex mixture of cells. In this paper, we benchmark differential expression methods against information-theoretic feature selection methods to evaluate the ability of these algorithms to identify small and efficient sets of genes that are informative about cell types. Unlike differential methods, that are strictly binary and univariate, information-theoretic methods can be used as any combination of binary or multiclass and univariate or multivariate. We show for some datasets, information theoretic methods can reveal genes that are both distinct from those selected by traditional algorithms and that are as informative, if not more, of the class labels. We also present detailed and principled theoretical analyses of these algorithms. All information theoretic methods in this paper are implemented in our PicturedRocks Python package that is compatible with the widely used scanpy package.

Download Full-text

Single-cell RNA sequencing reveals heterogeneous tumor and immune cell populations in early-stage lung adenocarcinomas harboring EGFR mutations

Oncogene ◽

10.1038/s41388-020-01528-0 ◽

2020 ◽

Cited By ~ 1

Author(s):

Di He ◽

Di Wang ◽

Ping Lu ◽

Nan Yang ◽

Zhigang Xue ◽

...

Keyword(s):

Gene Expression ◽

T Cells ◽

Single Cell ◽

Tumor Cells ◽

Rna Sequencing ◽

Early Stage ◽

Cell Types ◽

Egfr Mutations ◽

Advanced Tumor ◽

Single Cell Rna Sequencing

Abstract Lung adenocarcinoma (LUAD) harboring EGFR mutations prevails in Asian population. However, the inter-patient and intra-tumor heterogeneity has not been addressed at single-cell resolution. Here we performed single-cell RNA sequencing (scRNA-seq) of total 125,674 cells from seven stage-I/II LUAD samples harboring EGFR mutations and five tumor-adjacent lung tissues. We identified diverse cell types within the tumor microenvironment (TME) in which myeloid cells and T cells were the most abundant stromal cell types in tumors and adjacent lung tissues. Within tumors, accompanied by an increase in CD1C+ dendritic cells, the tumor-associated macrophages (TAMs) showed pro-tumoral functions without signature gene expression of defined M1 or M2 polarization. Tumor-infiltrating T cells mainly displayed exhausted and regulatory T-cell features. The adenocarcinoma cells can be categorized into different subtypes based on their gene expression signatures in distinct pathways such as hypoxia, glycolysis, cell metabolism, translation initiation, cell cycle, and antigen presentation. By performing pseudotime trajectory, we found that ELF3 was among the most upregulated genes in more advanced tumor cells. In response to secretion of inflammatory cytokines (e.g., IL1B) from immune infiltrates, ELF3 in tumor cells was upregulated to trigger the activation of PI3K/Akt/NF-κB pathway and elevated expression of proliferation and anti-apoptosis genes such as BCL2L1 and CCND1. Taken together, our study revealed substantial heterogeneity within early-stage LUAD harboring EGFR mutations, implicating complex interactions among tumor cells, stromal cells and immune infiltrates in the TME.

Download Full-text

Identification of silkworm hemocyte subsets and analysis of their response to BmNPV infection based on single-cell RNA sequencing

10.1101/2020.10.18.344127 ◽

2020 ◽

Author(s):

Min Feng ◽

Junming Xia ◽

Shigang Fei ◽

Xiong Wang ◽

Yaohong Zhou ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Current Knowledge ◽

Early Stage ◽

Expression Profiles ◽

Marker Genes ◽

Potential Marker ◽

Wide Range ◽

Single Cell Rna Sequencing

AbstractA wide range of hemocyte types exist in insects but a full definition of the different subclasses is not yet established. The current knowledge of the classification of silkworm hemocytes mainly comes from morphology rather than specific markers, so our understanding of the detailed classification, hemocyte lineage and functions of silkworm hemocytes is very incomplete. Bombyx mori nucleopolyhedrovirus (BmNPV) is a representative member of the baculoviruses, which are a major pathogens that specifically infects silkworms and cause serious loss in sericulture industry. Here, we performed single-cell RNA sequencing (scRNA-seq) of silkworm hemocytes in BmNPV and mock-infected larvae to comprehensively identify silkworm hemocyte subsets and determined specific molecular and cellular characteristics in each hemocyte subset before and after viral infection. A total of 19 cell clusters and their potential marker genes were identified in silkworm hemocytes. Among these hemocyte clusters, clusters 0, 1, 2, 5 and 9 might be granulocytes (GR); clusters 14 and 17 were predicted as plasmatocytes (PL); cluster 18 was tentatively identified as spherulocytes (SP); and clusters 7 and 11 could possibly correspond to oenocytoids (OE). In addition, all of the hemocyte clusters were infected by BmNPV and some infected cells carried high viral-load in silkworm larvae at 3 day post infection (dpi). Interestingly, BmNPV infection can cause severe and diverse changes in gene expression in hemocytes. Cells belonging to the infection group mainly located at the early stage of the pseudotime trajectories. Furthermore, we found that BmNPV infection suppresses the immune response in the major hemocyte types. In summary, our scRNA-seq analysis revealed the diversity of silkworm hemocytes and provided a rich resource of gene expression profiles for a systems-level understanding of their functions in the uninfected condition and as a response to BmNPV.

Download Full-text

Automated identification of Cell Types in Single Cell RNA Sequencing

10.1101/532093 ◽

2019 ◽

Cited By ~ 3

Author(s):

Feiyang Ma ◽

Matteo Pellegrini

Keyword(s):

Neural Network ◽

Single Cell ◽

Rna Sequencing ◽

Immune Cell ◽

Cell Types ◽

Marker Genes ◽

Complex Data ◽

Cell Type ◽

Human T Cell ◽

Single Cell Rna Sequencing

AbstractCell type identification is one of the major goals in single cell RNA sequencing (scRNA-seq). Current methods for assigning cell types typically involve the use of unsupervised clustering, the identification of signature genes in each cluster, followed by a manual lookup of these genes in the literature and databases to assign cell types. However, there are several limitations associated with these approaches, such as unwanted sources of variation that influence clustering and a lack of canonical markers for certain cell types. Here, we present ACTINN (Automated Cell Type Identification using Neural Networks), which employs a neural network with 3 hidden layers, trains on datasets with predefined cell types, and predicts cell types for other datasets based on the trained parameters. We trained the neural network on a mouse cell type atlas (Tabula Muris Atlas) and a human immune cell dataset, and used it to predict cell types for mouse leukocytes, human PBMCs and human T cell sub types. The results showed that our neural network is fast and accurate, and should therefore be a useful tool to complement existing scRNA-seq pipelines.Author SummarySingle cell RNA sequencing (scRNA-seq) provides high resolution profiling of the transcriptomes of individual cells, which inevitably results in high volumes of data that require complex data processing pipelines. Usually, one of the first steps in the analysis of scRNA-seq is to assign individual cells to known cell types. To accomplish this, traditional methods first group the cells into different clusters, then find marker genes, and finally use these to manually assign cell types for each cluster. Thus these methods require prior knowledge of cell type canonical markers, and some level of subjectivity to make the cell type assignments. As a result, the process is often laborious and requires domain specific expertise, which is a barrier for inexperienced users. By contrast, our neural network ACTINN automatically learns the features for each predefined cell type and uses these features to predict cell types for individual cells. This approach is computationally efficient and requires no domain expertise of the tissues being studied. We believe ACTINN allows users to rapidly identify cell types in their datasets, thus rendering the analysis of their scRNA-seq datasets more efficient.

Download Full-text

Defining the variety of cell types in developing and adult human kidneys by single-cell RNA sequencing

npj Regenerative Medicine ◽

10.1038/s41536-021-00156-w ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

A. Schumacher ◽

M. B. Rookmaaker ◽

J. A. Joles ◽

R. Kramann ◽

T. Q. Nguyen ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Renal Cell ◽

Early Stage ◽

Kidney Tissue ◽

Cell Types ◽

Intermediate Cell ◽

Adult Human ◽

Single Cell Rna Sequencing

AbstractThe kidney is among the most complex organs in terms of the variety of cell types. The cellular complexity of human kidneys is not fully unraveled and this challenge is further complicated by the existence of multiple progenitor pools and differentiation pathways. Researchers disagree on the variety of renal cell types due to a lack of research providing a comprehensive picture and the challenge to translate findings between species. To find an answer to the number of human renal cell types, we discuss research that used single-cell RNA sequencing on developing and adult human kidney tissue and compares these findings to the literature of the pre-single-cell RNA sequencing era. We find that these publications show major steps towards the discovery of novel cell types and intermediate cell stages as well as complex molecular signatures and lineage pathways throughout development. The variety of cell types remains variable in the single-cell literature, which is due to the limitations of the technique. Nevertheless, our analysis approaches an accumulated number of 41 identified cell populations of renal lineage and 32 of non-renal lineage in the adult kidney, and there is certainly much more to discover. There is still a need for a consensus on a variety of definitions and standards in single-cell RNA sequencing research, such as the definition of what is a cell type. Nevertheless, this early-stage research already proves to be of significant impact for both clinical and regenerative medicine, and shows potential to enhance the generation of sophisticated in vitro kidney tissue.

Download Full-text

Single-cell data clustering based on sparse optimization and low-rank matrix factorization

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab098 ◽

2021 ◽

Author(s):

Yinlei Hu ◽

Bin Li ◽

Falai Chen ◽

Kun Qu

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Matrix Factorization ◽

Data Clustering ◽

Cell Types ◽

Low Rank ◽

Sequencing Data ◽

Rank Matrix ◽

Single Cell Rna Sequencing ◽

Low Rank Matrix

Abstract Unsupervised clustering is a fundamental step of single-cell RNA sequencing data analysis. This issue has inspired several clustering methods to classify cells in single-cell RNA sequencing data. However, accurate prediction of the cell clusters remains a substantial challenge. In this study, we propose a new algorithm for single-cell RNA sequencing data clustering based on Sparse Optimization and low-rank matrix factorization (scSO). We applied our scSO algorithm to analyze multiple benchmark datasets and showed that the cluster number predicted by scSO was close to the number of reference cell types and that most cells were correctly classified. Our scSO algorithm is available at https://github.com/QuKunLab/scSO. Overall, this study demonstrates a potent cell clustering approach that can help researchers distinguish cell types in single-cell RNA sequencing data.

Download Full-text

Molecular characteristics and spatial distribution of adult human corneal cell subtypes

Scientific Reports ◽

10.1038/s41598-021-94933-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ann J. Ligocki ◽

Wen Fury ◽

Christian Gutierrez ◽

Christina Adler ◽

Tao Yang ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Cross Sections ◽

Cell Types ◽

Marker Genes ◽

Molecular Characteristics ◽

Transcriptional Level ◽

Human Cornea ◽

Adult Human ◽

And Migration

AbstractBulk RNA sequencing of a tissue captures the gene expression profile from all cell types combined. Single-cell RNA sequencing identifies discrete cell-signatures based on transcriptomic identities. Six adult human corneas were processed for single-cell RNAseq and 16 cell clusters were bioinformatically identified. Based on their transcriptomic signatures and RNAscope results using representative cluster marker genes on human cornea cross-sections, these clusters were confirmed to be stromal keratocytes, endothelium, several subtypes of corneal epithelium, conjunctival epithelium, and supportive cells in the limbal stem cell niche. The complexity of the epithelial cell layer was captured by eight distinct corneal clusters and three conjunctival clusters. These were further characterized by enriched biological pathways and molecular characteristics which revealed novel groupings related to development, function, and location within the epithelial layer. Moreover, epithelial subtypes were found to reflect their initial generation in the limbal region, differentiation, and migration through to mature epithelial cells. The single-cell map of the human cornea deepens the knowledge of the cellular subsets of the cornea on a whole genome transcriptional level. This information can be applied to better understand normal corneal biology, serve as a reference to understand corneal disease pathology, and provide potential insights into therapeutic approaches.

Download Full-text