Artificial Neural Networks for classification of single cell gene expression

Background: Single-cell transcriptome (SCT) sequencing technology has reached the level of high-throughput technology where gene expression can be measured concurrently from large numbers of cells. The results of gene expression studies are highly reproducible when strict protocols and standard operating procedures (SOP) are followed. However, differences in sample processing conditions result in significant changes in gene expression profiles making direct comparison of different studies difficult. Unsupervised machine learning (ML) uses clustering algorithms combined with semi-automated cell labeling and manual annotation of individual cells. They do not scale up well and a workflow used on a specific dataset will not perform well with other studies. Supervised ML classification shows superior classification accuracy and generalization properties as compared to unsupervised ML methods. We describe a supervised ML method that deploys artificial neural networks (ANN), for 5-class classification of healthy peripheral blood mononuclear cells (PBMC) from multiple diverse studies. Results: We used 58 data sets to train ANN incrementally - over ten cycles of training and testing. The sample processing involved four protocols: separation of PBMC, separation of PBMC + enrichment (by negative selection), separation of PBMC + FACS, and separation of PBMC + MACS. The training data set included between 85 and 110 thousand cells, and the test set had approximately 13 thousand cells. Training and testing were done with various combinations of data sets from four principal data sources. The overall accuracy of classification on independent data sets reached 5-class classification accuracy of 94%. Classification accuracy for B cells, monocytes, and T cells exceeded 95%. Classification accuracy of natural killer (NK) cells was 75% because of the similarity between NK cells and T cell subsets. The accuracy of dendritic cells (DC) was low due to very low numbers of DC in the training sets. Conclusions: The incremental learning ANN model can accurately classify the main types of PBMC. With the inclusion of more DC and resolving ambiguities between T cell and NK cell gene expression profiles, we will enable high accuracy supervised ML classification of PBMC. We assembled a reference data set for healthy PBMC and demonstrated a proof-of-concept for supervised ANN method in classification of previously unseen SCT data. The classification shows high accuracy, that is consistent across different studies and sample processing methods.

Download Full-text

The t(14;18) defines a unique subset of diffuse large B-cell lymphoma with a germinal center B-cell gene expression profile

Blood ◽

10.1182/blood.v99.7.2285 ◽

2002 ◽

Vol 99 (7) ◽

pp. 2285-2290 ◽

Cited By ~ 202

Author(s):

James Z. Huang ◽

Warren G. Sanger ◽

Timothy C. Greiner ◽

Louis M. Staudt ◽

Dennis D. Weisenburger ◽

...

Keyword(s):

Gene Expression ◽

B Cell ◽

Gene Expression Profile ◽

Expression Profile ◽

Germinal Center ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Germinal Center B Cell ◽

Cell Gene Expression ◽

Cell Gene

Recently we have identified subgroups of de novo primary diffuse large B-cell lymphoma (DLBCL) based on complementary DNA microarray-generated gene expression profiles. To correlate the gene expression profiles with cytogenetic abnormalities in these DLBCLs, we examined the occurrence of the t(14;18)(q32;q21) in the 2 distinctive subgroups of DLBCL: one with the germinal center B-cell gene expression signature and the other with the activated B cell–like gene expression signature. The t(14;18) was detected in 7 of 35 cases (20%). All 7 t(14;18)-positive cases had a germinal center B-cell gene expression profile, representing 35% of the cases in this subgroup, and 6 of these 7 cases had very similar gene expression profiles. The expression of bcl-2 and bcl-6 proteins was not significantly different between the t(14;18)-positive and -negative cases, whereas CD10 was detected only in the group with the germinal center B-cell expression profile, and CD10 was most frequently expressed in the t(14;18)-positive cases. This study supports the validity of subdividing DLBCL into 2 major subgroups by gene expression profiling, with the t(14;18) being an important event in the pathogenesis of a subset of DLBCL arising from germinal center B cells. CD10 protein expression is useful in identifying cases of DLBCL with a germinal center B-cell gene expression profile and is often expressed in cases with the t(14;18).

Download Full-text

The single-cell transcriptional landscape of lung carcinoid tumors

10.1101/2021.12.07.471416 ◽

2021 ◽

Author(s):

Philip Bischoff ◽

Alexandra Trinks ◽

Jennifer Wiederspahn ◽

Benedikt Obermayer ◽

Jan Patrick Pett ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Carcinoid Tumor ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Carcinoid Tumors ◽

Cellular Composition ◽

Cell Gene Expression ◽

Cell Gene ◽

Lung Carcinoid

AbstractLung carcinoid tumors, also referred to as pulmonary neuroendocrine tumors or lung carcinoids, are rare neoplasms of the lung with a more favorable prognosis than other subtypes of lung cancer. Still, some patients suffer from relapsed disease and metastatic spread while no consensus treatment exists for metastasized carcinoids. Several recent single-cell studies have provided detailed insights into the cellular heterogeneity of more common lung cancers, such as adeno- and squamous cell carcinoma. However, the characteristics of lung carcinoids on the single-cell level are yet completely unknown.To study the cellular composition and single-cell gene expression profiles in lung carcinoids, we applied single-cell RNA sequencing to three lung carcinoid tumor samples and normal lung tissue. The single-cell transcriptomes of carcinoid tumor cells reflected intertumoral heterogeneity associated with clinicopathological features, such as tumor necrosis and proliferation index. The immune microenvironment was specifically enriched in noninflammatory monocyte-derived myeloid cells. Tumor-associated endothelial cells were characterized by distinct gene expression profiles. A spectrum of vascular smooth muscle cells and pericytes predominated the stromal microenvironment. We found a small proportion of myofibroblasts exhibiting features reminiscent of cancer-associated fibroblasts. Stromal and immune cells exhibited potential paracrine interactions which may shape the microenvironment via NOTCH, VEGF, TGFβ and JAK/STAT signaling. Moreover, single-cell gene signatures of pericytes and myofibroblasts demonstrated prognostic value in bulk gene expression data.Here, we provide first comprehensive insights into the cellular composition and single-cell gene expression profiles in lung carcinoids, demonstrating the non-inflammatory and vessel-rich nature of their tumor microenvironment, and outlining relevant intercellular interactions which could serve as future therapeutic targets.

Download Full-text

The effects of a globin blocker on the resolution of 3’mRNA sequencing data in porcine blood

BMC Genomics ◽

10.1186/s12864-019-6122-2 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

Kyu-Sang Lim ◽

Qian Dong ◽

Pamela Moll ◽

Jana Vitkovska ◽

Gregor Wiktorin ◽

...

Keyword(s):

Gene Expression ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Data Sets ◽

Globin Genes ◽

Sequencing Data ◽

Globin Mrna ◽

Data Set ◽

Mrna Sequencing ◽

Porcine Blood

Abstract Background Gene expression profiling in blood is a potential source of biomarkers to evaluate or predict phenotypic differences between pigs but is expensive and inefficient because of the high abundance of globin mRNA in porcine blood. These limitations can be overcome by the use of QuantSeq 3’mRNA sequencing (QuantSeq) combined with a method to deplete or block the processing of globin mRNA prior to or during library construction. Here, we validated the effectiveness of QuantSeq using a novel specific globin blocker (GB) that is included in the library preparation step of QuantSeq. Results In data set 1, four concentrations of the GB were applied to RNA samples from two pigs. The GB significantly reduced the proportion of globin reads compared to non-GB (NGB) samples (P = 0.005) and increased the number of detectable non-globin genes. The highest evaluated concentration (C1) of the GB resulted in the largest reduction of globin reads compared to the NGB (from 56.4 to 10.1%). The second highest concentration C2, which showed very similar globin depletion rates (12%) as C1 but a better correlation of the expression of non-globin genes between NGB and GB (r = 0.98), allowed the expression of an additional 1295 non-globin genes to be detected, although 40 genes that were detected in the NGB sample (at a low level) were not present in the GB library. Concentration C2 was applied in the rest of the study. In data set 2, the distribution of the percentage of globin reads for NGB (n = 184) and GB (n = 189) samples clearly showed the effects of the GB on reducing globin reads, in particular for HBB, similar to results from data set 1. Data set 3 (n = 84) revealed that the proportion of globin reads that remained in GB samples was significantly and positively correlated with the reticulocyte count in the original blood sample (P < 0.001). Conclusions The effect of the GB on reducing the proportion of globin reads in porcine blood QuantSeq was demonstrated in three data sets. In addition to increasing the efficiency of sequencing non-globin mRNA, the GB for QuantSeq has an advantage that it does not require an additional step prior to or during library creation. Therefore, the GB is a useful tool in the quantification of whole gene expression profiles in porcine blood.

Download Full-text

Effect of GnRHa ovulation trigger dose on follicular fluid characteristics and granulosa cell gene expression profiles

Journal of Assisted Reproduction and Genetics ◽

10.1007/s10815-017-0891-9 ◽

2017 ◽

Vol 34 (4) ◽

pp. 471-478 ◽

Cited By ~ 1

Author(s):

Thi Ngoc Lan Vuong ◽

M. T. Ho ◽

T. Q. Ha ◽

M. Brehm Jensen ◽

C. Yding Andersen ◽

...

Keyword(s):

Gene Expression ◽

Granulosa Cell ◽

Follicular Fluid ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Cell Gene Expression ◽

Cell Gene ◽

Fluid Characteristics

Download Full-text

Comparative analyses of Purkinje cell gene expression profiles reveal shared molecular abnormalities in models of different polyglutamine diseases

Brain Research ◽

10.1016/j.brainres.2012.08.005 ◽

2012 ◽

Vol 1481 ◽

pp. 37-48 ◽

Cited By ~ 13

Author(s):

Bernd Friedrich ◽

Philipp Euler ◽

Ruhtraut Ziegler ◽

Alexandre Kuhn ◽

Bernhard G. Landwehrmeyer ◽

...

Keyword(s):

Gene Expression ◽

Purkinje Cell ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Polyglutamine Diseases ◽

Comparative Analyses ◽

Molecular Abnormalities ◽

Cell Gene Expression ◽

Cell Gene

Download Full-text

Correlation of Murine Embryonic Stem Cell Gene Expression Profiles with Functional Measures of Pluripotency

Stem Cells ◽

10.1634/stemcells.2004-0157 ◽

2005 ◽

Vol 23 (5) ◽

pp. 663-680 ◽

Cited By ~ 106

Author(s):

Lars Palmqvist ◽

Clive H. Glover ◽

Lien Hsu ◽

Min Lu ◽

Bolette Bossen ◽

...

Keyword(s):

Gene Expression ◽

Stem Cell ◽

Embryonic Stem Cell ◽

Expression Profiles ◽

Embryonic Stem ◽

Gene Expression Profiles ◽

Murine Embryonic Stem Cell ◽

Cell Gene Expression ◽

Cell Gene ◽

Stem Cell Gene

Download Full-text

Single-nucleus RNA-seq identifies Huntington disease astrocyte states

10.1101/799973 ◽

2019 ◽

Author(s):

Osama Al-Dalahmah ◽

Alexander A Sosunov ◽

A Shaik ◽

Kenneth Ofori ◽

Yang Liu ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Huntington Disease ◽

Expression Profiles ◽

Lipid Synthesis ◽

Gene Expression Profiles ◽

Cag Repeats ◽

Cell Gene Expression ◽

Single Nucleus ◽

Cell Gene

AbstractHuntington Disease (HD) is an inherited movement disorder caused by expanded CAG repeats in the Huntingtin gene. We have used single nucleus RNASeq (snRNASeq) to uncover cellular phenotypes that change in the disease, investigating single cell gene expression in cingulate cortex of patients with HD and comparing the gene expression to that of patients with no neurological disease. In this study, we focused on astrocytes, although we found significant gene expression differences in neurons, oligodendrocytes, and microglia as well. In particular, the gene expression profiles of astrocytes in HD showed multiple signatures, varying in phenotype from cells that had markedly upregulated metallothionein and heat shock genes, but had not completely lost the expression of genes associated with normal protoplasmic astrocytes, to astrocytes that had substantially upregulated GFAP and had lost expression of many normal protoplasmic astrocyte genes as well as metallothionein genes. When compared to astrocytes in control samples, astrocyte signatures in HD also showed downregulated expression of a number of genes, including several associated with protoplasmic astrocyte function and lipid synthesis. Thus, HD astrocytes appeared in variable transcriptional phenotypes, and could be divided into several different “states”, defined by patterns of gene expression. Ultimately, this study begins to fill the knowledge gap of single cell gene expression in HD and provide a more detailed understanding of the variation in changes in gene expression during astrocyte “reactions” to the disease.

Download Full-text