Cancer classification of single-cell gene expression data by neural network

Author(s):  
Bong-Hyun Kim ◽  
Kijin Yu ◽  
Peter C W Lee

Abstract Motivation Cancer classification based on gene expression profiles has provided insight on the causes of cancer and cancer treatment. Recently, machine learning-based approaches have been attempted in downstream cancer analysis to address the large differences in gene expression values, as determined by single-cell RNA sequencing (scRNA-seq). Results We designed cancer classifiers that can identify 21 types of cancers and normal tissues based on bulk RNA-seq as well as scRNA-seq data. Training was performed with 7398 cancer samples and 640 normal samples from 21 tumors and normal tissues in TCGA based on the 300 most significant genes expressed in each cancer. Then, we compared neural network (NN), support vector machine (SVM), k-nearest neighbors (kNN) and random forest (RF) methods. The NN performed consistently better than other methods. We further applied our approach to scRNA-seq transformed by kNN smoothing and found that our model successfully classified cancer types and normal samples. Availability and implementation Cancer classification by neural network. Supplementary information Supplementary data are available at Bioinformatics online.

2021 ◽  
Vol 12 ◽  
Author(s):  
Dongfang Jia ◽  
Cheng Chen ◽  
Chen Chen ◽  
Fangfang Chen ◽  
Ningrui Zhang ◽  
...  

Mastering the molecular mechanism of breast cancer (BC) can provide an in-depth understanding of BC pathology. This study explored existing technologies for diagnosing BC, such as mammography, ultrasound, magnetic resonance imaging (MRI), computed tomography (CT), and positron emission tomography (PET) and summarized the disadvantages of the existing cancer diagnosis. The purpose of this article is to use gene expression profiles of The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) to classify BC samples and normal samples. The method proposed in this article triumphs over some of the shortcomings of traditional diagnostic methods and can conduct BC diagnosis more rapidly with high sensitivity and have no radiation. This study first selected the genes most relevant to cancer through weighted gene co-expression network analysis (WGCNA) and differential expression analysis (DEA). Then it used the protein–protein interaction (PPI) network to screen 23 hub genes. Finally, it used the support vector machine (SVM), decision tree (DT), Bayesian network (BN), artificial neural network (ANN), convolutional neural network CNN-LeNet and CNN-AlexNet to process the expression levels of 23 hub genes. For gene expression profiles, the ANN model has the best performance in the classification of cancer samples. The ten-time average accuracy is 97.36% (±0.34%), the F1 value is 0.8535 (±0.0260), the sensitivity is 98.32% (±0.32%), the specificity is 89.59% (±3.53%) and the AUC is 0.99. In summary, this method effectively classifies cancer samples and normal samples and provides reasonable new ideas for the early diagnosis of cancer in the future.


2020 ◽  
Vol 36 (10) ◽  
pp. 3273-3275
Author(s):  
Elaine Y Cao ◽  
John F Ouyang ◽  
Owen J L Rackham

Abstract Summary Emerging single-cell RNA-sequencing data technologies has made it possible to capture and assess the gene expression of individual cells. Based on the similarity of gene expression profiles, many tools have been developed to generate an in silico ordering of cells in the form of pseudo-time trajectories. However, these tools do not provide a means to find the ordering of critical gene expression changes over pseudo-time. We present GeneSwitches, a tool that takes any single-cell pseudo-time trajectory and determines the precise order of gene expression and functional-event changes over time. GeneSwitches uses a statistical framework based on logistic regression to identify the order in which genes are either switched on or off along pseudo-time. With this information, users can identify the order in which surface markers appear, investigate how functional ontologies are gained or lost over time and compare the ordering of switching genes from two related pseudo-temporal processes. Availability GeneSwitches is available at https://geneswitches.ddnetbio.com. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (22) ◽  
pp. 4688-4695 ◽  
Author(s):  
Rui Hou ◽  
Elena Denisenko ◽  
Alistair R R Forrest

Abstract Motivation Single-cell RNA sequencing (scRNA-seq) measures gene expression at the resolution of individual cells. Massively multiplexed single-cell profiling has enabled large-scale transcriptional analyses of thousands of cells in complex tissues. In most cases, the true identity of individual cells is unknown and needs to be inferred from the transcriptomic data. Existing methods typically cluster (group) cells based on similarities of their gene expression profiles and assign the same identity to all cells within each cluster using the averaged expression levels. However, scRNA-seq experiments typically produce low-coverage sequencing data for each cell, which hinders the clustering process. Results We introduce scMatch, which directly annotates single cells by identifying their closest match in large reference datasets. We used this strategy to annotate various single-cell datasets and evaluated the impacts of sequencing depth, similarity metric and reference datasets. We found that scMatch can rapidly and robustly annotate single cells with comparable accuracy to another recent cell annotation tool (SingleR), but that it is quicker and can handle larger reference datasets. We demonstrate how scMatch can handle large customized reference gene expression profiles that combine data from multiple sources, thus empowering researchers to identify cell populations in any complex tissue with the desired precision. Availability and implementation scMatch (Python code) and the FANTOM5 reference dataset are freely available to the research community here https://github.com/forrest-lab/scMatch. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Elaine Y. Cao ◽  
John F. Ouyang ◽  
Owen J.L. Rackham

AbstractSummaryEmerging single-cell RNA-seq technologies has made it possible to capture and assess the gene expression of individual cells. Based on the similarity of gene expression profiles, many tools have been developed to generate an in silico ordering of cells in the form of pseudo-time trajectories. However, these tools do not provide a means to find the ordering of critical gene expression changes over pseudo-time. We present GeneSwitches, a tool that takes any single-cell pseudo-time trajectory and determines the precise order of gene-expression and functional-event changes over time. GeneSwitches uses a statistical framework based on logistic regression to identify the order in which genes are either switched on or off along pseudo-time. With this information, users can identify the order in which surface markers appear, investigate how functional ontologies are gained or lost over time, and compare the ordering of switching genes from two related pseudo-temporal processes.AvailabilityGeneSwitches is available at https://geneswitches.ddnetbio.comContactowen.rackham@duke-nus.edu.sgSupplementary Informationis available at http://www.ddnetbio.com/files/GeneSwitches_SI.pdf


2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A4-A4
Author(s):  
Anushka Dikshit ◽  
Dan Zollinger ◽  
Karen Nguyen ◽  
Jill McKay-Fleisch ◽  
Kit Fuhrman ◽  
...  

BackgroundThe canonical WNT-β-catenin signaling pathway is vital for development and tissue homeostasis but becomes strongly tumorigenic when dysregulated. and alter the transcriptional signature of a cell to promote malignant transformation. However, thorough characterization of these transcriptomic signatures has been challenging because traditional methods lack either spatial information, multiplexing, or sensitivity/specificity. To overcome these challenges, we developed a novel workflow combining the single molecule and single cell visualization capabilities of the RNAscope in situ hybridization (ISH) assay with the highly multiplexed spatial profiling capabilities of the GeoMx™ Digital Spatial Profiler (DSP) RNA assays. Using these methods, we sought to spatially profile and compare gene expression signatures of tumor niches with high and low CTNNB1 expression.MethodsAfter screening 120 tumor cores from multiple tumors for CTNNB1 expression by the RNAscope assay, we identified melanoma as the tumor type with the highest CTNNB1 expression while prostate tumors had the lowest expression. Using the RNAscope Multiplex Fluorescence assay we selected regions of high CTNNB1 expression within 3 melanoma tumors as well as regions with low CTNNB1 expression within 3 prostate tumors. These selected regions of interest (ROIs) were then transcriptionally profiled using the GeoMx DSP RNA assay for a set of 78 genes relevant in immuno-oncology. Target genes that were differentially expressed were further visualized and spatially assessed using the RNAscope Multiplex Fluorescence assay to confirm GeoMx DSP data with single cell resolution.ResultsThe GeoMx DSP analysis comparing the melanoma and prostate tumors revealed that they had significantly different gene expression profiles and many of these genes showed concordance with CTNNB1 expression. Furthermore, immunoregulatory targets such as ICOSLG, CTLA4, PDCD1 and ARG1, also demonstrated significant correlation with CTNNB1 expression. On validating selected targets using the RNAscope assay, we could distinctly visualize that they were not only highly expressed in melanoma compared to the prostate tumor, but their expression levels changed proportionally to that of CTNNB1 within the same tumors suggesting that these differentially expressed genes may be regulated by the WNT-β-catenin pathway.ConclusionsIn summary, by combining the RNAscope ISH assay and the GeoMx DSP RNA assay into one joint workflow we transcriptionally profiled regions of high and low CTNNB1 expression within melanoma and prostate tumors and identified genes potentially regulated by the WNT- β-catenin pathway. This novel workflow can be fully automated and is well suited for interrogating the tumor and stroma and their interactions.GeoMx Assays are for RESEARCH ONLY, not for diagnostics.


2021 ◽  
Vol 9 (Suppl 1) ◽  
pp. A12.1-A12
Author(s):  
Y Arjmand Abbassi ◽  
N Fang ◽  
W Zhu ◽  
Y Zhou ◽  
Y Chen ◽  
...  

Recent advances of high-throughput single cell sequencing technologies have greatly improved our understanding of the complex biological systems. Heterogeneous samples such as tumor tissues commonly harbor cancer cell-specific genetic variants and gene expression profiles, both of which have been shown to be related to the mechanisms of disease development, progression, and responses to treatment. Furthermore, stromal and immune cells within tumor microenvironment interact with cancer cells to play important roles in tumor responses to systematic therapy such as immunotherapy or cell therapy. However, most current high-throughput single cell sequencing methods detect only gene expression levels or epigenetics events such as chromatin conformation. The information on important genetic variants including mutation or fusion is not captured. To better understand the mechanisms of tumor responses to systematic therapy, it is essential to decipher the connection between genotype and gene expression patterns of both tumor cells and cells in the tumor microenvironment. We developed FocuSCOPE, a high-throughput multi-omics sequencing solution that can detect both genetic variants and transcriptome from same single cells. FocuSCOPE has been used to successfully perform single cell analysis of both gene expression profiles and point mutations, fusion genes, or intracellular viral sequences from thousands of cells simultaneously, delivering comprehensive insights of tumor and immune cells in tumor microenvironment at single cell resolution.Disclosure InformationY. Arjmand Abbassi: None. N. Fang: None. W. Zhu: None. Y. Zhou: None. Y. Chen: None. U. Deutsch: None.


2021 ◽  
Vol 288 (1945) ◽  
pp. 20202793
Author(s):  
Alexander Yermanos ◽  
Daniel Neumeier ◽  
Ioana Sandu ◽  
Mariana Borsa ◽  
Ann Cathrin Waindok ◽  
...  

Neuroinflammation plays a crucial role during ageing and various neurological conditions, including Alzheimer's disease, multiple sclerosis and infection. Technical limitations, however, have prevented an integrative analysis of how lymphocyte immune receptor repertoires and their accompanying transcriptional states change with age in the central nervous system. Here, we leveraged single-cell sequencing to simultaneously profile B cell receptor and T cell receptor repertoires and accompanying gene expression profiles in young and old mouse brains. We observed the presence of clonally expanded B and T cells in the central nervous system of aged male mice. Furthermore, many of these B cells were of the IgM and IgD isotypes, and had low levels of somatic hypermutation. Integrating gene expression information additionally revealed distinct transcriptional profiles of these clonally expanded lymphocytes. Our findings implicate that clonally related T and B cells in the CNS of elderly mice may contribute to neuroinflammation accompanying homeostatic ageing.


Sign in / Sign up

Export Citation Format

Share Document