scholarly journals Artificial Image Objects for Classification of Breast Cancer Biomarkers with Transcriptome Sequencing Data and Convolutional Neural Network Algorithms

Author(s):  
Xiangning Chen ◽  
Daniel G CHEN ◽  
Zhongming Zhao ◽  
Justin M Balko ◽  
Jingchun CHEN

Abstract Background: Transcriptome sequencing has been broadly available in clinical studies. However, it remains a challenge to utilize these data effectively to due to the high dimension of the data and the high correlation of gene expression. Methods: We propose a novel method that transforms RNA sequencing data into artificial image objects (AIOs) and apply convolutional neural network (CNN) algorithm to classify these AIOs. The AIO technique considers each gene as a pixel in digital image, standardizes and rescales gene expression levels into a range suitable for image display. Using the GSE81538 (n = 405) and GSE96058 (n = 3,373) datasets, we create AIOs for the subjects and design CNN models to classify biomarker Ki67 and Nottingham histologic grade (NHG). Results: With 5-fold cross validation, we accomplish a classification accuracy and AUC of 0.797 ± 0.034 and 0.820 ± 0.064 for Ki67 status. For NHG, the weighted average of categorical accuracy is 0.726 ± 0.018, and the weighted average of AUC is 0.848 ± 0.019. With GSE81538 as training data and GSE96058 as testing data, the accuracy and AUC for Ki67 are 0.772 ± 0.014 and 0.820 ± 0.006, and that for NHG are 0.682 ± 0.013 and 0.808 ± 0.003 respectively. These results are comparable to or better than the results reported in the original study. For both Ki67 and NHG, the calls from our models have similar predictive power for survival as the calls from trained pathologists in survival analyses. Comparing the calls from our models and the pathologists, we find that the discordant subjects for Ki67 are a group of patients for whom estrogen receptor, progesterone receptor, PAM50 and NHG could not predict their survival rate, and their responses to chemotherapy and endocrine therapy are also different from the concordant subjects. Conclusions: RNA sequencing data can be transformed into AIOs and be used to classify the status of Ki67 and NHG by CNN algorithm. The AIO method can handle high dimension data with highly correlated variables with no requirement for variable selection, leading to a data-driven, consistent and automation-ready approach to model RNA sequencing data.

2021 ◽  
Vol 23 (1) ◽  
Author(s):  
Xiangning Chen ◽  
Daniel G. Chen ◽  
Zhongming Zhao ◽  
Justin M. Balko ◽  
Jingchun Chen

Abstract Background Transcriptome sequencing has been broadly available in clinical studies. However, it remains a challenge to utilize these data effectively for clinical applications due to the high dimension of the data and the highly correlated expression between individual genes. Methods We proposed a method to transform RNA sequencing data into artificial image objects (AIOs) and applied convolutional neural network (CNN) algorithms to classify these AIOs. With the AIO technique, we considered each gene as a pixel in an image and its expression level as pixel intensity. Using the GSE96058 (n = 2976), GSE81538 (n = 405), and GSE163882 (n = 222) datasets, we created AIOs for the subjects and designed CNN models to classify biomarker Ki67 and Nottingham histologic grade (NHG). Results With fivefold cross-validation, we accomplished a classification accuracy and AUC of 0.821 ± 0.023 and 0.891 ± 0.021 for Ki67 status. For NHG, the weighted average of categorical accuracy was 0.820 ± 0.012, and the weighted average of AUC was 0.931 ± 0.006. With GSE96058 as training data and GSE81538 as testing data, the accuracy and AUC for Ki67 were 0.826 ± 0.037 and 0.883 ± 0.016, and that for NHG were 0.764 ± 0.052 and 0.882 ± 0.012, respectively. These results were 10% better than the results reported in the original studies. For Ki67, the calls generated from our models had a better power for prediction of survival as compared to the calls from trained pathologists in survival analyses. Conclusions We demonstrated that RNA sequencing data could be transformed into AIOs and be used to classify Ki67 status and NHG with CNN algorithms. The AIO method could handle high-dimensional data with highly correlated variables, and there was no need for variable selection. With the AIO technique, a data-driven, consistent, and automation-ready model could be developed to classify biomarkers with RNA sequencing data and provide more efficient care for cancer patients.


Patterns ◽  
2021 ◽  
pp. 100303
Author(s):  
Xiangning Chen ◽  
Daniel G. Chen ◽  
Zhongming Zhao ◽  
Justin Zhan ◽  
Changrong Ji ◽  
...  

Author(s):  
Anju Karki ◽  
Noah E Berlow ◽  
Jin-Ah Kim ◽  
Esther Hulleman ◽  
Qianqian Liu ◽  
...  

Abstract Background Diffuse intrinsic pontine glioma (DIPG) is a devastating pediatric cancer with unmet clinical need. DIPG is invasive in nature, where tumor cells interweave into the fiber nerve tracts of the pons making the tumor unresectable. Accordingly, novel approaches in combating the disease is of utmost importance and receptor-driven cell invasion in the context of DIPG is under-researched area. Here we investigated the impact on cell invasion mediated by PLEXINB1, PLEXINB2, platelet growth factor receptor (PDGFR)α, PDGFRβ, epithelial growth factor receptor (EGFR), activin receptor 1 (ACVR1), chemokine receptor 4 (CXCR4) and NOTCH1. Methods We used previously published RNA-sequencing data to measure gene expression of selected receptors in DIPG tumor tissue versus matched normal tissue controls (n=18). We assessed protein expression of the corresponding genes using DIPG cell culture models. Then, we performed cell viability and cell invasion assays of DIPG cells stimulated with chemoattractants/ligands. Results RNA-sequencing data showed increased gene expression of receptor genes such as PLEXINB2, PDGFRα, EGFR, ACVR1, CXCR4 and NOTCH1 in DIPG tumors compared to the control tissues. Representative DIPG cell lines demonstrated correspondingly increased protein expression levels of these genes. Cell viability assays showed minimal effects of growth factors/chemokines on tumor cell growth in most instances. Recombinant SEMA4C, SEM4D, PDGF-AA, PDGF-BB, ACVA, CXCL12 and DLL4 ligand stimulation altered invasion in DIPG cells. Conclusions We show that no single growth factor-ligand pair universally induces DIPG cell invasion. However, our results reveal a potential to create a composite of cytokines or anti-cytokines to modulate DIPG cell invasion.


2021 ◽  
Vol 36 (Supplement_1) ◽  
Author(s):  
Floranne Boulogne ◽  
Laura Claus ◽  
Henry Wiersma ◽  
Roy Oelen ◽  
Floor Schukking ◽  
...  

Abstract Background and Aims Genetic testing in patients with suspected hereditary kidney disease does not always reveal the genetic cause for the patient's disorder. Potentially pathogenic variants can reside in genes that are not known to be involved in kidney disease, which makes it difficult to prioritize and interpret the relevance of these variants. As such, there is a clear need for methods that predict the phenotypic consequences of gene expression in a way that is as unbiased as possible. To help identify candidate genes we have developed KidneyNetwork, in which tissue-specific expression is utilized to predict kidney-specific gene functions. Method We combined gene co-expression in 878 publicly available kidney RNA-sequencing samples with the co-expression of a multi-tissue RNA-sequencing dataset of 31,499 samples to build KidneyNetwork. The expression patterns were used to predict which genes have a kidney-related function, and which (disease) phenotypes might be caused when these genes are mutated. By integrating the information from the HPO database, in which known phenotypic consequences of disease genes are annotated, with the gene co-expression network we obtained prediction scores for each gene per HPO term. As proof of principle, we applied KidneyNetwork to prioritize variants in exome-sequencing data from 13 kidney disease patients without a genetic diagnosis. Results We assessed the prediction performance of KidneyNetwork by comparing it to GeneNetwork, a multi-tissue co-expression network we previously developed. In KidneyNetwork, we observe a significantly improved prediction accuracy of kidney-related HPO-terms, as well as an increase in the total number of significantly predicted kidney-related HPO-terms (figure 1). To examine its clinical utility, we applied KidneyNetwork to 13 patients with a suspected hereditary kidney disease without a genetic diagnosis. Based on the HPO terms “Renal cyst” and “Hepatic cysts”, combined with a list of potentially damaging variants in one of the undiagnosed patients with mild ADPKD/PCLD, we identified ALG6 as a new candidate gene. ALG6 bears a high resemblance to other genes implicated in this phenotype in recent years. Through the 100,000 Genomes Project and collaborators we identified three additional patients with kidney and/or liver cysts carrying a suspected deleterious variant in ALG6. Conclusion We present KidneyNetwork, a kidney specific co-expression network that accurately predicts what genes have kidney-specific functions and may result in kidney disease. Gene-phenotype associations of genes unknown for kidney-related phenotypes can be predicted by KidneyNetwork. We show the added value of KidneyNetwork by applying it to exome sequencing data of kidney disease patients without a molecular diagnosis and consequently we propose ALG6 as a promising candidate gene. KidneyNetwork can be applied to clinically unsolved kidney disease cases, but it can also be used by researchers to gain insight into individual genes to better understand kidney physiology and pathophysiology. Acknowledgments This research was made possible through access to the data and findings generated by the 100,000 Genomes Project; http://www.genomicsengland.co.uk.


2020 ◽  
Author(s):  
Benedict Hew ◽  
Qiao Wen Tan ◽  
William Goh ◽  
Jonathan Wei Xiong Ng ◽  
Kenny Koh ◽  
...  

AbstractBacterial resistance to antibiotics is a growing problem that is projected to cause more deaths than cancer in 2050. Consequently, novel antibiotics are urgently needed. Since more than half of the available antibiotics target the bacterial ribosomes, proteins that are involved in protein synthesis are thus prime targets for the development of novel antibiotics. However, experimental identification of these potential antibiotic target proteins can be labor-intensive and challenging, as these proteins are likely to be poorly characterized and specific to few bacteria. In order to identify these novel proteins, we established a Large-Scale Transcriptomic Analysis Pipeline in Crowd (LSTrAP-Crowd), where 285 individuals processed 26 terabytes of RNA-sequencing data of the 17 most notorious bacterial pathogens. In total, the crowd processed 26,269 RNA-seq experiments and used the data to construct gene co-expression networks, which were used to identify more than a hundred uncharacterized genes that were transcriptionally associated with protein synthesis. We provide the identity of these genes together with the processed gene expression data. The data can be used to identify other vulnerabilities or bacteria, while our approach demonstrates how the processing of gene expression data can be easily crowdsourced.


Circulation ◽  
2020 ◽  
Vol 142 (14) ◽  
pp. 1374-1388
Author(s):  
Yanming Li ◽  
Pingping Ren ◽  
Ashley Dawson ◽  
Hernan G. Vasquez ◽  
Waleed Ageedi ◽  
...  

Background: Ascending thoracic aortic aneurysm (ATAA) is caused by the progressive weakening and dilatation of the aortic wall and can lead to aortic dissection, rupture, and other life-threatening complications. To improve our understanding of ATAA pathogenesis, we aimed to comprehensively characterize the cellular composition of the ascending aortic wall and to identify molecular alterations in each cell population of human ATAA tissues. Methods: We performed single-cell RNA sequencing analysis of ascending aortic tissues from 11 study participants, including 8 patients with ATAA (4 women and 4 men) and 3 control subjects (2 women and 1 man). Cells extracted from aortic tissue were analyzed and categorized with single-cell RNA sequencing data to perform cluster identification. ATAA-related changes were then examined by comparing the proportions of each cell type and the gene expression profiles between ATAA and control tissues. We also examined which genes may be critical for ATAA by performing the integrative analysis of our single-cell RNA sequencing data with publicly available data from genome-wide association studies. Results: We identified 11 major cell types in human ascending aortic tissue; the high-resolution reclustering of these cells further divided them into 40 subtypes. Multiple subtypes were observed for smooth muscle cells, macrophages, and T lymphocytes, suggesting that these cells have multiple functional populations in the aortic wall. In general, ATAA tissues had fewer nonimmune cells and more immune cells, especially T lymphocytes, than control tissues did. Differential gene expression data suggested the presence of extensive mitochondrial dysfunction in ATAA tissues. In addition, integrative analysis of our single-cell RNA sequencing data with public genome-wide association study data and promoter capture Hi-C data suggested that the erythroblast transformation-specific related gene( ERG ) exerts an important role in maintaining normal aortic wall function. Conclusions: Our study provides a comprehensive evaluation of the cellular composition of the ascending aortic wall and reveals how the gene expression landscape is altered in human ATAA tissue. The information from this study makes important contributions to our understanding of ATAA formation and progression.


Sign in / Sign up

Export Citation Format

Share Document