scholarly journals CNAPE: A Machine Learning Method for Copy Number Alteration Prediction from Gene Expression

2019 ◽  
Author(s):  
Quanhua Mu ◽  
Jiguang Wang

AbstractCopy number alteration (CNA), the abnormal number of copies of genomic regions, plays a key role in cancer initiation and progression. Current high-throughput CNA detection methods, including DNA arrays and genomic sequencing, are relatively expensive and require DNA samples at a microgram level, which are not achievable in certain occasions such as clinical biopsies or single-cell genomes. Here we proposed an alternative method—CNAPE to computationally infer CNA using gene expression data. A prior knowledge-aided machine learning model was proposed, trained and tested on the transcriptomic profiles with matched CNA data of 9,740 cancers from The Cancer Genome Atlas. Using brain tumors as a proof-of-concept study, CNAPE achieved over 90% accuracy in the prediction of arm-level CNAs. Prediction performance for 12 gene-level CNAs (commonly altered genes in glioma) was also evaluated, and CNAPE achieved reasonable accuracy. CNAPE is developed as an easy-to-use tool at http://wang-lab.ust.hk/software/Software.html.

2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Gaojianyong Wang ◽  
Dimitris Anastassiou

Abstract Analysis of large gene expression datasets from biopsies of cancer patients can identify co-expression signatures representing particular biomolecular events in cancer. Some of these signatures involve genomically co-localized genes resulting from the presence of copy number alterations (CNAs), for which analysis of the expression of the underlying genes provides valuable information about their combined role as oncogenes or tumor suppressor genes. Here we focus on the discovery and interpretation of such signatures that are present in multiple cancer types due to driver amplifications and deletions in particular regions of the genome after doing a comprehensive analysis combining both gene expression and CNA data from The Cancer Genome Atlas.


2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Nicholas Nuechterlein ◽  
Linda G. Shapiro ◽  
Eric C. Holland ◽  
Patrick J. Cimino

AbstractKnowledge of 1p/19q-codeletion and IDH1/2 mutational status is necessary to interpret any investigational study of diffuse gliomas in the modern era. While DNA sequencing is the gold standard for determining IDH mutational status, genome-wide methylation arrays and gene expression profiling have been used for surrogate mutational determination. Previous studies by our group suggest that 1p/19q-codeletion and IDH mutational status can be predicted by genome-wide somatic copy number alteration (SCNA) data alone, however a rigorous model to accomplish this task has yet to be established. In this study, we used SCNA data from 786 adult diffuse gliomas in The Cancer Genome Atlas (TCGA) to develop a two-stage classification system that identifies 1p/19q-codeleted oligodendrogliomas and predicts the IDH mutational status of astrocytic tumors using a machine-learning model. Cross-validated results on TCGA SCNA data showed near perfect classification results. Furthermore, our astrocytic IDH mutation model validated well on four additional datasets (AUC = 0.97, AUC = 0.99, AUC = 0.95, AUC = 0.96) as did our 1p/19q-codeleted oligodendroglioma screen on the two datasets that contained oligodendrogliomas (MCC = 0.97, MCC = 0.97). We then retrained our system using data from these validation sets and applied our system to a cohort of REMBRANDT study subjects for whom SCNA data, but not IDH mutational status, is available. Overall, using genome-wide SCNAs, we successfully developed a system to robustly predict 1p/19q-codeletion and IDH mutational status in diffuse gliomas. This system can assign molecular subtype labels to tumor samples of retrospective diffuse glioma cohorts that lack 1p/19q-codeletion and IDH mutational status, such as the REMBRANDT study, recasting these datasets as validation cohorts for diffuse glioma research.


2015 ◽  
Vol 14 ◽  
pp. CIN.S30565 ◽  
Author(s):  
Pichai Raman ◽  
Timothy Purwin ◽  
Richard Pestell ◽  
Aydin Tozeren

Ovarian cancer (OC) is a leading cause of cancer mortality, but aside from a few well-studied mutations, very little is known about its underlying causes. As such, we performed survival analysis on ovarian copy number amplifications and gene expression datasets presented by The Cancer Genome Atlas in order to identify potential drivers and markers of aggressive OC. Additionally, two independent datasets from the Gene Expression Omnibus web platform were used to validate the identified markers. Based on our analysis, we identified FXYD5, a glycoprotein known to reduce cell adhesion, as a potential driver of metastasis and a significant predictor of mortality in OC. As a marker of poor outcome, the protein has effective antibodies against it for use in tissue arrays. FXYD5 bridges together a wide variety of cancers, including ovarian, breast cancer stage II, thyroid, colorectal, pancreatic, and head and neck cancers for metastasis studies.


PeerJ ◽  
2019 ◽  
Vol 8 ◽  
pp. e8347 ◽  
Author(s):  
Hui Zhong ◽  
Huiyu Chen ◽  
Huahong Qiu ◽  
Chen Huang ◽  
Zhihui Wu

Background Endometrial carcinoma (EC) and serous ovarian carcinoma (OvCa) are both among the common cancer types in women. EC can be divided into two subtypes, endometroid EC and serous-like EC, with distinct histological characterizations and molecular phenotypes. There is an increasing awareness that serous-like EC resembles serous OvCa in genetic landscape, but a clear relationship between them is still lacking. Methods Here, we took advantage of the large-scale molecular profiling of The Cancer Genome Atlas(TCGA) to compare the two EC subtypes and serous OvCa. We used bioinformatics data analytic methods to systematically examine the somatic mutation (SM) and copy number alteration (SCNA), gene expression, pathway activities, survival gene signatures and immune infiltration. Based on these quantifiable molecular characterizations, we asked whether serous-like EC should be grouped more closely to serous OvCa, based on the context of being serous-like; or if should be grouped more closely to endometroid EC, based on the same organ origin. Results We found that although serous-like EC and serous OvCa share some common genotypes, including mutation and copy number alteration, they differ in molecular phenotypes such as gene expression and signaling pathway activity. Moreover, no shared prognostic gene signature was found, indicating that they use unique genes governing tumor progression. Finally, although the endometrioid EC and serous OvCa are both highly immune infiltrated, the immune cell composition in serous OvCa is mostly immune suppressive, whereas endometrioid EC has a higher level of cytotoxic immune cells. Overall, our genetic aberration and molecular phenotype characterizations indicated that serous-like EC and serous OvCa cannot be simply treated as a simple “serous” cancer type. In particular, additional attention should be paid to their unique gene activities and tumor microenvironments for novel targeted therapy development.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ewe Seng Ch’ng

AbstractDistinguishing bladder urothelial carcinomas from prostate adenocarcinomas for poorly differentiated carcinomas derived from the bladder neck entails the use of a panel of lineage markers to help make this distinction. Publicly available The Cancer Genome Atlas (TCGA) gene expression data provides an avenue to examine utilities of these markers. This study aimed to verify expressions of urothelial and prostate lineage markers in the respective carcinomas and to seek the relative importance of these markers in making this distinction. Gene expressions of these markers were downloaded from TCGA Pan-Cancer database for bladder and prostate carcinomas. Differential gene expressions of these markers were analyzed. Standard linear discriminant analyses were applied to establish the relative importance of these markers in lineage determination and to construct the model best in making the distinction. This study shows that all urothelial lineage genes except for the gene for uroplakin III were significantly expressed in bladder urothelial carcinomas (p < 0.001). In descending order of importance to distinguish from prostate adenocarcinomas, genes for uroplakin II, S100P, GATA3 and thrombomodulin had high discriminant loadings (> 0.3). All prostate lineage genes were significantly expressed in prostate adenocarcinomas(p < 0.001). In descending order of importance to distinguish from bladder urothelial carcinomas, genes for NKX3.1, prostate specific antigen (PSA), prostate-specific acid phosphatase, prostein, and prostate-specific membrane antigen had high discriminant loadings (> 0.3). Combination of gene expressions for uroplakin II, S100P, NKX3.1 and PSA approached 100% accuracy in tumor classification both in the training and validation sets. Mining gene expression data, a combination of four lineage markers helps distinguish between bladder urothelial carcinomas and prostate adenocarcinomas.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Tiago Azevedo ◽  
Giovanna Maria Dimitri ◽  
Pietro Lió ◽  
Eric R. Gamazon

AbstractHere, we performed a comprehensive intra-tissue and inter-tissue multilayer network analysis of the human transcriptome. We generated an atlas of communities in gene co-expression networks in 49 tissues (GTEx v8), evaluated their tissue specificity, and investigated their methodological implications. UMAP embeddings of gene expression from the communities (representing nearly 18% of all genes) robustly identified biologically-meaningful clusters. Notably, new gene expression data can be embedded into our algorithmically derived models to accelerate discoveries in high-dimensional molecular datasets and downstream diagnostic or prognostic applications. We demonstrate the generalisability of our approach through systematic testing in external genomic and transcriptomic datasets. Methodologically, prioritisation of the communities in a transcriptome-wide association study of the biomarker C-reactive protein (CRP) in 361,194 individuals in the UK Biobank identified genetically-determined expression changes associated with CRP and led to considerably improved performance. Furthermore, a deep learning framework applied to the communities in nearly 11,000 tumors profiled by The Cancer Genome Atlas across 33 different cancer types learned biologically-meaningful latent spaces, representing metastasis (p < 2.2 × 10−16) and stemness (p < 2.2 × 10−16). Our study provides a rich genomic resource to catalyse research into inter-tissue regulatory mechanisms, and their downstream consequences on human disease.


2021 ◽  
Vol 11 (13) ◽  
pp. 6006
Author(s):  
Huy Le ◽  
Minh Nguyen ◽  
Wei Qi Yan ◽  
Hoa Nguyen

Augmented reality is one of the fastest growing fields, receiving increased funding for the last few years as people realise the potential benefits of rendering virtual information in the real world. Most of today’s augmented reality marker-based applications use local feature detection and tracking techniques. The disadvantage of applying these techniques is that the markers must be modified to match the unique classified algorithms or they suffer from low detection accuracy. Machine learning is an ideal solution to overcome the current drawbacks of image processing in augmented reality applications. However, traditional data annotation requires extensive time and labour, as it is usually done manually. This study incorporates machine learning to detect and track augmented reality marker targets in an application using deep neural networks. We firstly implement the auto-generated dataset tool, which is used for the machine learning dataset preparation. The final iOS prototype application incorporates object detection, object tracking and augmented reality. The machine learning model is trained to recognise the differences between targets using one of YOLO’s most well-known object detection methods. The final product makes use of a valuable toolkit for developing augmented reality applications called ARKit.


2021 ◽  
Vol 11 ◽  
Author(s):  
Yi Zhang ◽  
Lei Xia ◽  
Dawei Ma ◽  
Jing Wu ◽  
Xinyu Xu ◽  
...  

Cancer of unknown primary (CUP), in which metastatic diseases exist without an identifiable primary location, accounts for about 3–5% of all cancer diagnoses. Successful diagnosis and treatment of such patients are difficult. This study aimed to assess the expression characteristics of 90 genes as a method of identifying the primary site from CUP samples. We validated a 90-gene expression assay and explored its potential diagnostic utility in 44 patients at Jiangsu Cancer Hospital. For each specimen, the expression of 90 tumor-specific genes in malignant tumors was analyzed, and similarity scores were obtained. The types of malignant tumors predicted were compared with the reference diagnosis to calculate the accuracy. In addition, we verified the consistency of the expression profiles of the 90 genes in CUP secondary malignancies and metastatic malignancies in The Cancer Genome Atlas. We also reported a detailed description of the next-generation coding sequences for CUP patients. For each clinical medical specimen collected, the type of malignant tumor predicted and analyzed by the 90-gene expression assay was compared with its reference diagnosis, and the overall accuracy was 95.4%. In addition, the 90-gene expression profile generally accurately classified CUP into the cluster of its primary tumor. Sequencing of the exome transcriptome containing 556 high-frequency gene mutation oncogenes was not significantly related to the 90 genes analysis. Our results demonstrate that the expression characteristics of these 90 genes can be used as a powerful tool to accurately identify the primary sites of CUP. In the future, the inclusion of the 90-gene expression assay in pathological diagnosis will help oncologists use precise treatments, thereby improving the care and outcomes of CUP patients.


Sign in / Sign up

Export Citation Format

Share Document