mClass: Cancer Type Classification with Somatic Point Mutation Data

Author(s):  
Md Abid Hasan ◽  
Stefano Lonardi
2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Kanggeun Lee ◽  
Hyoung-oh Jeong ◽  
Semin Lee ◽  
Won-Ki Jeong

AbstractWith recent advances in DNA sequencing technologies, fast acquisition of large-scale genomic data has become commonplace. For cancer studies, in particular, there is an increasing need for the classification of cancer type based on somatic alterations detected from sequencing analyses. However, the ever-increasing size and complexity of the data make the classification task extremely challenging. In this study, we evaluate the contributions of various input features, such as mutation profiles, mutation rates, mutation spectra and signatures, and somatic copy number alterations that can be derived from genomic data, and further utilize them for accurate cancer type classification. We introduce a novel ensemble of machine learning classifiers, called CPEM (Cancer Predictor using an Ensemble Model), which is tested on 7,002 samples representing over 31 different cancer types collected from The Cancer Genome Atlas (TCGA) database. We first systematically examined the impact of the input features. Features known to be associated with specific cancers had relatively high importance in our initial prediction model. We further investigated various machine learning classifiers and feature selection methods to derive the ensemble-based cancer type prediction model achieving up to 84% classification accuracy in the nested 10-fold cross-validation. Finally, we narrowed down the target cancers to the six most common types and achieved up to 94% accuracy.


2020 ◽  
Vol 24 (2) ◽  
Author(s):  
Erick E. Montelongo González ◽  
José A. Reyes Ortiz ◽  
Beatriz A. González Beltrán

PLoS ONE ◽  
2013 ◽  
Vol 8 (11) ◽  
pp. e74380 ◽  
Author(s):  
Karin S. Kassahn ◽  
Oliver Holmes ◽  
Katia Nones ◽  
Ann-Marie Patch ◽  
David K. Miller ◽  
...  

2020 ◽  
Author(s):  
Rodrigo Ramos ◽  
Jorge Cutigi ◽  
Cynthia Ferreira ◽  
Adriane Evangelista ◽  
Adenilso Simão

With the advancements of next-generation sequencing (NGS) technologies, a massive volume of genetic data has been generated. It makes possible the study of complex disease by computational approaches. In the context of cancer, there is a huge variety of mutation data in public databases. However, it is not feasible to use all available data in every analysis; thus, a data subset must be selected. This work is aiming to investigate and understand the mutational characteristics presented in different cancer mutation data sets of the same type of cancer. To achieve this goal, exploration and visualization of cancer mutation data were performed. Several analyses are presented for three common types of cancer: 1) Breast Invasive Carcinoma (BRCA); 2) Lung Adenocarcinoma (LUAD); and Prostate Adenocarcinoma (PRAD). For each cancer type, three distinct data sets were analyzed in order to understand if there are significant differences or similarities among them. The analyses show that BRCA and LUAD have evidence of similarity among their data sets, while PRAD is likely heterogeneous.


2021 ◽  
Author(s):  
Shanwen Chen ◽  
Yunfan Jin ◽  
Siqi Wang ◽  
Shaozhen Xing ◽  
Yingchao Wu ◽  
...  

Abstract Background: The utilities of cell free nucleic acids in monitoring cancer have been recognized by both scientists and clinicians. In addition to human transcripts, a fraction of cell free nucleic acids in human plasma were proved to derived from microbes, and reported to have some relevance to cancer. Methods: To get a better understanding of plasma cell free RNAs (cfRNAs) in cancer patients, we profiled cfRNAs in ~300 plasma samples of five cancer types (colorectal cancer, stomach cancer, liver cancer, lung cancer, esophageal cancer) and healthy donors with RNA-seq. Results: Microbe derived cfRNAs were consistently detected by different computational methods when potential contaminations were carefully filtered. Clinically relevant signals can be identified from human and microbial reads, and alteration in human cfRNA expression and virus abundance both suggests some cancer patients were immunosuppressed, as indicated by enriched KEGG pathways of downregulated human genes and higher prevalence torque teno virus. Our data supports the diagnostic value of human and microbe derived plasma cfRNAs for cancer detection, as an area under receiver operating characteristic (ROC) curve of 0.931 for distinguishing cancer patients from healthy donors was achieved on validation set, using both human and microbial features. Moreover, these cfRNAs both have some cancer type specificity, and could distinguish tumors of different primary locations. Compared to using human feature alone, combining human and microbial features improves the average validation accuracy of between cancer type classification by 11.5%. Conclusions: In summary, this work provides evidence for the clinical relevance of human and microbe derived plasma cfRNAs, and their potential utilities in cancer detection, and determination of tumor sites.


2009 ◽  
Vol 81 (20) ◽  
pp. 8596-8602 ◽  
Author(s):  
Vaya Tsiakalou ◽  
Margarita Petropoulou ◽  
Penelope C. Ioannou ◽  
Theodore K. Christopoulos ◽  
Emmanuel Kanavakis ◽  
...  

2016 ◽  
Vol 33 ◽  
pp. S10
Author(s):  
Alexandra Iliadi ◽  
Theodore Christopoulos ◽  
Penelope Ioannou

Sign in / Sign up

Export Citation Format

Share Document