Finding Transcripts Associated with Prostate Cancer Gleason Stages Using Next Generation Sequencing and Machine Learning Techniques

Author(s):  
Osama Hamzeh ◽  
Abedalrhman Alkhateeb ◽  
Iman Rezaeian ◽  
Aram Karkar ◽  
Luis Rueda
2019 ◽  
Vol 18 ◽  
pp. 117693511983552 ◽  
Author(s):  
Abedalrhman Alkhateeb ◽  
Iman Rezaeian ◽  
Siva Singireddy ◽  
Dora Cavallo-Medved ◽  
Lisa A Porter ◽  
...  

Prostate cancer is one of the most common types of cancer among Canadian men. Next-generation sequencing using RNA-Seq provides large amounts of data that may reveal novel and informative biomarkers. We introduce a method that uses machine learning techniques to identify transcripts that correlate with prostate cancer development and progression. We have isolated transcripts that have the potential to serve as prognostic indicators and may have tremendous value in guiding treatment decisions. Analysis of normal versus malignant prostate cancer data sets indicates differential expression of the genes HEATR5B, DDC, and GABPB1-AS1 as potential prostate cancer biomarkers. Our study also supports PTGFR, NREP, SCARNA22, DOCK9, FLVCR2, IK2F3, USP13, and CLASP1 as potential biomarkers to predict prostate cancer progression, especially between stage II and subsequent stages of the disease.


2016 ◽  
Author(s):  
Heini M L Kallio ◽  
Matti Annala ◽  
Anniina Brofeldt ◽  
Reija Hieta ◽  
Kati Kivinummi ◽  
...  

Oncogene ◽  
2014 ◽  
Vol 34 (5) ◽  
pp. 568-577 ◽  
Author(s):  
I Teles Alves ◽  
T Hartjes ◽  
E McClellan ◽  
S Hiltemann ◽  
R Böttcher ◽  
...  

2018 ◽  
Vol 13 (4) ◽  
pp. 495-500 ◽  
Author(s):  
Pedro C. Barata ◽  
Prateek Mendiratta ◽  
Brandie Heald ◽  
Stefan Klek ◽  
Petros Grivas ◽  
...  

2019 ◽  
Vol 66 (1) ◽  
pp. 239-246 ◽  
Author(s):  
Chao Wu ◽  
Xiaonan Zhao ◽  
Mark Welsh ◽  
Kellianne Costello ◽  
Kajia Cao ◽  
...  

Abstract BACKGROUND Molecular profiling has become essential for tumor risk stratification and treatment selection. However, cancer genome complexity and technical artifacts make identification of real variants a challenge. Currently, clinical laboratories rely on manual screening, which is costly, subjective, and not scalable. We present a machine learning–based method to distinguish artifacts from bona fide single-nucleotide variants (SNVs) detected by next-generation sequencing from nonformalin-fixed paraffin-embedded tumor specimens. METHODS A cohort of 11278 SNVs identified through clinical sequencing of tumor specimens was collected and divided into training, validation, and test sets. Each SNV was manually inspected and labeled as either real or artifact as part of clinical laboratory workflow. A 3-class (real, artifact, and uncertain) model was developed on the training set, fine-tuned with the validation set, and then evaluated on the test set. Prediction intervals reflecting the certainty of the classifications were derived during the process to label “uncertain” variants. RESULTS The optimized classifier demonstrated 100% specificity and 97% sensitivity over 5587 SNVs of the test set. Overall, 1252 of 1341 true-positive variants were identified as real, 4143 of 4246 false-positive calls were deemed artifacts, whereas only 192 (3.4%) SNVs were labeled as “uncertain,” with zero misclassification between the true positives and artifacts in the test set. CONCLUSIONS We presented a computational classifier to identify variant artifacts detected from tumor sequencing. Overall, 96.6% of the SNVs received definitive labels and thus were exempt from manual review. This framework could improve quality and efficiency of the variant review process in clinical laboratories.


Sign in / Sign up

Export Citation Format

Share Document