Machine learning approach yields epigenetic biomarkers of food allergy: A novel 13-gene signature to diagnose clinical reactivity v2 (protocols.io.x7pfrmn)

Abstract Background Glioblastoma (GBM), the most common and aggressive primary brain tumour in adults, has been classified into three subtypes: classical, mesenchymal and proneural. While the original classification relied on an 840 gene-set, further clarification on true GBM subtypes uses a 150-gene signature to accurately classify GBM into the three subtypes. We hypothesized whether a machine learning approach could be used to identify a smaller gene-set to accurately predict GBM subtype. Methods Using a supervised machine learning approach, extreme gradient boosting (XGBoost), we developed a classifier to predict the three subtypes of glioblastoma (GBM): classical, mesenchymal and proneural. We tested the classifier on in-house GBM tissue, cell lines and xenograft samples to predict their subtype. Results We identified the five most important genes for characterizing the three subtypes based on genes that often exhibited high Importance Scores in our XGBoost analyses. On average, this approach achieved 80.12% accuracy in predicting these three subtypes of GBM. Furthermore, we applied our five-gene classifier to successfully predict the subtype of GBM samples at our centre. Conclusion Our 5-gene set classifier is the smallest classifier to date that can predict GBM subtypes with high accuracy, which could facilitate the future development of a five-gene subtype diagnostic biomarker for routine assays in GBM samples.

Download Full-text

Identification of a Transcriptomic Prognostic Signature by Machine Learning Using a Combination of Small Cohorts of Prostate Cancer

Frontiers in Genetics ◽

10.3389/fgene.2020.550894 ◽

2020 ◽

Vol 11 ◽

Author(s):

Benjamin Vittrant ◽

Mickael Leclercq ◽

Marie-Laure Martin-Magniette ◽

Colin Collins ◽

Alain Bergeron ◽

...

Keyword(s):

Prostate Cancer ◽

Machine Learning ◽

Specific Antigen ◽

Gene Signature ◽

Learning Approach ◽

Gleason Grade ◽

Rna Seq ◽

Number Of Patients ◽

Heterogeneous Datasets ◽

Machine Learning Approach

Determining which treatment to provide to men with prostate cancer (PCa) is a major challenge for clinicians. Currently, the clinical risk-stratification for PCa is based on clinico-pathological variables such as Gleason grade, stage and prostate specific antigen (PSA) levels. But transcriptomic data have the potential to enable the development of more precise approaches to predict evolution of the disease. However, high quality RNA sequencing (RNA-seq) datasets along with clinical data with long follow-up allowing discovery of biochemical recurrence (BCR) biomarkers are small and rare. In this study, we propose a machine learning approach that is robust to batch effect and enables the discovery of highly predictive signatures despite using small datasets. Gene expression data were extracted from three RNA-Seq datasets cumulating a total of 171 PCa patients. Data were re-analyzed using a unique pipeline to ensure uniformity. Using a machine learning approach, a total of 14 classifiers were tested with various parameters to identify the best model and gene signature to predict BCR. Using a random forest model, we have identified a signature composed of only three genes (JUN, HES4, PPDPF) predicting BCR with better accuracy [74.2%, balanced error rate (BER) = 27%] than the clinico-pathological variables (69.2%, BER = 32%) currently in use to predict PCa evolution. This score is in the range of the studies that predicted BCR in single-cohort with a higher number of patients. We showed that it is possible to merge and analyze different small and heterogeneous datasets altogether to obtain a better signature than if they were analyzed individually, thus reducing the need for very large cohorts. This study demonstrates the feasibility to regroup different small datasets in one larger to identify a predictive genomic signature that would benefit PCa patients.

Download Full-text