scholarly journals Development of an absolute assignment predictor for triple-negative breast cancer subtyping using machine learning approaches

2021 ◽  
Vol 129 ◽  
pp. 104171
Author(s):  
Fadoua Ben Azzouz ◽  
Bertrand Michel ◽  
Hamza Lasla ◽  
Wilfried Gouraud ◽  
Anne-Flore François ◽  
...  
2020 ◽  
Author(s):  
Fadoua Ben Azzouz ◽  
Bertrand Michel ◽  
Hamza Lasla ◽  
Wilfried Gouraud ◽  
Anne-Flore François ◽  
...  

AbstractTriple-negative breast cancer (TNBC) heterogeneity represents one of the main impediment to precision medicine for this disease. Recent concordant transcriptomics studies have shown that TNBC could be splitted into at least three subtypes with potential therapeutic implications. Although, a few studies have been done to predict TNBC subtype by means of transcriptomics data, subtyping was partially sensitive and limited by batch effect and dependence to a given dataset, which may penalize the switch to routine diagnostic testing. Therefore, we sought to build an absolute predictor (i.e. intra-patient diagnosis) based on machine learning algorithm with a limited number of probes. To this end, we started by introducing probe binary comparison for each patient (indicators). We based predictive analysis on this transformed data. Probe selection was first performed by combining both filter and wrapper methods for variable selection using cross validation. We thus tested three prediction models (random forest, gradient boosting [GB] and extreme gradient boosting) using this optimal subset of indicators as inputs. Nested cross-validation allowed us to consistently choose the best model. Results showed that the 50 selected indicators highlighted biological characteristics associated with each TNBC subtype. The GB based on this subset of indicators has better performances as compared to the other models.


2021 ◽  
Vol 11 (2) ◽  
pp. 61
Author(s):  
Jiande Wu ◽  
Chindo Hicks

Background: Breast cancer is a heterogeneous disease defined by molecular types and subtypes. Advances in genomic research have enabled use of precision medicine in clinical management of breast cancer. A critical unmet medical need is distinguishing triple negative breast cancer, the most aggressive and lethal form of breast cancer, from non-triple negative breast cancer. Here we propose use of a machine learning (ML) approach for classification of triple negative breast cancer and non-triple negative breast cancer patients using gene expression data. Methods: We performed analysis of RNA-Sequence data from 110 triple negative and 992 non-triple negative breast cancer tumor samples from The Cancer Genome Atlas to select the features (genes) used in the development and validation of the classification models. We evaluated four different classification models including Support Vector Machines, K-nearest neighbor, Naïve Bayes and Decision tree using features selected at different threshold levels to train the models for classifying the two types of breast cancer. For performance evaluation and validation, the proposed methods were applied to independent gene expression datasets. Results: Among the four ML algorithms evaluated, the Support Vector Machine algorithm was able to classify breast cancer more accurately into triple negative and non-triple negative breast cancer and had less misclassification errors than the other three algorithms evaluated. Conclusions: The prediction results show that ML algorithms are efficient and can be used for classification of breast cancer into triple negative and non-triple negative breast cancer types.


2018 ◽  
Vol 17 (3) ◽  
pp. 251-259 ◽  
Author(s):  
Arjun P. Athreya ◽  
Alan J. Gaglio ◽  
Junmei Cairns ◽  
Krishna R. Kalari ◽  
Richard M. Weinshilboum ◽  
...  

2019 ◽  
Author(s):  
Yiqing Zhang ◽  
William Nock ◽  
Meghan Wyse ◽  
Zachary Weber ◽  
Elizabeth Adams ◽  
...  

ABSTRACTPurposeMetastatic relapse of triple-negative breast cancer (TNBC) within 2 years of diagnosis is associated with particularly aggressive disease and a distinct clinical course relative to TNBCs that relapse beyond 2 years. We hypothesized that rapid relapse TNBCs (rrTNBC; metastatic relapse or death <2 years) reflect unique genomic features relative to late relapse (lrTNBC; >2 years).Patients and MethodsWe identified 453 primary TNBCs from three publicly-available datasets and characterized each as rrTNBc, lrTNBC, or ‘no relapse’ (nrTNBC: no relapse/death with at least 5 years follow-up). We compiled primary tumor clinical and multi-omic data, including transcriptome (n=453), copy number alterations (CNAs; n=317), and mutations in 171 cancer-related genes (n=317), then calculated published gene expression and immune signatures.ResultsPatients with rrTNBC were higher stage at diagnosis (Chi-square p<0.0001) while lrTNBC were more likely to be non-basal PAM50 subtype (Chi-square p=0.03). Among 125 expression signatures, five immune signatures were significantly higher in nrTNBCs while lrTNBC were enriched for eight estrogen/luminal signatures (all FDR p<0.05). There was no significant difference in tumor mutation burden or percent genome altered across the groups. Among mutations, onlyTP53mutations were significantly more frequent in rrTNBC compared to lrTNBC (Fisher exact FDR p=0.009). To develop an optimal classifier, we used 77 significant clinical and ‘omic features to evaluate six modeling approaches encompassing simple, machine learning, and artificial neural network (ANN). Support vector machine outperformed other models with average receiver-operator characteristic area under curve >0.75.ConclusionsWe provide a new approach to define TNBCs based on timing of relapse. We identify distinct clinical and genomic features that can be incorporated into machine learning models to predict rapid relapse of TNBC.


2021 ◽  
Vol 11 (9) ◽  
pp. 881
Author(s):  
Rassanee Bissanum ◽  
Sitthichok Chaichulee ◽  
Rawikant Kamolphiwong ◽  
Raphatphorn Navakanitworakul ◽  
Kanyanatt Kanokwiroon

Triple negative breast cancer (TNBC) lacks well-defined molecular targets and is highly heterogenous, making treatment challenging. Using gene expression analysis, TNBC has been classified into four different subtypes: basal-like immune-activated (BLIA), basal-like immune-suppressed (BLIS), mesenchymal (MES), and luminal androgen receptor (LAR). However, there is currently no standardized method for classifying TNBC subtypes. We attempted to define a gene signature for each subtype, and to develop a classification method based on machine learning (ML) for TNBC subtyping. In these experiments, gene expression microarray data for TNBC patients were downloaded from the Gene Expression Omnibus database. Differentially expressed genes unique to 198 known TNBC cases were identified and selected as a training gene set to train in seven different classification models. We produced a training set consisting of 719 DEGs selected from uniquely expressed genes of all four subtypes. The highest average accuracy of classification of the BLIA, BLIS, MES, and LAR subtypes was achieved by the SVM algorithm (accuracy 95–98.8%; AUC 0.99–1.00). For model validation, we used 334 samples of unknown TNBC subtypes, of which 97 (29.04%), 73 (21.86%), 39 (11.68%) and 59 (17.66%) were predicted to be BLIA, BLIS, MES, and LAR, respectively. However, 66 TNBC samples (19.76%) could not be assigned to any subtype. These samples contained only three upregulated genes (EN1, PROM1, and CCL2). Each TNBC subtype had a unique gene expression pattern, which was confirmed by identification of DEGs and pathway analysis. These results indicated that our training gene set was suitable for development of classification models, and that the SVM algorithm could classify TNBC into four unique subtypes. Accurate and consistent classification of the TNBC subtypes is essential for personalized treatment and prognosis of TNBC.


2018 ◽  
Vol 173 (2) ◽  
pp. 365-373 ◽  
Author(s):  
Tong Wu ◽  
Laith R. Sultan ◽  
Jiawei Tian ◽  
Theodore W. Cary ◽  
Chandra M. Sehgal

Sign in / Sign up

Export Citation Format

Share Document