scholarly journals Development of an absolute assignment predictor for triple-negative breast cancer subtyping using machine learning approaches

2020 ◽  
Author(s):  
Fadoua Ben Azzouz ◽  
Bertrand Michel ◽  
Hamza Lasla ◽  
Wilfried Gouraud ◽  
Anne-Flore François ◽  
...  

AbstractTriple-negative breast cancer (TNBC) heterogeneity represents one of the main impediment to precision medicine for this disease. Recent concordant transcriptomics studies have shown that TNBC could be splitted into at least three subtypes with potential therapeutic implications. Although, a few studies have been done to predict TNBC subtype by means of transcriptomics data, subtyping was partially sensitive and limited by batch effect and dependence to a given dataset, which may penalize the switch to routine diagnostic testing. Therefore, we sought to build an absolute predictor (i.e. intra-patient diagnosis) based on machine learning algorithm with a limited number of probes. To this end, we started by introducing probe binary comparison for each patient (indicators). We based predictive analysis on this transformed data. Probe selection was first performed by combining both filter and wrapper methods for variable selection using cross validation. We thus tested three prediction models (random forest, gradient boosting [GB] and extreme gradient boosting) using this optimal subset of indicators as inputs. Nested cross-validation allowed us to consistently choose the best model. Results showed that the 50 selected indicators highlighted biological characteristics associated with each TNBC subtype. The GB based on this subset of indicators has better performances as compared to the other models.

2021 ◽  
Vol 129 ◽  
pp. 104171
Author(s):  
Fadoua Ben Azzouz ◽  
Bertrand Michel ◽  
Hamza Lasla ◽  
Wilfried Gouraud ◽  
Anne-Flore François ◽  
...  

2021 ◽  
Author(s):  
Yafei Wu ◽  
Zhongquan Jiang ◽  
Shaowu Lin ◽  
Ya Fang

BACKGROUND Prediction of stroke based on individuals’ risk factors, especially for a first stroke event, is of great significance for primary prevention of high-risk populations. OBJECTIVE This study aimed to investigate the applicability of machine learning for predicting stroke onset in older adults compared with statistical model. METHODS A total of 5960 participants consecutively surveyed from 2011 to 2013 in the China Health and Retirement Longitudinal Study were included for analysis. We constructed a traditional logistic regression (LR) and two machine learning methods, namely random forest (RF) and extreme gradient boosting (XGBoost), to identify stroke onset using epidemiological and clinical variables. Grid search and 10-fold cross validation were used to tune hyperparameters. Model performance was assessed by discrimination, calibration, decision curve and predictiveness curve analysis. RESULTS Among the 5960 participants, 131 (2.20%) of them developed stroke after an average of 2-year follow-up. Our prediction models distinguished stroke versus non-stroke with excellent performance. The AUCs of machine learning (RF, 0.823[95% CI, 0.759-0.886]; XGBoost, 0.808[95% CI, 0.730-0.886]) were significantly higher than LR (0.718[95% CI, 0.649, 0.787], p<0.05). No significant difference was observed between RF and XGBoost (p>0.05). All prediction models had good calibration results with brier score of approximately 0.020. XGBoost had much higher net benefits within a wider threshold range and more capable of recognizing high risk individuals in terms of decision curve and predictiveness curve analysis. Biomarker information were more capable for stroke prediction than epidemiological data. CONCLUSIONS Machine learning, especially for XGBoost, had potential to predict stroke onset among the elderly in the population-based study.


2020 ◽  
Author(s):  
Si-Qiao Liang ◽  
Jian-Xiong Long ◽  
Jingmin Deng ◽  
Xuan Wei ◽  
Mei-Ling Yang ◽  
...  

Abstract Asthma is a serious immune-mediated respiratory airway disease. Its pathological processes involve genetics and the environment, but it remains unclear. To understand the risk factors of asthma, we combined genome-wide association study (GWAS) risk loci and clinical data in predicting asthma using machine-learning approaches. A case–control study with 123 asthma patients and 100 healthy controls was conducted in Zhuang population in Guangxi. GWAS risk loci were detected using polymerase chain reaction, and clinical data were collected. Machine-learning approaches (e.g., extreme gradient boosting [XGBoost], decision tree, support vector machine, and random forest algorithms) were used to identify the major factors that contributed to asthma. A total of 14 GWAS risk loci with clinical data were analyzed on the basis of 10 times of 10-fold cross-validation for all machine-learning models. Using GWAS risk loci or clinical data, the best performances were area under the curve (AUC) values of 64.3% and 71.4%, respectively. Combining GWAS risk loci and clinical data, the XGBoost established the best model with an AUC of 79.7%, indicating that the combination of genetics and clinical data can enable improved performance. We then sorted the importance of features and found that the top six risk factors for predicting asthma were rs3117098, rs7775228, family history, rs2305480, rs4833095, and body mass index. Asthma-prediction models based on GWAS risk loci and clinical data can accurately predict asthma and thus provide insights into the disease pathogenesis of asthma. Further research is required to evaluate more genetic markers and clinical data and predict asthma risk.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Arturo Moncada-Torres ◽  
Marissa C. van Maaren ◽  
Mathijs P. Hendriks ◽  
Sabine Siesling ◽  
Gijs Geleijnse

AbstractCox Proportional Hazards (CPH) analysis is the standard for survival analysis in oncology. Recently, several machine learning (ML) techniques have been adapted for this task. Although they have shown to yield results at least as good as classical methods, they are often disregarded because of their lack of transparency and little to no explainability, which are key for their adoption in clinical settings. In this paper, we used data from the Netherlands Cancer Registry of 36,658 non-metastatic breast cancer patients to compare the performance of CPH with ML techniques (Random Survival Forests, Survival Support Vector Machines, and Extreme Gradient Boosting [XGB]) in predicting survival using the $$c$$ c -index. We demonstrated that in our dataset, ML-based models can perform at least as good as the classical CPH regression ($$c$$ c -index $$\sim \,0.63$$ ∼ 0.63 ), and in the case of XGB even better ($$c$$ c -index $$\sim 0.73$$ ∼ 0.73 ). Furthermore, we used Shapley Additive Explanation (SHAP) values to explain the models’ predictions. We concluded that the difference in performance can be attributed to XGB’s ability to model nonlinearities and complex interactions. We also investigated the impact of specific features on the models’ predictions as well as their corresponding insights. Lastly, we showed that explainable ML can generate explicit knowledge of how models make their predictions, which is crucial in increasing the trust and adoption of innovative ML techniques in oncology and healthcare overall.


2021 ◽  
Vol 11 (2) ◽  
pp. 61
Author(s):  
Jiande Wu ◽  
Chindo Hicks

Background: Breast cancer is a heterogeneous disease defined by molecular types and subtypes. Advances in genomic research have enabled use of precision medicine in clinical management of breast cancer. A critical unmet medical need is distinguishing triple negative breast cancer, the most aggressive and lethal form of breast cancer, from non-triple negative breast cancer. Here we propose use of a machine learning (ML) approach for classification of triple negative breast cancer and non-triple negative breast cancer patients using gene expression data. Methods: We performed analysis of RNA-Sequence data from 110 triple negative and 992 non-triple negative breast cancer tumor samples from The Cancer Genome Atlas to select the features (genes) used in the development and validation of the classification models. We evaluated four different classification models including Support Vector Machines, K-nearest neighbor, Naïve Bayes and Decision tree using features selected at different threshold levels to train the models for classifying the two types of breast cancer. For performance evaluation and validation, the proposed methods were applied to independent gene expression datasets. Results: Among the four ML algorithms evaluated, the Support Vector Machine algorithm was able to classify breast cancer more accurately into triple negative and non-triple negative breast cancer and had less misclassification errors than the other three algorithms evaluated. Conclusions: The prediction results show that ML algorithms are efficient and can be used for classification of breast cancer into triple negative and non-triple negative breast cancer types.


2018 ◽  
Vol 17 (3) ◽  
pp. 251-259 ◽  
Author(s):  
Arjun P. Athreya ◽  
Alan J. Gaglio ◽  
Junmei Cairns ◽  
Krishna R. Kalari ◽  
Richard M. Weinshilboum ◽  
...  

Biomolecules ◽  
2020 ◽  
Vol 10 (9) ◽  
pp. 1295
Author(s):  
Archana P. Thankamony ◽  
Reshma Murali ◽  
Nitheesh Karthikeyan ◽  
Binitha Anu Varghese ◽  
Wee S. Teo ◽  
...  

The basic helix-loop-helix (bHLH) transcription factors inhibitor of differentiation 1 (Id1) and inhibitor of differentiation 3 (Id3) (referred to as Id) have an important role in maintaining the cancer stem cell (CSC) phenotype in the triple-negative breast cancer (TNBC) subtype. In this study, we aimed to understand the molecular mechanism underlying Id control of CSC phenotype and exploit it for therapeutic purposes. We used two different TNBC tumor models marked by either Id depletion or Id1 expression in order to identify Id targets using a combinatorial analysis of RNA sequencing and microarray data. Phenotypically, Id protein depletion leads to cell cycle arrest in the G0/G1 phase, which we demonstrate is reversible. In order to understand the molecular underpinning of Id proteins on the cell cycle phenotype, we carried out a large-scale small interfering RNA (siRNA) screen of 61 putative targets identified by using genomic analysis of two Id TNBC tumor models. Kinesin Family Member 11 (Kif11) and Aurora Kinase A (Aurka), which are critical cell cycle regulators, were further validated as Id targets. Interestingly, unlike in Id depletion conditions, Kif11 and Aurka knockdown leads to a G2/M arrest, suggesting a novel Id cell cycle mechanism, which we will explore in further studies. Therapeutic targeting of Kif11 to block the Id1–Kif11 axis was carried out using small molecular inhibitor ispinesib. We finally leveraged our findings to target the Id/Kif11 pathway using the small molecule inhibitor ispinesib in the Id+ CSC results combined with chemotherapy for better response in TNBC subtypes. This work opens up exciting new possibilities of targeting Id targets such as Kif11 in the TNBC subtype, which is currently refractory to chemotherapy. Targeting the Id1–Kif11 molecular pathway in the Id1+ CSCs in combination with chemotherapy and small molecular inhibitor results in more effective debulking of TNBC.


2020 ◽  
Vol 5 (8) ◽  
pp. 62
Author(s):  
Clint Morris ◽  
Jidong J. Yang

Generating meaningful inferences from crash data is vital to improving highway safety. Classic statistical methods are fundamental to crash data analysis and often regarded for their interpretability. However, given the complexity of crash mechanisms and associated heterogeneity, classic statistical methods, which lack versatility, might not be sufficient for granular crash analysis because of the high dimensional features involved in crash-related data. In contrast, machine learning approaches, which are more flexible in structure and capable of harnessing richer data sources available today, emerges as a suitable alternative. With the aid of new methods for model interpretation, the complex machine learning models, previously considered enigmatic, can be properly interpreted. In this study, two modern machine learning techniques, Linear Discriminate Analysis and eXtreme Gradient Boosting, were explored to classify three major types of multi-vehicle crashes (i.e., rear-end, same-direction sideswipe, and angle) occurred on Interstate 285 in Georgia. The study demonstrated the utility and versatility of modern machine learning methods in the context of crash analysis, particularly in understanding the potential features underlying different crash patterns on freeways.


Sign in / Sign up

Export Citation Format

Share Document