scholarly journals Modeling the Factors Associated with Mortality in Patients with Breast Cancer: A Machine Learning Approach

2020 ◽  
Author(s):  
Mohammad Asghari Jafarabadi ◽  
Zeynab Iraji ◽  
Roya Dolatkhah ◽  
Tohid Jafari Koshki

Abstract Background: Breast cancer (BC) was the fifth leading cause of death worldwide in 2015 and the second leading cause of death in Iran in 2012. This study aimed to model the factors associated with mortality in patients with BC utilizing the machine learning approach.Methods: We used data of patients with primary BC during 2007-2016 in Tabriz, Iran. The data were analyzed using decision tree (DT), boosted tree (BT), random forest (RF), k-nearest neighbors (KNN) and generalized additive model (GAM) with inverse probability of censoring weighting (IPCW) technique to assess the risk factors of mortality. The models were compared by using diagnostic accuracy measures.Results: Accuracy of the models ranged from 76.0 to 93.0%, with sensitivity of 82.5-98.8% and specificity of 72.2-99.4%. The GAM fit the data best with accuracy of 93.0% (95% CI: [90.5, 95.0]), sensitivity of 98.8% (95% CI: [96.9, 99.7]) and specificity of 84.3% (95% CI: [78.8, 88.9]) where non-linear effect of age (p-value = 0.006), grade (p-value = 0.024) and time to event (p-value < 0.001) on mortality were significant. Conclusion: The GAM seems to be an optimal model for classifying the mortality in patients with BC. Considering the time to event, age and grade, as the prognostic factors obtained by GAM, more accurate prevention planning may be designed.

2020 ◽  
Author(s):  
Mohammad Asghari Jafarabadi ◽  
Zaynab Iraji ◽  
Roya Dolatkhah ◽  
Tohid jafari koshki

Abstract Background: Breast cancer (BC) was the fifth leading cause of death worldwide in 2015 and the second leading cause of death in Iran in 2012. This study aimed to model the factors associated with mortality in patients with BC utilizing the machine learning approach.Methods: We used data of patients with primary BC during 2007-2016 in Tabriz, Iran. The data were analyzed using decision tree (DT), boosted tree (BT), random forest (RF), k-nearest neighbors (KNN) and generalized additive model (GAM) with inverse probability of censoring weighting (IPCW) technique to assess the risk factors of mortality. The models were compared by using diagnostic accuracy measures.Results: Accuracy of the models ranged from 76.0 to 93.0%, with sensitivity of 82.5-98.8% and specificity of 72.2-99.4%. The GAM fit the data best with accuracy of 93.0% (95% CI: [90.5, 95.0]), sensitivity of 98.8% (95% CI: [96.9, 99.7]) and specificity of 84.3% (95% CI: [78.8, 88.9]) where non-linear effect of age (p-value = 0.006), grade (p-value = 0.024) and time to event (p-value < 0.001) on mortality were significant. Conclusion: The GAM seems to be an optimal model for classifying the mortality in patients with BC. Considering the time to event, age and grade, as the prognostic factors obtained by GAM, more accurate prevention planning may be designed.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Pratyusha Rakshit ◽  
Onintze Zaballa ◽  
Aritz Pérez ◽  
Elisa Gómez-Inhiesto ◽  
Maria T. Acaiturri-Ayesta ◽  
...  

AbstractThis paper presents a novel machine learning approach to perform an early prediction of the healthcare cost of breast cancer patients. The learning phase of our prediction method considers the following two steps: (1) in the first step, the patients are clustered taking into account the sequences of actions undergoing similar clinical activities and ensuring similar healthcare costs, and (2) a Markov chain is then learned for each group to describe the action-sequences of the patients in the cluster. A two step procedure is undertaken in the prediction phase: (1) first, the healthcare cost of a new patient’s treatment is estimated based on the average healthcare cost of its k-nearest neighbors in each group, and (2) finally, an aggregate measure of the healthcare cost estimated by each group is used as the final predicted cost. Experiments undertaken reveal a mean absolute percentage error as small as 6%, even when half of the clinical records of a patient is available, substantiating the early prediction capability of the proposed method. Comparative analysis substantiates the superiority of the proposed algorithm over the state-of-the-art techniques.


Cancers ◽  
2019 ◽  
Vol 11 (3) ◽  
pp. 431 ◽  
Author(s):  
Oneeb Rehman ◽  
Hanqi Zhuang ◽  
Ali Muhamed Ali ◽  
Ali Ibrahim ◽  
Zhongwei Li

Certain small noncoding microRNAs (miRNAs) are differentially expressed in normal tissues and cancers, which makes them great candidates for biomarkers for cancer. Previously, a selected subset of miRNAs has been experimentally verified to be linked to breast cancer. In this paper, we validated the importance of these miRNAs using a machine learning approach on miRNA expression data. We performed feature selection, using Information Gain (IG), Chi-Squared (CHI2) and Least Absolute Shrinkage and Selection Operation (LASSO), on the set of these relevant miRNAs to rank them by importance. We then performed cancer classification using these miRNAs as features using Random Forest (RF) and Support Vector Machine (SVM) classifiers. Our results demonstrated that the miRNAs ranked higher by our analysis had higher classifier performance. Performance becomes lower as the rank of the miRNA decreases, confirming that these miRNAs had different degrees of importance as biomarkers. Furthermore, we discovered that using a minimum of three miRNAs as biomarkers for breast cancers can be as effective as using the entire set of 1800 miRNAs. This work suggests that machine learning is a useful tool for functional studies of miRNAs for cancer detection and diagnosis.


Sign in / Sign up

Export Citation Format

Share Document