Comparison of Machine Learning Algorithms in Breast Cancer Prediction Using the Coimbra Dataset

Author(s):  
Yolanda D Austria ◽  
Marie Luvett Goh ◽  
Lorenzo Sta. Maria Jr. ◽  
Jay-Ar Lalata ◽  
Joselito Eduard Goh ◽  
...  
Author(s):  
Akshya Yadav ◽  
Imlikumla Jamir ◽  
Raj Rajeshwari Jain ◽  
Mayank Sohani

Cancer has been characterized as one of the leading diseases that causes death in humans. Breast cancer being a subtype of cancer causes death in one out of every eight women worldwide. The solution to counter this is by conducting early and accurate diagnosis for faster treatment. To achieve such accuracy in a short span of time proves difficult with existing techniques. In this paper, different machine learning algorithms which can be used as tools by physicians for early and effective detection and prediction of cancerous cells have been studied and introduced. The different algorithms introduced here are ANN, DT, Random Forest (RF), Naive Bayes Classifier (NBC), SVM and KNN. These algorithms are trained with a dataset that contain parameters describing the tumor of a person having breast cancer and are then used to classify and predict whether the cell is cancerous.


2021 ◽  
Vol 191 ◽  
pp. 487-492
Author(s):  
Mohammed Amine Naji ◽  
Sanaa El Filali ◽  
Kawtar Aarika ◽  
EL Habib Benlahmar ◽  
Rachida Ait Abdelouhahid ◽  
...  

10.2196/17364 ◽  
2020 ◽  
Vol 8 (6) ◽  
pp. e17364 ◽  
Author(s):  
Can Hou ◽  
Xiaorong Zhong ◽  
Ping He ◽  
Bin Xu ◽  
Sha Diao ◽  
...  

Background Risk-based breast cancer screening is a cost-effective intervention for controlling breast cancer in China, but the successful implementation of such intervention requires an accurate breast cancer prediction model for Chinese women. Objective This study aimed to evaluate and compare the performance of four machine learning algorithms on predicting breast cancer among Chinese women using 10 breast cancer risk factors. Methods A dataset consisting of 7127 breast cancer cases and 7127 matched healthy controls was used for model training and testing. We used repeated 5-fold cross-validation and calculated AUC, sensitivity, specificity, and accuracy as the measures of the model performance. Results The three novel machine-learning algorithms (XGBoost, Random Forest and Deep Neural Network) all achieved significantly higher area under the receiver operating characteristic curves (AUCs), sensitivity, and accuracy than logistic regression. Among the three novel machine learning algorithms, XGBoost (AUC 0.742) outperformed deep neural network (AUC 0.728) and random forest (AUC 0.728). Main residence, number of live births, menopause status, age, and age at first birth were considered as top-ranked variables in the three novel machine learning algorithms. Conclusions The novel machine learning algorithms, especially XGBoost, can be used to develop breast cancer prediction models to help identify women at high risk for breast cancer in developing countries.


2019 ◽  
Author(s):  
Can Hou ◽  
Xiaorong Zhong ◽  
Ping He ◽  
Bin Xu ◽  
Sha Diao ◽  
...  

BACKGROUND Risk-based breast cancer screening is a cost-effective intervention for controlling breast cancer in China, but the successful implementation of such intervention requires an accurate breast cancer prediction model for Chinese women. OBJECTIVE This study aimed to evaluate and compare the performance of four machine learning algorithms on predicting breast cancer among Chinese women using 10 breast cancer risk factors. METHODS A dataset consisting of 7127 breast cancer cases and 7127 matched healthy controls was used for model training and testing. We used repeated 5-fold cross-validation and calculated AUC, sensitivity, specificity, and accuracy as the measures of the model performance. RESULTS The three novel machine-learning algorithms (XGBoost, Random Forest and Deep Neural Network) all achieved significantly higher area under the receiver operating characteristic curves (AUCs), sensitivity, and accuracy than logistic regression. Among the three novel machine learning algorithms, XGBoost (AUC 0.742) outperformed deep neural network (AUC 0.728) and random forest (AUC 0.728). Main residence, number of live births, menopause status, age, and age at first birth were considered as top-ranked variables in the three novel machine learning algorithms. CONCLUSIONS The novel machine learning algorithms, especially XGBoost, can be used to develop breast cancer prediction models to help identify women at high risk for breast cancer in developing countries.


Sign in / Sign up

Export Citation Format

Share Document