Preprocessing Breast Cancer Data to Improve the Data Quality, Diagnosis Procedure, and Medical Care Services

In recent years, due to an increase in the incidence of different cancers, various data sources are available in this field. Consequently, many researchers have become interested in the discovery of useful knowledge from available data to assist faster decision-making by doctors and reduce the negative consequences of such diseases. Data mining includes a set of useful techniques in the discovery of knowledge from the data: detecting hidden patterns and finding unknown relations. However, these techniques face several challenges with real-world data. Particularly, dealing with inconsistencies, errors, noise, and missing values requires appropriate preprocessing and data preparation procedures. In this article, we investigate the impact of preprocessing to provide high-quality data for classification techniques. A wide range of preprocessing and data preparation methods are studied, and a set of preprocessing steps was leveraged to obtain appropriate classification results. The preprocessing is done on a real-world breast cancer dataset of the Reza Radiation Oncology Center in Mashhad with various features and a great percentage of null values, and the results are reported in this article. To evaluate the impact of the preprocessing steps on the results of classification algorithms, this case study was divided into the following 3 experiments: Breast cancer recurrence prediction without data preprocessing Breast cancer recurrence prediction by error removal Breast cancer recurrence prediction by error removal and filling null values Then, in each experiment, dimensionality reduction techniques are used to select a suitable subset of features for the problem at hand. Breast cancer recurrence prediction models are constructed using the 3 widely used classification algorithms, namely, naïve Bayes, k-nearest neighbor, and sequential minimal optimization. The evaluation of the experiments is done in terms of accuracy, sensitivity, F-measure, precision, and G-mean measures. Our results show that recurrence prediction is significantly improved after data preprocessing, especially in terms of sensitivity, F-measure, precision, and G-mean measures.

Download Full-text

A Hybrid multi-stage Learning technique based on Brain Storming Optimization algorithm for Breast Cancer Recurrence Prediction

Journal of King Saud University - Computer and Information Sciences ◽

10.1016/j.jksuci.2021.05.004 ◽

2021 ◽

Author(s):

Maram Alwohaibi ◽

Malek Alzaqebah ◽

Noura M. Alotaibi ◽

Abeer M. Alzahrani ◽

Mariem Zouch

Keyword(s):

Breast Cancer ◽

Optimization Algorithm ◽

Cancer Recurrence ◽

Breast Cancer Recurrence ◽

Multi Stage ◽

Recurrence Prediction ◽

Learning Technique

Download Full-text

Breast Cancer Recurrence Prediction Model Using Machine Learning Technique: State of the Art, Challenges and Future Direction

10.1109/icrito51393.2021.9596179 ◽

2021 ◽

Author(s):

Mohan Kumar ◽

Sunil Kumar Khatri ◽

Masoud Mohammadian

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Prediction Model ◽

State Of The Art ◽

Cancer Recurrence ◽

Breast Cancer Recurrence ◽

Machine Learning Technique ◽

Recurrence Prediction ◽

Learning Technique ◽

Future Direction

Download Full-text

The Impact of Bariatric Surgery on Breast Cancer Recurrence: Case Series and Review of Literature

Obesity Surgery ◽

10.1007/s11695-019-04099-6 ◽

2019 ◽

Vol 30 (2) ◽

pp. 780-785 ◽

Cited By ~ 2

Author(s):

Shijia Zhang ◽

Sayeed Ikramuddin ◽

Heather C. Beckwith ◽

Adam C. Sheka ◽

Keith M. Wirth ◽

...

Keyword(s):

Breast Cancer ◽

Bariatric Surgery ◽

Case Series ◽

Cancer Recurrence ◽

Breast Cancer Recurrence ◽

Review Of Literature ◽

The Impact

Download Full-text

Predicting breast cancer recurrence using principal component analysis as feature extraction: an unbiased comparative analysis

International Journal of Advances in Intelligent Informatics ◽

10.26555/ijain.v6i3.462 ◽

2020 ◽

Vol 6 (3) ◽

pp. 313

Author(s):

Zuhaira Muhammad Zain ◽

Mona Alshenaifi ◽

Abeer Aljaloud ◽

Tamadhur Albednah ◽

Reham Alghanim ◽

...

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Principal Component Analysis ◽

Feature Extraction ◽

Medical Information ◽

Cancer Recurrence ◽

Principal Component ◽

Component Analysis ◽

Breast Cancer Recurrence ◽

F Measure

Breast cancer recurrence is among the most noteworthy fears faced by women. Nevertheless, with modern innovations in data mining technology, early recurrence prediction can help relieve these fears. Although medical information is typically complicated, and simplifying searches to the most relevant input is challenging, new sophisticated data mining techniques promise accurate predictions from high-dimensional data. In this study, the performances of three established data mining algorithms: Naïve Bayes (NB), k-nearest neighbor (KNN), and fast decision tree (REPTree), adopting the feature extraction algorithm, principal component analysis (PCA), for predicting breast cancer recurrence were contrasted. The comparison was conducted between models built in the absence and presence of PCA. The results showed that KNN produced better prediction without PCA (F-measure = 72.1%), whereas the other two techniques: NB and REPTree, improved when used with PCA (F-measure = 76.1% and 72.8%, respectively). This study can benefit the healthcare industry in assisting physicians in predicting breast cancer recurrence precisely.

Download Full-text