THE EFFECT OF DATASETS ON BREAST CANCER DETECTION MODELS

2021 ◽  
Vol 4 (4) ◽  
pp. 309-315
Author(s):  
Kumawuese Jennifer Kurugh ◽  
Muhammad Aminu Ahmad ◽  
Awwal Ahmad Babajo

Datasets are a major requirement in the development of breast cancer classification/detection models using machine learning algorithms. These models can provide an effective, accurate and less expensive diagnosis method and reduce life losses. However, using the same machine learning algorithms on different datasets yields different results. This research developed several machine learning models for breast cancer classification/detection using Random forest, support vector machine, K Nearest Neighbors, Gaussian Naïve Bayes, Perceptron and Logistic regression. Three widely used test data sets were used; Wisconsin Breast Cancer (WBC) Original, Wisconsin Diagnostic Breast Cancer (WDBC) and Wisconsin Prognostic Breast Cancer (WPBC). The results show that datasets affect the performance of machine learning classifiers. Also, the machine learning classifiers have different performances with a given breast cancer dataset

2021 ◽  
Vol 23 (11) ◽  
pp. 749-758
Author(s):  
Saranya N ◽  
◽  
Kavi Priya S ◽  

Breast Cancer is one of the chronic diseases occurred to human beings throughout the world. Early detection of this disease is the most promising way to improve patients’ chances of survival. The strategy employed in this paper is to select the best features from various breast cancer datasets using a genetic algorithm and machine learning algorithm is applied to predict the outcomes. Two machine learning algorithms such as Support Vector Machines and Decision Tree are used along with Genetic Algorithm. The proposed work is experimented on five datasets such as Wisconsin Breast Cancer-Diagnosis Dataset, Wisconsin Breast Cancer-Original Dataset, Wisconsin Breast Cancer-Prognosis Dataset, ISPY1 Clinical trial Dataset, and Breast Cancer Dataset. The results exploit that SVM-GA achieves higher accuracy of 98.16% than DT-GA of 97.44%.


2020 ◽  
Vol 4 (2) ◽  
pp. 535-544
Author(s):  
Djihane HOUFANI ◽  
◽  
Sihem SLATNIA ◽  
Okba KAZAR ◽  
Noureddine ZERHOUNI ◽  
...  

Background: The second leading deadliest disease affecting women worldwide, after lung cancer, is breast cancer. Traditional approaches for breast cancer diagnosis suffer from time consumption and some human errors in classification. To deal with this problems, many research works based on machine learning techniques are proposed. These approaches show their effectiveness in data classification in many fields, especially in healthcare. Methods: In this cross sectional study, we conducted a practical comparison between the most used machine learning algorithms in the literature. We applied kernel and linear support vector machines, random forest, decision tree, multi-layer perceptron, logistic regression, and k-nearest neighbors for breast cancer tumors classification. The used dataset is Wisconsin diagnosis Breast Cancer. Results: After comparing the machine learning algorithms efficiency, we noticed that multilayer perceptron and logistic regression gave the best results with an accuracy of 98% for breast cancer classification. Conclusion: Machine learning approaches are extensively used in medical prediction and decision support systems. This study showed that multilayer perceptron and logistic regression algorithms are performant ( good accuracy specificity and sensitivity) compared to the other evaluated algorithms.


Author(s):  
Peter T. Habib ◽  
Alsamman M. Alsamman ◽  
Sameh E. Hassnein ◽  
Ghada A. Shereif ◽  
Aladdin Hamwieh

Abstractin 2019, estimated New Cases 268.600, Breast cancer has one of the most common cancers and is one of the world’s leading causes of death for women. Classification and data mining is an efficient way to classify information. Particularly in the medical field where prediction techniques are commonly used for early detection and effective treatment in diagnosis and research.These paper tests models for the mammogram analysis of breast cancer information from 23 of the more widely used machine learning algorithms such as Decision Tree, Random forest, K-nearest neighbors and support vector machine. The spontaneously splits results are distributed from a replicated 10-fold cross-validation method. The accuracy calculated by Regression Metrics such as Mean Absolute Error, Mean Squared Error, R2 Score and Clustering Metrics such as Adjusted Rand Index, Homogeneity, V-measure.accuracy has been checked F-Measure, AUC, and Cross-Validation. Thus, proper identification of patients with breast cancer would create care opportunities, for example, the supervision and the implementation of intervention plans could benefit the quality of long-term care. Experimental results reveal that the maximum precision 100%with the lowest error rate is obtained with Ada-boost Classifier.


2021 ◽  
Vol 7 ◽  
pp. e390
Author(s):  
Shafaq Abbas ◽  
Zunera Jalil ◽  
Abdul Rehman Javed ◽  
Iqra Batool ◽  
Mohammad Zubair Khan ◽  
...  

Breast cancer is one of the leading causes of death in the current age. It often results in subpar living conditions for a patient as they have to go through expensive and painful treatments to fight this cancer. One in eight women all over the world is affected by this disease. Almost half a million women annually do not survive this fight and die from this disease. Machine learning algorithms have proven to outperform all existing solutions for the prediction of breast cancer using models built on the previously available data. In this paper, a novel approach named BCD-WERT is proposed that utilizes the Extremely Randomized Tree and Whale Optimization Algorithm (WOA) for efficient feature selection and classification. WOA reduces the dimensionality of the dataset and extracts the relevant features for accurate classification. Experimental results on state-of-the-art comprehensive dataset demonstrated improved performance in comparison with eight other machine learning algorithms: Support Vector Machine (SVM), Random Forest, Kernel Support Vector Machine, Decision Tree, Logistic Regression, Stochastic Gradient Descent, Gaussian Naive Bayes and k-Nearest Neighbor. BCD-WERT outperformed all with the highest accuracy rate of 99.30% followed by SVM achieving 98.60% accuracy. Experimental results also reveal the effectiveness of feature selection techniques in improving prediction accuracy.


Sign in / Sign up

Export Citation Format

Share Document