Classifications of Breast Cancer Diagnosis using Machine Learning

International Journal of Computers ◽

10.46300/9108.2020.14.13 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Random Forest ◽

Breast Cancer Diagnosis ◽

Performance Comparison ◽

Support Vector ◽

Breast Cancer Dataset ◽

K Nearest Neighbors ◽

Cancer Dataset ◽

Machine Learning Classification

Breast Cancer (BC) is amongst the most common and leading causes of deaths in women throughout the world. Recently, classification and data analysis tools are being widely used in the medical field for diagnosis, prognosis and decision making to help lower down the risks of people dying or suffering from diseases. Advanced machine learning methods have proven to give hope for patients as this has helped the doctors in early detection of diseases like Breast Cancer that can be fatal, in support with providing accurate outcomes. However, the results highly depend on the techniques used for feature selection and classification which will produce a strong machine learning model. In this paper, a performance comparison is conducted using four classifiers which are Multilayer Perceptron (MLP), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Random Forest on the Wisconsin Breast Cancer dataset to spot the most effective predictors. The main goal is to apply best machine learning classification methods to predict the Breast Cancer as benign or malignant using terms such as accuracy, f-measure, precision and recall. Experimental results show that Random forest is proven to achieve the highest accuracy of 99.26% on this dataset and features, while SVM and KNN show 97.78% and 97.04% accuracy respectively. MLP shows the least accuracy of 94.07%. All the experiments are conducted using RStudio as the data mining tool platform.

Download Full-text

RFSVM: A Novel Classification Technique for Breast Cancer Diagnosis

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l2808.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 3295-3305

Keyword(s):

Breast Cancer ◽

Support Vector Machine ◽

Random Forest ◽

Early Stage ◽

Breast Cancer Diagnosis ◽

Support Vector ◽

Breast Cancer Dataset ◽

Limited Data ◽

Cancer Dataset ◽

Cancerous Cell

Cancer is a disease, which develops, in human body due to gene mutation. Due to various factor cells turn into cancerous cell and grow rapidly while damaging normal cells. Many women get affected by breast cancer, which might even cause death if not treated at early stage. Early detection of breast cancer is highly important to increase the survival rate. Machine learning methods and technologies are making it possible to classify and detect the class in an accurate manner. Among other classifiers, random forest and support vector machine are two classifiers that have a good classification power. In this, research a combination of these two classifier i.e. Random Forest and Support Vector Machine (RFSVM) is proposed for early diagnosis of breast cancer cell using Wisconsin Breast Cancer Dataset (WBCD). Using different train-test data ratio experiments are performed and an average of more than 98percentage accuracy is achieved using this hybrid classifier. This paper overcomes the over-fitting problem of random forest and the need of tuning the parameters of Support Vector Machine. Even with limited data available, the classifier tunes its parameters so well to give a highly accurate result.

Download Full-text

Breast Cancer Prediction Using Machine Learning

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206457 ◽

2020 ◽

pp. 278-284

Author(s):

Gaurav Singh

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Support Vector ◽

Breast Cancer Dataset ◽

K Nearest Neighbor ◽

Cancer Dataset ◽

Implementation Phase ◽

Machine Learning Classification

Breast cancer may be a prevalent explanation for death, and it's the sole sort of cancer that's widespread among women worldwide. The prime objective of this paper creates the model for predicting breast cancer using various machine learning classification algorithms like k Nearest Neighbor (kNN), Support Vector Machine (SVM), Logistic Regression (LR), and Gaussian Naive Bayes (NB). And furthermore, assess and compare the performance of the varied classifiers as far as accuracy, precision, recall, f1-Score, and Jaccard index. The breast cancer dataset is publicly available on the UCI Machine Learning Repository and therefore the implementation phase dataset is going to be partitioned as 80% for the training phase and 20% for the testing phase then apply the machine learning algorithms. k Nearest Neighbors achieved a significant performance in respect of all parameters.

Download Full-text

Prediction of Breast Cancer Using Machine Learning

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190617160834 ◽

2020 ◽

Vol 13 (5) ◽

pp. 901-908

Author(s):

Somil Jain ◽

Puneet Kumar

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Prediction Accuracy ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Classification Algorithms ◽

Breast Cancer Dataset

Background:: Breast cancer is one of the diseases which cause number of deaths ever year across the globe, early detection and diagnosis of such type of disease is a challenging task in order to reduce the number of deaths. Now a days various techniques of machine learning and data mining are used for medical diagnosis which has proven there metal by which prediction can be done for the chronic diseases like cancer which can save the life’s of the patients suffering from such type of disease. The major concern of this study is to find the prediction accuracy of the classification algorithms like Support Vector Machine, J48, Naïve Bayes and Random Forest and to suggest the best algorithm. Objective:: The objective of this study is to assess the prediction accuracy of the classification algorithms in terms of efficiency and effectiveness. Methods: This paper provides a detailed analysis of the classification algorithms like Support Vector Machine, J48, Naïve Bayes and Random Forest in terms of their prediction accuracy by applying 10 fold cross validation technique on the Wisconsin Diagnostic Breast Cancer dataset using WEKA open source tool. Results:: The result of this study states that Support Vector Machine has achieved the highest prediction accuracy of 97.89 % with low error rate of 0.14%. Conclusion:: This paper provides a clear view over the performance of the classification algorithms in terms of their predicting ability which provides a helping hand to the medical practitioners to diagnose the chronic disease like breast cancer effectively.

Download Full-text

Optimized Breast Cancer Classification using Feature Selection and Outliers Detection

Journal of the Nigerian Society of Physical Sciences ◽

10.46481/jnsps.2021.331 ◽

2021 ◽

pp. 298-307

Author(s):

A. B Yusuf ◽

R. M Dima ◽

S. K Aina

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Cancer Diagnosis ◽

Breast Cancer Diagnosis ◽

Feature Reduction ◽

Machine Learning Algorithms ◽

Breast Cancer Dataset ◽

The Impact

Breast cancer is the second most commonly diagnosed cancer in women throughout the world. It is on the rise, especially in developing countries, where the majority of cases are discovered late. Breast cancer develops when cancerous tumors form on the surface of the breast cells. The absence of accurate prognostic models to assist physicians recognize symptoms early makes it difficult to develop a treatment plan that would help patients live longer. However, machine learning techniques have recently been used to improve the accuracy and speed of breast cancer diagnosis. If the accuracy is flawless, the model will be more efficient, and the solution to breast cancer diagnosis will be better. Nevertheless, the primary difficulty for systems developed to detect breast cancer using machine-learning models is attaining the greatest classification accuracy and picking the most predictive feature useful for increasing accuracy. As a result, breast cancer prognosis remains a difficulty in today's society. This research seeks to address a flaw in an existing technique that is unable to enhance classification of continuous-valued data, particularly its accuracy and the selection of optimal features for breast cancer prediction. In order to address these issues, this study examines the impact of outliers and feature reduction on the Wisconsin Diagnostic Breast Cancer Dataset, which was tested using seven different machine learning algorithms. The results show that Logistic Regression, Random Forest, and Adaboost classifiers achieved the greatest accuracy of 99.12%, on removal of outliers from the dataset. Also, this filtered dataset with feature selection, on the other hand, has the greatest accuracy of 100% and 99.12% with Random Forest and Gradient boost classifiers, respectively. When compared to other state-of-the-art approaches, the two suggested strategies outperformed the unfiltered data in terms of accuracy. The suggested architecture might be a useful tool for radiologists to reduce the number of false negatives and positives. As a result, the efficiency of breast cancer diagnosis analysis will be increased.

Download Full-text

Using Support Vector Machine Detection of Breast Cancer in Early stage

International Journal for Research in Engineering Application & Management ◽

10.35291/2454-9150.2020.0465 ◽

2020 ◽

pp. 213-216

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Support Vector Machine ◽

Early Stage ◽

Breast Cancer Diagnosis ◽

Support Vector ◽

Svm Classifier ◽

K Nearest Neighbors ◽

Data Set ◽

Sensitivity Specificity

The Breast Cancer is disease which tremendously increased in women’s nowadays. Mammography is technique of low-powered X-ray diagnosis approach for detection and diagnosis of cancer diseases at early stage. The proposed system shows the solution of two problems. First shows to detect tumors as suspicious regions with a weak contrast to their background and second shows way to extract features which categorize tumors. Hence this classification can be done with SVM, a great method of statistical learning has made significant achievement in various field. Discovered in the early 90’s, which led to an interest in machine learning? Here the different types of tumor like Benign, Malignant, or Normal image are classified using the SVM classifier. This techniques shows how easily we can detect region of tumor is present in mammogram images with more than 80% of accuracy rates for linear classification using SVM. The 10-fold cross validation to get an accurate outcome is been used by proposed system. The Wisconsin breast cancer diagnosis data set is referred from UCI machine learning repository. The considering accuracy, sensitivity, specificity, false discovery rate, false omission rate and Matthews’s correlation coefficient is appraised in the proposed system. This Provides good result for both training and testing phase. The techniques also shows accuracy of 98.57% and 97.14% by use of Support Vector Machine and K-Nearest Neighbors

Download Full-text

Breast Tumor Classification Using an Ensemble Machine Learning Method

Journal of Imaging ◽

10.3390/jimaging6060039 ◽

2020 ◽

Vol 6 (6) ◽

pp. 39 ◽

Cited By ~ 1

Author(s):

Adel S. Assiri ◽

Saima Nazir ◽

Sergio A. Velastin

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

State Of The Art ◽

Majority Voting ◽

Ensemble Classification ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Breast Cancer Dataset ◽

Machine Learning Classification ◽

Voting Mechanism

Breast cancer is the most common cause of death for women worldwide. Thus, the ability of artificial intelligence systems to detect possible breast cancer is very important. In this paper, an ensemble classification mechanism is proposed based on a majority voting mechanism. First, the performance of different state-of-the-art machine learning classification algorithms were evaluated for the Wisconsin Breast Cancer Dataset (WBCD). The three best classifiers were then selected based on their F3 score. F3 score is used to emphasize the importance of false negatives (recall) in breast cancer classification. Then, these three classifiers, simple logistic regression learning, support vector machine learning with stochastic gradient descent optimization and multilayer perceptron network, are used for ensemble classification using a voting mechanism. We also evaluated the performance of hard and soft voting mechanism. For hard voting, majority-based voting mechanism was used and for soft voting we used average of probabilities, product of probabilities, maximum of probabilities and minimum of probabilities-based voting methods. The hard voting (majority-based voting) mechanism shows better performance with 99.42%, as compared to the state-of-the-art algorithm for WBCD.

Download Full-text

BREAST CANCER DETECTION USING MAMMOGRAM FEATURES USING RANDOM FOREST ALGORITHM

INTERNATIONAL JOURNAL FOR ADVANCED RESEARCH IN SCIENCE & TECHNOLOGY ◽

10.48047/ijarst/v10/i11/02 ◽

2020 ◽

pp. 12-15

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Random Forest ◽

Cancer Detection ◽

Learning Algorithm ◽

Breast Cancer Dataset ◽

Random Forest Algorithm ◽

Training Set ◽

Cancer Dataset ◽

Breast Cells

Breast Cancer is one of the most dangerous diseases for women. This cancer occurs when some breast cells begin to grow abnormally. Machine learning is the subfield of computer science that studies programs that generalize from past experience. This project looks at classification, where an algorithm tries to predict the label for a sample. The machine learning algorithm takes many of these samples, called the training set, and builds an internal model. This built model is used to classify and predict the data. There are two classes, benign and malignant. Random Forest classifier is used to predict whether the cancer is benign or malignant. Training and testing of the model are done by Wisconsin Diagnosis Breast Cancer dataset.

Download Full-text

A Novel Approach for Improving Breast Cancer Risk Prediction using Machine Learning Algorithms : A Survey

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset196634 ◽

2019 ◽

pp. 113-118 ◽

Cited By ~ 1

Author(s):

Madhuri Maru ◽

Saket Swarndeep

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Predictive Analytics ◽

Learning Algorithms ◽

Performance Comparison ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Predictive Analysis ◽

Support Vector ◽

K Nearest Neighbors

Breast cancer represents one of the diseases that make a high number of deaths every year. It is the most common type of all cancers and the main cause of women's deaths worldwide. Classification and data mining methods are an effective way to classify data. Especially in medical field, where those methods are widely used in diagnosis and analysis to make decisions. Here, a common misconception is that predictive analytics and machine learning are the same thing where in predictive analysis is a statistical learning and machine learning is pattern recognition and explores the notion that algorithms can learn from and make predictions on data. In this paper, we are addressing the problem of predictive analysis by adding machine learning techniques for better prediction of breast cancer. In this, a performance comparison between different machine learning algorithms: Support Vector Machine (SVM), Decision Tree (C4.5), Naive Bayes (NB) and k Nearest Neighbors (k-NN) on the Wisconsin Breast Cancer (original) datasets is conducted. The main objective is to assess the correctness in classifying data with respect to efficiency and effectiveness of hybrid algorithm in terms of accuracy, precision, sensitivity and specificity.

Download Full-text

A Comparative Analysis and Predicting for Breast Cancer Detection Based on Data Mining Models

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v8i430209 ◽

2021 ◽

pp. 45-59

Author(s):

Shler Farhad Khorshid ◽

Adnan Mohsin Abdulazeez ◽

Amira Bibo Sallow

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Data Mining ◽

Nearest Neighbors ◽

Performance Comparison ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbors ◽

Data Set ◽

Wide Range

Breast cancer is one of the most common diseases among women, accounting for many deaths each year. Even though cancer can be treated and cured in its early stages, many patients are diagnosed at a late stage. Data mining is the method of finding or extracting information from massive databases or datasets, and it is a field of computer science with a lot of potentials. It covers a wide range of areas, one of which is classification. Classification may also be accomplished using a variety of methods or algorithms. With the aid of MATLAB, five classification algorithms were compared. This paper presents a performance comparison among the classifiers: Support Vector Machine (SVM), Logistics Regression (LR), K-Nearest Neighbors (K-NN), Weighted K-Nearest Neighbors (Weighted K-NN), and Gaussian Naïve Bayes (Gaussian NB). The data set was taken from UCI Machine learning Repository. The main objective of this study is to classify breast cancer women using the application of machine learning algorithms based on their accuracy. The results have revealed that Weighted K-NN (96.7%) has the highest accuracy among all the classifiers.

Download Full-text

Research of Machine Learning algorithms using K-fold cross validation

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1043.0886s19 ◽

2019 ◽

Vol 8 (6S) ◽

pp. 215-218

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Research Area ◽

Machine Learning Algorithms ◽

Support Vector ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Validation Data ◽

Machine Learning Classification ◽

Fold Cross Validation

In machine learning, Classification is one of the most important research area. Classification allocates the given input to a known category. In this paper different machine algorithms like Logistic regression (LR), Decision tree (DT), Support vector machine (SVM), K nearest neighbors (KNN) were implemented on UCI breast cancer dataset with preprocessing. The models were trained and tested with k-fold cross validation data. Accuracy and run time execution of each classifier are implemented in python.

Download Full-text