Breast Cancer Prediction using Machine Learning

One of the most dreadful disease is breast cancer and it has a potential cause for death in women. Every year, death rate increases drastically due to breast cancer. An effective way to classify data is through classification or data mining. This becomes very handy, especially in the medical field where diagnosis and analysis are done through these techniques. Wisconsin Breast cancer dataset is used to perform a comparison between SVM, Logistic Regression, Naïve Bayes and Random Forest. Evaluating the correctness in classifying data based on accuracy and time consumption is used to determine the efficiency of the algorithms, which is the main objective. Based on the result of performed experiments, the Random Forest algorithm shows the highest accuracy (99.76%) with the least error rate. ANACONDA Data Science Platform is used to execute all the experiments in a simulated environment.

Download Full-text

BREAST CANCER DETECTION USING MAMMOGRAM FEATURES USING RANDOM FOREST ALGORITHM

INTERNATIONAL JOURNAL FOR ADVANCED RESEARCH IN SCIENCE & TECHNOLOGY ◽

10.48047/ijarst/v10/i11/02 ◽

2020 ◽

pp. 12-15

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Random Forest ◽

Cancer Detection ◽

Learning Algorithm ◽

Breast Cancer Dataset ◽

Random Forest Algorithm ◽

Training Set ◽

Cancer Dataset ◽

Breast Cells

Breast Cancer is one of the most dangerous diseases for women. This cancer occurs when some breast cells begin to grow abnormally. Machine learning is the subfield of computer science that studies programs that generalize from past experience. This project looks at classification, where an algorithm tries to predict the label for a sample. The machine learning algorithm takes many of these samples, called the training set, and builds an internal model. This built model is used to classify and predict the data. There are two classes, benign and malignant. Random Forest classifier is used to predict whether the cancer is benign or malignant. Training and testing of the model are done by Wisconsin Diagnosis Breast Cancer dataset.

Download Full-text

Classifications of Breast Cancer Diagnosis using Machine Learning

International Journal of Computers ◽

10.46300/9108.2020.14.13 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Random Forest ◽

Breast Cancer Diagnosis ◽

Performance Comparison ◽

Support Vector ◽

Breast Cancer Dataset ◽

K Nearest Neighbors ◽

Cancer Dataset ◽

Machine Learning Classification

Breast Cancer (BC) is amongst the most common and leading causes of deaths in women throughout the world. Recently, classification and data analysis tools are being widely used in the medical field for diagnosis, prognosis and decision making to help lower down the risks of people dying or suffering from diseases. Advanced machine learning methods have proven to give hope for patients as this has helped the doctors in early detection of diseases like Breast Cancer that can be fatal, in support with providing accurate outcomes. However, the results highly depend on the techniques used for feature selection and classification which will produce a strong machine learning model. In this paper, a performance comparison is conducted using four classifiers which are Multilayer Perceptron (MLP), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) and Random Forest on the Wisconsin Breast Cancer dataset to spot the most effective predictors. The main goal is to apply best machine learning classification methods to predict the Breast Cancer as benign or malignant using terms such as accuracy, f-measure, precision and recall. Experimental results show that Random forest is proven to achieve the highest accuracy of 99.26% on this dataset and features, while SVM and KNN show 97.78% and 97.04% accuracy respectively. MLP shows the least accuracy of 94.07%. All the experiments are conducted using RStudio as the data mining tool platform.

Download Full-text

Breast Cancer Prediction and Classification Using Supervised Learning Techniques

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.8924 ◽

2020 ◽

Vol 17 (6) ◽

pp. 2519-2522

Author(s):

Kalpna Guleria ◽

Avinash Sharma ◽

Umesh Kumar Lilhore ◽

Devendra Prasad

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Supervised Learning ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

World Health ◽

Breast Cancer Dataset ◽

Cancer Awareness ◽

Cancer Dataset ◽

Cancer Prediction

Approximately 2.1 million women every year are affected due to breast cancer which has become one of the major causes for cancer related deaths among women. World Health Organization’s (WHO) report 2018, reveals that around 15% of deaths among women are due to breast cancer. Lack of awareness is one of the major reason which has led to the detection of breast cancer at the later stage. Another major reason is access to limited health resources which make the problem worse. Early or timely detection of breast cancer is utmost important to increase the survival rate of the patients. World Health Organization’s (WHO) cancer awareness guidelines recommend that women aged between 40–49 years of age or 70–75 years of age must be subjected to mammographic screening which will provide the timely detection of the problem, if it persist. This article uses Breast Cancer dataset from UCI machine learning repository to predict and diagnose the class of breast cancer: benign or malignant by using supervised learning. Supervised machine learning algorithms: KNearest Neighbor (K-NN), Naive Bayes, logistic regression and decision tree have been utilized for breast cancer prediction. The performance evaluation of these classification algorithms is done based on various performance measures: accuracy, sensitivity, specificity and F -measure.

Download Full-text

Pendekatan Machine Learning yang Efisien untuk Prediksi Kanker Payudara

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v3i3.1347 ◽

2019 ◽

Vol 3 (3) ◽

pp. 458-469

Author(s):

Azminuddin I. S. Azis ◽

Irma Surya Kumala Idris ◽

Budy Santoso ◽

Yasin Aril Mustofa

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Particle Swarm Optimization ◽

Nearest Neighbor ◽

Breast Cancer Dataset ◽

Z Score ◽

Cancer Dataset ◽

Swarm Optimization ◽

Cancer Prediction ◽

Machine Learning Methods

Breast Cancer is the most common cancer found in women and the death rate is still in second place among other cancers. The high accuracy of the machine learning approach that has been proposed by related studies is often achieved. However, without efficient pre-processing, the model of Breast Cancer prediction that was proposed is still in question. Therefore, this research objective to improve the accuracy of machine learning methods through pre-processing: Missing Value Replacement, Data Transformation, Smoothing Noisy Data, Feature Selection / Attribute Weighting, Data Validation, and Unbalanced Class Reduction which is more efficient for Breast Cancer prediction. The results of this study propose several approaches: C4.5 - Z-Score - Genetic Algorithm for Breast Cancer Dataset with 77,27% accuracy, 7-Nearest Neighbor - Min-Max Normalization - Particle Swarm Optimization for Wisconsin Breast Cancer Dataset - Original with 97,85% accuracy, Artificial Neural Network - Z-Score - Forward Selection for Wisconsin Breast Cancer Dataset - Diagnostics with 98,24% accuracy, and 11-Nearest Neighbor - Min-Max Normalization - Particle Swarm Optimization for Wisconsin Breast Cancer Dataset - Prognostic with 83,33% accuracy. The performance of these approaches is better than standard/normal machine learning methods and the proposed methods by the best of previous related studies.

Download Full-text

A Hybrid of Random Over Sample Examples and Boosted C5.0 Algorithms for Breast Cancer Diagnosis on Imbalanced Data

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2020.3201 ◽

2020 ◽

Vol 10 (11) ◽

pp. 2686-2692

Author(s):

Jianxue Tian ◽

Jue Zhang ◽

Xiaofen Tang ◽

Ting Dong

Keyword(s):

Breast Cancer ◽

Cancer Diagnosis ◽

Imbalanced Data ◽

Breast Cancer Diagnosis ◽

Clinical Decision ◽

Ensemble Classifier ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Cancer Prediction ◽

Imbalanced Data Classification

To surmount the two-class imbalanced problem existing in the breast cancer diagnosis, a hybrid method of ROSE sampling approach with Boosted C5.0 ensemble classifier (R-Boosted C5.0) is proposed. ROSE as the sampling method is utilized to balance the class distribution. Boosted C5.0 is then used as the classifier. To serve this purpose, Wisconsin Breast Cancer Dataset (WBCD), Wisconsin Diagnosis Breast Cancer (WDBC) and three imbalanced datasets have been studied. Assessing by Matthews Correlation Coefficient (MCC), the performance of proposed method on WBCD and WDBC datasets were 98.5% and 93.0%, respectively. The experimental results show that the proposed work outperforms in contrast with the rest of the classifiers. It can be used as a clinical decision support system to assist breast cancer prediction. In practice, the proposed methodology can be further applied to class imbalanced data classification.

Download Full-text

Predicting Breast Cancer using Modern Data Science Methodology

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j1077.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 4444-4446

Keyword(s):

Breast Cancer ◽

Data Science ◽

Hormone Replacement ◽

World Health ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Learning Technique ◽

Health Organization ◽

Science Methodology ◽

Late Childbearing

Breast Cancer is the mass occurring cancer in women according to the World Health Organization(WHO), But the early prediction of breast cancer helps in the recovery for the effected one's. Reasons for breast cancer were Hormone replacement therapy or getting explore to harmful radioactive rays and due to late childbearing. The aim is to diagnose cancer by using a machine learning technique, Random Forest, for accurate solutions. The dataset we used is the Wisconsin Breast Cancer dataset. The output which the error rate was only about "0.0177"

Download Full-text

RFSVM: A Novel Classification Technique for Breast Cancer Diagnosis

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l2808.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 3295-3305

Keyword(s):

Breast Cancer ◽

Support Vector Machine ◽

Random Forest ◽

Early Stage ◽

Breast Cancer Diagnosis ◽

Support Vector ◽

Breast Cancer Dataset ◽

Limited Data ◽

Cancer Dataset ◽

Cancerous Cell

Cancer is a disease, which develops, in human body due to gene mutation. Due to various factor cells turn into cancerous cell and grow rapidly while damaging normal cells. Many women get affected by breast cancer, which might even cause death if not treated at early stage. Early detection of breast cancer is highly important to increase the survival rate. Machine learning methods and technologies are making it possible to classify and detect the class in an accurate manner. Among other classifiers, random forest and support vector machine are two classifiers that have a good classification power. In this, research a combination of these two classifier i.e. Random Forest and Support Vector Machine (RFSVM) is proposed for early diagnosis of breast cancer cell using Wisconsin Breast Cancer Dataset (WBCD). Using different train-test data ratio experiments are performed and an average of more than 98percentage accuracy is achieved using this hybrid classifier. This paper overcomes the over-fitting problem of random forest and the need of tuning the parameters of Support Vector Machine. Even with limited data available, the classifier tunes its parameters so well to give a highly accurate result.

Download Full-text

Prediction of Breast Cancer using Decision tree and Random Forest Algorithm

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i2.226229 ◽

2018 ◽

Vol 6 (2) ◽

pp. 226-229

Author(s):

N.Sridevi . ◽

◽

S.Anitha . ◽

Keyword(s):

Breast Cancer ◽

Random Forest ◽

Decision Tree ◽

Random Forest Algorithm

Download Full-text

Prediction of benign and malignant breast cancer using data mining techniques

Journal of Algorithms & Computational Technology ◽

10.1177/1748301818756225 ◽

2018 ◽

Vol 12 (2) ◽

pp. 119-126 ◽

Cited By ~ 43

Author(s):

Vikas Chaurasia ◽

Saurabh Pal ◽

BB Tiwari

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Low Income ◽

Prediction Models ◽

Naive Bayes ◽

Naïve Bayes ◽

Low Income Countries ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Rbf Network

Breast cancer is the second most leading cancer occurring in women compared to all other cancers. Around 1.1 million cases were recorded in 2004. Observed rates of this cancer increase with industrialization and urbanization and also with facilities for early detection. It remains much more common in high-income countries but is now increasing rapidly in middle- and low-income countries including within Africa, much of Asia, and Latin America. Breast cancer is fatal in under half of all cases and is the leading cause of death from cancer in women, accounting for 16% of all cancer deaths worldwide. The objective of this research paper is to present a report on breast cancer where we took advantage of those available technological advancements to develop prediction models for breast cancer survivability. We used three popular data mining algorithms (Naïve Bayes, RBF Network, J48) to develop the prediction models using a large dataset (683 breast cancer cases). We also used 10-fold cross-validation methods to measure the unbiased estimate of the three prediction models for performance comparison purposes. The results (based on average accuracy Breast Cancer dataset) indicated that the Naïve Bayes is the best predictor with 97.36% accuracy on the holdout sample (this prediction accuracy is better than any reported in the literature), RBF Network came out to be the second with 96.77% accuracy, J48 came out third with 93.41% accuracy.

Download Full-text

PERFORMANCE ANALYSIS OF BREAST CANCER CLASSIFICATION USING DECISION TREE CLASSIFIERS

International Journal of Current Pharmaceutical Research ◽

10.22159/ijcpr.2017v9i2.17383 ◽

2017 ◽

Vol 9 (2) ◽

pp. 19 ◽

Cited By ~ 6

Author(s):

P. Hamsagayathri ◽

P. Sampath

Keyword(s):

Breast Cancer ◽

Decision Tree ◽

Ductal Carcinoma ◽

Research Work ◽

The United States ◽

Breast Cancer Dataset ◽

Decision Tree Classifier ◽

Cancer Dataset ◽

Term Survival ◽

Tree Classifier

Breast cancer is one of the dangerous cancers among world’s women above 35 y. The breast is made up of lobules that secrete milk and thin milk ducts to carry milk from lobules to the nipple. Breast cancer mostly occurs either in lobules or in milk ducts. The most common type of breast cancer is ductal carcinoma where it starts from ducts and spreads across the lobules and surrounding tissues. According to the medical survey, each year there are about 125.0 per 100,000 new cases of breast cancer are diagnosed and 21.5 per 100,000 women due to this disease in the United States. Also, 246,660 new cases of women with cancer are estimated for the year 2016. Early diagnosis of breast cancer is a key factor for long-term survival of cancer patients. Classification plays an important role in breast cancer detection and used by researchers to analyse and classify the medical data. In this research work, priority-based decision tree classifier algorithm has been implemented for Wisconsin Breast cancer dataset. This paper analyzes the different decision tree classifier algorithms for Wisconsin original, diagnostic and prognostic dataset using WEKA software. The performance of the classifiers are evaluated against the parameters like accuracy, Kappa statistic, Entropy, RMSE, TP Rate, FP Rate, Precision, Recall, F-Measure, ROC, Specificity, Sensitivity.

Download Full-text