Breast cancer prediction model with decision tree and adaptive boosting

In this study, breast cancer prediction model is proposed with decision tree and adaptive boosting (Adboost). Furthermore, an extensive experimental evaluation of the predictive performance of the proposed model is conducted. The study is conducted on breast cancer dataset collected form the kaggle data repository. The dataset consists of 569 observations of which the 212 or 37.25% are benign or breast cancer negative and 62.74% are malignant or breast cancer positive. The class distribution shows that, the dataset is highly imbalanced and a learning algorithm such as decision tree is biased to the benign observation and results in poor performance on predicting the malignant observation. To improve the performance of the decision tree on the malignant observation, boosting algorithm namely, the adaptive boosting is employed. Finally, the predictive performance of the decision tree and adaptive boosting is analyzed. The analysis on predictive performance of the model on the kaggle breast cancer data repository shows that, adaptive boosting has 92.53% accuracy and the accuracy of decision tree is 88.80%, Overall, the adaboost algorithm performed better than decision tree.

Download Full-text

Breast Cancer Prediction using Decision Tree

Journal of Physics Conference Series ◽

10.1088/1742-6596/1916/1/012069 ◽

2021 ◽

Vol 1916 (1) ◽

pp. 012069

Author(s):

J S Ravi Shankar ◽

S Nithish ◽

M Nithish Babu ◽

R Karthik ◽

A Shahid Afridi

Keyword(s):

Breast Cancer ◽

Decision Tree ◽

Cancer Prediction

Download Full-text

Design of novel multi filter union feature selection framework for breast cancer dataset

Concurrent Engineering ◽

10.1177/1063293x211016046 ◽

2021 ◽

pp. 1063293X2110160

Author(s):

Dinesh Morkonda Gunasekaran ◽

Prabha Dhandayudam

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Care Center ◽

Feature Selection Method ◽

Selection Method ◽

Cancer Center ◽

Breast Cancer Dataset ◽

Data Set ◽

Health Care Center ◽

Cancer Data

Nowadays women are commonly diagnosed with breast cancer. Feature based Selection method plays an important step while constructing a classification based framework. We have proposed Multi filter union (MFU) feature selection method for breast cancer data set. The feature selection process based on random forest algorithm and Logistic regression (LG) algorithm based union model is used for selecting important features in the dataset. The performance of the data analysis is evaluated using optimal features subset from selected dataset. The experiments are computed with data set of Wisconsin diagnostic breast cancer center and next the real data set from women health care center. The result of the proposed approach shows high performance and efficient when comparing with existing feature selection algorithms.

Download Full-text

PERFORMANCE ANALYSIS OF BREAST CANCER CLASSIFICATION USING DECISION TREE CLASSIFIERS

International Journal of Current Pharmaceutical Research ◽

10.22159/ijcpr.2017v9i2.17383 ◽

2017 ◽

Vol 9 (2) ◽

pp. 19 ◽

Cited By ~ 6

Author(s):

P. Hamsagayathri ◽

P. Sampath

Keyword(s):

Breast Cancer ◽

Decision Tree ◽

Ductal Carcinoma ◽

Research Work ◽

The United States ◽

Breast Cancer Dataset ◽

Decision Tree Classifier ◽

Cancer Dataset ◽

Term Survival ◽

Tree Classifier

Breast cancer is one of the dangerous cancers among world’s women above 35 y. The breast is made up of lobules that secrete milk and thin milk ducts to carry milk from lobules to the nipple. Breast cancer mostly occurs either in lobules or in milk ducts. The most common type of breast cancer is ductal carcinoma where it starts from ducts and spreads across the lobules and surrounding tissues. According to the medical survey, each year there are about 125.0 per 100,000 new cases of breast cancer are diagnosed and 21.5 per 100,000 women due to this disease in the United States. Also, 246,660 new cases of women with cancer are estimated for the year 2016. Early diagnosis of breast cancer is a key factor for long-term survival of cancer patients. Classification plays an important role in breast cancer detection and used by researchers to analyse and classify the medical data. In this research work, priority-based decision tree classifier algorithm has been implemented for Wisconsin Breast cancer dataset. This paper analyzes the different decision tree classifier algorithms for Wisconsin original, diagnostic and prognostic dataset using WEKA software. The performance of the classifiers are evaluated against the parameters like accuracy, Kappa statistic, Entropy, RMSE, TP Rate, FP Rate, Precision, Recall, F-Measure, ROC, Specificity, Sensitivity.

Download Full-text

A Comparison Study of Goodness of Fit Tests of Logistic Regression in R: Simulation and Application to Breast Cancer Data

Academic Journal of Applied Mathematical Sciences ◽

10.32861/ajams.71.50.59 ◽

2020 ◽

pp. 50-59

Author(s):

El-Housainy A. Rady ◽

Mohamed R. Abonazel ◽

Mariam H. Metawe’e

Keyword(s):

Breast Cancer ◽

Logistic Regression ◽

Sample Size ◽

Null Hypothesis ◽

Goodness Of Fit ◽

Quadratic Term ◽

Breast Cancer Dataset ◽

Cancer Data ◽

Interaction Term ◽

Test Package

Goodness of fit (GOF) tests of logistic regression attempt to find out the suitability of the model to the data. The null hypothesis of all GOF tests is the model fit. R as a free software package has many GOF tests in different packages. A Monte Carlo simulation has been conducted to study two situations; the first, studying the ability of each test, under its default settings, to accept the null hypothesis when the model truly fitted. The second, studying the power of these tests when assumptions of sufficient linear combination of the explanatory variables are violated (by omitting linear covariate term, quadratic term, or interaction term). Moreover, checking whether the same test in different R packages had the same results or not. As the sample size supposed to affect simulation results, so the pattern of change of GOF tests results under different sample sizes as well as different model settings was estimated. All tests accept the null hypothesis (more than 95% of simulation trials) when the model truly fitted except modified Hosmer-Lemeshow test in "LogisticDx" package under all different model settings and Osius and Rojek’s (OsRo) test when the true model had an interaction term between binary and categorical covariates. In addition, le Cessie-van Houwelingen-Copas-Hosmer unweighted sum of squares (CHCH) test gave unexpected different results under different packages. Concerning the power study, all tests had a very low power when a departure of missing covariate existed. Generally, stukel’s test (package ’LogisticDX) and CHCH test (package "RMS") reached a power in detecting a missing quadratic term greater than 80% under lower sample size while OsRo test (package ’LogisticDX’) was better in detecting missing interaction term. Beside the simulation study, we evaluated the performance of GOF tests using the breast cancer dataset.

Download Full-text

Breast Cancer Prediction using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8292.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 4879-4881

Keyword(s):

Breast Cancer ◽

Random Forest ◽

Data Science ◽

Breast Cancer Dataset ◽

Random Forest Algorithm ◽

Medical Field ◽

Cancer Dataset ◽

Cancer Prediction ◽

Time Consumption ◽

Simulated Environment

One of the most dreadful disease is breast cancer and it has a potential cause for death in women. Every year, death rate increases drastically due to breast cancer. An effective way to classify data is through classification or data mining. This becomes very handy, especially in the medical field where diagnosis and analysis are done through these techniques. Wisconsin Breast cancer dataset is used to perform a comparison between SVM, Logistic Regression, Naïve Bayes and Random Forest. Evaluating the correctness in classifying data based on accuracy and time consumption is used to determine the efficiency of the algorithms, which is the main objective. Based on the result of performed experiments, the Random Forest algorithm shows the highest accuracy (99.76%) with the least error rate. ANACONDA Data Science Platform is used to execute all the experiments in a simulated environment.

Download Full-text

Dimensionality Reduction using PCA and K-Means Clustering for Breast Cancer Prediction

Lontar Komputer Jurnal Ilmiah Teknologi Informasi ◽

10.24843/lkjiti.2018.v09.i03.p08 ◽

2018 ◽

pp. 192 ◽

Cited By ~ 2

Author(s):

Ade Jamal ◽

Annisa Handayani ◽

Ali Akbar Septiandri ◽

Endang Ripmiatin ◽

Yunus Effendi

Keyword(s):

Breast Cancer ◽

Principal Component Analysis ◽

Dimensionality Reduction ◽

Principal Component ◽

Component Analysis ◽

Gradient Boosting ◽

Support Vector ◽

Breast Cancer Dataset ◽

Cancer Prediction ◽

Extreme Gradient Boosting

Breast cancer is the most important cause of death among women. A prediction of breast cancer in early stage provides a greater possibility of its cure. It needs a breast cancer prediction tool that can classify a breast tumor whether it was a harmful malignant tumor or un-harmful benign tumor. In this paper, two algorithms of machine learning, namely Support Vector Machine and Extreme Gradient Boosting technique will be compared for classification purpose. Prior to the classification, the number of data attribute will be reduced from the raw data by extracting features using Principal Component Analysis. A clustering method, namely K-Means is also used for dimensionality reduction besides the Principal Component Analysis. This paper will present a comparison among four models based on two dimensionality reduction methods combined with two classifiers which applied on Wisconsin Breast Cancer Dataset. The comparison will be measured by using accuracy, sensitivity and specificity metrics evaluated from the confusion matrices. The experimental results have indicated that the K-Means method, which is not usually used for dimensionality reduction can perform well compared to the popular Principal Component Analysis.

Download Full-text

Predicting Breast Cancer Using Logistic Regression and Multi-Class Classifiers

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.20.22115 ◽

2018 ◽

Vol 7 (4.20) ◽

pp. 22 ◽

Cited By ~ 4

Author(s):

Jabeen Sultana ◽

Abdul Khader Jilani ◽

. .

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Logistic Regression ◽

Regression Method ◽

Breast Cancer Dataset ◽

Breast Cancer Data ◽

Data Set ◽

Cancer Data ◽

Logistic Regression Method ◽

Simple Logistic

The primary identification and prediction of type of the cancer ought to develop a compulsion in cancer study, in order to assist and supervise the patients. The significance of classifying cancer patients into high or low risk clusters needs commanded many investigation teams, from the biomedical and the bioinformatics area, to learn and analyze the application of machine learning (ML) approaches. Logistic Regression method and Multi-classifiers has been proposed to predict the breast cancer. To produce deep predictions in a new environment on the breast cancer data. This paper explores the different data mining approaches using Classification which can be applied on Breast Cancer data to build deep predictions. Besides this, this study predicts the best Model yielding high performance by evaluating dataset on various classifiers. In this paper Breast cancer dataset is collected from the UCI machine learning repository has 569 instances with 31 attributes. Data set is pre-processed first and fed to various classifiers like Simple Logistic-regression method, IBK, K-star, Multi-Layer Perceptron (MLP), Random Forest, Decision table, Decision Trees (DT), PART, Multi-Class Classifiers and REP Tree. 10-fold cross validation is applied, training is performed so that new Models are developed and tested. The results obtained are evaluated on various parameters like Accuracy, RMSE Error, Sensitivity, Specificity, F-Measure, ROC Curve Area and Kappa statistic and time taken to build the model. Result analysis reveals that among all the classifiers Simple Logistic Regression yields the deep predictions and obtains the best model yielding high and accurate results followed by other methods IBK: Nearest Neighbor Classifier, K-Star: instance-based Classifier, MLP- Neural network. Other Methods obtained less accuracy in comparison with Logistic regression method.

Download Full-text

Ensemble Comparative Study for Diagnosis of Breast Cancer Datasets

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.15.23007 ◽

2018 ◽

Vol 7 (4.15) ◽

pp. 281

Author(s):

Bibhuprasad Sahu ◽

Sujata Dash ◽

Sachi Nandan Mohanty ◽

Saroj Kumar Rout

Keyword(s):

Breast Cancer ◽

Neural Network ◽

Early Stage ◽

Breast Cancer Dataset ◽

Classification Rate ◽

Cancer Dataset ◽

Cancer Data ◽

Cad System ◽

Result Analysis ◽

Sensitivity Specificity

Every disease is curable if a little amount of human effort is applied for early diagnosis. The death rate in world increases day by day as patient fail to detect it before it becomes chronic. Breast cancer is curable if detection is done at early stage before it spread across all part of body. Now-a-days computer aided diagnosis are automated assistance for the doctors to produce accurate prediction about the stage of disease. This study provided CAD system for diagnosis of breast cancer. This method uses Neural Network (NN) as a classifier model and PCA/LDA for dimension reduction method to attain higher classification rate. Multiple layers of neural network are applied to classify the breast cancer data. This system experiment done on Wisconsin breast cancer dataset (WBCD) from UCI repository. The dataset is divided into 2 parts train and test. With the result of accuracy, sensitivity, specificity, precision and recall the performance can be measured. The results obtained are this study is 97% using ANN and PCA-ANN, which is better than other state-of-art methods. As per the result analysis this system outperformed then the existing system.

Download Full-text

Breast Cancer Prediction and Classification Using Supervised Learning Techniques

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.8924 ◽

2020 ◽

Vol 17 (6) ◽

pp. 2519-2522

Author(s):

Kalpna Guleria ◽

Avinash Sharma ◽

Umesh Kumar Lilhore ◽

Devendra Prasad

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Supervised Learning ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

World Health ◽

Breast Cancer Dataset ◽

Cancer Awareness ◽

Cancer Dataset ◽

Cancer Prediction

Approximately 2.1 million women every year are affected due to breast cancer which has become one of the major causes for cancer related deaths among women. World Health Organization’s (WHO) report 2018, reveals that around 15% of deaths among women are due to breast cancer. Lack of awareness is one of the major reason which has led to the detection of breast cancer at the later stage. Another major reason is access to limited health resources which make the problem worse. Early or timely detection of breast cancer is utmost important to increase the survival rate of the patients. World Health Organization’s (WHO) cancer awareness guidelines recommend that women aged between 40–49 years of age or 70–75 years of age must be subjected to mammographic screening which will provide the timely detection of the problem, if it persist. This article uses Breast Cancer dataset from UCI machine learning repository to predict and diagnose the class of breast cancer: benign or malignant by using supervised learning. Supervised machine learning algorithms: KNearest Neighbor (K-NN), Naive Bayes, logistic regression and decision tree have been utilized for breast cancer prediction. The performance evaluation of these classification algorithms is done based on various performance measures: accuracy, sensitivity, specificity and F -measure.

Download Full-text

Breast Cancer Diagnosis Using an Ensemble Transfer Support Vector Machine

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2021.3260 ◽

2021 ◽

Vol 11 (2) ◽

pp. 332-336

Author(s):

Lifang Peng ◽

Kefu Chen ◽

Bin Huang ◽

Leyuan Zhou

Keyword(s):

Breast Cancer ◽

Support Vector Machine ◽

Support Vector ◽

Breast Cancer Data ◽

Generalization Performance ◽

Source Domain ◽

Cancer Data ◽

Adaboost Algorithm ◽

Svm Algorithm ◽

Ensemble Strategy

As the number of breast cancer patients increases and the age of onset is younger, early detection and prevention have become the key to prevention and treatment of breast cancer. At present, many classification or clustering algorithms are used to diagnose breast cancer data. However, these algorithms directly lose the minimum source domain information, resulting in a significant improvement in the recognition rate. Based on this, this paper proposes an ensemble transfer support vector machine (ET-SVM) algorithm based on classic support vector machine (SVM). The algorithm can effectively use the knowledge in the source domain to guide the learning of the target task. The result of a single SVM is usually the local optimal solution. And its performance is unstable and its generalization performance is poor. Therefore, this article introduces an ensemble strategy based on AdaBoost algorithm. Experiments on the Wisconsin breast cancer data set proved that the proposed ET-SVM algorithm can achieve better classification results and good generalization performance.

Download Full-text