Exploring the performance of feature selection method using breast cancer dataset

Breast cancer is the most common type of cancer occurring mostly in females. In recent years, many researchers have devoted to automate diagnosis of breast cancer by developing different machine learning model. However, the quality and quantity of feature in breast cancer diagnostic dataset have significant effect on the accuracy and efficiency of predictive model. Feature selection is effective method for reducing the dimensionality and improving the accuracy of predictive model. The use of feature selection is to determine feature required for training model and to remove irrelevant and duplicate feature. Duplicate feature is a feature that is highly correlated to another feature. The objective of this study is to conduct experimental research on three different feature selection methods for breast cancer prediction. Sequential, embedded and chi-square feature selection are implemented using breast cancer diagnostic dataset. The study compares the performance of sequential embedded and chi-square feature selection on test set. The experimental result evidently shows that sequential feature selection outperforms as compared to chi-square (X2) statistics and embedded feature selection. Overall, sequential feature selection achieves better accuracy of 98.3% as compared to chi-square (X2) statistics and embedded feature selection.

Download Full-text

A Kernel Based Feature Selection Method Used in the Diagnosis of Wisconsin Breast Cancer Dataset

Advances in Computing and Communications - Communications in Computer and Information Science ◽

10.1007/978-3-642-22709-7_66 ◽

2011 ◽

pp. 683-690 ◽

Cited By ~ 2

Author(s):

P. Jaganathan ◽

N. Rajkumar ◽

R. Nagalakshmi

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Breast Cancer Dataset ◽

Cancer Dataset

Download Full-text

Comparison of microarray breast cancer classification using support vector machine and logistic regression with LASSO and boruta feature selection

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v20.i2.pp712-719 ◽

2020 ◽

Vol 20 (2) ◽

pp. 712

Author(s):

Nursabillilah Mohd Ali ◽

Nor Azlina Ab Aziz ◽

Rosli Besar

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Feature Selection Method ◽

Diagnostic System ◽

Cancer Classification ◽

Support Vector ◽

Breast Cancer Dataset ◽

Breast Cancer Patients ◽

Cancer Dataset ◽

Breast Cancer Classification

Breast cancer is the most frequent cancer diagnosis amongst women worldwide. Despite the advancement of medical diagnostic and prognostic tools for early detection and treatment of breast cancer patients, research on development of better and more reliable tools is still actively conducted globally. The breast cancer classification is significantly important in ensuring reliable diagnostic system. Preliminary research on the usage of machine learning classifier and feature selection method for breast cancer classification is conducted here. Two feature selection methods namely Boruta and LASSO and SVM and LR classifier are studied. A breast cancer dataset from GEO web is adopted in this study. The findings show that LASSO with LR gives the best accuracy using this dataset.

Download Full-text

Design of novel multi filter union feature selection framework for breast cancer dataset

Concurrent Engineering ◽

10.1177/1063293x211016046 ◽

2021 ◽

pp. 1063293X2110160

Author(s):

Dinesh Morkonda Gunasekaran ◽

Prabha Dhandayudam

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Care Center ◽

Feature Selection Method ◽

Selection Method ◽

Cancer Center ◽

Breast Cancer Dataset ◽

Data Set ◽

Health Care Center ◽

Cancer Data

Nowadays women are commonly diagnosed with breast cancer. Feature based Selection method plays an important step while constructing a classification based framework. We have proposed Multi filter union (MFU) feature selection method for breast cancer data set. The feature selection process based on random forest algorithm and Logistic regression (LG) algorithm based union model is used for selecting important features in the dataset. The performance of the data analysis is evaluated using optimal features subset from selected dataset. The experiments are computed with data set of Wisconsin diagnostic breast cancer center and next the real data set from women health care center. The result of the proposed approach shows high performance and efficient when comparing with existing feature selection algorithms.

Download Full-text

Binary Duck Travel Optimization Algorithm for Feature Selection in Breast Cancer Dataset Problem

IOT with Smart Systems - Smart Innovation, Systems and Technologies ◽

10.1007/978-981-16-3945-6_17 ◽

2022 ◽

pp. 157-167

Author(s):

Krishnaveni Arumugam ◽

Shankar Ramasamy ◽

Duraisamy Subramani

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Optimization Algorithm ◽

Breast Cancer Dataset ◽

Cancer Dataset

Download Full-text

A Comparative Analysis of Feature Selection Methods and Associated Machine Learning Algorithms on Wisconsin Breast Cancer Dataset (WBCD)

Advances in Intelligent Systems and Computing - Proceedings of International Conference on ICT for Sustainable Development ◽

10.1007/978-981-10-0129-1_23 ◽

2016 ◽

pp. 215-224 ◽

Cited By ~ 3

Author(s):

Nileshkumar Modi ◽

Kaushar Ghanchi

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Feature Selection ◽

Comparative Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Breast Cancer Dataset ◽

Selection Methods ◽

Cancer Dataset

Download Full-text

Feature selection using Linear Discriminant Analysis for breast cancer dataset

2018 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC) ◽

10.1109/iccic.2018.8782399 ◽

2018 ◽

Author(s):

B.M. Gayathri ◽

C.P. Sumathi

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Discriminant Analysis ◽

Linear Discriminant Analysis ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Linear Discriminant

Download Full-text

Predictive Modeling for Classification of Breast Cancer Dataset Using Feature Selection Techniques

Handbook of Research on Innovations and Applications of AI, IoT, and Cognitive Technologies - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-6870-5.ch015 ◽

2021 ◽

pp. 204-215

Author(s):

Leena Nesamani S. ◽

S. Nirmala Sigirtha Rajini

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Feature Selection ◽

Mutual Information ◽

Predictive Modeling ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Chi Squared ◽

Feature Selection Techniques

Predictive modeling or predict analysis is the process of trying to predict the outcome from data using machine learning models. The quality of the output predominantly depends on the quality of the data that is provided to the model. The process of selecting the best choice of input to a machine learning model depends on a variety of criteria and is referred to as feature engineering. The work is conducted to classify the breast cancer patients into either the recurrence or non-recurrence category. A categorical breast cancer dataset is used in this work from which the best set of features is selected to make accurate predictions. Two feature selection techniques, namely the chi-squared technique and the mutual information technique, have been used. The selected features were then used by the logistic regression model to make the final prediction. It was identified that the mutual information technique proved to be more efficient and produced higher accuracy in the predictions.

Download Full-text

Deep Learning Hybrid with Binary Dragonfly Feature Selection for the Wisconsin Breast Cancer Dataset

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2021.0120314 ◽

2021 ◽

Vol 12 (3) ◽

Author(s):

Marian Mamdouh Ibrahim ◽

Dina Ahmed ◽

Rania Ahmed

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Deep Learning ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Selection For

Download Full-text

A Robust Gene selection Method for Microarray-based Cancer Classification

Cancer Informatics ◽

10.4137/cin.s3794 ◽

2010 ◽

Vol 9 ◽

pp. CIN.S3794 ◽

Cited By ~ 21

Author(s):

Xiaosheng Wang ◽

Osamu Gotoh

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Selection ◽

Information Gain ◽

Expression Profiles ◽

Feature Selection Method ◽

Gene Expression Profiles ◽

Molecular Classification ◽

Selection Method ◽

Chi Square

Gene selection is of vital importance in molecular classification of cancer using high-dimensional gene expression data. Because of the distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and robust feature selection methods is extremely crucial. We investigated the properties of one feature selection approach proposed in our previous work, which was the generalization of the feature selection method based on the depended degree of attribute in rough sets. We compared the feature selection method with the established methods: the depended degree, chi-square, information gain, Relief-F and symmetric uncertainty, and analyzed its properties through a series of classification experiments. The results revealed that our method was superior to the canonical depended degree of attribute based method in robustness and applicability. Moreover, the method was comparable to the other four commonly used methods. More importantly, the method can exhibit the inherent classification difficulty with respect to different gene expression datasets, indicating the inherent biology of specific cancers.

Download Full-text

Prediction of benign and malignant breast cancer using data mining techniques

Journal of Algorithms & Computational Technology ◽

10.1177/1748301818756225 ◽

2018 ◽

Vol 12 (2) ◽

pp. 119-126 ◽

Cited By ~ 43

Author(s):

Vikas Chaurasia ◽

Saurabh Pal ◽

BB Tiwari

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Low Income ◽

Prediction Models ◽

Naive Bayes ◽

Naïve Bayes ◽

Low Income Countries ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Rbf Network

Breast cancer is the second most leading cancer occurring in women compared to all other cancers. Around 1.1 million cases were recorded in 2004. Observed rates of this cancer increase with industrialization and urbanization and also with facilities for early detection. It remains much more common in high-income countries but is now increasing rapidly in middle- and low-income countries including within Africa, much of Asia, and Latin America. Breast cancer is fatal in under half of all cases and is the leading cause of death from cancer in women, accounting for 16% of all cancer deaths worldwide. The objective of this research paper is to present a report on breast cancer where we took advantage of those available technological advancements to develop prediction models for breast cancer survivability. We used three popular data mining algorithms (Naïve Bayes, RBF Network, J48) to develop the prediction models using a large dataset (683 breast cancer cases). We also used 10-fold cross-validation methods to measure the unbiased estimate of the three prediction models for performance comparison purposes. The results (based on average accuracy Breast Cancer dataset) indicated that the Naïve Bayes is the best predictor with 97.36% accuracy on the holdout sample (this prediction accuracy is better than any reported in the literature), RBF Network came out to be the second with 96.77% accuracy, J48 came out third with 93.41% accuracy.

Download Full-text