scholarly journals Exploring the performance of feature selection method using breast cancer dataset

Author(s):  
Tsehay Admassu Assegie ◽  
Ravulapalli Lakshmi Tulasi ◽  
Vadivel Elanangai ◽  
Napa Komal Kumar

Breast cancer is the most common type of cancer occurring mostly in females. In recent years, many researchers have devoted to automate diagnosis of breast cancer by developing different machine learning model. However, the quality and quantity of feature in breast cancer diagnostic dataset have significant effect on the accuracy and efficiency of predictive model. Feature selection is effective method for reducing the dimensionality and improving the accuracy of predictive model. The use of feature selection is to determine feature required for training model and to remove irrelevant and duplicate feature. Duplicate feature is a feature that is highly correlated to another feature. The objective of this study is to conduct experimental research on three different feature selection methods for breast cancer prediction. Sequential, embedded and chi-square feature selection are implemented using breast cancer diagnostic dataset. The study compares the performance of sequential embedded and chi-square feature selection on test set. The experimental result evidently shows that sequential feature selection outperforms as compared to chi-square (X<sup>2</sup>) statistics and embedded feature selection. Overall, sequential feature selection achieves better accuracy of 98.3% as compared to chi-square (X<sup>2</sup>) statistics and embedded feature selection.

Author(s):  
Nursabillilah Mohd Ali ◽  
Nor Azlina Ab Aziz ◽  
Rosli Besar

<p>Breast cancer is the most frequent cancer diagnosis amongst women worldwide. Despite the advancement of medical diagnostic and prognostic tools for early detection and treatment of breast cancer patients, research on development of better and more reliable tools is still actively conducted globally. The breast cancer classification is significantly important in ensuring reliable diagnostic system. Preliminary research on the usage of machine learning classifier and feature selection method for breast cancer classification is conducted here. Two feature selection methods namely Boruta and LASSO and SVM and LR classifier are studied. A breast cancer dataset from GEO web is adopted in this study. The findings show that LASSO with LR gives the best accuracy using this dataset.</p>


2021 ◽  
pp. 1063293X2110160
Author(s):  
Dinesh Morkonda Gunasekaran ◽  
Prabha Dhandayudam

Nowadays women are commonly diagnosed with breast cancer. Feature based Selection method plays an important step while constructing a classification based framework. We have proposed Multi filter union (MFU) feature selection method for breast cancer data set. The feature selection process based on random forest algorithm and Logistic regression (LG) algorithm based union model is used for selecting important features in the dataset. The performance of the data analysis is evaluated using optimal features subset from selected dataset. The experiments are computed with data set of Wisconsin diagnostic breast cancer center and next the real data set from women health care center. The result of the proposed approach shows high performance and efficient when comparing with existing feature selection algorithms.


Author(s):  
Leena Nesamani S. ◽  
S. Nirmala Sigirtha Rajini

Predictive modeling or predict analysis is the process of trying to predict the outcome from data using machine learning models. The quality of the output predominantly depends on the quality of the data that is provided to the model. The process of selecting the best choice of input to a machine learning model depends on a variety of criteria and is referred to as feature engineering. The work is conducted to classify the breast cancer patients into either the recurrence or non-recurrence category. A categorical breast cancer dataset is used in this work from which the best set of features is selected to make accurate predictions. Two feature selection techniques, namely the chi-squared technique and the mutual information technique, have been used. The selected features were then used by the logistic regression model to make the final prediction. It was identified that the mutual information technique proved to be more efficient and produced higher accuracy in the predictions.


2010 ◽  
Vol 9 ◽  
pp. CIN.S3794 ◽  
Author(s):  
Xiaosheng Wang ◽  
Osamu Gotoh

Gene selection is of vital importance in molecular classification of cancer using high-dimensional gene expression data. Because of the distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and robust feature selection methods is extremely crucial. We investigated the properties of one feature selection approach proposed in our previous work, which was the generalization of the feature selection method based on the depended degree of attribute in rough sets. We compared the feature selection method with the established methods: the depended degree, chi-square, information gain, Relief-F and symmetric uncertainty, and analyzed its properties through a series of classification experiments. The results revealed that our method was superior to the canonical depended degree of attribute based method in robustness and applicability. Moreover, the method was comparable to the other four commonly used methods. More importantly, the method can exhibit the inherent classification difficulty with respect to different gene expression datasets, indicating the inherent biology of specific cancers.


2018 ◽  
Vol 12 (2) ◽  
pp. 119-126 ◽  
Author(s):  
Vikas Chaurasia ◽  
Saurabh Pal ◽  
BB Tiwari

Breast cancer is the second most leading cancer occurring in women compared to all other cancers. Around 1.1 million cases were recorded in 2004. Observed rates of this cancer increase with industrialization and urbanization and also with facilities for early detection. It remains much more common in high-income countries but is now increasing rapidly in middle- and low-income countries including within Africa, much of Asia, and Latin America. Breast cancer is fatal in under half of all cases and is the leading cause of death from cancer in women, accounting for 16% of all cancer deaths worldwide. The objective of this research paper is to present a report on breast cancer where we took advantage of those available technological advancements to develop prediction models for breast cancer survivability. We used three popular data mining algorithms (Naïve Bayes, RBF Network, J48) to develop the prediction models using a large dataset (683 breast cancer cases). We also used 10-fold cross-validation methods to measure the unbiased estimate of the three prediction models for performance comparison purposes. The results (based on average accuracy Breast Cancer dataset) indicated that the Naïve Bayes is the best predictor with 97.36% accuracy on the holdout sample (this prediction accuracy is better than any reported in the literature), RBF Network came out to be the second with 96.77% accuracy, J48 came out third with 93.41% accuracy.


Sign in / Sign up

Export Citation Format

Share Document