Design of novel multi filter union feature selection framework for breast cancer dataset

Concurrent Engineering ◽

10.1177/1063293x211016046 ◽

2021 ◽

pp. 1063293X2110160

Author(s):

Dinesh Morkonda Gunasekaran ◽

Prabha Dhandayudam

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Care Center ◽

Feature Selection Method ◽

Selection Method ◽

Cancer Center ◽

Breast Cancer Dataset ◽

Data Set ◽

Health Care Center ◽

Cancer Data

Nowadays women are commonly diagnosed with breast cancer. Feature based Selection method plays an important step while constructing a classification based framework. We have proposed Multi filter union (MFU) feature selection method for breast cancer data set. The feature selection process based on random forest algorithm and Logistic regression (LG) algorithm based union model is used for selecting important features in the dataset. The performance of the data analysis is evaluated using optimal features subset from selected dataset. The experiments are computed with data set of Wisconsin diagnostic breast cancer center and next the real data set from women health care center. The result of the proposed approach shows high performance and efficient when comparing with existing feature selection algorithms.

Download Full-text

A Kernel Based Feature Selection Method Used in the Diagnosis of Wisconsin Breast Cancer Dataset

Advances in Computing and Communications - Communications in Computer and Information Science ◽

10.1007/978-3-642-22709-7_66 ◽

2011 ◽

pp. 683-690 ◽

Cited By ~ 2

Author(s):

P. Jaganathan ◽

N. Rajkumar ◽

R. Nagalakshmi

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Breast Cancer Dataset ◽

Cancer Dataset

Download Full-text

NONLINEAR PROBIT GENE CLASSIFICATION USING MUTUAL INFORMATION AND WAVELET-BASED FEATURE SELECTION

Journal of Biological System ◽

10.1142/s0218339004001178 ◽

2004 ◽

Vol 12 (03) ◽

pp. 371-386 ◽

Cited By ~ 30

Author(s):

XIAOBO ZHOU ◽

XIAODONG WANG ◽

EDWARD R. DOUGHERTY

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Mutual Information ◽

Feature Selection Method ◽

Accurate Estimate ◽

Selection Method ◽

Expression Data ◽

Cancer Data ◽

New Methods ◽

Leukemia Data

We consider the problem of cancer classification from gene expression data. We propose using a mutual information-based gene or feature selection method where features are wavelet-based. The bootstrap technique is employed to obtain an accurate estimate of the mutual information. We then develop a nonlinear probit Bayesian classifier consisting of a linear term plus a nonlinear term, the parameters of which are estimated using the Gibbs sampler. These new methods are applied to analyze breast-cancer data and leukemia data. The results indicate that the proposed gene and feature selection method is very accurate in breast-cancer and leukemia classifications.

Download Full-text

Multi-Objective Feature Selection Method by Using ACO with PSO Algorithm for Breast Cancer Detection

International Journal of Intelligent Engineering and Systems ◽

10.22266/ijies2021.1031.32 ◽

2021 ◽

Vol 14 (5) ◽

pp. 359-368

Author(s):

Rajesh Saturi ◽

◽

Parvataneni Premchand ◽

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Cancer Detection ◽

Feature Selection Method ◽

Pso Algorithm ◽

Selection Method ◽

Breast Cancer Detection ◽

Multi Objective

Download Full-text

Predicting Breast Cancer Using Logistic Regression and Multi-Class Classifiers

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.20.22115 ◽

2018 ◽

Vol 7 (4.20) ◽

pp. 22 ◽

Cited By ~ 4

Author(s):

Jabeen Sultana ◽

Abdul Khader Jilani ◽

. .

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Logistic Regression ◽

Regression Method ◽

Breast Cancer Dataset ◽

Breast Cancer Data ◽

Data Set ◽

Cancer Data ◽

Logistic Regression Method ◽

Simple Logistic

The primary identification and prediction of type of the cancer ought to develop a compulsion in cancer study, in order to assist and supervise the patients. The significance of classifying cancer patients into high or low risk clusters needs commanded many investigation teams, from the biomedical and the bioinformatics area, to learn and analyze the application of machine learning (ML) approaches. Logistic Regression method and Multi-classifiers has been proposed to predict the breast cancer. To produce deep predictions in a new environment on the breast cancer data. This paper explores the different data mining approaches using Classification which can be applied on Breast Cancer data to build deep predictions. Besides this, this study predicts the best Model yielding high performance by evaluating dataset on various classifiers. In this paper Breast cancer dataset is collected from the UCI machine learning repository has 569 instances with 31 attributes. Data set is pre-processed first and fed to various classifiers like Simple Logistic-regression method, IBK, K-star, Multi-Layer Perceptron (MLP), Random Forest, Decision table, Decision Trees (DT), PART, Multi-Class Classifiers and REP Tree. 10-fold cross validation is applied, training is performed so that new Models are developed and tested. The results obtained are evaluated on various parameters like Accuracy, RMSE Error, Sensitivity, Specificity, F-Measure, ROC Curve Area and Kappa statistic and time taken to build the model. Result analysis reveals that among all the classifiers Simple Logistic Regression yields the deep predictions and obtains the best model yielding high and accurate results followed by other methods IBK: Nearest Neighbor Classifier, K-Star: instance-based Classifier, MLP- Neural network. Other Methods obtained less accuracy in comparison with Logistic regression method.

Download Full-text

Comparison of radiomic pre-processing steps in the reproducible prediction of disease free survival across multi-scanners/centers

10.21203/rs.3.rs-875843/v1 ◽

2021 ◽

Author(s):

Marta Ferreira ◽

Pierre Lovinfosse ◽

Johanne Hermesse ◽

Marjolein Decuypere ◽

Caroline Rousseau ◽

...

Keyword(s):

Feature Selection ◽

Locally Advanced ◽

Feature Selection Method ◽

Disease Free Survival ◽

Selection Method ◽

Prediction Performance ◽

Data Set ◽

Free Survival ◽

Processing Steps ◽

Disease Free

Abstract Background Features reproducibility and the generalizability of the models are currently among the most important limitations when integrating radiomics into the clinics. Radiomic features are sensitive to imaging acquisition protocols, reconstruction algorithms and parameters, as well as by the different steps of the usual radiomics workflow. We propose a framework for comparing the reproducibility of different pre-processing steps in PET/CT radiomic analysis in the prediction of disease free survival (DFS) across multi-scanners/centers. Results We evaluated and compared the prediction performance of several models that differ in i) the type of intensity discretization, ii) feature selection method, iii) features type i.e, original or tumour to liver ratio radiomic features (OR or TLR). We trained our models using data from one scanner/center and tested on two external scanner/centers. Our results show that there is a low reproducibility in predictions across scanners and discretization methods. Despite of this, TLR based models were generally more robust than OR. Maximum relevance minimum redundancy (MRMR) forward feature selection with Pearson correlation was the feature selection method that had the best mean area under the precision recall curve when using it combining the features from all discretization’s bin’s number (D_All_FBN) with TLR features for two of the four classifiers. Conclusion We evaluated and compared the prediction performance of several models in a data set containing hundred fifty-eight patients with locally advanced cervical cancer (LACC) from three distinct scanners. In our cohort of LAAC patients pre-processing of radiomic features in [18F]FDG PET affects DFS predictions performances across scanners and combining the D_All_FBN TLR approach with the MRMR forward Pearson feature selection method might help increasing robustness of radiomic studies.

Download Full-text

Deep Recurrent Network Based Feature Selection using Single Matrix Normalization and Eigen Vectors for Analyzing Sentiments

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j8913.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 804-809

Keyword(s):

Social Media ◽

Feature Selection ◽

Feature Selection Method ◽

Recurrent Network ◽

Classification Performance ◽

Selection Method ◽

Data Set ◽

Improve Accuracy ◽

N Gram ◽

Single Matrix

Sentiment analysis plays a major role in e-commerce and social media these days. Due to the increasing growth of social media, a huge number of peoples and users send their reviews through the Internet and several other sources. Analyzing this data is challenging in today's life. In this paper new normalization based feature selection method is proposed and the topic of interest here is to select the relevant features and perform the classification of the data and find the accuracy. Stability of the data is considered as the most important challenge in analyzing the sentiments. In this paper investigating the sentiments and selecting the relevant features from the data set places a major role. The aim is to work with the vector-based feature selection and check the classification performance using recurrent networks. In this paper, text mining depends on feature retrieval methods to improve accuracy and propose a single matrix normalization method to reduce the dimensions. The proposed method performs data preprocessing or sentiment classification and features reduction to improve accuracy. The proposed method achieves better accuracy than the N-gram feature selection method. The experimental results show that the proposed method has better accuracy than other traditional feature selection approaches and that the proposed method can decrease the implementation time.

Download Full-text

English Text Classification Using Improved Recursive Feature Elimination (IRFE) Algorithm: تصنيف النص الإنجليزي باستخدام الخوارزمية العودية المحسنة لإزالة الخواص (IRFE)

Journal of engineering sciences and information technology - مجلة العلوم الهندسية و تكنولوجيا المعلومات ◽

10.26389/ajsrp.r080420 ◽

2020 ◽

Vol 4 (2) ◽

Author(s):

Esraa H. Abd Al-Ameer, Ahmed H. Aliwy

Keyword(s):

Feature Selection ◽

Language Processing ◽

Text Classification ◽

Feature Selection Method ◽

Selection Method ◽

English Text ◽

Recursive Feature Elimination ◽

Chi Square ◽

Data Set ◽

New Feature

Documents classification is from most important fields for Natural language processing and text mining. There are many algorithms can be used for this task. In this paper, focuses on improving Text Classification by feature selection. This means determine some of the original features without affecting the accuracy of the work, where our work is a new feature selection method was suggested which can be a general formulation and mathematical model of Recursive Feature Elimination (RFE). The used method was compared with other two well-known feature selection methods: Chi-square and threshold. The results proved that the new method is comparable with the other methods, The best results were 83% when 60% of features used, 82% when 40% of features used, and 82% when 20% of features used. The tests were done with the Naïve Bayes (NB) and decision tree (DT) classification algorithms , where the used dataset is a well-known English data set “20 newsgroups text” consists of approximately 18846 files. The results showed that our suggested feature selection method is comparable with standard Like Chi-square.

Download Full-text