Design of novel multi filter union feature selection framework for breast cancer dataset

2021 ◽  
pp. 1063293X2110160
Author(s):  
Dinesh Morkonda Gunasekaran ◽  
Prabha Dhandayudam

Nowadays women are commonly diagnosed with breast cancer. Feature based Selection method plays an important step while constructing a classification based framework. We have proposed Multi filter union (MFU) feature selection method for breast cancer data set. The feature selection process based on random forest algorithm and Logistic regression (LG) algorithm based union model is used for selecting important features in the dataset. The performance of the data analysis is evaluated using optimal features subset from selected dataset. The experiments are computed with data set of Wisconsin diagnostic breast cancer center and next the real data set from women health care center. The result of the proposed approach shows high performance and efficient when comparing with existing feature selection algorithms.

2004 ◽  
Vol 12 (03) ◽  
pp. 371-386 ◽  
Author(s):  
XIAOBO ZHOU ◽  
XIAODONG WANG ◽  
EDWARD R. DOUGHERTY

We consider the problem of cancer classification from gene expression data. We propose using a mutual information-based gene or feature selection method where features are wavelet-based. The bootstrap technique is employed to obtain an accurate estimate of the mutual information. We then develop a nonlinear probit Bayesian classifier consisting of a linear term plus a nonlinear term, the parameters of which are estimated using the Gibbs sampler. These new methods are applied to analyze breast-cancer data and leukemia data. The results indicate that the proposed gene and feature selection method is very accurate in breast-cancer and leukemia classifications.


2018 ◽  
Vol 7 (4.20) ◽  
pp. 22 ◽  
Author(s):  
Jabeen Sultana ◽  
Abdul Khader Jilani ◽  
. .

The primary identification and prediction of type of the cancer ought to develop a compulsion in cancer study, in order to assist and supervise the patients. The significance of classifying cancer patients into high or low risk clusters needs commanded many investigation teams, from the biomedical and the bioinformatics area, to learn and analyze the application of machine learning (ML) approaches. Logistic Regression method and Multi-classifiers has been proposed to predict the breast cancer. To produce deep predictions in a new environment on the breast cancer data. This paper explores the different data mining approaches using Classification which can be applied on Breast Cancer data to build deep predictions. Besides this, this study predicts the best Model yielding high performance by evaluating dataset on various classifiers. In this paper Breast cancer dataset is collected from the UCI machine learning repository has 569 instances with 31 attributes. Data set is pre-processed first and fed to various classifiers like Simple Logistic-regression method, IBK, K-star, Multi-Layer Perceptron (MLP), Random Forest, Decision table, Decision Trees (DT), PART, Multi-Class Classifiers and REP Tree.  10-fold cross validation is applied, training is performed so that new Models are developed and tested. The results obtained are evaluated on various parameters like Accuracy, RMSE Error, Sensitivity, Specificity, F-Measure, ROC Curve Area and Kappa statistic and time taken to build the model. Result analysis reveals that among all the classifiers Simple Logistic Regression yields the deep predictions and obtains the best model yielding high and accurate results followed by other methods IBK: Nearest Neighbor Classifier, K-Star: instance-based Classifier, MLP- Neural network. Other Methods obtained less accuracy in comparison with Logistic regression method.  


2021 ◽  
Author(s):  
Marta Ferreira ◽  
Pierre Lovinfosse ◽  
Johanne Hermesse ◽  
Marjolein Decuypere ◽  
Caroline Rousseau ◽  
...  

Abstract Background Features reproducibility and the generalizability of the models are currently among the most important limitations when integrating radiomics into the clinics. Radiomic features are sensitive to imaging acquisition protocols, reconstruction algorithms and parameters, as well as by the different steps of the usual radiomics workflow. We propose a framework for comparing the reproducibility of different pre-processing steps in PET/CT radiomic analysis in the prediction of disease free survival (DFS) across multi-scanners/centers. Results We evaluated and compared the prediction performance of several models that differ in i) the type of intensity discretization, ii) feature selection method, iii) features type i.e, original or tumour to liver ratio radiomic features (OR or TLR). We trained our models using data from one scanner/center and tested on two external scanner/centers. Our results show that there is a low reproducibility in predictions across scanners and discretization methods. Despite of this, TLR based models were generally more robust than OR. Maximum relevance minimum redundancy (MRMR) forward feature selection with Pearson correlation was the feature selection method that had the best mean area under the precision recall curve when using it combining the features from all discretization’s bin’s number (D_All_FBN) with TLR features for two of the four classifiers. Conclusion We evaluated and compared the prediction performance of several models in a data set containing hundred fifty-eight patients with locally advanced cervical cancer (LACC) from three distinct scanners. In our cohort of LAAC patients pre-processing of radiomic features in [18F]FDG PET affects DFS predictions performances across scanners and combining the D_All_FBN TLR approach with the MRMR forward Pearson feature selection method might help increasing robustness of radiomic studies.


Sentiment analysis plays a major role in e-commerce and social media these days. Due to the increasing growth of social media, a huge number of peoples and users send their reviews through the Internet and several other sources. Analyzing this data is challenging in today's life. In this paper new normalization based feature selection method is proposed and the topic of interest here is to select the relevant features and perform the classification of the data and find the accuracy. Stability of the data is considered as the most important challenge in analyzing the sentiments. In this paper investigating the sentiments and selecting the relevant features from the data set places a major role. The aim is to work with the vector-based feature selection and check the classification performance using recurrent networks. In this paper, text mining depends on feature retrieval methods to improve accuracy and propose a single matrix normalization method to reduce the dimensions. The proposed method performs data preprocessing or sentiment classification and features reduction to improve accuracy. The proposed method achieves better accuracy than the N-gram feature selection method. The experimental results show that the proposed method has better accuracy than other traditional feature selection approaches and that the proposed method can decrease the implementation time.


Author(s):  
Esraa H. Abd Al-Ameer, Ahmed H. Aliwy

Documents classification is from most important fields for Natural language processing and text mining. There are many algorithms can be used for this task. In this paper, focuses on improving Text Classification by feature selection. This means determine some of the original features without affecting the accuracy of the work, where our work is a new feature selection method was suggested which can be a general formulation and mathematical model of Recursive Feature Elimination (RFE). The used method was compared with other two well-known feature selection methods: Chi-square and threshold. The results proved that the new method is comparable with the other methods, The best results were 83% when 60% of features used, 82% when 40% of features used, and 82% when 20% of features used. The tests were done with the Naïve Bayes (NB) and decision tree (DT) classification algorithms , where the used dataset is a well-known English data set “20 newsgroups text” consists of approximately 18846 files. The results showed that our suggested feature selection method is comparable with standard Like Chi-square.


Sign in / Sign up

Export Citation Format

Share Document