A Fast Feature Selection Method Based on Coefficient of Variation for Diabetics Prediction Using Machine Learning

Author(s):  
Tengyue Li ◽  
Simon Fong

Diabetes has become a prevalent metabolic disease nowadays, affecting patients of all age groups and large populations around the world. Early detection would facilitate early treatment that helps the prognosis. In the literature of computational intelligence and medical care communities, different techniques have been proposed in predicting diabetes based on the historical records of related symptoms. The researchers share a common goal of improving the accuracy of a diabetes prediction model. In addition to the model induction algorithms, feature selection is a significant approach in retaining only the relevant attributes for the sake of building a quality prediction model later. In this article, a novel and simple feature selection criterion called Coefficient of Variation (CV) is proposed as a filter-based feature selection scheme. By following the CV method, attributes that have a data dispersion too low are disqualified from the model construction process. Thereby the attributes which are factors leading to poor model accuracy are discarded. The computation of CV is simple, hence enabling an efficient feature selection process. Computer simulation experiments by using the Prima Indian diabetes dataset is used to compare the performance of CV with other traditional feature selection methods. Superior results by CV are observed.

2013 ◽  
Vol 427-429 ◽  
pp. 2045-2049
Author(s):  
Chun Mei Yu ◽  
Sheng Bo Yang

To increase fault classification performance and reduce computational complexity,the feature selection process has been used for fault diagnosis.In this paper, we proposed a sparse representation based feature selection method and gave detailed procedure of the algorithm. Traditional selecting methods based on wavelet package decomposition and Bhattacharyya distance methods,and sparse methods, including sparse representation classifier, sparsity preserving projection and sparse principal component analysis,were compared to the proposed method.Simulations showed the proposed selecting method gave better performance on fault diagnosis with Tennessee Eastman Process data.


2021 ◽  
Author(s):  
Chunyuan Wang ◽  
Yatao Zhang ◽  
Xinge Jiang ◽  
Feifei Liu ◽  
Zhimin Zhang ◽  
...  

Abstract This paper proposed a feature selection method combined with multi-time-scales analysis and heart rate variability (HRV) analysis for middle and early diagnosis of congestive heart failure (CHF). In previous studies regarding the diagnosis of CHF, researchers have tended to increase the variety of HRV features by searching for new ones or to use different machine learning algorithms to optimize the classification of CHF and normal sinus rhythms subject (NSR). In fact, the full utilization of traditional HRV features can also improve classification accuracy. The proposed method constructs a multi-time-scales feature matrix according to traditional HRV features that exhibit good stability in multiple time-scales and differences in different time-scales. The multi-scales features yield better performance than the traditional single-time-scales features when the features are fed into a support vector machine (SVM) classifier, and the results of the SVM classifier exhibit a sensitivity, a specificity, and an accuracy of 99.52%, 100.00%, and 99.83%, respectively. These results indicate that the proposed feature selection method can effectively reduce redundant features and computational load when used for automatic diagnosis of CHF.


Author(s):  
RONG LIU ◽  
ROBERT RALLO ◽  
YORAM COHEN

An unsupervised feature selection method is proposed for analysis of datasets of high dimensionality. The least square error (LSE) of approximating the complete dataset via a reduced feature subset is proposed as the quality measure for feature selection. Guided by the minimization of the LSE, a kernel least squares forward selection algorithm (KLS-FS) is developed that is capable of both linear and non-linear feature selection. An incremental LSE computation is designed to accelerate the selection process and, therefore, enhances the scalability of KLS-FS to high-dimensional datasets. The superiority of the proposed feature selection algorithm, in terms of keeping principal data structures, learning performances in classification and clustering applications, and robustness, is demonstrated using various real-life datasets of different sizes and dimensions.


Author(s):  
Alhayat Ali Mekonnen ◽  
Frédéric Lerasle ◽  
Ariane Herbulot ◽  
Cyril Briand

In this paper, we investigate the notion of incorporating feature computation time (CT) measures during feature selection in a boosted cascade people detector utilizing heterogeneous pool of features. We present various approaches based on pareto-front analysis, CT weighted adaboost, and Binary Integer Programming (BIP) with comparative evaluations. The novel feature selection method proposed based on BIP — the main contribution — mines heterogeneous features taking both detection performance and CT explicitly into consideration. The results demonstrate that the detector using this feature selection scheme exhibits low miss rates (MRs) with significant boost in frame rate. For example, it achieves a [Formula: see text] less MR at [Formula: see text] FPPW compared to Dalal and Triggs HOG detector with a [Formula: see text]x speed improvement. The presented extensive experimental results clearly highlight the improvements the proposed framework brings to the table.


Metals ◽  
2021 ◽  
Vol 11 (5) ◽  
pp. 747
Author(s):  
Yuchun Wu ◽  
Yifan Yan ◽  
Zhimin Lv

Traditional mechanical properties prediction models are mostly based on experience and mechanism, which neglect the linear and nonlinear relationships between process parameters. Aiming at the high-dimensional data collected in the complex industrial process of steel production, a new prediction model is proposed. The multidimensional support vector regression (MSVR)-based model is combined with the feature selection method, which involves maximum information coefficient (MIC) correlation characterization and complex network clustering. Firstly, MIC is used to measure the correlation between process parameters and mechanical properties, based on which a complex network is constructed and hierarchical clustering is performed. Secondly, we evaluate all parameters and select a representative one for each partition as the input of the subsequent model based on the centrality and influence indicators. Finally, an actual steel production case is used to train the MSVR prediction model. The prediction results show that our proposed framework can capture effective features from the full parameters in terms of higher prediction accuracy and is less time-consuming compared with the Pearson-based subset, full-parameter subset, and empirical subset input. The feature selection method based on MIC can dig out some nonlinear relationships which cannot be found by Pearson coefficient.


2021 ◽  
Vol 15 (6) ◽  
pp. 1-24
Author(s):  
Dipanjyoti Paul ◽  
Rahul Kumar ◽  
Sriparna Saha ◽  
Jimson Mathew

The feature selection method is the process of selecting only relevant features by removing irrelevant or redundant features amongst the large number of features that are used to represent data. Nowadays, many application domains especially social media networks, generate new features continuously at different time stamps. In such a scenario, when the features are arriving in an online fashion, to cope up with the continuous arrival of features, the selection task must also have to be a continuous process. Therefore, the streaming feature selection based approach has to be incorporated, i.e., every time a new feature or a group of features arrives, the feature selection process has to be invoked. Again, in recent years, there are many application domains that generate data where samples may belong to more than one classes called multi-label dataset. The multiple labels that the instances are being associated with, may have some dependencies amongst themselves. Finding the co-relation amongst the class labels helps to select the discriminative features across multiple labels. In this article, we develop streaming feature selection methods for multi-label data where the multiple labels are reduced to a lower-dimensional space. The similar labels are grouped together before performing the selection method to improve the selection quality and to make the model time efficient. The multi-objective version of the cuckoo search-based approach is used to select the optimal feature set. The proposed method develops two versions of the streaming feature selection method: ) when the features arrive individually and ) when the features arrive in the form of a batch. Various multi-label datasets from various domains such as text, biology, and audio have been used to test the developed streaming feature selection methods. The proposed methods are compared with many previous feature selection methods and from the comparison, the superiority of using multiple objectives and label co-relation in the feature selection process can be established.


High dimensional data are found in the medical domain that needs to be processed for improved data analysis. In order to deal with the curse of dimensionality, feature selection process is employed in almost all data mining applications. In this research work, Density based Feature Selection (DFS) method that ranks the features by finding the Probability Density Function (PDF) of each feature is applied to medical datasets that suffer from the curse of dimensionality. The DFS method is a filter based approach that selects the most discriminatory features from the given feature set. The feature selection method evaluates the importance of the feature with regard to the target class using density function. The DFS method has major advantages over other methods, since it is based on the ranking method to select the most discriminatory features from the whole feature set. This research work finds the best feature subset that can be used in prediction and classification of medical datasets imbibed with high dimensionality. The DFS method based on PDF is applied on the three medical datasets namely Chronic Kidney Disease (CKD) dataset, Breast Cancer Wisconsin Dataset and Parkinsons Dataset. The proposed feature selection method evaluates the merit of each feature, assign weights to the feature and rank the features based on their feature density. The reduced feature subset is then validated by the application three classification algorithms namely Support Vector Machine (SVM), Gradient Boosting, and Convolutional Neural Network (CNN). The performance of the classification algorithms are evaluated based on the performance metrics Accuracy, Sensitivity and Specificity. Experimental results indicate that the performance of the classification algorithms SVM, Gradient Boosting, and CNN is improved after the feature selection process.


Author(s):  
Md. Monirul Kabir ◽  
◽  
Md. Shahjahan ◽  
Kazuyuki Murase ◽  
◽  
...  

In this paper we propose a new backward feature selection method that generates compact classifier of a three-layered feed-forward artificial neural network (ANN). In the algorithm, that is based on the wrapper model, two techniques, coherence and pruning, are integrated together in order to find relevant features with a network of minimal numbers of hidden units and connections. Firstly, a coherence learning and a pruning technique are applied during training for removing unnecessary hidden units from the network. After that, attribute distances are measured by a straightforward computation that is not computationally expensive. An attribute is then removed based on an error-based criterion. The network is retrained after the removal of the attribute. This unnecessary attribute selection process is continued until a stopping criterion is satisfied. We applied this method to several standard benchmark classification problems such as breast cancer, diabetes, glass identification and thyroid problems. Experimental results confirmed that the proposed method generates compact network structures that can select relevant features with good classification accuracies.


Sign in / Sign up

Export Citation Format

Share Document