scholarly journals A Novel End-To-End Feature Selection and Diagnosis Method for Rotating Machinery

Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2056
Author(s):  
Gang Wang ◽  
Yang Zhao ◽  
Jiasi Zhang ◽  
Yongjie Ning

Feature selection is to obtain effective features from data, also known as feature engineering. Traditional feature selection and predictive model learning are separated, and there is a problem of inconsistency of criteria. This paper presents an end-to-end feature selection and diagnosis method that organically unifies feature expression learning and machine prediction learning into one model. The algorithm first combines the prediction model to calculate the mean impact value (MIVs) of the feature and realizes primary feature selection for the prediction model by selecting the feature with a larger MIV. In order to take into account the performance of the feature itself, the within-class and between-class discriminant analysis (WBDA) method is proposed, and combined with the feature diversity strategy, the feature-oriented secondary selection is realized. Eventually, feature vectors obtained by two selections are classified using a multi-class support vector machine (SVM). Compared with the modified network variable selection algorithm (MIVs), the principal component analysis dimensionality reduction algorithm (PCA), variable selection based on compensative distance evaluation technology (CDET), and other algorithms, the proposed method MIVs-WBDA exhibits excellent classification accuracy owing to the fusion of feature selection and predictive model learning. According to the results of classification accuracy testing after dimensionality reduction on rotating machinery status, the MIVs-WBDA method has a 3% classification accuracy improvement under the low-dimensional feature set. The typical running time of this classification learning algorithm is less than 10 s, while using deep learning, its running time will be more than a few hours.

Author(s):  
Edilson Delgado-Trejos ◽  
Germán Castellanos ◽  
Luis G. Sánchez ◽  
Julio F. Suárez

Dimensionality reduction procedures perform well on sets of correlated features while variable selection methods perform poorly. These methods fail to pick relevant variables because the score they assign to correlated features is too similar, and none of the variables is strongly preferred over another. Hence, feature selection and dimensionality reduction algorithms have complementary advantages and disadvantages. Dimensionality reduction algorithms thrive on the correlation between variables but fail to select informative features from a set of more complex features. Variable selection algorithms fail when all the features are correlated, but succeed with informative variables (Wolf & Bileschi, 2005). In this work, we propose a feature selection algorithm with heuristic search that uses Multivariate Analysis of Variance (MANOVA) as the cost function. This technique is put to the test by classifying hypernasal from normal voices of CLP (cleft lip and/or palate) patients. The classification performance, computational time, and reduction ratio are also considered by comparing with an alternate feature selection method based on the unfolding of multivariate analysis into univariate and bivariate analysis. The methodology is effective because it has in mind the statistical and geometrical relevance present in the features, which does not summarize the analysis of the separability among classes, but searches a quality level in signal representation.


Author(s):  
Maria Mohammad Yousef ◽  

Generally, medical dataset classification has become one of the biggest problems in data mining research. Every database has a given number of features but it is observed that some of these features can be redundant and can be harmful as well as disrupt the process of classification and this problem is known as a high dimensionality problem. Dimensionality reduction in data preprocessing is critical for increasing the performance of machine learning algorithms. Besides the contribution of feature subset selection in dimensionality reduction gives a significant improvement in classification accuracy. In this paper, we proposed a new hybrid feature selection approach based on (GA assisted by KNN) to deal with issues of high dimensionality in biomedical data classification. The proposed method first applies the combination between GA and KNN for feature selection to find the optimal subset of features where the classification accuracy of the k-Nearest Neighbor (kNN) method is used as the fitness function for GA. After selecting the best-suggested subset of features, Support Vector Machine (SVM) are used as the classifiers. The proposed method experiments on five medical datasets of the UCI Machine Learning Repository. It is noted that the suggested technique performs admirably on these databases, achieving higher classification accuracy while using fewer features.


2019 ◽  
Vol 25 (1) ◽  
Author(s):  
André Moiane ◽  
Alvaro Muriel Lima Machado

Abstract This paper investigates an alternative classification method that integrates class-based affinity propagation (CAP) clustering algorithm and maximum likelihood classifier (MLC) with the purpose of overcome the MLC limitations in the classification of high dimensionality data, and thus improve its accuracy. The new classifier was named CAP-MLC, and comprises two approaches, spectral feature selection and image classification. CAP clustering algorithm was used to perform the image dimensionality reduction and feature selection while the MLC was employed for image classification. The performance of MLC in terms of classification accuracy and processing time is determined as a function of the selection rate achieved in the CAP clustering stage. The performance of CAP-MLC has been evaluated and validated using two hyperspectral scenes from the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) and the Hyperspectral Digital Imagery Collection Experiment (HYDICE). Classification results show that CAP-MLC observed an enormous improvement in accuracy, reaching 94.15% and 96.47% respectively for AVIRIS and HYDICE if compared with MLC, which had 85.42% and 81.50%. These values obtained by CAP-MLC improved the MLC classification accuracy in 8.73% and 14.97% for these images. The results also show that CAP-MLC performed well, even for classes with limited training samples, surpassing the limitations of MLC.


2019 ◽  
Vol 6 (339) ◽  
pp. 7-16
Author(s):  
Mariusz Kubus

Random forests are currently one of the most preferable methods of supervised learning among practitioners. Their popularity is influenced by the possibility of applying this method without a time consuming pre‑processing step. Random forests can be used for mixed types of features, irrespectively of their distributions. The method is robust to outliers, and feature selection is built into the learning algorithm. However, a decrease of classification accuracy can be observed in the presence of redundant variables. In this paper, we discuss two approaches to the problem of redundant variables. We consider two strategies of searching for best feature subset as well as two formulas of aggregating the features in the clusters. In the empirical experiment, we generate collinear predictors and include them in the real datasets. Dimensionality reduction methods usually improve the accuracy of random forests, but none of them clearly outperforms the others.


2017 ◽  
Vol 10 (2) ◽  
pp. 282-290
Author(s):  
Samir Singha ◽  
Syed Hassan

The performance of data mining and machine learning tasks can be significantly degraded due to the presence of noisy, irrelevant and high dimensional data containing large number of features. A large amount of real world data consist of noise or missing values. While collecting data, there may be many irrelevant features that are collected by the storage repositories. These redundant and irrelevant feature values distorts the classification principle and simultaneously increases calculations overhead and decreases the prediction ability of the classifier. The high-dimensionality of such datasets possesses major bottleneck in the field of data mining, statistics, machine learning. Among several methods of dimensionality reduction, attribute or feature selection technique is often used in dimensionality reduction. Since the k-NN algorithm is sensitive to irrelevant attributes therefore its performance degrades significantly when a dataset contains missing values or noisy data. However, this weakness of the k-NN algorithm can be minimized when combined with the other feature selection techniques. In this research we combine the Correlation based Feature Selection (CFS) with k-Nearest Neighbour (k-NN) Classification algorithm to find better result in classification when the dataset contains missing values or noisy data. The reduced attribute set decreases the time required for classification. The research shows that when dimensionality reduction is done using CFS and classified with k-NN algorithm, dataset with nil or very less noise may have negative impact in the classification accuracy, when compared with classification accuracy of k-NN algorithm alone. When additional noise is introduced to these datasets, the performance of k-NN degrades significantly. When these noisy datasets are classified using CFS and k-NN together, the percentage in classification accuracy is improved.


2019 ◽  
Vol 21 (9) ◽  
pp. 631-645 ◽  
Author(s):  
Saeed Ahmed ◽  
Muhammad Kabir ◽  
Zakir Ali ◽  
Muhammad Arif ◽  
Farman Ali ◽  
...  

Aim and Objective: Cancer is a dangerous disease worldwide, caused by somatic mutations in the genome. Diagnosis of this deadly disease at an early stage is exceptionally new clinical application of microarray data. In DNA microarray technology, gene expression data have a high dimension with small sample size. Therefore, the development of efficient and robust feature selection methods is indispensable that identify a small set of genes to achieve better classification performance. Materials and Methods: In this study, we developed a hybrid feature selection method that integrates correlation-based feature selection (CFS) and Multi-Objective Evolutionary Algorithm (MOEA) approaches which select the highly informative genes. The hybrid model with Redial base function neural network (RBFNN) classifier has been evaluated on 11 benchmark gene expression datasets by employing a 10-fold cross-validation test. Results: The experimental results are compared with seven conventional-based feature selection and other methods in the literature, which shows that our approach owned the obvious merits in the aspect of classification accuracy ratio and some genes selected by extensive comparing with other methods. Conclusion: Our proposed CFS-MOEA algorithm attained up to 100% classification accuracy for six out of eleven datasets with a minimal sized predictive gene subset.


Author(s):  
B. Venkatesh ◽  
J. Anuradha

In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.


Sign in / Sign up

Export Citation Format

Share Document