Transmission Line Fault-Cause Identification Based on Hierarchical Multiview Feature Selection

Fault-cause identification plays a significant role in transmission line maintenance and fault disposal. With the increasing types of monitoring data, i.e., micrometeorology and geographic information, multiview learning can be used to realize the information fusion for better fault-cause identification. To reduce the redundant information of different types of monitoring data, in this paper, a hierarchical multiview feature selection (HMVFS) method is proposed to address the challenge of combining waveform and contextual fault features. To enhance the discriminant ability of the model, an ε-dragging technique is introduced to enlarge the boundary between different classes. To effectively select the useful feature subset, two regularization terms, namely l2,1-norm and Frobenius norm penalty, are adopted to conduct the hierarchical feature selection for multiview data. Subsequently, an iterative optimization algorithm is developed to solve our proposed method, and its convergence is theoretically proven. Waveform and contextual features are extracted from yield data and used to evaluate the proposed HMVFS. The experimental results demonstrate the effectiveness of the combined used of fault features and reveal the superior performance and application potential of HMVFS.

Download Full-text

Radiomics side experiments and DAFIT approach in identifying pulmonary hypertension using Cardiac MRI derived radiomics based machine learning models

Scientific Reports ◽

10.1038/s41598-021-92155-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sarv Priya ◽

Tanya Aggarwal ◽

Caitlin Ward ◽

Girish Bathla ◽

Mathews Jacob ◽

...

Keyword(s):

Machine Learning ◽

Pulmonary Hypertension ◽

Feature Selection ◽

Subgroup Analysis ◽

Cardiac Mri ◽

Intraclass Correlation ◽

Poor Performance ◽

Superior Performance ◽

Feature Filtering ◽

The Impact

AbstractSide experiments are performed on radiomics models to improve their reproducibility. We measure the impact of myocardial masks, radiomic side experiments and data augmentation for information transfer (DAFIT) approach to differentiate patients with and without pulmonary hypertension (PH) using cardiac MRI (CMRI) derived radiomics. Feature extraction was performed from the left ventricle (LV) and right ventricle (RV) myocardial masks using CMRI in 82 patients (42 PH and 40 controls). Various side study experiments were evaluated: Original data without and with intraclass correlation (ICC) feature-filtering and DAFIT approach (without and with ICC feature-filtering). Multiple machine learning and feature selection strategies were evaluated. Primary analysis included all PH patients with subgroup analysis including PH patients with preserved LVEF (≥ 50%). For both primary and subgroup analysis, DAFIT approach without feature-filtering was the highest performer (AUC 0.957–0.958). ICC approaches showed poor performance compared to DAFIT approach. The performance of combined LV and RV masks was superior to individual masks alone. There was variation in top performing models across all approaches (AUC 0.862–0.958). DAFIT approach with features from combined LV and RV masks provide superior performance with poor performance of feature filtering approaches. Model performance varies based upon the feature selection and model combination.

Download Full-text

Outburst prediction and influencing factors analysis based on Boruta-Apriori and BO-SVM algorithms

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210466 ◽

2021 ◽

pp. 1-18

Author(s):

Zhang Zixian ◽

Liu Xuning ◽

Li Zhixiang ◽

Hu Hongqiang

Keyword(s):

Feature Selection ◽

Prediction Model ◽

Influencing Factors ◽

Association Rules ◽

Coal And Gas Outburst ◽

Feature Subset ◽

Gas Outburst ◽

Proposed Model ◽

Optimal Feature Subset ◽

Feature Dimension

The influencing factors of coal and gas outburst are complex, now the accuracy and efficiency of outburst prediction and are not high, in order to obtain the effective features from influencing factors and realize the accurate and fast dynamic prediction of coal and gas outburst, this article proposes an outburst prediction model based on the coupling of feature selection and intelligent optimization classifier. Firstly, in view of the redundancy and irrelevance of the influencing factors of coal and gas outburst, we use Boruta feature selection method obtain the optimal feature subset from influencing factors of coal and gas outburst. Secondly, based on Apriori association rules mining method, the internal association relationship between coal and gas outburst influencing factors is mined, and the strong association rules existing in the influencing factors and samples that affect the classification of coal and gas outburst are extracted. Finally, svm is used to classify coal and gas outbursts based on the above obtained optimal feature subset and sample data, and Bayesian optimization algorithm is used to optimize the kernel parameters of svm, and the coal and gas outburst pattern recognition prediction model is established, which is compared with the existing coal and gas outbursts prediction model in literatures. Compared with the method of feature selection and association rules mining alone, the proposed model achieves the highest prediction accuracy of 93% when the feature dimension is 3, which is higher than that of Apriori association rules and Boruta feature selection, and the classification accuracy is significantly improved, However, the feature dimension decreased significantly; The results show that the proposed model is better than other prediction models, which further verifies the accuracy and applicability of the coupling prediction model, and has high stability and robustness.

Download Full-text

Study on Portrait Tracking Technology of Deep Feature Learning in Monitoring Image Acquisition

Journal of Imaging Science and Technology ◽

10.2352/j.imagingsci.technol.2021.65.4.040502 ◽

2021 ◽

Author(s):

Senlin Yang ◽

Xin Chong

Keyword(s):

Feature Selection ◽

Image Acquisition ◽

Recognition Rate ◽

Particle Swarm ◽

Feature Subset ◽

Particle Structure ◽

Recognition Time ◽

Traditional Methods ◽

Deep Feature ◽

Global Optimal

In a network information society, there are many occasions where people’s behaviors need to be tracked, photographed, and recognized. Biometric recognition technologies are considered to be one of the most effective solutions. Traditional methods mostly use graph structure and deformed component model to design two-dimensional (2D) human body component detectors, and apply graph models to establish the connectivity of each component. The recognition design process is simple, but the accuracy of recognition and tracking effect applied in monitoring image acquisition is not high. The improved particle swarm optimization algorithm is used to determine the particle structure, and the binary bit string is used to represent the particle structure. The support vector machine (SVM) parameters of discrete particles are optimized, and the synchronous optimization design of feature selection and SVM parameters is carried out to realize the synchronous optimization of portrait feature subset and SVM parameters in discrete space. Through in-depth research, the extracted feature subsets can be effectively optimized and selected, and the parameters of SVM model can be optimized synchronously. The discrete particle structure is associated with the SVM parameters to achieve feature selection and SVM parameter synchronization and optimization. It is not only superior to traditional algorithms in terms of recognition rate, but also reduces the feature dimension and shortens the recognition time. The deep feature recognition built on the learning machine is not easy to diverge and can effectively adjust the particle speed to the global optimal, which is more effective than the particle swarm algorithm to search for the global optimal solution, and has better robustness. In the experiments, the research content of the article is compared with the traditional methods to test and analysis. The results show that the method optimizes the selection of feature subset and eliminates a large number of invalid features. The method not only reduces space complexity and shortens recognition time, but also improves recognition rate. The dimension of feature subset dimensions are superior to those extracted by other algorithms.

Download Full-text

Simultaneous Channel and Feature Selection of Fused EEG Features Based on Sparse Group Lasso

BioMed Research International ◽

10.1155/2015/703768 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 11

Author(s):

Jin-Jia Wang ◽

Fang Xue ◽

Hui Li

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Group Lasso ◽

High Dimensional ◽

Test Accuracy ◽

Gradient Descent Method ◽

Feature Subset ◽

Eeg Signals ◽

Sparse Group Lasso ◽

Selection Of

Feature extraction and classification of EEG signals are core parts of brain computer interfaces (BCIs). Due to the high dimension of the EEG feature vector, an effective feature selection algorithm has become an integral part of research studies. In this paper, we present a new method based on a wrapped Sparse Group Lasso for channel and feature selection of fused EEG signals. The high-dimensional fused features are firstly obtained, which include the power spectrum, time-domain statistics, AR model, and the wavelet coefficient features extracted from the preprocessed EEG signals. The wrapped channel and feature selection method is then applied, which uses the logistical regression model with Sparse Group Lasso penalized function. The model is fitted on the training data, and parameter estimation is obtained by modified blockwise coordinate descent and coordinate gradient descent method. The best parameters and feature subset are selected by using a 10-fold cross-validation. Finally, the test data is classified using the trained model. Compared with existing channel and feature selection methods, results show that the proposed method is more suitable, more stable, and faster for high-dimensional feature fusion. It can simultaneously achieve channel and feature selection with a lower error rate. The test accuracy on the data used from international BCI Competition IV reached 84.72%.

Download Full-text

A novel feature selection algorithm based on damping oscillation theory

PLoS ONE ◽

10.1371/journal.pone.0255307 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0255307

Author(s):

Fujun Wang ◽

Xing Wang

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Euclidean Distance ◽

Oscillation Theory ◽

Feature Subset Selection ◽

Support Vector ◽

Data Sets ◽

Feature Subset ◽

Selection Algorithm ◽

Filter Model

Feature selection is an important task in big data analysis and information retrieval processing. It reduces the number of features by removing noise, extraneous data. In this paper, one feature subset selection algorithm based on damping oscillation theory and support vector machine classifier is proposed. This algorithm is called the Maximum Kendall coefficient Maximum Euclidean Distance Improved Gray Wolf Optimization algorithm (MKMDIGWO). In MKMDIGWO, first, a filter model based on Kendall coefficient and Euclidean distance is proposed, which is used to measure the correlation and redundancy of the candidate feature subset. Second, the wrapper model is an improved grey wolf optimization algorithm, in which its position update formula has been improved in order to achieve optimal results. Third, the filter model and the wrapper model are dynamically adjusted by the damping oscillation theory to achieve the effect of finding an optimal feature subset. Therefore, MKMDIGWO achieves both the efficiency of the filter model and the high precision of the wrapper model. Experimental results on five UCI public data sets and two microarray data sets have demonstrated the higher classification accuracy of the MKMDIGWO algorithm than that of other four state-of-the-art algorithms. The maximum ACC value of the MKMDIGWO algorithm is at least 0.5% higher than other algorithms on 10 data sets.

Download Full-text

A Novel Unit-Based Personalized Fingerprint Feature Selection Strategy for Dynamic Functional Connectivity Networks

Frontiers in Neuroscience ◽

10.3389/fnins.2021.651574 ◽

2021 ◽

Vol 15 ◽

Author(s):

Feng Zhao ◽

Zhiyuan Chen ◽

Islem Rekik ◽

Peiqiang Liu ◽

Ning Mao ◽

...

Keyword(s):

Feature Selection ◽

Functional Connectivity ◽

Evaluation Method ◽

Brain Regions ◽

Autism Spectrum ◽

Superior Performance ◽

Selection Strategy ◽

Dynamic Functional Connectivity ◽

Discriminative Feature ◽

Fingerprint Feature

The sliding-window-based dynamic functional connectivity networks (SW-D-FCN) derive from resting-state functional Magnetic Resonance Imaging has become an increasingly useful tool in the diagnosis of various neurodegenerative diseases. However, it is still challenging to learn how to extract and select the most discriminative features from SW-D-FCN. Conventionally, existing methods opt to select a single discriminative feature set or concatenate a few more from the SW-D-FCN. However, such reductionist strategies may fail to fully capture the personalized discriminative characteristics contained in each functional connectivity (FC) sequence of the SW-D-FCN. To address this issue, we propose a unit-based personalized fingerprint feature selection (UPFFS) strategy to better capture the most discriminative feature associated with a target disease for each unit. Specifically, we regard the FC sequence between any pair of brain regions of interest (ROIs) is regarded as a unit. For each unit, the most discriminative feature is identified by a specific feature evaluation method and all the most discriminative features are then concatenated together as a feature set for the subsequent classification task. In such a way, the personalized fingerprint feature derived from each FC sequence can be fully mined and utilized in classification decision. To illustrate the effectiveness of the proposed strategy, we conduct experiments to distinguish subjects diagnosed with autism spectrum disorder from normal controls. Experimental results show that the proposed strategy can select relevant discriminative features and achieve superior performance to benchmark methods.

Download Full-text

MRFGRO: a hybrid meta-heuristic feature selection method for screening COVID-19 using deep features

Scientific Reports ◽

10.1038/s41598-021-02731-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Arijit Dey ◽

Soham Chattopadhyay ◽

Pawan Kumar Singh ◽

Ali Ahmadian ◽

Massimiliano Ferrara ◽

...

Keyword(s):

Feature Selection ◽

Golden Ratio ◽

Medical Image Analysis ◽

Feature Selection Method ◽

World Health ◽

Significant Feature ◽

Feature Subset ◽

Upper Respiratory Tract ◽

Global Pandemic ◽

Health Organization

AbstractCOVID-19 is a respiratory disease that causes infection in both lungs and the upper respiratory tract. The World Health Organization (WHO) has declared it a global pandemic because of its rapid spread across the globe. The most common way for COVID-19 diagnosis is real-time reverse transcription-polymerase chain reaction (RT-PCR) which takes a significant amount of time to get the result. Computer based medical image analysis is more beneficial for the diagnosis of such disease as it can give better results in less time. Computed Tomography (CT) scans are used to monitor lung diseases including COVID-19. In this work, a hybrid model for COVID-19 detection has developed which has two key stages. In the first stage, we have fine-tuned the parameters of the pre-trained convolutional neural networks (CNNs) to extract some features from the COVID-19 affected lungs. As pre-trained CNNs, we have used two standard CNNs namely, GoogleNet and ResNet18. Then, we have proposed a hybrid meta-heuristic feature selection (FS) algorithm, named as Manta Ray Foraging based Golden Ratio Optimizer (MRFGRO) to select the most significant feature subset. The proposed model is implemented over three publicly available datasets, namely, COVID-CT dataset, SARS-COV-2 dataset, and MOSMED dataset, and attains state-of-the-art classification accuracies of 99.15%, 99.42% and 95.57% respectively. Obtained results confirm that the proposed approach is quite efficient when compared to the local texture descriptors used for COVID-19 detection from chest CT-scan images.

Download Full-text

Feature selection method based on Menger curvature and LDA theory for a P300 brain-computer interface

Journal of Neural Engineering ◽

10.1088/1741-2552/ac42b4 ◽

2021 ◽

Author(s):

ShuRui Li ◽

Jing Jin ◽

Ian Daly ◽

Chang Liu ◽

Andrzej Cichocki

Keyword(s):

Feature Selection ◽

Brain Computer Interface ◽

Feature Selection Method ◽

Event Related Potentials ◽

Selection Method ◽

Computer Interface ◽

Feature Subset ◽

Linear Discriminant ◽

Related Potentials ◽

Menger Curvature

Abstract Brain–computer interface (BCI) systems decode electroencephalogram signals to establish a channel for direct interaction between the human brain and the external world without the need for muscle or nerve control. The P300 speller, one of the most widely used BCI applications, presents a selection of characters to the user and performs character recognition by identifying P300 event-related potentials from the EEG. Such P300-based BCI systems can reach good levels of accuracy but are difficult to use in day-to-day life due to redundancy and noisy signal. A room for improvement should be considered. We propose a novel hybrid feature selection method for the P300-based BCI system to address the problem of feature redundancy, which combines the Menger curvature and linear discriminant analysis. First, selected strategies are applied separately to a given dataset to estimate the gain for application to each feature. Then, each generated value set is ranked in descending order and judged by a predefined criterion to be suitable in classification models. The intersection of the two approaches is then evaluated to identify an optimal feature subset. The proposed method is evaluated using three public datasets, i.e., BCI Competition III dataset II, BNCI Horizon dataset, and EPFL dataset. Experimental results indicate that compared with other typical feature selection and classification methods, our proposed method has better or comparable performance. Additionally, our proposed method can achieve the best classification accuracy after all epochs in three datasets. In summary, our proposed method provides a new way to enhance the performance of the P300-based BCI speller.

Download Full-text

Performance Analysis of Classifiers on Filter-Based Feature Selection Approaches on Microarray Data

Bio-Inspired Computing for Information Retrieval Applications - Advances in Knowledge Acquisition, Transfer, and Management ◽

10.4018/978-1-5225-2375-8.ch002 ◽

2017 ◽

pp. 41-70 ◽

Cited By ~ 5

Author(s):

Arunkumar Chinnaswamy ◽

Ramakrishnan Srinivasan

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Classification Accuracy ◽

Information Gain ◽

Feature Subset ◽

Classification Problems ◽

Raw Data ◽

Correlation Based Feature Selection ◽

Feature Selection Approach ◽

Gene Expression Levels

The process of Feature selection in machine learning involves the reduction in the number of features (genes) and similar activities that results in an acceptable level of classification accuracy. This paper discusses the filter based feature selection methods such as Information Gain and Correlation coefficient. After the process of feature selection is performed, the selected genes are subjected to five classification problems such as Naïve Bayes, Bagging, Random Forest, J48 and Decision Stump. The same experiment is performed on the raw data as well. Experimental results show that the filter based approaches reduce the number of gene expression levels effectively and thereby has a reduced feature subset that produces higher classification accuracy compared to the same experiment performed on the raw data. Also Correlation Based Feature Selection uses very fewer genes and produces higher accuracy compared to Information Gain based Feature Selection approach.

Download Full-text

Liver Cancer Classification Model Using Hybrid Feature Selection Based on Class-Dependent Technique for the Central Region of Thailand

Information ◽

10.3390/info10060187 ◽

2019 ◽

Vol 10 (6) ◽

pp. 187

Author(s):

Rattanawadee Panthong ◽

Anongnart Srivihok

Keyword(s):

Feature Selection ◽

Liver Cancer ◽

Predictive Model ◽

Information Gain ◽

Classification Performance ◽

Cancer Classification ◽

Feature Subset Selection ◽

Classification Model ◽

Feature Subset ◽

Cancer Data

Liver cancer data always consist of a large number of multidimensional datasets. A dataset that has huge features and multiple classes may be irrelevant to the pattern classification in machine learning. Hence, feature selection improves the performance of the classification model to achieve maximum classification accuracy. The aims of the present study were to find the best feature subset and to evaluate the classification performance of the predictive model. This paper proposed a hybrid feature selection approach by combining information gain and sequential forward selection based on the class-dependent technique (IGSFS-CD) for the liver cancer classification model. Two different classifiers (decision tree and naïve Bayes) were used to evaluate feature subsets. The liver cancer datasets were obtained from the Cancer Hospital Thailand database. Three ensemble methods (ensemble classifiers, bagging, and AdaBoost) were applied to improve the performance of classification. The IGSFS-CD method provided good accuracy of 78.36% (sensitivity 0.7841 and specificity 0.9159) on LC_dataset-1. In addition, LC_dataset II delivered the best performance with an accuracy of 84.82% (sensitivity 0.8481 and specificity 0.9437). The IGSFS-CD method achieved better classification performance compared to the class-independent method. Furthermore, the best feature subset selection could help reduce the complexity of the predictive model.

Download Full-text