Multi-view feature selection via sparse tensor regression

Author(s):  
Haoliang Yuan ◽  
Sio-Long Lo ◽  
Ming Yin ◽  
Yong Liang

In this paper, we propose a sparse tensor regression model for multi-view feature selection. Apart from the most of existing methods, our model adopts a tensor structure to represent multi-view data, which aims to explore their underlying high-order correlations. Based on this tensor structure, our model can effectively select the meaningful feature set for each view. We also develop an iterative optimization algorithm to solve our model, together with analysis about the convergence and computational complexity. Experimental results on several popular multi-view data sets confirm the effectiveness of our model.

PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0255307
Author(s):  
Fujun Wang ◽  
Xing Wang

Feature selection is an important task in big data analysis and information retrieval processing. It reduces the number of features by removing noise, extraneous data. In this paper, one feature subset selection algorithm based on damping oscillation theory and support vector machine classifier is proposed. This algorithm is called the Maximum Kendall coefficient Maximum Euclidean Distance Improved Gray Wolf Optimization algorithm (MKMDIGWO). In MKMDIGWO, first, a filter model based on Kendall coefficient and Euclidean distance is proposed, which is used to measure the correlation and redundancy of the candidate feature subset. Second, the wrapper model is an improved grey wolf optimization algorithm, in which its position update formula has been improved in order to achieve optimal results. Third, the filter model and the wrapper model are dynamically adjusted by the damping oscillation theory to achieve the effect of finding an optimal feature subset. Therefore, MKMDIGWO achieves both the efficiency of the filter model and the high precision of the wrapper model. Experimental results on five UCI public data sets and two microarray data sets have demonstrated the higher classification accuracy of the MKMDIGWO algorithm than that of other four state-of-the-art algorithms. The maximum ACC value of the MKMDIGWO algorithm is at least 0.5% higher than other algorithms on 10 data sets.


Genes ◽  
2020 ◽  
Vol 11 (7) ◽  
pp. 717
Author(s):  
Garba Abdulrauf Sharifai ◽  
Zurinahni Zainol

The training machine learning algorithm from an imbalanced data set is an inherently challenging task. It becomes more demanding with limited samples but with a massive number of features (high dimensionality). The high dimensional and imbalanced data set has posed severe challenges in many real-world applications, such as biomedical data sets. Numerous researchers investigated either imbalanced class or high dimensional data sets and came up with various methods. Nonetheless, few approaches reported in the literature have addressed the intersection of the high dimensional and imbalanced class problem due to their complicated interactions. Lately, feature selection has become a well-known technique that has been used to overcome this problem by selecting discriminative features that represent minority and majority class. This paper proposes a new method called Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm (rCBR-BGOA); rCBR-BGOA has employed an ensemble of multi-filters coupled with the Correlation-Based Redundancy method to select optimal feature subsets. A binary Grasshopper optimisation algorithm (BGOA) is used to construct the feature selection process as an optimisation problem to select the best (near-optimal) combination of features from the majority and minority class. The obtained results, supported by the proper statistical analysis, indicate that rCBR-BGOA can improve the classification performance for high dimensional and imbalanced datasets in terms of G-mean and the Area Under the Curve (AUC) performance metrics.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Bingsheng Chen ◽  
Huijie Chen ◽  
Mengshan Li

Feature selection can classify the data with irrelevant features and improve the accuracy of data classification in pattern classification. At present, back propagation (BP) neural network and particle swarm optimization algorithm can be well combined with feature selection. On this basis, this paper adds interference factors to BP neural network and particle swarm optimization algorithm to improve the accuracy and practicability of feature selection. This paper summarizes the basic methods and requirements for feature selection and combines the benefits of global optimization with the feedback mechanism of BP neural networks to feature based on backpropagation and particle swarm optimization (BP-PSO). Firstly, a chaotic model is introduced to increase the diversity of particles in the initial process of particle swarm optimization, and an adaptive factor is introduced to enhance the global search ability of the algorithm. Then, the number of features is optimized to reduce the number of features on the basis of ensuring the accuracy of feature selection. Finally, different data sets are introduced to test the accuracy of feature selection, and the evaluation mechanisms of encapsulation mode and filtering mode are used to verify the practicability of the model. The results show that the average accuracy of BP-PSO is 8.65% higher than the suboptimal NDFs model in different data sets, and the performance of BP-PSO is 2.31% to 18.62% higher than the benchmark method in all data sets. It shows that BP-PSO can select more distinguishing feature subsets, which verifies the accuracy and practicability of this model.


Author(s):  
NEERAJ SAHU ◽  
D. S. RAJPUT ◽  
R. S. THAKUR ◽  
G. S. THAKUR

This paper presents Clustering Based Document classification and analysis of data. The proposed Clustering Based classification and analysis of data approach is based on Unsupervised and Supervised Document Classification. In this paper Unsupervised Document and Supervised Document Classification are used. In this approach Document collection, Text Preprocessing, Feature Selection, Indexing, Clustering Process and Results Analysis steps are used. Twenty News group data sets [20] are used in the Experiments. For experimental results analysis evaluated using the Analytical SAS 9.0 Software is used. The Experimental Results show the proposed approach out performs.


Author(s):  
Mojtaba Khanzadeh ◽  
Matthew Dantin ◽  
Wenmeng Tian ◽  
Matthew W. Priddy ◽  
Haley Doude ◽  
...  

Abstract The objective of this research is to study an effective thermal history prediction method for additive manufacturing (AM) processes using thermal image streams in a layer-wise manner. The need for immaculate integration of in-process sensing and data-driven approaches to monitor process dynamics in AM has been clearly stated in blueprint reports released by various U.S. agencies such as NIST and DoD over the past five years. Reliable physics-based models have been developed to delineate the underlying thermo-mechanical dynamics of AM processes; however, the computational cost is extremely high. We propose a tensor-based surrogate modeling methodology to predict the layer-wise relationship in the thermal history of the AM parts, which is time-efficient compared to available physics-based prediction models. We construct a network-tensor structure for freeform shapes based on thermal image streams obtained in metal-based AM process. Subsequently, we simplify the network-tensor structure by concatenating images to reach layer-wise structure. Succeeding layers are predicted based on antecedent layer using the tensor regression model. Generalized multilinear structure, called the higher-order partial least squares (HOPLS) is used to estimate the tensor regression model parameters. Through proposed method, high-dimensional thermal history of AM components can be predicted accurately in a computationally efficient manner. The proposed thermal history prediction is applied on simulated thermal images from finite element method (FEM) simulations. This shows that the proposed model can be used to enhance their performance alongside simulation-based models.


Author(s):  
Weichan Zhong ◽  
Xiaojun Chen ◽  
Guowen Yuan ◽  
Yiqin Li ◽  
Feiping Nie

In this paper, we propose a novel Adaptive Discriminant Analysis for semi-supervised feature selection, namely SADA. Instead of computing fixed similarities before performing feature selection, SADA simultaneously learns an adaptive similarity matrix S and a projection matrix W with an iterative method. In each iteration, S is computed from the projected distance with the learned W and W is computed with the learned S. Therefore, SADA can learn better projection matrix W by weakening the effect of noise features with the adaptive similarity matrix. Experimental results on 4 data sets show the superiority of SADA compared to 5 semisupervised feature selection methods.


Author(s):  
Shuangli Liao ◽  
Quanxue Gao ◽  
Feiping Nie ◽  
Yang Liu ◽  
Xiangdong Zhang

Feature selection plays a critical role in data mining, driven by increasing feature dimensionality in target problems. In this paper, we propose a new criterion for discriminative feature selection, worst-case discriminative feature selection (WDFS). Unlike Fisher Score and other methods based on the discriminative criteria considering the overall (or average) separation of data, WDFS adopts a new perspective called worst-case view which arguably is more suitable for classification applications. Specifically, WDFS directly maximizes the ratio of the minimum of between-class variance of all class pairs over the maximum of within-class variance, and thus it duly considers the separation of all classes. Otherwise, we take a greedy strategy by finding one feature at a time, but it is very easy to implement. Moreover, we also utilize the correlation between features to help reduce the redundancy and extend WDFS to uncorrelated WDFS (UWDFS). To evaluate the effectiveness of the proposed algorithm, we conduct classification experiments on many real data sets. In the experiment, we respectively use the original features and the score vectors of features over all class pairs to calculate the correlation coefficients, and analyze the experimental results in these two ways. Experimental results demonstrate the effectiveness of WDFS and UWDFS.


2020 ◽  
Vol 21 (S18) ◽  
Author(s):  
Sudipta Acharya ◽  
Laizhong Cui ◽  
Yi Pan

Abstract Background In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population. Results In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets. Conclusion A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.


Sign in / Sign up

Export Citation Format

Share Document