Outburst prediction and influencing factors analysis based on Boruta-Apriori and BO-SVM algorithms

2021 ◽  
pp. 1-18
Author(s):  
Zhang Zixian ◽  
Liu Xuning ◽  
Li Zhixiang ◽  
Hu Hongqiang

The influencing factors of coal and gas outburst are complex, now the accuracy and efficiency of outburst prediction and are not high, in order to obtain the effective features from influencing factors and realize the accurate and fast dynamic prediction of coal and gas outburst, this article proposes an outburst prediction model based on the coupling of feature selection and intelligent optimization classifier. Firstly, in view of the redundancy and irrelevance of the influencing factors of coal and gas outburst, we use Boruta feature selection method obtain the optimal feature subset from influencing factors of coal and gas outburst. Secondly, based on Apriori association rules mining method, the internal association relationship between coal and gas outburst influencing factors is mined, and the strong association rules existing in the influencing factors and samples that affect the classification of coal and gas outburst are extracted. Finally, svm is used to classify coal and gas outbursts based on the above obtained optimal feature subset and sample data, and Bayesian optimization algorithm is used to optimize the kernel parameters of svm, and the coal and gas outburst pattern recognition prediction model is established, which is compared with the existing coal and gas outbursts prediction model in literatures. Compared with the method of feature selection and association rules mining alone, the proposed model achieves the highest prediction accuracy of 93% when the feature dimension is 3, which is higher than that of Apriori association rules and Boruta feature selection, and the classification accuracy is significantly improved, However, the feature dimension decreased significantly; The results show that the proposed model is better than other prediction models, which further verifies the accuracy and applicability of the coupling prediction model, and has high stability and robustness.

Author(s):  
ZENGLIN XU ◽  
IRWIN KING ◽  
MICHAEL R. LYU

Feature selection is an important task in pattern recognition. Support Vector Machine (SVM) and Minimax Probability Machine (MPM) have been successfully used as the classification framework for feature selection. However, these paradigms cannot automatically control the balance between prediction accuracy and the number of selected features. In addition, the selected feature subsets are also not stable in different data partitions. Minimum Error Minimax Probability Machine (MEMPM) has been proposed for classification recently. In this paper, we outline MEMPM to select the optimal feature subset with good stability and automatic balance between prediction accuracy and the size of feature subset. The experiments against feature selection with SVM and MPM show the advantages of the proposed MEMPM formulation in stability and automatic balance between the feature subset size and the prediction accuracy.


Author(s):  
Hui Wang ◽  
Li Li Guo ◽  
Yun Lin

Automatic modulation recognition is very important for the receiver design in the broadband multimedia communication system, and the reasonable signal feature extraction and selection algorithm is the key technology of Digital multimedia signal recognition. In this paper, the information entropy is used to extract the single feature, which are power spectrum entropy, wavelet energy spectrum entropy, singular spectrum entropy and Renyi entropy. And then, the feature selection algorithm of distance measurement and Sequential Feature Selection(SFS) are presented to select the optimal feature subset. Finally, the BP neural network is used to classify the signal modulation. The simulation result shows that the four-different information entropy can be used to classify different signal modulation, and the feature selection algorithm is successfully used to choose the optimal feature subset and get the best performance.


2019 ◽  
Vol 5 ◽  
pp. e237 ◽  
Author(s):  
Davide Nardone ◽  
Angelo Ciaramella ◽  
Antonino Staiano

In this work, we propose a novel Feature Selection framework called Sparse-Modeling Based Approach for Class Specific Feature Selection (SMBA-CSFS), that simultaneously exploits the idea of Sparse Modeling and Class-Specific Feature Selection. Feature selection plays a key role in several fields (e.g., computational biology), making it possible to treat models with fewer variables which, in turn, are easier to explain, by providing valuable insights on the importance of their role, and likely speeding up the experimental validation. Unfortunately, also corroborated by the no free lunch theorems, none of the approaches in literature is the most apt to detect the optimal feature subset for building a final model, thus it still represents a challenge. The proposed feature selection procedure conceives a two-step approach: (a) a sparse modeling-based learning technique is first used to find the best subset of features, for each class of a training set; (b) the discovered feature subsets are then fed to a class-specific feature selection scheme, in order to assess the effectiveness of the selected features in classification tasks. To this end, an ensemble of classifiers is built, where each classifier is trained on its own feature subset discovered in the previous phase, and a proper decision rule is adopted to compute the ensemble responses. In order to evaluate the performance of the proposed method, extensive experiments have been performed on publicly available datasets, in particular belonging to the computational biology field where feature selection is indispensable: the acute lymphoblastic leukemia and acute myeloid leukemia, the human carcinomas, the human lung carcinomas, the diffuse large B-cell lymphoma, and the malignant glioma. SMBA-CSFS is able to identify/retrieve the most representative features that maximize the classification accuracy. With top 20 and 80 features, SMBA-CSFS exhibits a promising performance when compared to its competitors from literature, on all considered datasets, especially those with a higher number of features. Experiments show that the proposed approach may outperform the state-of-the-art methods when the number of features is high. For this reason, the introduced approach proposes itself for selection and classification of data with a large number of features and classes.


Feature selection in multispectral high dimensional information is a hard labour machine learning problem because of the imbalanced classes present in the data. The existing Most of the feature selection schemes in the literature ignore the problem of class imbalance by choosing the features from the classes having more instances and avoiding significant features of the classes having less instances. In this paper, SMOTE concept is exploited to produce the required samples form minority classes. Feature selection model is formulated with the objective of reducing number of features with improved classification performance. This model is based on dimensionality reduction by opt for a subset of relevant spectral, textural and spatial features while eliminating the redundant features for the purpose of improved classification performance. Binary ALO is engaged to solve the feature selection model for optimal selection of features. The proposed ALO-SVM with wrapper concept is applied to each potential solution obtained during optimization step. The working of this methodology is tested on LANDSAT multispectral image.


The optimal feature subset selection over very high dimensional data is a vital issue. Even though the optimal features are selected, the classification of those selected features becomes a key complicated task. In order to handle these problems, a novel, Accelerated Simulated Annealing and Mutation Operator (ASAMO) feature selection algorithm is suggested in this work. For solving the classification problem, the Fuzzy Minimal Consistent Class Subset Coverage (FMCCSC) problem is introduced. In FMCCSC, consistent subset is combined with the K-Nearest Neighbour (KNN) classifier known as FMCCSC-KNN classifier. The two data sets Dorothea and Madelon from UCI machine repository are experimented for optimal feature selection and classification. The experimental results substantiate the efficiency of proposed ASAMO with FMCCSC-KNN classifier compared to Particle Swarm Optimization (PSO) and Accelerated PSO feature selection algorithms.


2020 ◽  
Vol 17 (5) ◽  
pp. 721-730
Author(s):  
Kamal Bashir ◽  
Tianrui Li ◽  
Mahama Yahaya

The most frequently used machine learning feature ranking approaches failed to present optimal feature subset for accurate prediction of defective software modules in out-of-sample data. Machine learning Feature Selection (FS) algorithms such as Chi-Square (CS), Information Gain (IG), Gain Ratio (GR), RelieF (RF) and Symmetric Uncertainty (SU) perform relatively poor at prediction, even after balancing class distribution in the training data. In this study, we propose a novel FS method based on the Maximum Likelihood Logistic Regression (MLLR). We apply this method on six software defect datasets in their sampled and unsampled forms to select useful features for classification in the context of Software Defect Prediction (SDP). The Support Vector Machine (SVM) and Random Forest (RaF) classifiers are applied on the FS subsets that are based on sampled and unsampled datasets. The performance of the models captured using Area Ander Receiver Operating Characteristics Curve (AUC) metrics are compared for all FS methods considered. The Analysis Of Variance (ANOVA) F-test results validate the superiority of the proposed method over all the FS techniques, both in sampled and unsampled data. The results confirm that the MLLR can be useful in selecting optimal feature subset for more accurate prediction of defective modules in software development process


2011 ◽  
Vol 2 (1) ◽  
pp. 18-33 ◽  
Author(s):  
Liau Heng Fui ◽  
Dino Isa

Feature selection is crucial to select an “optimized” subset of features from the original feature set based on a certain objective function. In general, feature selection removes redundant or irrelevant data while retaining classification accuracy. This paper proposes a feature selection algorithm that aims to minimize the area under the curve of detection error trade-off (DET) curve. Particle swarm optimization (PSO) is employed to search for the optimal feature subset. The proposed method is implemented in face recognition and iris recognition systems. The result shows that the proposed method is able to find an optimal subset of features that sufficiently describes iris and face images by removing unwanted and redundant features and at the same time improving the classification accuracy in terms of total error rate (TER).


Author(s):  
Amit Kumar ◽  
Manish Kumar ◽  
Nidhya R.

In recent years, a huge increase in the demand of medically related data is reported. Due to this, research in medical disease diagnosis has emerged as one of the most demanding research domains. The research reported in this chapter is based on developing an ACO (ant colony optimization)-based Bayesian hybrid prediction model for medical disease diagnosis. The proposed model is presented in two phases. In the first phase, the authors deal with feature selection by using the application of a nature-inspired algorithm known as ACO. In the second phase, they use the obtained feature subset as input for the naïve Bayes (NB) classifier for enhancing the classification performances over medical domain data sets. They have considered 12 datasets from different organizations for experimental purpose. The experimental analysis advocates the superiority of the presented model in dealing with medical data for disease prediction and diagnosis.


Sign in / Sign up

Export Citation Format

Share Document