Outburst prediction and influencing factors analysis based on Boruta-Apriori and BO-SVM algorithms

The influencing factors of coal and gas outburst are complex, now the accuracy and efficiency of outburst prediction and are not high, in order to obtain the effective features from influencing factors and realize the accurate and fast dynamic prediction of coal and gas outburst, this article proposes an outburst prediction model based on the coupling of feature selection and intelligent optimization classifier. Firstly, in view of the redundancy and irrelevance of the influencing factors of coal and gas outburst, we use Boruta feature selection method obtain the optimal feature subset from influencing factors of coal and gas outburst. Secondly, based on Apriori association rules mining method, the internal association relationship between coal and gas outburst influencing factors is mined, and the strong association rules existing in the influencing factors and samples that affect the classification of coal and gas outburst are extracted. Finally, svm is used to classify coal and gas outbursts based on the above obtained optimal feature subset and sample data, and Bayesian optimization algorithm is used to optimize the kernel parameters of svm, and the coal and gas outburst pattern recognition prediction model is established, which is compared with the existing coal and gas outbursts prediction model in literatures. Compared with the method of feature selection and association rules mining alone, the proposed model achieves the highest prediction accuracy of 93% when the feature dimension is 3, which is higher than that of Apriori association rules and Boruta feature selection, and the classification accuracy is significantly improved, However, the feature dimension decreased significantly; The results show that the proposed model is better than other prediction models, which further verifies the accuracy and applicability of the coupling prediction model, and has high stability and robustness.

Download Full-text

Coal and gas outburst prediction model based on extension theory and its application

Process Safety and Environmental Protection ◽

10.1016/j.psep.2021.08.023 ◽

2021 ◽

Author(s):

Wei Wang ◽

Hanpeng Wang ◽

Bing Zhang ◽

Su Wang ◽

Wenbin Xing

Keyword(s):

Prediction Model ◽

Coal And Gas Outburst ◽

Extension Theory ◽

Gas Outburst ◽

Model Based

Download Full-text

FEATURE SELECTION BASED ON MINIMUM ERROR MINIMAX PROBABILITY MACHINE

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001407005958 ◽

2007 ◽

Vol 21 (08) ◽

pp. 1279-1292 ◽

Cited By ~ 5

Author(s):

ZENGLIN XU ◽

IRWIN KING ◽

MICHAEL R. LYU

Keyword(s):

Feature Selection ◽

Prediction Accuracy ◽

Support Vector ◽

Feature Subset ◽

Minimum Error ◽

Automatic Balance ◽

Classification Framework ◽

Data Partitions ◽

Minimax Probability Machine ◽

Optimal Feature Subset

Feature selection is an important task in pattern recognition. Support Vector Machine (SVM) and Minimax Probability Machine (MPM) have been successfully used as the classification framework for feature selection. However, these paradigms cannot automatically control the balance between prediction accuracy and the number of selected features. In addition, the selected feature subsets are also not stable in different data partitions. Minimum Error Minimax Probability Machine (MEMPM) has been proposed for classification recently. In this paper, we outline MEMPM to select the optimal feature subset with good stability and automatic balance between prediction accuracy and the size of feature subset. The experiments against feature selection with SVM and MPM show the advantages of the proposed MEMPM formulation in stability and automatic balance between the feature subset size and the prediction accuracy.

Download Full-text

Modulation Recognition of Digital Multimedia Signal Based on Data Feature Selection

International Journal of Mobile Computing and Multimedia Communications ◽

10.4018/ijmcmc.2017070107 ◽

2017 ◽

Vol 8 (3) ◽

pp. 90-111 ◽

Cited By ~ 2

Author(s):

Hui Wang ◽

Li Li Guo ◽

Yun Lin

Keyword(s):

Feature Selection ◽

Information Entropy ◽

Feature Subset ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Modulation Recognition ◽

Signal Modulation ◽

Digital Multimedia ◽

Optimal Feature Subset ◽

Optimal Feature

Automatic modulation recognition is very important for the receiver design in the broadband multimedia communication system, and the reasonable signal feature extraction and selection algorithm is the key technology of Digital multimedia signal recognition. In this paper, the information entropy is used to extract the single feature, which are power spectrum entropy, wavelet energy spectrum entropy, singular spectrum entropy and Renyi entropy. And then, the feature selection algorithm of distance measurement and Sequential Feature Selection(SFS) are presented to select the optimal feature subset. Finally, the BP neural network is used to classify the signal modulation. The simulation result shows that the four-different information entropy can be used to classify different signal modulation, and the feature selection algorithm is successfully used to choose the optimal feature subset and get the best performance.

Download Full-text

A Sparse-Modeling Based Approach for Class Specific Feature Selection

PeerJ Computer Science ◽

10.7717/peerj-cs.237 ◽

2019 ◽

Vol 5 ◽

pp. e237 ◽

Cited By ~ 2

Author(s):

Davide Nardone ◽

Angelo Ciaramella ◽

Antonino Staiano

Keyword(s):

Feature Selection ◽

Computational Biology ◽

Lymphoblastic Leukemia ◽

Selection Procedure ◽

B Cell Lymphoma ◽

Feature Subset ◽

Sparse Modeling ◽

Selection Scheme ◽

Selection Framework ◽

Optimal Feature Subset

In this work, we propose a novel Feature Selection framework called Sparse-Modeling Based Approach for Class Specific Feature Selection (SMBA-CSFS), that simultaneously exploits the idea of Sparse Modeling and Class-Specific Feature Selection. Feature selection plays a key role in several fields (e.g., computational biology), making it possible to treat models with fewer variables which, in turn, are easier to explain, by providing valuable insights on the importance of their role, and likely speeding up the experimental validation. Unfortunately, also corroborated by the no free lunch theorems, none of the approaches in literature is the most apt to detect the optimal feature subset for building a final model, thus it still represents a challenge. The proposed feature selection procedure conceives a two-step approach: (a) a sparse modeling-based learning technique is first used to find the best subset of features, for each class of a training set; (b) the discovered feature subsets are then fed to a class-specific feature selection scheme, in order to assess the effectiveness of the selected features in classification tasks. To this end, an ensemble of classifiers is built, where each classifier is trained on its own feature subset discovered in the previous phase, and a proper decision rule is adopted to compute the ensemble responses. In order to evaluate the performance of the proposed method, extensive experiments have been performed on publicly available datasets, in particular belonging to the computational biology field where feature selection is indispensable: the acute lymphoblastic leukemia and acute myeloid leukemia, the human carcinomas, the human lung carcinomas, the diffuse large B-cell lymphoma, and the malignant glioma. SMBA-CSFS is able to identify/retrieve the most representative features that maximize the classification accuracy. With top 20 and 80 features, SMBA-CSFS exhibits a promising performance when compared to its competitors from literature, on all considered datasets, especially those with a higher number of features. Experiments show that the proposed approach may outperform the state-of-the-art methods when the number of features is high. For this reason, the introduced approach proposes itself for selection and classification of data with a large number of features and classes.

Download Full-text

Optimal Feature Subset Selection for Imbalanced Class Data using SMOTE and Binary ALO Algorithm

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c4734.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 344-349

Keyword(s):

Feature Selection ◽

Class Imbalance ◽

Classification Performance ◽

Selection Model ◽

Feature Subset Selection ◽

Feature Subset ◽

Spatial Features ◽

Imbalanced Classes ◽

Optimal Feature Subset ◽

Optimal Feature

Feature selection in multispectral high dimensional information is a hard labour machine learning problem because of the imbalanced classes present in the data. The existing Most of the feature selection schemes in the literature ignore the problem of class imbalance by choosing the features from the classes having more instances and avoiding significant features of the classes having less instances. In this paper, SMOTE concept is exploited to produce the required samples form minority classes. Feature selection model is formulated with the objective of reducing number of features with improved classification performance. This model is based on dimensionality reduction by opt for a subset of relevant spectral, textural and spatial features while eliminating the redundant features for the purpose of improved classification performance. Binary ALO is engaged to solve the feature selection model for optimal selection of features. The proposed ALO-SVM with wrapper concept is applied to each potential solution obtained during optimization step. The working of this methodology is tested on LANDSAT multispectral image.

Download Full-text

Accelerated Simulated Annealing and Mutation Operator Feature Selection method for Big Data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1712.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 910-916

Keyword(s):

Feature Selection ◽

Simulated Annealing ◽

Feature Selection Method ◽

Classification Problem ◽

Feature Subset Selection ◽

Feature Subset ◽

Mutation Operator ◽

Knn Classifier ◽

Optimal Feature Subset ◽

Optimal Feature

The optimal feature subset selection over very high dimensional data is a vital issue. Even though the optimal features are selected, the classification of those selected features becomes a key complicated task. In order to handle these problems, a novel, Accelerated Simulated Annealing and Mutation Operator (ASAMO) feature selection algorithm is suggested in this work. For solving the classification problem, the Fuzzy Minimal Consistent Class Subset Coverage (FMCCSC) problem is introduced. In FMCCSC, consistent subset is combined with the K-Nearest Neighbour (KNN) classifier known as FMCCSC-KNN classifier. The two data sets Dorothea and Madelon from UCI machine repository are experimented for optimal feature selection and classification. The experimental results substantiate the efficiency of proposed ASAMO with FMCCSC-KNN classifier compared to Particle Swarm Optimization (PSO) and Accelerated PSO feature selection algorithms.

Download Full-text

Prediction Model of Coal and Gas Outburst Based on Rough Set-Unascertained Measure Theory

Journal of Engineering and Technological Sciences ◽

10.5614/j.eng.technol.sci.2018.50.6.2 ◽

2019 ◽

Vol 50 (6) ◽

pp. 758

Author(s):

Weidong Gong

Keyword(s):

Prediction Model ◽

Rough Set ◽

Measure Theory ◽

Coal And Gas Outburst ◽

Gas Outburst

Download Full-text

A Novel Feature Selection Method Based on Maximum Likelihood Logistic Regression for Imbalanced Learning in Software Defect Prediction

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/5/5 ◽

2020 ◽

Vol 17 (5) ◽

pp. 721-730

Author(s):

Kamal Bashir ◽

Tianrui Li ◽

Mahama Yahaya

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Feature Selection ◽

Maximum Likelihood ◽

Defect Prediction ◽

Feature Subset ◽

Software Defect Prediction ◽

Software Defect ◽

Optimal Feature Subset ◽

Optimal Feature

The most frequently used machine learning feature ranking approaches failed to present optimal feature subset for accurate prediction of defective software modules in out-of-sample data. Machine learning Feature Selection (FS) algorithms such as Chi-Square (CS), Information Gain (IG), Gain Ratio (GR), RelieF (RF) and Symmetric Uncertainty (SU) perform relatively poor at prediction, even after balancing class distribution in the training data. In this study, we propose a novel FS method based on the Maximum Likelihood Logistic Regression (MLLR). We apply this method on six software defect datasets in their sampled and unsampled forms to select useful features for classification in the context of Software Defect Prediction (SDP). The Support Vector Machine (SVM) and Random Forest (RaF) classifiers are applied on the FS subsets that are based on sampled and unsampled datasets. The performance of the models captured using Area Ander Receiver Operating Characteristics Curve (AUC) metrics are compared for all FS methods considered. The Analysis Of Variance (ANOVA) F-test results validate the superiority of the proposed method over all the FS techniques, both in sampled and unsampled data. The results confirm that the MLLR can be useful in selecting optimal feature subset for more accurate prediction of defective modules in software development process

Download Full-text

Feature Selection Based on Minimizing the Area Under the Detection Error Tradeoff Curve

International Journal of Applied Evolutionary Computation ◽

10.4018/jaec.2011010102 ◽

2011 ◽

Vol 2 (1) ◽

pp. 18-33 ◽

Cited By ~ 1

Author(s):

Liau Heng Fui ◽

Dino Isa

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Iris Recognition ◽

Total Error ◽

Area Under The Curve ◽

Feature Subset ◽

Detection Error ◽

Total Error Rate ◽

Optimal Subset ◽

Optimal Feature Subset

Feature selection is crucial to select an “optimized” subset of features from the original feature set based on a certain objective function. In general, feature selection removes redundant or irrelevant data while retaining classification accuracy. This paper proposes a feature selection algorithm that aims to minimize the area under the curve of detection error trade-off (DET) curve. Particle swarm optimization (PSO) is employed to search for the optimal feature subset. The proposed method is implemented in face recognition and iris recognition systems. The result shows that the proposed method is able to find an optimal subset of features that sufficiently describes iris and face images by removing unwanted and redundant features and at the same time improving the classification accuracy in terms of total error rate (TER).

Download Full-text

ACO_NB-Based Hybrid Prediction Model for Medical Disease Diagnosis

Handbook of Research on Disease Prediction Through Data Analytics and Machine Learning - Advances in Medical Diagnosis, Treatment, and Care ◽

10.4018/978-1-7998-2742-9.ch026 ◽

2021 ◽

pp. 526-536

Author(s):

Amit Kumar ◽

Manish Kumar ◽

Nidhya R.

Keyword(s):

Prediction Model ◽

Disease Diagnosis ◽

Second Phase ◽

Data Sets ◽

Feature Subset ◽

Hybrid Prediction ◽

Related Data ◽

Medical Disease ◽

Proposed Model ◽

Two Phases

In recent years, a huge increase in the demand of medically related data is reported. Due to this, research in medical disease diagnosis has emerged as one of the most demanding research domains. The research reported in this chapter is based on developing an ACO (ant colony optimization)-based Bayesian hybrid prediction model for medical disease diagnosis. The proposed model is presented in two phases. In the first phase, the authors deal with feature selection by using the application of a nature-inspired algorithm known as ACO. In the second phase, they use the obtained feature subset as input for the naïve Bayes (NB) classifier for enhancing the classification performances over medical domain data sets. They have considered 12 datasets from different organizations for experimental purpose. The experimental analysis advocates the superiority of the presented model in dealing with medical data for disease prediction and diagnosis.

Download Full-text