Factorial Analysis for Gas Leakage Risk Predictions from a Vehicle-Based Methane Survey

Natural gas (NG), typically methane, is released into the air, causing significant air pollution and environmental and health problems. Nowadays, there is a need to use machine-based methods to predict gas losses widely. In this article, we proposed to predict NG leakage levels through feature selection based on a factorial analysis (FA) of the USA’s urban natural gas open data. The paper has been divided into three sections. First, we select essential features using FA. Then, the dataset is labeled by k-means clustering with OrdinalEncoder (OE)-based normalization. The final module uses five algorithms (extreme gradient boost (XGBoost), K-nearest neighbors (KNN), decision tree (DT), random forest (RF), Naive Bayes (NB), and multilayer perceptron (MLP)) to predict gas leakage levels. The proposed method is evaluated by the accuracy, F1-score, mean standard error (MSE), and area under the ROC curve (AUC). The test results indicate that the F-OE-based classification method has improved successfully. Moreover, F-OE-based XGBoost (F-OE-XGBoost) showed the best performance by giving 95.14% accuracy, an F1-score of 95.75%, an MSE of 0.028, and an AUC of 96.29%. Following these, the second-best outcomes of an accuracy rate of 95.09%, F1-score of 95.60%, MSE of 0.029, and AUC of 96.11% were achieved by the F-OE-RF model.

Download Full-text

Feature Selection for High Dimensional Data Using Weighted K-Nearest Neighbors and Genetic Algorithm

IEEE Access ◽

10.1109/access.2020.3012768 ◽

2020 ◽

Vol 8 ◽

pp. 139512-139528

Author(s):

Shuangjie Li ◽

Kaixiang Zhang ◽

Qianru Chen ◽

Shuqin Wang ◽

Shaoqiang Zhang

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

High Dimensional Data ◽

Nearest Neighbors ◽

High Dimensional ◽

K Nearest Neighbors ◽

Selection For

Download Full-text

Towards Optimization of Malware Detection using Chi-square Feature Selection on Ensemble Classifiers

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d2359.0410421 ◽

2021 ◽

Vol 10 (4) ◽

pp. 254-262

Author(s):

*Fadare Oluwaseun Gbenga ◽

Adetunmbi Adebayo Olusola ◽

(Mrs) Oyinloye Oghenerukevwe Eloho ◽

Mogaji Stephen Alaba

Keyword(s):

Feature Selection ◽

Malware Detection ◽

Feature Selection Method ◽

Ensemble Methods ◽

Nearest Neighbors ◽

Selection Method ◽

Gradient Boosting ◽

K Nearest Neighbors ◽

Chi Square ◽

Extreme Gradient Boosting

The multiplication of malware variations is probably the greatest problem in PC security and the protection of information in form of source code against unauthorized access is a central issue in computer security. In recent times, machine learning has been extensively researched for malware detection and ensemble technique has been established to be highly effective in terms of detection accuracy. This paper proposes a framework that combines combining the exploit of both Chi-square as the feature selection method and eight ensemble learning classifiers on five base learners- K-Nearest Neighbors, Naïve Bayes, Support Vector Machine, Decision Trees, and Logistic Regression. K-Nearest Neighbors returns the highest accuracy of 95.37%, 87.89% on chi-square, and without feature selection respectively. Extreme Gradient Boosting Classifier ensemble accuracy is the highest with 97.407%, 91.72% with Chi-square as feature selection, and ensemble methods without feature selection respectively. Extreme Gradient Boosting Classifier and Random Forest are leading in the seven evaluative measures of chi-square as a feature selection method and ensemble methods without feature selection respectively. The study results show that the tree-based ensemble model is compelling for malware classification.

Download Full-text

Reparameterizable Subset Sampling via Continuous Relaxations

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/544 ◽

2019 ◽

Author(s):

Sang Michael Xie ◽

Stefano Ermon

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Stochastic Optimization ◽

Nearest Neighbors ◽

Improve Performance ◽

K Nearest Neighbors ◽

Single Item ◽

Underlying Distribution ◽

Learning Tasks ◽

Continuous Relaxation

Many machine learning tasks require sampling a subset of items from a collection based on a parameterized distribution. The Gumbel-softmax trick can be used to sample a single item, and allows for low-variance reparameterized gradients with respect to the parameters of the underlying distribution. However, stochastic optimization involving subset sampling is typically not reparameterizable. To overcome this limitation, we define a continuous relaxation of subset sampling that provides reparameterization gradients by generalizing the Gumbel-max trick. We use this approach to sample subsets of features in an instance-wise feature selection task for model interpretability, subsets of neighbors to implement a deep stochastic k-nearest neighbors model, and sub-sequences of neighbors to implement parametric t-SNE by directly comparing the identities of local neighbors. We improve performance in all these tasks by incorporating subset sampling in end-to-end training.

Download Full-text

Predicting the Combustion Phasing of a Natural Gas Spark Ignition Engine Using the K-Nearest Neighbors Algorithm

10.1115/1.0004369v ◽

2021 ◽

Author(s):

Jinlong Liu ◽

Christopher Ulishney ◽

Cosmin Dumitrescu

Keyword(s):

Natural Gas ◽

Nearest Neighbors ◽

Spark Ignition ◽

Spark Ignition Engine ◽

K Nearest Neighbors

Download Full-text

Weighted k-Nearest Neighbors Feature Selection (WkNN-FS)

10.31988/scitrends.48207 ◽

2019 ◽

Author(s):

Peter Drotár

Keyword(s):

Feature Selection ◽

Nearest Neighbors ◽

K Nearest Neighbors

Download Full-text

Weighted k-nearest neighbors feature selection for high-dimensional multi-class data

2019 IEEE International Conference on Systems, Man and Cybernetics (SMC) ◽

10.1109/smc.2019.8914434 ◽

2019 ◽

Author(s):

Peter Bugata ◽

Peter Drotar

Keyword(s):

Feature Selection ◽

Nearest Neighbors ◽

High Dimensional ◽

K Nearest Neighbors ◽

Selection For

Download Full-text

CLASSIFICATION OF BATIK LAMONGAN BASED ON FEATURES OF COLOR, TEXTURE AND SHAPE

Kursor ◽

10.28961/kursor.v9i1.114 ◽

2018 ◽

Vol 9 (1) ◽

Author(s):

Miftahus Sholihin

Keyword(s):

Feature Extraction ◽

Nearest Neighbors ◽

Gray Level ◽

Accuracy Rate ◽

K Nearest Neighbors ◽

Moment Invariant ◽

Color Features ◽

Color Moment ◽

Color Texture

Classification aims to classify object into specific classes based on the value of the attribute associated with the object being observed. In this research designed a system that serves to classify Lamongan batik cloth based on color features using color moment, texture using Gray Level Co-occurence Matrix (GLCM), and shape using moment invariant, classification using K-Nearest Neighbors (K-NN) method. In outline the system was built consists of three main processes namely pre-processing, feature extraction, and classification. The highest accuracy rate in this study was 90.4% when the value of k = 6.

Download Full-text

Enhancement of Performance of K-Nearest Neighbors Classifiers for the Prediction of Diabetes Using Feature Selection Method

2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA) ◽

10.1109/iccca49541.2020.9250887 ◽

2020 ◽

Author(s):

Subhash Chandra Gupta ◽

Noopur Goel

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Nearest Neighbors ◽

Selection Method ◽

K Nearest Neighbors ◽

Prediction Of Diabetes

Download Full-text

Predicting the Combustion Phasing of a Natural Gas Spark Ignition Engine Using the K-Nearest Neighbors Algorithm

Volume 8: Energy ◽

10.1115/imece2020-23982 ◽

2020 ◽

Author(s):

Jinlong Liu ◽

Christopher Ulishney ◽

Cosmin E. Dumitrescu

Keyword(s):

Natural Gas ◽

Energy Release ◽

Engine Speed ◽

Nearest Neighbors ◽

Spark Ignition ◽

Operating Conditions ◽

Computational Time ◽

Cylinder Pressure ◽

K Nearest Neighbors ◽

Peak Cylinder Pressure

Abstract Location of the peak cylinder pressure and the crank angle associated with half of the energy releases during the combustion process are generally used to define the engine combustion phasing and control the engine efficiency. To accelerate the optimization of a natural gas spark ignition internal combustion engine, this study proposes a black box modeling approach that will reduce the experimental or computational time needed to estimate the high efficient operating conditions at a particular engine speed and load via combustion phasing information. Specifically, a k-nearest neighbors (KNN) algorithm applied key engine operating variables such as the spark timing, air-fuel ratio, and engine speed as inputs to predict combustion phasing parameters such as the crank angles associated with peak cylinder pressure and 50% energy release. After training the correlative model, the selected engine variables produced acceptable errors for most operating conditions investigated. The results showed that the KNN algorithm predicted much better the location of the peak pressure than the location of the 50% energy release, as evidenced by the larger R2 values and smaller prediction errors. In addition, the regression model built in this study produced larger errors in the sparse-distributed region. Therefore, a more uniformly distributed training dataset is suggested for KNN algorithm, at least for the situations investigated in this research.

Download Full-text

A Research of Speaker Recognition Based on VQ and MFCC

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.644-650.4325 ◽

2014 ◽

Vol 644-650 ◽

pp. 4325-4329 ◽

Cited By ~ 1

Author(s):

Chen Chen Huang ◽

Wei Gong ◽

Wen Long Fu ◽

Dong Yu Feng

Keyword(s):

Feature Selection ◽

Speaker Recognition ◽

Recognition Rate ◽

Recognition System ◽

Recognition Algorithm ◽

Test Results ◽

Selection Methods ◽

Accuracy Rate ◽

Fisher Ratio ◽

Degree Of Differentiation

—Feature extraction is a very important part in speaker recognition system. We proposed and implemented a speaker recognition algorithm based on the VQ and weighted fisher ratio of MFCC. To evaluate performance of this algorithm, we built a small speaker recognition system based on the MATLAB. Compared with the traditional feature selection methods, the characteristic vector obtained via this algorithm has the greatest degree of differentiation in the same dimension. According to the test results, the speaker recognition algorithm we proposed in this paper, can significantly increase the accuracy rate of training and recognition, and reduce the data required by calculation, in the case of keeping a higher recognition rate.

Download Full-text