scholarly journals Factorial Analysis for Gas Leakage Risk Predictions from a Vehicle-Based Methane Survey

2021 ◽  
Vol 12 (1) ◽  
pp. 115
Author(s):  
Khongorzul Dashdondov ◽  
Mi-Hwa Song

Natural gas (NG), typically methane, is released into the air, causing significant air pollution and environmental and health problems. Nowadays, there is a need to use machine-based methods to predict gas losses widely. In this article, we proposed to predict NG leakage levels through feature selection based on a factorial analysis (FA) of the USA’s urban natural gas open data. The paper has been divided into three sections. First, we select essential features using FA. Then, the dataset is labeled by k-means clustering with OrdinalEncoder (OE)-based normalization. The final module uses five algorithms (extreme gradient boost (XGBoost), K-nearest neighbors (KNN), decision tree (DT), random forest (RF), Naive Bayes (NB), and multilayer perceptron (MLP)) to predict gas leakage levels. The proposed method is evaluated by the accuracy, F1-score, mean standard error (MSE), and area under the ROC curve (AUC). The test results indicate that the F-OE-based classification method has improved successfully. Moreover, F-OE-based XGBoost (F-OE-XGBoost) showed the best performance by giving 95.14% accuracy, an F1-score of 95.75%, an MSE of 0.028, and an AUC of 96.29%. Following these, the second-best outcomes of an accuracy rate of 95.09%, F1-score of 95.60%, MSE of 0.029, and AUC of 96.11% were achieved by the F-OE-RF model.

IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 139512-139528
Author(s):  
Shuangjie Li ◽  
Kaixiang Zhang ◽  
Qianru Chen ◽  
Shuqin Wang ◽  
Shaoqiang Zhang

Author(s):  
*Fadare Oluwaseun Gbenga ◽  
Adetunmbi Adebayo Olusola ◽  
(Mrs) Oyinloye Oghenerukevwe Eloho ◽  
Mogaji Stephen Alaba

The multiplication of malware variations is probably the greatest problem in PC security and the protection of information in form of source code against unauthorized access is a central issue in computer security. In recent times, machine learning has been extensively researched for malware detection and ensemble technique has been established to be highly effective in terms of detection accuracy. This paper proposes a framework that combines combining the exploit of both Chi-square as the feature selection method and eight ensemble learning classifiers on five base learners- K-Nearest Neighbors, Naïve Bayes, Support Vector Machine, Decision Trees, and Logistic Regression. K-Nearest Neighbors returns the highest accuracy of 95.37%, 87.89% on chi-square, and without feature selection respectively. Extreme Gradient Boosting Classifier ensemble accuracy is the highest with 97.407%, 91.72% with Chi-square as feature selection, and ensemble methods without feature selection respectively. Extreme Gradient Boosting Classifier and Random Forest are leading in the seven evaluative measures of chi-square as a feature selection method and ensemble methods without feature selection respectively. The study results show that the tree-based ensemble model is compelling for malware classification.


Author(s):  
Sang Michael Xie ◽  
Stefano Ermon

Many machine learning tasks require sampling a subset of items from a collection based on a parameterized distribution. The Gumbel-softmax trick can be used to sample a single item, and allows for low-variance reparameterized gradients with respect to the parameters of the underlying distribution. However, stochastic optimization involving subset sampling is typically not reparameterizable. To overcome this limitation, we define a continuous relaxation of subset sampling that provides reparameterization gradients by generalizing the Gumbel-max trick. We use this approach to sample subsets of features in an instance-wise feature selection task for model interpretability, subsets of neighbors to implement a deep stochastic k-nearest neighbors model, and sub-sequences of neighbors to implement parametric t-SNE by directly comparing the identities of local neighbors. We improve performance in all these tasks by incorporating subset sampling in end-to-end training.


Kursor ◽  
2018 ◽  
Vol 9 (1) ◽  
Author(s):  
Miftahus Sholihin

Classification aims to classify object into specific classes based on the value of the attribute associated with the object being observed. In this research designed a system that serves to classify Lamongan batik cloth based on color features using color moment, texture using Gray Level Co-occurence Matrix (GLCM), and shape using moment invariant, classification using K-Nearest Neighbors (K-NN) method. In outline the system was built consists of three main processes namely pre-processing, feature extraction, and classification. The highest accuracy rate in this study was 90.4% when the value of k = 6.


2020 ◽  
Author(s):  
Jinlong Liu ◽  
Christopher Ulishney ◽  
Cosmin E. Dumitrescu

Abstract Location of the peak cylinder pressure and the crank angle associated with half of the energy releases during the combustion process are generally used to define the engine combustion phasing and control the engine efficiency. To accelerate the optimization of a natural gas spark ignition internal combustion engine, this study proposes a black box modeling approach that will reduce the experimental or computational time needed to estimate the high efficient operating conditions at a particular engine speed and load via combustion phasing information. Specifically, a k-nearest neighbors (KNN) algorithm applied key engine operating variables such as the spark timing, air-fuel ratio, and engine speed as inputs to predict combustion phasing parameters such as the crank angles associated with peak cylinder pressure and 50% energy release. After training the correlative model, the selected engine variables produced acceptable errors for most operating conditions investigated. The results showed that the KNN algorithm predicted much better the location of the peak pressure than the location of the 50% energy release, as evidenced by the larger R2 values and smaller prediction errors. In addition, the regression model built in this study produced larger errors in the sparse-distributed region. Therefore, a more uniformly distributed training dataset is suggested for KNN algorithm, at least for the situations investigated in this research.


2014 ◽  
Vol 644-650 ◽  
pp. 4325-4329 ◽  
Author(s):  
Chen Chen Huang ◽  
Wei Gong ◽  
Wen Long Fu ◽  
Dong Yu Feng

—Feature extraction is a very important part in speaker recognition system. We proposed and implemented a speaker recognition algorithm based on the VQ and weighted fisher ratio of MFCC. To evaluate performance of this algorithm, we built a small speaker recognition system based on the MATLAB. Compared with the traditional feature selection methods, the characteristic vector obtained via this algorithm has the greatest degree of differentiation in the same dimension. According to the test results, the speaker recognition algorithm we proposed in this paper, can significantly increase the accuracy rate of training and recognition, and reduce the data required by calculation, in the case of keeping a higher recognition rate.


Sign in / Sign up

Export Citation Format

Share Document