Comparative study on total nitrogen prediction in wastewater treatment plant and effect of various feature selection methods on machine learning algorithms performance

2021 ◽  
Vol 41 ◽  
pp. 102033
Author(s):  
Faramarz Bagherzadeh ◽  
Mohamad-Javad Mehrani ◽  
Milad Basirifard ◽  
Javad Roostaei
Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4716
Author(s):  
Federico Cangialosi ◽  
Edoardo Bruno ◽  
Gabriella De Santis

The development of low-cost sensors, the introduction of technical performance specifications, and increasingly effective machine learning algorithms for managing big data have led to a growing interest in the use of instrumental odor monitoring systems (IOMS) for odor measurements from industrial plants. The classification and quantification of odor concentration are the main goals of IOMS installed inside industrial plants in order to identify the most important odor sources and to assess whether the regulatory thresholds have been exceeded. This paper illustrates the use of two machine learning algorithms applied to the concurrent classification and quantification of odors. Random Forest was employed, which is a machine learning algorithm that thus far has not been used in the field of odor quantification and classification for complex industrial situations. Furthermore, the results were compared with commonly used algorithms in this field, such as artificial neural network (ANN), which was here employed in the form of a deep neural network. Both techniques were applied to the data collected from an IOMS installed for fenceline monitoring at a wastewater treatment plant. Cohen’s kappa and Normalized RMSE are used as specifical performance indicators for classification and regression: the indicators were calculated for the test dataset, and the results were compared with data in the literature obtained in contexts of similar complexity. A Cohen’s kappa of 97% was reached for the classification task, while the best Normalized RMSE, namely 4%, for the interval 20–2435 ouE/m3 was obtained with Random Forest.


2018 ◽  
Vol 14 (1) ◽  
pp. 64-73 ◽  
Author(s):  
ShaoPeng Wang ◽  
Deling Wang ◽  
JiaRui Li ◽  
Tao Huang ◽  
Yu-Dong Cai

Several machine learning algorithms were adopted to investigate cleavage sites in a signal peptide. An optimal dagging based classifier was constructed and 870 important features were deemed to be important for this classifier.


Author(s):  
Mohammad Almseidin ◽  
AlMaha Abu Zuraiq ◽  
Mouhammd Al-kasassbeh ◽  
Nidal Alnidami

With increasing technology developments, the Internet has become everywhere and accessible by everyone. There are a considerable number of web-pages with different benefits. Despite this enormous number, not all of these sites are legitimate. There are so-called phishing sites that deceive users into serving their interests. This paper dealt with this problem using machine learning algorithms in addition to employing a novel dataset that related to phishing detection, which contains 5000 legitimate web-pages and 5000 phishing ones. In order to obtain the best results, various machine learning algorithms were tested. Then J48, Random forest, and Multilayer perceptron were chosen. Different feature selection tools were employed to the dataset in order to improve the efficiency of the models. The best result of the experiment achieved by utilizing 20 features out of 48 features and applying it to Random forest algorithm. The accuracy was 98.11%.


Forests ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 216
Author(s):  
Mi Luo ◽  
Yifu Wang ◽  
Yunhong Xie ◽  
Lai Zhou ◽  
Jingjing Qiao ◽  
...  

Increasing numbers of explanatory variables tend to result in information redundancy and “dimensional disaster” in the quantitative remote sensing of forest aboveground biomass (AGB). Feature selection of model factors is an effective method for improving the accuracy of AGB estimates. Machine learning algorithms are also widely used in AGB estimation, although little research has addressed the use of the categorical boosting algorithm (CatBoost) for AGB estimation. Both feature selection and regression for AGB estimation models are typically performed with the same machine learning algorithm, but there is no evidence to suggest that this is the best method. Therefore, the present study focuses on evaluating the performance of the CatBoost algorithm for AGB estimation and comparing the performance of different combinations of feature selection methods and machine learning algorithms. AGB estimation models of four forest types were developed based on Landsat OLI data using three feature selection methods (recursive feature elimination (RFE), variable selection using random forests (VSURF), and least absolute shrinkage and selection operator (LASSO)) and three machine learning algorithms (random forest regression (RFR), extreme gradient boosting (XGBoost), and categorical boosting (CatBoost)). Feature selection had a significant influence on AGB estimation. RFE preserved the most informative features for AGB estimation and was superior to VSURF and LASSO. In addition, CatBoost improved the accuracy of the AGB estimation models compared with RFR and XGBoost. AGB estimation models using RFE for feature selection and CatBoost as the regression algorithm achieved the highest accuracy, with root mean square errors (RMSEs) of 26.54 Mg/ha for coniferous forest, 24.67 Mg/ha for broad-leaved forest, 22.62 Mg/ha for mixed forests, and 25.77 Mg/ha for all forests. The combination of RFE and CatBoost had better performance than the VSURF–RFR combination in which random forests were used for both feature selection and regression, indicating that feature selection and regression performed by a single machine learning algorithm may not always ensure optimal AGB estimation. It is promising to extending the application of new machine learning algorithms and feature selection methods to improve the accuracy of AGB estimates.


Sign in / Sign up

Export Citation Format

Share Document