Retraining prior state performances of anaerobic digestion improves prediction accuracy of methane yield in various machine learning models

2021 ◽  
Vol 298 ◽  
pp. 117250
Author(s):  
Jun-Gyu Park ◽  
Hang-Bae Jun ◽  
Tae-Young Heo
Processes ◽  
2022 ◽  
Vol 10 (1) ◽  
pp. 158
Author(s):  
Ain Cheon ◽  
Jwakyung Sung ◽  
Hangbae Jun ◽  
Heewon Jang ◽  
Minji Kim ◽  
...  

The application of a machine learning (ML) model to bio-electrochemical anaerobic digestion (BEAD) is a future-oriented approach for improving process stability by predicting performances that have nonlinear relationships with various operational parameters. Five ML models, which included tree-, regression-, and neural network-based algorithms, were applied to predict the methane yield in BEAD reactor. The results showed that various 1-step ahead ML models, which utilized prior data of BEAD performances, could enhance prediction accuracy. In addition, 1-step ahead with retraining algorithm could improve prediction accuracy by 37.3% compared with the conventional multi-step ahead algorithm. The improvement was particularly noteworthy in tree- and regression-based ML models. Moreover, 1-step ahead with retraining algorithm showed high potential of achieving efficient prediction using pH as a single input data, which is plausibly an easier monitoring parameter compared with the other parameters required in bioprocess models.


2019 ◽  
Vol 14 (2) ◽  
pp. 97-106
Author(s):  
Ning Yan ◽  
Oliver Tat-Sheung Au

Purpose The purpose of this paper is to make a correlation analysis between students’ online learning behavior features and course grade, and to attempt to build some effective prediction model based on limited data. Design/methodology/approach The prediction label in this paper is the course grade of students, and the eigenvalues available are student age, student gender, connection time, hits count and days of access. The machine learning model used in this paper is the classical three-layer feedforward neural networks, and the scaled conjugate gradient algorithm is adopted. Pearson correlation analysis method is used to find the relationships between course grade and the student eigenvalues. Findings Days of access has the highest correlation with course grade, followed by hits count, and connection time is less relevant to students’ course grade. Student age and gender have the lowest correlation with course grade. Binary classification models have much higher prediction accuracy than multi-class classification models. Data normalization and data discretization can effectively improve the prediction accuracy of machine learning models, such as ANN model in this paper. Originality/value This paper may help teachers to find some clue to identify students with learning difficulties in advance and give timely help through the online learning behavior data. It shows that acceptable prediction models based on machine learning can be built using a small and limited data set. However, introducing external data into machine learning models to improve its prediction accuracy is still a valuable and hard issue.


2021 ◽  
Vol 229 ◽  
pp. 01022
Author(s):  
Fatima Walid ◽  
Sanaa El Fkihi ◽  
Houda Benbrahim ◽  
Hicham Tagemouati

Anaerobic digestion is recognized as being an advantageous waste management technique representing a source of clean and renewable energy. However, biogas production through such practice is complex and it relies on the interaction of several factors including changes in operating and monitoring parameters. Enormous researchers have focused and gave their full attention to mathematical modeling of anaerobic digestion to get good insights about process dynamics, aiming to optimize its efficiency. This paper gives an overview of the different approaches applied to tackle this challenge including mechanistic and data-driven models. This review has led us to conclude that neural networks combined with metaheuristic techniques has the potential to outperform mechanistic and classical machine learning models.


IoT ◽  
2020 ◽  
Vol 1 (2) ◽  
pp. 360-381
Author(s):  
Matthew T. O. Worsey ◽  
Hugo G. Espinosa ◽  
Jonathan B. Shepherd ◽  
David V. Thiel

Machine learning is a powerful tool for data classification and has been used to classify movement data recorded by wearable inertial sensors in general living and sports. Inertial sensors can provide valuable biofeedback in combat sports such as boxing; however, the use of such technology has not had a global uptake. If simple inertial sensor configurations can be used to automatically classify strike type, then cumbersome tasks such as video labelling can be bypassed and the foundation for automated workload monitoring of combat sport athletes is set. This investigation evaluates the classification performance of six different supervised machine learning models (tuned and untuned) when using two simple inertial sensor configurations (configuration 1—inertial sensor worn on both wrists; configuration 2—inertial sensor worn on both wrists and third thoracic vertebrae [T3]). When trained on one athlete, strike prediction accuracy was good using both configurations (sensor configuration 1 mean overall accuracy: 0.90 ± 0.12; sensor configuration 2 mean overall accuracy: 0.87 ± 0.09). There was no significant statistical difference in prediction accuracy between both configurations and tuned and untuned models (p > 0.05). Moreover, there was no significant statistical difference in computational training time for tuned and untuned models (p > 0.05). For sensor configuration 1, a support vector machine (SVM) model with a Gaussian rbf kernel performed the best (accuracy = 0.96), for sensor configuration 2, a multi-layered perceptron neural network (MLP-NN) model performed the best (accuracy = 0.98). Wearable inertial sensors can be used to accurately classify strike-type in boxing pad work, this means that cumbersome tasks such as video and notational analysis can be bypassed. Additionally, automated workload and performance monitoring of athletes throughout training camp is possible. Future investigations will evaluate the performance of this algorithm on a greater sample size and test the influence of impact window-size on prediction accuracy. Additionally, supervised machine learning models should be trained on data collected during sparring to see if high accuracy holds in a competition setting. This can help move closer towards automatic scoring in boxing.


2021 ◽  
Vol 12 (6) ◽  
pp. 1-24
Author(s):  
Shaojie Qiao ◽  
Nan Han ◽  
Jianbin Huang ◽  
Kun Yue ◽  
Rui Mao ◽  
...  

Bike-sharing systems are becoming popular and generate a large volume of trajectory data. In a bike-sharing system, users can borrow and return bikes at different stations. In particular, a bike-sharing system will be affected by weather, the time period, and other dynamic factors, which challenges the scheduling of shared bikes. In this article, a new shared-bike demand forecasting model based on dynamic convolutional neural networks, called SDF , is proposed to predict the demand of shared bikes. SDF chooses the most relevant weather features from real weather data by using the Pearson correlation coefficient and transforms them into a two-dimensional dynamic feature matrix, taking into account the states of stations from historical data. The feature information in the matrix is extracted, learned, and trained with a newly proposed dynamic convolutional neural network to predict the demand of shared bikes in a dynamical and intelligent fashion. The phase of parameter update is optimized from three aspects: the loss function, optimization algorithm, and learning rate. Then, an accurate shared-bike demand forecasting model is designed based on the basic idea of minimizing the loss value. By comparing with classical machine learning models, the weight sharing strategy employed by SDF reduces the complexity of the network. It allows a high prediction accuracy to be achieved within a relatively short period of time. Extensive experiments are conducted on real-world bike-sharing datasets to evaluate SDF. The results show that SDF significantly outperforms classical machine learning models in prediction accuracy and efficiency.


Geofluids ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Jia Rong ◽  
Zongyuan Zheng ◽  
Xiaorong Luo ◽  
Chao Li ◽  
Yuping Li ◽  
...  

The total organic carbon content (TOC) is a core indicator for shale gas reservoir evaluations. Machine learning-based models can quickly and accurately predict TOC, which is of great significance for the production of shale gas. Based on conventional logs, the measured TOC values, and other data of 9 typical wells in the Jiaoshiba area of the Sichuan Basin, this paper performed a Bayesian linear regression and applied a random forest machine learning model to predict TOC values of the shale from the Wufeng Formation and the lower part of the Longmaxi Formation. The results showed that the TOC value prediction accuracy was improved by more than 50% by using the well-trained machine learning models compared with the traditional Δ Log R method in an overmature and tight shale. Using the halving random search cross-validation method to optimize hyperparameters can greatly improve the speed of building the model. Furthermore, excluding the factors that affect the log value other than the TOC and taking the corrected data as input data for training could improve the prediction accuracy of the random forest model by approximately 5%. Data can be easily updated with machine learning models, which is of primary importance for improving the efficiency of shale gas exploration and development.


2020 ◽  
Vol 59 (01) ◽  
pp. 001-008
Author(s):  
Mayumi Suzuki ◽  
Takuma Shibahara ◽  
Yoshihiro Muragaki

Abstract Background Although advances in prediction accuracy have been made with new machine learning methods, such as support vector machines and deep neural networks, these methods make nonlinear machine learning models and thus lack the ability to explain the basis of their predictions. Improving their explanatory capabilities would increase the reliability of their predictions. Objective Our objective was to develop a factor analysis technique that enables the presentation of the feature variables used in making predictions, even in nonlinear machine learning models. Methods A factor analysis technique was consisted of two techniques: backward analysis technique and factor extraction technique. We developed a factor extraction technique extracted feature variables that was obtained from the posterior probability distribution of a machine learning model which was calculated by backward analysis technique. Results In evaluation, using gene expression data from prostate tumor patients and healthy subjects, the prediction accuracy of a model of deep neural networks was approximately 5% better than that of a model of support vector machines. Then the rate of concordance between the feature variables extracted in an earlier report using Jensen–Shannon divergence and the ones extracted in this report using backward elimination using Hilbert–Schmidt independence criteria was 40% for the top five variables, 40% for the top 10, and 49% for the top 100. Conclusion The results showed that models can be evaluated from different viewpoints by using different factor extraction techniques. In the future, we hope to use this technique to verify the characteristics of features extracted by factor extraction technique, and to perform clinical studies using the genes, we extracted in this experiment.


2021 ◽  
Vol 9 ◽  
Author(s):  
Wenbin Li ◽  
Yu Shi ◽  
Faming Huang ◽  
Haoyuan Hong ◽  
Guquan Song

For the issue of collapse susceptibility prediction (CSP), minimal attention has been paid to explore the uncertainty characteristics of different machine learning models predicting collapse susceptibility. In this study, six kinds of typical machine learning methods, namely, logistic regression (LR), radial basis function neural network (RBF), multilayer perceptron (MLP), support vector machine (SVM), chi-square automatic interactive detection decision tree (CHAID), and random forest (RF) models, are constructed to do CSP. In this regard, An’yuan County in China, with a total of 108 collapses and 11 related environmental factors acquired through remote sensing and GIS technologies, is selected as a case study. The spatial dataset is first constructed, and then these machine learning models are used to implement CSP. Finally, the uncertainty characteristics of the CSP results are explored according to the accuracies, mean values, and standard deviations of the collapse susceptibility indexes (CSIs) and the Kendall synergy coefficient test. In addition, Huichang County, China, is used as another study case to avoid the uncertainty of different study areas. Results show that 1) overall, all six kinds of machine learning models reasonably and accurately predict the collapse susceptibility in An’yuan County; 2) the RF model has the highest prediction accuracy, followed by the CHAID, SVM, MLP, RBF, and LR models; and 3) the CSP results of these models are significantly different, with the mean value (0.2718) and average rank (2.72) of RF being smaller than those of the other five models, followed by the CHAID (0.3210 and 3.29), SVM (0.3268 and 3.48), MLP (0.3354 and 3.64), RBF (0.3449 and 3.81), and LR (0.3496 and 4.06), and with a Kendall synergy coefficient value of 0.062. Conclusively, it is necessary to adopt a series of different machine learning models to predict collapse susceptibility for cross-validation and comparison. Furthermore, the RF model has the highest prediction accuracy and the lowest uncertainty of the CSP results of the machine learning models.


2020 ◽  
Vol 12 (20) ◽  
pp. 3423
Author(s):  
Alireza Arabameri ◽  
Sunil Saha ◽  
Kaustuv Mukherjee ◽  
Thomas Blaschke ◽  
Wei Chen ◽  
...  

The uncertainty of flash flood makes them highly difficult to predict through conventional models. The physical hydrologic models of flash flood prediction of any large area is very difficult to compute as it requires lot of data and time. Therefore remote sensing data based models (from statistical to machine learning) have become highly popular due to open data access and lesser prediction times. There is a continuous effort to improve the prediction accuracy of these models through introducing new methods. This study is focused on flash flood modeling through novel hybrid machine learning models, which can improve the prediction accuracy. The hybrid machine learning ensemble approaches that combine the three meta-classifiers (Real AdaBoost, Random Subspace, and MultiBoosting) with J48 (a tree-based algorithm that can be used to evaluate the behavior of the attribute vector for any defined number of instances) were used in the Gorganroud River Basin of Iran to assess flood susceptibility (FS). A total of 426 flood positions as dependent variables and a total of 14 flood conditioning factors (FCFs) as independent variables were used to model the FS. Several threshold-dependent and independent statistical tests were applied to verify the performance and predictive capability of these machine learning models, such as the receiver operating characteristic (ROC) curve of the success rate curve (SRC) and prediction rate curve (PRC), efficiency (E), root-mean square-error (RMSE), and true skill statistics (TSS). The valuation of the FCFs was done using AdaBoost, frequency ratio (FR), and Boosted Regression Tree (BRT) models. In the flooding of the study area, altitude, land use/land cover (LU/LC), distance to stream, normalized differential vegetation index (NDVI), and rainfall played important roles. The Random Subspace J48 (RSJ48) ensemble method with an area under the curve (AUC) of 0.931 (SRC), 0.951 (PRC), E of 0.89, sensitivity of 0.87, and TSS of 0.78, has become the most effective ensemble in predicting the FS. The FR technique also showed good performance and reliability for all models. Map removal sensitivity analysis (MRSA) revealed that the FS maps have the highest sensitivity to elevation. Based on the findings of the validation methods, the FS maps prepared using the machine learning ensemble techniques have high robustness and can be used to advise flood management initiatives in flood-prone areas.


2021 ◽  
Vol 13 (5) ◽  
pp. 1018
Author(s):  
Chao Song ◽  
Xiaohong Chen

It has become increasingly difficult in recent years to predict precipitation scientifically and accurately due to the dual effects of human activities and climatic conditions. This paper focuses on four aspects to improve precipitation prediction accuracy. Five decomposition methods (time-varying filter-based empirical mode decomposition (TVF-EMD), robust empirical mode decomposition (REMD), complementary ensemble empirical mode decomposition (CEEMD), wavelet transform (WT), and extreme-point symmetric mode decomposition (ESMD) combined with the Elman neural network (ENN)) are used to construct five prediction models, i.e., TVF-EMD-ENN, REMD-ENN, CEEMD-ENN, WT-ENN, and ESMD-ENN. The variance contribution rate (VCR) and Pearson correlation coefficient (PCC) are utilized to compare the performances of the five decomposition methods. The wavelet transform coherence (WTC) is used to determine the reason for the poor prediction performance of machine learning algorithms in individual years and the relationship with climate indicators. A secondary decomposition of the TVF-EMD is used to improve the prediction accuracy of the models. The proposed methods are used to predict the annual precipitation in Guangzhou. The subcomponents obtained from the TVF-EMD are the most stable among the four decomposition methods, and the North Atlantic Oscillation (NAO) index, the Nino 3.4 index, and sunspots have a smaller influence on the first subcomponent (Sc-1) than the other subcomponents. The TVF-EMD-ENN model has the best prediction performance and outperforms traditional machine learning models. The secondary decomposition of the Sc-1 of the TVF-EMD model significantly improves the prediction accuracy.


Sign in / Sign up

Export Citation Format

Share Document