stacked generalization
Recently Published Documents


TOTAL DOCUMENTS

122
(FIVE YEARS 49)

H-INDEX

18
(FIVE YEARS 4)

Author(s):  
Naipeng Liu ◽  
Hui Gao ◽  
Zhen Zhao ◽  
Yule Hu ◽  
Longchen Duan

AbstractIn gas drilling operations, the rate of penetration (ROP) parameter has an important influence on drilling costs. Prediction of ROP can optimize the drilling operational parameters and reduce its overall cost. To predict ROP with satisfactory precision, a stacked generalization ensemble model is developed in this paper. Drilling data were collected from a shale gas survey well in Xinjiang, northwestern China. First, Pearson correlation analysis is used for feature selection. Then, a Savitzky-Golay smoothing filter is used to reduce noise in the dataset. In the next stage, we propose a stacked generalization ensemble model that combines six machine learning models: support vector regression (SVR), extremely randomized trees (ET), random forest (RF), gradient boosting machine (GB), light gradient boosting machine (LightGBM) and extreme gradient boosting (XGB). The stacked model generates meta-data from the five models (SVR, ET, RF, GB, LightGBM) to compute ROP predictions using an XGB model. Then, the leave-one-out method is used to verify modeling performance. The performance of the stacked model is better than each single model, with R2 = 0.9568 and root mean square error = 0.4853 m/h achieved on the testing dataset. Hence, the proposed approach will be useful in optimizing gas drilling. Finally, the particle swarm optimization (PSO) algorithm is used to optimize the relevant ROP parameters.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 1143
Author(s):  
Kalaiarasi Sonai Muthu Anbananthen ◽  
Sridevi Subbiah ◽  
Deisy Chelliah ◽  
Prithika Sivakumar ◽  
Varsha Somasundaram ◽  
...  

Background: In recent times, digitization is gaining importance in different domains of knowledge such as agriculture, medicine, recommendation platforms, the Internet of Things (IoT), and weather forecasting. In agriculture, crop yield estimation is essential for improving productivity and decision-making processes such as financial market forecasting, and addressing food security issues. The main objective of the article is to predict and improve the accuracy of crop yield forecasting using hybrid machine learning (ML) algorithms. Methods: This article proposes hybrid ML algorithms that use specialized ensembling methods such as stacked generalization, gradient boosting, random forest, and least absolute shrinkage and selection operator (LASSO) regression. Stacked generalization is a new model which learns how to best combine the predictions from two or more models trained on the dataset. To demonstrate the applications of the proposed algorithm, aerial-intel datasets from the github data science repository are used. Results: Based on the experimental results done on the agricultural data, the following observations have been made. The performance of the individual algorithm and hybrid ML algorithms are compared using cross-validation to identify the most promising performers for the agricultural dataset.  The accuracy of random forest regressor, gradient boosted tree regression, and stacked generalization ensemble methods are 87.71%, 86.98%, and 88.89% respectively. Conclusions: The proposed stacked generalization ML algorithm statistically outperforms with an accuracy of 88.89% and hence demonstrates that the proposed approach is an effective algorithm for predicting crop yield. The system also gives fast and accurate responses to the farmers.


2021 ◽  
Vol 147 (11) ◽  
pp. 04021050
Author(s):  
Manish Pandey ◽  
Mehdi Jamei ◽  
Masoud Karbasi ◽  
Iman Ahmadianfar ◽  
Xuefeng Chu

Author(s):  
Duc-Khanh Nguyen ◽  
Chung-Hsien Lan ◽  
Chien-Lung Chan

With the development of information and technology, especially with the boom in big data, healthcare support systems are becoming much better. Patient data can be collected, retrieved, and stored in real time. These data are valuable and meaningful for monitoring, diagnosing, and further applications in data analysis and decision-making. Essentially, the data can be divided into three types, namely, statistical, image-based, and sequential data. Each type has a different method of retrieval, processing, and deployment. Additionally, the application of machine learning (ML) and deep learning (DL) in healthcare support systems is growing more rapidly than ever. Numerous high-performance architectures are proposed to optimize decision-making. As reliability and stability are the most important factors in the healthcare support system, enhancing the predicted performance and maintaining the stability of the model are always the top priority. The main idea of our study comes from ensemble techniques. Numerous studies and data science competitions show that by combining several weak models into one, ensemble models can attain outstanding performance and reliability. We propose three deep ensemble learning (DEL) approaches, each with stable and reliable performance, that are workable on the above-mentioned data types. These are deep-stacked generalization ensemble learning, gradient deep learning boosting, and deep aggregation learning. The experiment results show that our proposed approaches achieve more vigorous and reliable performance than traditional ML and DL techniques on statistical, image-based, and sequential benchmark datasets. In particular, on the Heart Disease UCI dataset, representing the statistical type, the gradient deep learning boosting approach dominates the others with accuracy, recall, F1-score, Matthews correlation coefficient, and area under the curve values of 0.87, 0.81, 0.83, 0.73, and 0.91, respectively. On the X-ray dataset, representing the image-based type, the deep aggregation learning approach shows the highest performance with values of 0.91, 0.97, 0.93, 0.80, and 0.94, respectively. On the Depresjon dataset, representing the sequence type, the deep-stacked generalization ensemble learning approach outperforms the others with values of 0.91, 0.84, 0.86, 0.8, and 0.94, respectively. Overall, we conclude that applying DL models using our proposed approaches is a promising method for the healthcare support system to enhance prediction and diagnosis performance. Furthermore, our study reveals that these approaches are flexible and easy to apply to achieve optimal performance.


2021 ◽  
Vol 8 (3) ◽  
pp. 1442-1456
Author(s):  
RICO BAYU WIRANATA

Investor harus memprediksi saham dengan tepat agar keuntungan maksimal sekaligus terhindar kebangkrutan. Namun bursa saham sulit dideteksi situasinya. Perilakunya berubah-ubah dipengaruhi berbagai faktor seperti situasi politik, ekonomi perusahaan dan global, maupun ekspektasi investor yang tersedia melalui berita. Penelitian ini bertujuan mengembangkan model yang dapat memprediksi saham lebih akurat mengkombinasikan indikator teknikal saham dan sentimen berita. Genetic algorithm (GA) mengoptimalisasi beberapa ensemble decision tree-based yang ditumpuk menggunakan metode stacked-generalization dengan konsep meta-learner digunakan dalam penelitian ini. Terdapat lima tahapan utama metodologi, dimulai pengumpulan data saham dan berita, praproses data, ekstraksi fitur indikator teknikal dan sentimen serta analisis data, selanjutnya pengembangan model. Serangkaian uji coba parameter crossover dan mutasi GA memberi hasil optimum pencarian kombinatorik hyper-parameter model dengan accuracy 81.63% dan f1-score 82.21%. Evaluasi model terhadap kombinasi jenis dataset mampu meningkatkan accuracy prediksi dari 75.91% menajdi 81.63%, dan f1-score dari 77.56% menjadi 82.21%. Terhadap evaluasi trading, metode yang diusulkan terbukti memberi return yang fantastis sebesar 121.27% dalam setahun, dengan nilai maximum drawdown yang paling kecil juga nilai sharpe ratio yang tinggi. Evaluasi tersebut melampaui hasil penelitian serupa terdahulu, bahkan jauh diatas performa pergerakan saham itu sendiri terindikasi melalui strategi buy & hold


2021 ◽  
Author(s):  
Siddharth Sanghavi ◽  
Parag Vaid ◽  
Palash Rathod ◽  
Kriti Srivastava

2021 ◽  
Vol 24 (2) ◽  
pp. 139-183
Author(s):  
Kristoffer B. Birkeland ◽  
◽  
Allan D. D’Silva ◽  
Roland Füss ◽  
Are Oust ◽  
...  

We develop an automated valuation model (AVM) for the residential real estate market by leveraging stacked generalization and a comparable market analysis. Specifically, we combine four novel ensemble learning methods with a repeat sales method and tailor the data selection for each value estimate. We calibrate and evaluate the model for the residential real estate market in Oslo by producing out-of-sample estimates for the value of 1,979 dwellings sold in the first quarter of 2018. Our novel approach of using stacked generalization achieves a median absolute percentage error of 5.4%, and more than 96% of the dwellings are estimated within 20% of their actual sales price. A comparison of the valuation accuracy of our AVM to that of the local estate agents in Oslo generally demonstrates its viability as a valuation tool. However, in stable market phases, the machine falls short of human capability.


Sign in / Sign up

Export Citation Format

Share Document