Predicting attributes based movie success through ensemble machine learning

Author(s):  
Vedika Gupta ◽  
Nikita Jain ◽  
Harshit Garg ◽  
Srishti Jhunthra ◽  
Senthilkumar Mohan ◽  
...  
Energies ◽  
2021 ◽  
Vol 14 (4) ◽  
pp. 1052
Author(s):  
Baozhong Wang ◽  
Jyotsna Sharma ◽  
Jianhua Chen ◽  
Patricia Persaud

Estimation of fluid saturation is an important step in dynamic reservoir characterization. Machine learning techniques have been increasingly used in recent years for reservoir saturation prediction workflows. However, most of these studies require input parameters derived from cores, petrophysical logs, or seismic data, which may not always be readily available. Additionally, very few studies incorporate the production data, which is an important reflection of the dynamic reservoir properties and also typically the most frequently and reliably measured quantity throughout the life of a field. In this research, the random forest ensemble machine learning algorithm is implemented that uses the field-wide production and injection data (both measured at the surface) as the only input parameters to predict the time-lapse oil saturation profiles at well locations. The algorithm is optimized using feature selection based on feature importance score and Pearson correlation coefficient, in combination with geophysical domain-knowledge. The workflow is demonstrated using the actual field data from a structurally complex, heterogeneous, and heavily faulted offshore reservoir. The random forest model captures the trends from three and a half years of historical field production, injection, and simulated saturation data to predict future time-lapse oil saturation profiles at four deviated well locations with over 90% R-square, less than 6% Root Mean Square Error, and less than 7% Mean Absolute Percentage Error, in each case.


2021 ◽  
Vol 10 (1) ◽  
pp. 42
Author(s):  
Kieu Anh Nguyen ◽  
Walter Chen ◽  
Bor-Shiun Lin ◽  
Uma Seeboonruang

Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.


Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1285
Author(s):  
Mohammed Al-Sarem ◽  
Faisal Saeed ◽  
Zeyad Ghaleb Al-Mekhlafi ◽  
Badiea Abdulkarem Mohammed ◽  
Tawfik Al-Hadhrami ◽  
...  

Security attacks on legitimate websites to steal users’ information, known as phishing attacks, have been increasing. This kind of attack does not just affect individuals’ or organisations’ websites. Although several detection methods for phishing websites have been proposed using machine learning, deep learning, and other approaches, their detection accuracy still needs to be enhanced. This paper proposes an optimized stacking ensemble method for phishing website detection. The optimisation was carried out using a genetic algorithm (GA) to tune the parameters of several ensemble machine learning methods, including random forests, AdaBoost, XGBoost, Bagging, GradientBoost, and LightGBM. The optimized classifiers were then ranked, and the best three models were chosen as base classifiers of a stacking ensemble method. The experiments were conducted on three phishing website datasets that consisted of both phishing websites and legitimate websites—the Phishing Websites Data Set from UCI (Dataset 1); Phishing Dataset for Machine Learning from Mendeley (Dataset 2, and Datasets for Phishing Websites Detection from Mendeley (Dataset 3). The experimental results showed an improvement using the optimized stacking ensemble method, where the detection accuracy reached 97.16%, 98.58%, and 97.39% for Dataset 1, Dataset 2, and Dataset 3, respectively.


2021 ◽  
Vol 598 ◽  
pp. 126266
Author(s):  
Mohammad Zounemat-Kermani ◽  
Okke Batelaan ◽  
Marzieh Fadaee ◽  
Reinhard Hinkelmann

Sign in / Sign up

Export Citation Format

Share Document