Industry Experience of Developing Day-Ahead Photovoltaic Plant Forecasting System Based on Machine Learning

This article highlights the industry experience of the development and practical implementation of a short-term photovoltaic forecasting system based on machine learning methods for a real industry-scale photovoltaic power plant implemented in a Russian power system using remote data acquisition. One of the goals of the study is to improve photovoltaic power plants generation forecasting accuracy based on open-source meteorological data, which is provided in regular weather forecasts. In order to improve the robustness of the system in terms of the forecasting accuracy, we apply newly derived feature introduction, a factor obtained as a result of feature engineering procedure, characterizing the relationship between photovoltaic power plant energy production and solar irradiation on a horizontal surface, thus taking into account the impacts of atmospheric and electrical nature. The article scrutinizes the application of different machine learning algorithms, including Random Forest regressor, Gradient Boosting Regressor, Linear Regression and Decision Trees regression, to the remotely obtained data. As a result of the application of the aforementioned approaches together with hyperparameters, tuning and pipelining of the algorithms, the optimal structure, parameters and the application sphere of different regressors were identified for various testing samples. The mathematical model developed within the framework of the study gave us the opportunity to provide robust photovoltaic energy forecasting results with mean accuracy over 92% for mostly-sunny sample days and over 83% for mostly cloudy days with different types of precipitation.

Download Full-text

A New Hybrid Model for Hourly Solar Radiation Forecasting Using Daily Classification Technique and Machine Learning Algorithms

Mathematical Problems in Engineering ◽

10.1155/2021/6692626 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Hamza Ali-Ou-Salah ◽

Benyounes Oukarfi ◽

Khalid Bahani ◽

Mohammed Moujabbir

Keyword(s):

Machine Learning ◽

Solar Radiation ◽

Global Solar Radiation ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Forecasting Accuracy ◽

Photovoltaic Power ◽

Classification Technique ◽

Artificial Neural Network Ann

Photovoltaic power generation depends significantly on solar radiation, which is variable and unpredictable in nature. As a result, the production of electricity from photovoltaic power cannot be guaranteed permanently during the operational phase. Forecasting global solar radiation can play a key role in overcoming this drawback of intermittency. This paper proposes a new hybrid method based on machine learning (ML) algorithms and daily classification technique to forecast 1 h ahead of global solar radiation in the city of Évora. Firstly, several comparative studies have been done between random forest (RF), gradient boosting (GB), support vector machines (SVM), and artificial neural network (ANN). These comparisons were made using annual, seasonal, and daily testing sets in order to determine the best ML algorithm under different meteorological conditions. Subsequently, the daily classification technique has been applied to classify the original training set into sunny and cloudy training subsets in order to enhance the forecasting accuracy. The evaluation of the proposed ML algorithms was carried out using the normalized root mean square error (nRMSE) and the normalized absolute mean error (nMAE). The results of the seasonal comparison show that the RF model performs well for spring and autumn seasons with nRMSE equaling 22.53% and 23.42%, respectively. While the SVR model gives good results for winter and summer seasons with nRMSE equaling 24.31% and 8.41%, respectively. In addition, the daily comparison demonstrates that the RF model performs well for cloudy days with nRMSE = 41.40%, while the SVR model yields good results for sunny days with nRMSE = 8.88%. The results show that the daily classification technique enhances the forecasting accuracy of ML models. Furthermore, this study demonstrates that the forecasting accuracy of ML algorithms depends significantly on sky conditions.

Download Full-text

Economic feasibility of developing large scale solar photovoltaic power plants in Spain

E3S Web of Conferences ◽

10.1051/e3sconf/201912202004 ◽

2019 ◽

Vol 122 ◽

pp. 02004 ◽

Cited By ~ 1

Author(s):

Javier Menéndez ◽

Jorge Loredo

Keyword(s):

Power Plant ◽

Power Plants ◽

Large Scale ◽

Economic Feasibility ◽

Solar Irradiation ◽

Solar Photovoltaic ◽

Solar Pv ◽

Photovoltaic Energy ◽

Profitability Analysis ◽

Photovoltaic Power

In 2017, electricity generation from renewable sources contributed more than one quarter (30.7%) to total EU-28 gross electricity consumption. Wind power is for the first time the most important source, followed closely by hydro power. The growth in electricity from photovoltaic energy has been dramatic, rising from just 3.8 TWh in 2007, reaching a level of 119.5 TWh in 2017. Over this period, the contribution of photovoltaic energy to all electricity generated in the EU-28 from renewable energy sources increased from 0.7% to 12.3%. During this period the investment cost of a photovoltaic power plant has decreased considerably. Fundamentally, the cost of solar panels and inverters has decreased by more than 50%. The solar photovoltaic energy potential depends on two parameters: global solar irradiation and photovoltaic panel efficiency. The average solar irradiation in Spain is 1,600 kWh m-2. This paper analyzes the economic feasibility of developing large scale solar photovoltaic power plants in Spain. Equivalent hours between 800-1,800 h year-1 and output power between 100-400 MW have been considered. The profitability analysis has been carried out considering different prices of the electricity produced in the daily market (50-60 € MWh-1). Net Present Value (NPV) and Internal Rate of Return (IRR) were estimated for all scenarios analyzed. A solar PV power plant with 400 MW of power and 1,800 h year-1, reaches a NPV of 196 M€ and the IRR is 11.01%.

Download Full-text

Forecasting US movies box office performances in Turkey using machine learning algorithms

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189120 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6579-6590

Author(s):

Sandy Çağlıyor ◽

Başar Öztayşi ◽

Selime Sezgin

Keyword(s):

Machine Learning ◽

Global Economy ◽

Learning Algorithms ◽

Forecast Model ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

High Stakes ◽

Box Office ◽

Industry Forecast ◽

The Impact

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.

Download Full-text

Predicting Undesired Treatment Outcome in Mental Healthcare: Machine Learning Study (Preprint)

10.2196/preprints.17235 ◽

2019 ◽

Author(s):

Kasper Van Mens ◽

Joran Lokkerbol ◽

Richard Janssen ◽

Robert de Lange ◽

Bea Tiemens

Keyword(s):

Machine Learning ◽

Treatment Outcome ◽

Mental Health Treatment ◽

Mental Healthcare ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Trade Off ◽

Trade Offs ◽

Outcome Monitoring ◽

Extreme Gradient Boosting

BACKGROUND It remains a challenge to predict which treatment will work for which patient in mental healthcare. OBJECTIVE In this study we compare machine algorithms to predict during treatment which patients will not benefit from brief mental health treatment and present trade-offs that must be considered before an algorithm can be used in clinical practice. METHODS Using an anonymized dataset containing routine outcome monitoring data from a mental healthcare organization in the Netherlands (n = 2,655), we applied three machine learning algorithms to predict treatment outcome. The algorithms were internally validated with cross-validation on a training sample (n = 1,860) and externally validated on an unseen test sample (n = 795). RESULTS The performance of the three algorithms did not significantly differ on the test set. With a default classification cut-off at 0.5 predicted probability, the extreme gradient boosting algorithm showed the highest positive predictive value (ppv) of 0.71(0.61 – 0.77) with a sensitivity of 0.35 (0.29 – 0.41) and area under the curve of 0.78. A trade-off can be made between ppv and sensitivity by choosing different cut-off probabilities. With a cut-off at 0.63, the ppv increased to 0.87 and the sensitivity dropped to 0.17. With a cut-off of at 0.38, the ppv decreased to 0.61 and the sensitivity increased to 0.57. CONCLUSIONS Machine learning can be used to predict treatment outcomes based on routine monitoring data.This allows practitioners to choose their own trade-off between being selective and more certain versus inclusive and less certain.

Download Full-text

Feasibility of Machine Learning Algorithms for Predicting the Deformation of Anodic Titanium Films by Modulating Anodization Processes

Materials ◽

10.3390/ma14051089 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1089

Author(s):

Sung-Hee Kim ◽

Chanyoung Jeong

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Multiclass Classification ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Smart Manufacturing ◽

Gradient Boosting ◽

Experimental Conditions ◽

Learning Techniques ◽

Tio2 Nanostructures

This study aims to demonstrate the feasibility of applying eight machine learning algorithms to predict the classification of the surface characteristics of titanium oxide (TiO2) nanostructures with different anodization processes. We produced a total of 100 samples, and we assessed changes in TiO2 nanostructures’ thicknesses by performing anodization. We successfully grew TiO2 films with different thicknesses by one-step anodization in ethylene glycol containing NH4F and H2O at applied voltage differences ranging from 10 V to 100 V at various anodization durations. We found that the thicknesses of TiO2 nanostructures are dependent on anodization voltages under time differences. Therefore, we tested the feasibility of applying machine learning algorithms to predict the deformation of TiO2. As the characteristics of TiO2 changed based on the different experimental conditions, we classified its surface pore structure into two categories and four groups. For the classification based on granularity, we assessed layer creation, roughness, pore creation, and pore height. We applied eight machine learning techniques to predict classification for binary and multiclass classification. For binary classification, random forest and gradient boosting algorithm had relatively high performance. However, all eight algorithms had scores higher than 0.93, which signifies high prediction on estimating the presence of pore. In contrast, decision tree and three ensemble methods had a relatively higher performance for multiclass classification, with an accuracy rate greater than 0.79. The weakest algorithm used was k-nearest neighbors for both binary and multiclass classifications. We believe that these results show that we can apply machine learning techniques to predict surface quality improvement, leading to smart manufacturing technology to better control color appearance, super-hydrophobicity, super-hydrophilicity or batter efficiency.

Download Full-text

Prediction of Healing Performance of Autogenous Healing Concrete Using Machine Learning

Materials ◽

10.3390/ma14154068 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4068

Author(s):

Xu Huang ◽

Mirna Wasouf ◽

Jessada Sresakoolchai ◽

Sakdirat Kaewunruen

Keyword(s):

Machine Learning ◽

Search Algorithm ◽

Weather Conditions ◽

Prediction Performance ◽

Machine Learning Algorithms ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Support Vector ◽

Self Healing ◽

Artificial Neural Network Ann

Cracks typically develop in concrete due to shrinkage, loading actions, and weather conditions; and may occur anytime in its life span. Autogenous healing concrete is a type of self-healing concrete that can automatically heal cracks based on physical or chemical reactions in concrete matrix. It is imperative to investigate the healing performance that autogenous healing concrete possesses, to assess the extent of the cracking and to predict the extent of healing. In the research of self-healing concrete, testing the healing performance of concrete in a laboratory is costly, and a mass of instances may be needed to explore reliable concrete design. This study is thus the world’s first to establish six types of machine learning algorithms, which are capable of predicting the healing performance (HP) of self-healing concrete. These algorithms involve an artificial neural network (ANN), a k-nearest neighbours (kNN), a gradient boosting regression (GBR), a decision tree regression (DTR), a support vector regression (SVR) and a random forest (RF). Parameters of these algorithms are tuned utilising grid search algorithm (GSA) and genetic algorithm (GA). The prediction performance indicated by coefficient of determination (R2) and root mean square error (RMSE) measures of these algorithms are evaluated on the basis of 1417 data sets from the open literature. The results show that GSA-GBR performs higher prediction performance (R2GSA-GBR = 0.958) and stronger robustness (RMSEGSA-GBR = 0.202) than the other five types of algorithms employed to predict the healing performance of autogenous healing concrete. Therefore, reliable prediction accuracy of the healing performance and efficient assistance on the design of autogenous healing concrete can be achieved.

Download Full-text

Comparison of Ensemble Machine Learning Methods for Soil Erosion Pin Measurements

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10010042 ◽

2021 ◽

Vol 10 (1) ◽

pp. 42

Author(s):

Kieu Anh Nguyen ◽

Walter Chen ◽

Bor-Shiun Lin ◽

Uma Seeboonruang

Keyword(s):

Machine Learning ◽

Soil Erosion ◽

Ensemble Methods ◽

Machine Learning Algorithms ◽

Multivariate Adaptive Regression Splines ◽

Gradient Boosting ◽

Support Vector ◽

Ensemble Machine Learning ◽

Boosting Method ◽

Bagging Method

Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.

Download Full-text

Evaluation of energy generation in Iraqi territory by solar photovoltaic power plants with a capacity of 20 MW

Energy Harvesting and Systems ◽

10.1515/ehs-2021-0075 ◽

2022 ◽

Vol 0 (0) ◽

Author(s):

Qusay Hassan ◽

Saadoon Abdul Hafedh ◽

Ali Hasan ◽

Marek Jaszczur

Keyword(s):

Experimental Data ◽

Wind Speed ◽

Power Plant ◽

Ambient Temperature ◽

Solar Irradiance ◽

Electricity Generation ◽

Power Plants ◽

Solar Photovoltaic ◽

Photovoltaic Power ◽

Plant Capacity

Abstract The study evaluates the visibility of solar photovoltaic power plant construction for electricity generation based on a 20 MW capacity. The assessment was performed for four main cities in Iraq by using hourly experimental weather data (solar irradiance, wind speed, and ambient temperature). The experimental data was measured for the period from 1st January to 31st December of the year 2019, where the simulation process was performed at a 1 h time step resolution at the same resolution as the experimental data. There are two positionings considered for solar photovoltaic modules: (i) annual optimum tilt angle and (ii) two-axis tracking system. The effect of the ambient temperature and wind on the overall system energy generated was taken into consideration. The study is targeted at evaluating the potential solar energy in Iraq and the viability of electricity generation using a 20 MW solar photovoltaic power plant. The results showed that the overall performance of the suggested power plant capacity is highly dependent on the solar irradiance intensity and the ambient temperature with wind speed. The current 20 MW solar photovoltaic power plant capacity shows the highest energy that can be generated in the mid-western region and the lowest in the northeast regions. The greatest influence of the ambient temperature on the energy genrated by power plants is observed in the southern regions.

Download Full-text

Prediction and Analysis of Gold Prices using Ensemble Machine Learning Algorithms

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.36028 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 4367-4374

Author(s):

Gudipally Chandrashakar

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Data ◽

Gold Price ◽

Machine Learning Algorithms ◽

Series Data ◽

Gradient Boosting ◽

Support Vector ◽

Average Value ◽

Ensemble Machine Learning

In this article, we used historical time series data up to the current day gold price. In this study of predicting gold price, we consider few correlating factors like silver price, copper price, standard, and poor’s 500 value, dollar-rupee exchange rate, Dow Jones Industrial Average Value. Considering the prices of every correlating factor and gold price data where dates ranging from 2008 January to 2021 February. Few algorithms of machine learning are used to analyze the time-series data are Random Forest Regression, Support Vector Regressor, Linear Regressor, ExtraTrees Regressor and Gradient boosting Regression. While seeing the results the Extra Tree Regressor algorithm gives the predicted value of gold prices more accurately.

Download Full-text

Predicting hospitalization following psychiatric crisis care using machine learning

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01361-1 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Prediction Models ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Ensemble Model ◽

K Nearest Neighbors ◽

Crisis Care

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.

Download Full-text