scholarly journals Big Data Analytics for Short and Medium-Term Electricity Load Forecasting Using an AI Techniques Ensembler

Energies ◽  
2020 ◽  
Vol 13 (19) ◽  
pp. 5193
Author(s):  
Nasir Ayub ◽  
Muhammad Irfan ◽  
Muhammad Awais ◽  
Usman Ali ◽  
Tariq Ali ◽  
...  

Electrical load forecasting provides knowledge about future consumption and generation of electricity. There is a high level of fluctuation behavior between energy generation and consumption. Sometimes, the energy demand of the consumer becomes higher than the energy already generated, and vice versa. Electricity load forecasting provides a monitoring framework for future energy generation, consumption, and making a balance between them. In this paper, we propose a framework, in which deep learning and supervised machine learning techniques are implemented for electricity-load forecasting. A three-step model is proposed, which includes: feature selection, extraction, and classification. The hybrid of Random Forest (RF) and Extreme Gradient Boosting (XGB) is used to calculate features’ importance. The average feature importance of hybrid techniques selects the most relevant and high importance features in the feature selection method. The Recursive Feature Elimination (RFE) method is used to eliminate the irrelevant features in the feature extraction method. The load forecasting is performed with Support Vector Machines (SVM) and a hybrid of Gated Recurrent Units (GRU) and Convolutional Neural Networks (CNN). The meta-heuristic algorithms, i.e., Grey Wolf Optimization (GWO) and Earth Worm Optimization (EWO) are applied to tune the hyper-parameters of SVM and CNN-GRU, respectively. The accuracy of our enhanced techniques CNN-GRU-EWO and SVM-GWO is 96.33% and 90.67%, respectively. Our proposed techniques CNN-GRU-EWO and SVM-GWO perform 7% and 3% better than the State-Of-The-Art (SOTA). In the end, a comparison with SOTA techniques is performed to show the improvement of the proposed techniques. This comparison showed that the proposed technique performs well and results in the lowest performance error rates and highest accuracy rates as compared to other techniques.

Energies ◽  
2019 ◽  
Vol 12 (6) ◽  
pp. 1140 ◽  
Author(s):  
Xin Gao ◽  
Xiaobing Li ◽  
Bing Zhao ◽  
Weijia Ji ◽  
Xiao Jing ◽  
...  

Many factors affect short-term electric load, and the superposition of these factors leads to it being non-linear and non-stationary. Separating different load components from the original load series can help to improve the accuracy of prediction, but the direct modeling and predicting of the decomposed time series components will give rise to multiple random errors and increase the workload of prediction. This paper proposes a short-term electricity load forecasting model based on an empirical mode decomposition-gated recurrent unit (EMD-GRU) with feature selection (FS-EMD-GRU). First, the original load series is decomposed into several sub-series by EMD. Then, we analyze the correlation between the sub-series and the original load series through the Pearson correlation coefficient method. Some sub-series with high correlation with the original load series are selected as features and input into the GRU network together with the original load series to establish the prediction model. Three public data sets provided by the U.S. public utility and the load data from a region in northwestern China were used to evaluate the effectiveness of the proposed method. The experiment results showed that the average prediction accuracy of the proposed method on four data sets was 96.9%, 95.31%, 95.72%, and 97.17% respectively. Compared to a single GRU, support vector regression (SVR), random forest (RF) models and EMD-GRU, EMD-SVR, EMD-RF models, the prediction accuracy of the proposed method in this paper was higher.


Author(s):  
Harsha A K

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.


Author(s):  
Manuel Martín-Merino Acera

Electricity load forecasting has become increasingly important due to the strong impact on the operational efficiency of the power system. However, the accurate load prediction remains a challenging task due to several issues such as the nonlinear character of the time series or the seasonal patterns it exhibits. A large variety of techniques have been proposed to this aim, such as statistical models, fuzzy systems or artificial neural networks. The Support Vector Machines (SVM) have been widely applied to the electricity load forecasting with remarkable results. In this chapter, the authors study the performance of the classical SVM in the problem of electricity load forecasting. Next, an algorithm is developed that takes advantage of the local character of the time series. The method proposed first splits the time series into homogeneous regions using the Self Organizing Maps (SOM) and next trains a Support Vector Machine (SVM) locally in each region. The methods presented have been applied to the prediction of the maximum daily electricity demand. The properties of the time series are analyzed in depth. All the models are compared rigorously through several objective functions. The experimental results show that the local model proposed outperforms several statistical and machine learning forecasting techniques.


2016 ◽  
pp. 1161-1183 ◽  
Author(s):  
Tuncay Ozcan ◽  
Tarik Küçükdeniz ◽  
Funda Hatice Sezgin

Electricity load forecasting is crucial for electricity generation companies, distributors and other electricity market participants. In this study, several forecasting techniques are applied to time series modeling and forecasting of the hourly loads. Seasonal grey model, support vector regression, random forests, seasonal ARIMA and linear regression are benchmarked on seven data sets. A rolling forecasting model is developed and 24 hours of the next day is predicted for the last 14 days of each data set. This day-ahead forecasting model is especially important in day-ahead market activities and plant scheduling operations. Experimental results indicate that support vector regression and seasonal grey model outperforms other approaches in terms of forecast accuracy for day-ahead load forecasting.


2020 ◽  
Vol 9 (9) ◽  
pp. 507
Author(s):  
Sanjiwana Arjasakusuma ◽  
Sandiaga Swahyu Kusuma ◽  
Stuart Phinn

Machine learning has been employed for various mapping and modeling tasks using input variables from different sources of remote sensing data. For feature selection involving high- spatial and spectral dimensionality data, various methods have been developed and incorporated into the machine learning framework to ensure an efficient and optimal computational process. This research aims to assess the accuracy of various feature selection and machine learning methods for estimating forest height using AISA (airborne imaging spectrometer for applications) hyperspectral bands (479 bands) and airborne light detection and ranging (lidar) height metrics (36 metrics), alone and combined. Feature selection and dimensionality reduction using Boruta (BO), principal component analysis (PCA), simulated annealing (SA), and genetic algorithm (GA) in combination with machine learning algorithms such as multivariate adaptive regression spline (MARS), extra trees (ET), support vector regression (SVR) with radial basis function, and extreme gradient boosting (XGB) with trees (XGbtree and XGBdart) and linear (XGBlin) classifiers were evaluated. The results demonstrated that the combinations of BO-XGBdart and BO-SVR delivered the best model performance for estimating tropical forest height by combining lidar and hyperspectral data, with R2 = 0.53 and RMSE = 1.7 m (18.4% of nRMSE and 0.046 m of bias) for BO-XGBdart and R2 = 0.51 and RMSE = 1.8 m (15.8% of nRMSE and −0.244 m of bias) for BO-SVR. Our study also demonstrated the effectiveness of BO for variables selection; it could reduce 95% of the data to select the 29 most important variables from the initial 516 variables from lidar metrics and hyperspectral data.


2020 ◽  
Vol 2020 ◽  
pp. 1-10 ◽  
Author(s):  
Xiuzhi Sang ◽  
Wanyue Xiao ◽  
Huiwen Zheng ◽  
Yang Yang ◽  
Taigang Liu

Prediction of DNA-binding proteins (DBPs) has become a popular research topic in protein science due to its crucial role in all aspects of biological activities. Even though considerable efforts have been devoted to developing powerful computational methods to solve this problem, it is still a challenging task in the field of bioinformatics. A hidden Markov model (HMM) profile has been proved to provide important clues for improving the prediction performance of DBPs. In this paper, we propose a method, called HMMPred, which extracts the features of amino acid composition and auto- and cross-covariance transformation from the HMM profiles, to help train a machine learning model for identification of DBPs. Then, a feature selection technique is performed based on the extreme gradient boosting (XGBoost) algorithm. Finally, the selected optimal features are fed into a support vector machine (SVM) classifier to predict DBPs. The experimental results tested on two benchmark datasets show that the proposed method is superior to most of the existing methods and could serve as an alternative tool to identify DBPs.


2020 ◽  
Author(s):  
Patrick Schratz ◽  
Jannes Muenchow ◽  
Eugenia Iturritxa ◽  
José Cortés ◽  
Bernd Bischl ◽  
...  

This study analyzed highly-correlated, feature-rich datasets from hyperspectral remote sensing data using multiple machine and statistical-learning methods.<br> The effect of filter-based feature-selection methods on predictive performance was compared.<br> Also, the effect of multiple expert-based and data-driven feature sets, derived from the reflectance data, was investigated.<br> Defoliation of trees (%) was modeled as a function of reflectance, and variable importance was assessed using permutation-based feature importance.<br> Overall support vector machine (SVM) outperformed others such as random forest (RF), extreme gradient boosting (XGBoost), lasso (L1) and ridge (L2) regression by at least three percentage points.<br> The combination of certain feature sets showed small increases in predictive performance while no substantial differences between individual feature sets were observed.<br> For some combinations of learners and feature sets, filter methods achieved better predictive performances than the unfiltered feature sets, while ensemble filters did not have a substantial impact on performance.<br><br> Permutation-based feature importance estimated features around the red edge to be most important for the models.<br> However, the presence of features in the near-infrared region (800 nm - 1000 nm) was essential to achieve the best performances.<br><br> More training data and replication in similar benchmarking studies is needed for more generalizable conclusions.<br> Filter methods have the potential to be helpful in high-dimensional situations and are able to improve the interpretation of feature effects in fitted models, which is an essential constraint in environmental modeling studies.


Sign in / Sign up

Export Citation Format

Share Document