DATA SELECTION TEST METHOD FOR BETTER PREDICTION OF BUILDING ELECTRICITY CONSUMPTION

2016 ◽  
Vol 78 (5-7) ◽  
Author(s):  
Iqbal Faridian Syah ◽  
Md Pauzi Abdullah ◽  
Husna Syadli ◽  
Mohammad Yusri Hassan ◽  
Faridah Hussin

The issue of obtaining an accurate prediction of electricity consumption has been widely discussed by many previous works. Various techniques have been used such as statistical method, time-series, heuristic methods and many more. Whatever the technique used, the accuracy of prediction depends on the availability of historical data as well as the proper selection of the data. Even the data is exhaustive; it must be selected so that the prediction accuracy can be improved. This paper presented a test method named Data Selection Test (DST) method that can be used to test the historical data to select the correct data set for prediction. The DST method is demonstrated and tested on practical electricity consumption data of a selected commercial building. Three different prediction methods are used (ie. Moving Average, MA, Exponential Smoothing, ES and Linear Regression, LR) to evaluate the prediction accuracy by using the data set recommended by the DST method.  

Energies ◽  
2019 ◽  
Vol 12 (7) ◽  
pp. 1201 ◽  
Author(s):  
Moon Kim ◽  
Jaehoon Cha ◽  
Eunmi Lee ◽  
Van Pham ◽  
Sanghyuk Lee ◽  
...  

With growing urbanization, it has become necessary to manage this growth smartly. Specifically, increased electrical energy consumption has become a rapid urbanization trend in China. A building model based on a neural network was proposed to overcome the difficulties of analytical modelling. However, increased amounts of data, repetitive computation, and training time become a limitation of this approach. A simplified model can be used instead of the full order model if the performance is acceptable. In order to select effective data, Mean Impact Value (MIV) has been applied to select meaningful data. To verify this neural network method, we used real electricity consumption data of a shopping mall in China as a case study. In this paper, a Bayesian Regularization Neural Network (BRNN) is utilized to avoid overfitting due to the small amount of data. With the simplified data set, the building model showed reasonable performance. The mean of Root Mean Square Error achieved is around 10% with respect to the actual consumption and the standard deviation is low, which reflects the model’s reliability. We also compare the results with our previous approach using the Levenberg–Marquardt back propagation (LM-BP) method. The main difference is the output reliability of the two methods. LM-BP shows higher error than BRNN due to overfitting. BRNN shows reliable prediction results when the simplified neural network model is applied.


Author(s):  
Nada Mohammed Ahmed Alamin

    The purpose of the research is to reach the forecast of monthly electricity consumption in Gezira state, Sudan for the period (Jun 2018 - Dec 2020) through the application to the historical data of electric power consumption (Jan 2006-May 2018) obtained from the National Control Center, which has been applied in the research methodology of seasonal Autoregressive Integrated Moving Average due to seasonal behavior in the data, good forecast has been given by SARIMA (2, 1, 7) (0, 1, 1), which has been examined its quality using the Thiel coefficient. The study recommended the use of the model of seasonal Autoregressive Integrated Moving Average in data with Seasonal behavior due to its simple application and accuracy of the results reached.    


2020 ◽  
Vol 39 (5) ◽  
pp. 6419-6430
Author(s):  
Dusan Marcek

To forecast time series data, two methodological frameworks of statistical and computational intelligence modelling are considered. The statistical methodological approach is based on the theory of invertible ARIMA (Auto-Regressive Integrated Moving Average) models with Maximum Likelihood (ML) estimating method. As a competitive tool to statistical forecasting models, we use the popular classic neural network (NN) of perceptron type. To train NN, the Back-Propagation (BP) algorithm and heuristics like genetic and micro-genetic algorithm (GA and MGA) are implemented on the large data set. A comparative analysis of selected learning methods is performed and evaluated. From performed experiments we find that the optimal population size will likely be 20 with the lowest training time from all NN trained by the evolutionary algorithms, while the prediction accuracy level is lesser, but still acceptable by managers.


2020 ◽  
Vol 15 (6) ◽  
pp. 517-527
Author(s):  
Yunyun Liang ◽  
Shengli Zhang

Background: Apoptosis proteins have a key role in the development and the homeostasis of the organism, and are very important to understand the mechanism of cell proliferation and death. The function of apoptosis protein is closely related to its subcellular location. Objective: Prediction of apoptosis protein subcellular localization is a meaningful task. Methods: In this study, we predict the apoptosis protein subcellular location by using the PSSMbased second-order moving average descriptor, nonnegative matrix factorization based on Kullback-Leibler divergence and over-sampling algorithms. This model is named by SOMAPKLNMF- OS and constructed on the ZD98, ZW225 and CL317 benchmark datasets. Then, the support vector machine is adopted as the classifier, and the bias-free jackknife test method is used to evaluate the accuracy. Results: Our prediction system achieves the favorable and promising performance of the overall accuracy on the three datasets and also outperforms the other listed models. Conclusion: The results show that our model offers a high throughput tool for the identification of apoptosis protein subcellular localization.


Genetics ◽  
2021 ◽  
Author(s):  
Marco Lopez-Cruz ◽  
Gustavo de los Campos

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.


2021 ◽  
pp. 135481662110088
Author(s):  
Sefa Awaworyi Churchill ◽  
John Inekwe ◽  
Kris Ivanovski

Using a historical data set and recent advances in non-parametric time series modelling, we investigate the nexus between tourism flows and house prices in Germany over nearly 150 years. We use time-varying non-parametric techniques given that historical data tend to exhibit abrupt changes and other forms of non-linearities. Our findings show evidence of a time-varying effect of tourism flows on house prices, although with mixed effects. The pre-World War II time-varying estimates of tourism show both positive and negative effects on house prices. While changes in tourism flows contribute to increasing housing prices over the post-1950 period, this is short-lived, and the effect declines until the mid-1990s. However, we find a positive and significant relationship after 2000, where the impact of tourism on house prices becomes more pronounced in recent years.


2013 ◽  
Vol 2013 ◽  
pp. 1-13 ◽  
Author(s):  
Helena Mouriño ◽  
Maria Isabel Barão

Missing-data problems are extremely common in practice. To achieve reliable inferential results, we need to take into account this feature of the data. Suppose that the univariate data set under analysis has missing observations. This paper examines the impact of selecting an auxiliary complete data set—whose underlying stochastic process is to some extent interdependent with the former—to improve the efficiency of the estimators for the relevant parameters of the model. The Vector AutoRegressive (VAR) Model has revealed to be an extremely useful tool in capturing the dynamics of bivariate time series. We propose maximum likelihood estimators for the parameters of the VAR(1) Model based on monotone missing data pattern. Estimators’ precision is also derived. Afterwards, we compare the bivariate modelling scheme with its univariate counterpart. More precisely, the univariate data set with missing observations will be modelled by an AutoRegressive Moving Average (ARMA(2,1)) Model. We will also analyse the behaviour of the AutoRegressive Model of order one, AR(1), due to its practical importance. We focus on the mean value of the main stochastic process. By simulation studies, we conclude that the estimator based on the VAR(1) Model is preferable to those derived from the univariate context.


2017 ◽  
Vol 2017 ◽  
pp. 1-8 ◽  
Author(s):  
Sen Zhang ◽  
Qiang Fu ◽  
Wendong Xiao

Accurate click-through rate (CTR) prediction can not only improve the advertisement company’s reputation and revenue, but also help the advertisers to optimize the advertising performance. There are two main unsolved problems of the CTR prediction: low prediction accuracy due to the imbalanced distribution of the advertising data and the lack of the real-time advertisement bidding implementation. In this paper, we will develop a novel online CTR prediction approach by incorporating the real-time bidding (RTB) advertising by the following strategies: user profile system is constructed from the historical data of the RTB advertising to describe the user features, the historical CTR features, the ID features, and the other numerical features. A novel CTR prediction approach is presented to address the imbalanced learning sample distribution by integrating the Weighted-ELM (WELM) and the Adaboost algorithm. Compared to the commonly used algorithms, the proposed approach can improve the CTR significantly.


2020 ◽  
pp. 1-22
Author(s):  
Luis E. Nieto-Barajas ◽  
Rodrigo S. Targino

ABSTRACT We propose a stochastic model for claims reserving that captures dependence along development years within a single triangle. This dependence is based on a gamma process with a moving average form of order $p \ge 0$ which is achieved through the use of poisson latent variables. We carry out Bayesian inference on model parameters and borrow strength across several triangles, coming from different lines of businesses or companies, through the use of hierarchical priors. We carry out a simulation study as well as a real data analysis. Results show that reserve estimates, for the real data set studied, are more accurate with our gamma dependence model as compared to the benchmark over-dispersed poisson that assumes independence.


2004 ◽  
Vol 35 (2) ◽  
pp. 165-174 ◽  
Author(s):  
Hafzullah Aksoy ◽  
Tanju Akar ◽  
N. Erdem Ünal

Wavelets, functions with zero mean and finite variance, have recently been found to be appropriate tools in investigating geophysical, hydrological, meteorological, and environmental processes. In this study, a wavelet-based modeling technique is presented for suspended sediment discharge time series. The model generates synthetic series statistically similar to the observed data. In the model in which the Haar wavelet is used, the available data are decomposed into detail functions. By choosing randomly from among the detail functions, synthetic suspended sediment discharge series are composed. Results are compared with those obtained from a moving-average process fitted to the data set.


Sign in / Sign up

Export Citation Format

Share Document