scholarly journals Evaluation Procedures for Forecasting with Spatiotemporal Data

Mathematics ◽  
2021 ◽  
Vol 9 (6) ◽  
pp. 691
Author(s):  
Mariana Oliveira ◽  
Luís Torgo ◽  
Vítor Santos Costa

The increasing use of sensor networks has led to an ever larger number of available spatiotemporal datasets. Forecasting applications using this type of data are frequently motivated by important domains such as environmental monitoring. Being able to properly assess the performance of different forecasting approaches is fundamental to achieve progress. However, traditional performance estimation procedures, such as cross-validation, face challenges due to the implicit dependence between observations in spatiotemporal datasets. In this paper, we empirically compare several variants of cross-validation (CV) and out-of-sample (OOS) performance estimation procedures, using both artificially generated and real-world spatiotemporal datasets. Our results show both CV and OOS reporting useful estimates, but they suggest that blocking data in space and/or in time may be useful in mitigating CV’s bias to underestimate error. Overall, our study shows the importance of considering data dependencies when estimating the performance of spatiotemporal forecasting models.


2021 ◽  
Vol 21 (4) ◽  
pp. 1-28
Author(s):  
Song Deng ◽  
Fulin Chen ◽  
Xia Dong ◽  
Guangwei Gao ◽  
Xindong Wu

Load forecasting in short term is very important to economic dispatch and safety assessment of power system. Although existing load forecasting in short-term algorithms have reached required forecast accuracy, most of the forecasting models are black boxes and cannot be constructed to display mathematical models. At the same time, because of the abnormal load caused by the failure of the load data collection device, time synchronization, and malicious tampering, the accuracy of the existing load forecasting models is greatly reduced. To address these problems, this article proposes a Short-Term Load Forecasting algorithm by using Improved Gene Expression Programming and Abnormal Load Recognition (STLF-IGEP_ALR). First, the Recognition algorithm of Abnormal Load based on Probability Distribution and Cross Validation is proposed. By analyzing the probability distribution of rows and columns in load data, and using the probability distribution of rows and columns for cross-validation, misjudgment of normal load in abnormal load data can be better solved. Second, by designing strategies for adaptive generation of population parameters, individual evolution of populations and dynamic adjustment of genetic operation probability, an Improved Gene Expression Programming based on Evolutionary Parameter Optimization is proposed. Finally, the experimental results on two real load datasets and one open load dataset show that compared with the existing abnormal data detection algorithms, the algorithm proposed in this article have higher advantages in missing detection rate, false detection rate and precision rate, and STLF-IGEP_ALR is superior to other short-term load forecasting algorithms in terms of the convergence speed, MAE, MAPE, RSME, and R 2 .



Energies ◽  
2021 ◽  
Vol 14 (14) ◽  
pp. 4173
Author(s):  
Rangan Gupta ◽  
Christian Pierdzioch

We use a dataset for the group of G7 countries and China to study the out-of-sample predictive value of uncertainty and its international spillovers for the realized variance of crude oil (West Texas Intermediate and Brent) over the sample period from 1996Q1 to 2020Q4. Using the Lasso estimator, we found evidence that uncertainty and international spillovers had predictive value for the realized variance at intermediate (two quarters) and long (one year) forecasting horizons in several of the forecasting models that we studied. This result holds also for upside (good) and downside (bad) variance, and irrespective of whether we used a recursive or a rolling estimation window. Our results have important implications for investors and policymakers.



2018 ◽  
Author(s):  
Παντελής Σταυρούλιας

Οι έγκυρες προβλέψεις χρηματοοικονομικών κρίσεων διασφάλιζαν ανέκαθεν την σταθερότητα τόσο ολόκληρου του χρηματοοικονομικού οικοδομήματος γενικότερα, όσο και του τραπεζικού τομέα ειδικότερα. Με την παρούσα διατριβή επιτυγχάνεται η πρόβλεψη συστημικών τραπεζικών κρίσεων για χώρες της EE-14 αρκετά τρίμηνα προτού αυτές γίνουν αντιληπτές με την χρησιμοποίηση των πιο διαδεδομένων μεταβλητών (μακροοικονομικών, τραπεζικών και αγοράς) μέσω δύο προσεγγίσεων, της δυαδικής και της πολυεπίπεδης. Ακολουθώντας τη δυαδική προσέγγιση, εξάγονται μοντέλα ταξινόμησης με την εφαρμογή της Διακριτής Ανάλυσης (Discriminant Analysis), της Γραμμικής Παλινδρόμησης (Linear Regression), της Λογιστικής Παλινδρόμησης (Logistic Regression) και της Παλινδρόμησης Πιθανοομάδας (Probit Regression), για την έγκαιρη πρόβλεψη των κρίσεων -12 έως -7 τρίμηνα πριν την εμφάνισή τους. Επιπροσθέτως, συγκρίνεται η απόδοση της ανωτέρω ανάλυσης χρησιμοποιώντας τις νεότερες και πλέον υποσχόμενες μεθόδους του Δέντρου Ταξινόμησης (Classification Tree), του Τυχαίου Δάσους (Random Forest) και της C5. Ταυτόχρονα προτείνεται ένα νέο μέτρο επιλογής κατωφλίων και απόδοσης προσαρμογής (GoF) των μοντέλων πρόβλεψης και μια νέα συνδυαστική (combined) μέθοδος ταξινόμησης. Προκειμένου να διερευνηθεί η απόδοση της ανωτέρω ανάλυσης, χρησιμοποιείται ο εκτός του δείγματος έλεγχος (out-of-sample testing) με τη μέθοδο της ανά χώρα σταυρωτής επικύρωσης (country-blocked cross validation). Σύμφωνα με τη μέθοδο αυτή, πραγματοποιείται η ανάλυση και εξάγονται τα μοντέλα πρόβλεψης με τη χρήση των δεκατριών από τις δεκατέσσερις χώρες του δείγματος (in-sample), εφαρμόζονται τα εξαγόμενα μοντέλα για την δέκατη τέταρτη χώρα που είχε εξαιρεθεί από το αρχικό δείγμα (out-of-sample) και ελέγχονται τα αποτελέσματα πρόβλεψης με τα πραγματικά δεδομένα της χώρας αυτής. Η παραπάνω διαδικασία επαναλαμβάνεται δεκατέσσερις φορές, αφήνοντας δηλαδή κάθε φορά μια χώρα εκτός δείγματος και τελικά εξάγεται ο μέσος όρος των επαναλήψεων. Στην παρούσα διατριβή, και χρησιμοποιώντας τον εκτός του δείγματος έλεγχο, επιτυγχάνεται η κατά 82.4% σωστή ταξινόμηση (Ακρίβεια – Accuracy), 78.4% ποσοστό Αληθινών Θετικών (Τrue Ρositive Rate - TPR) και 80.6% ποσοστό Θετικής Τιμής Πρόβλεψης (Positive Predictive Value - PPV). Σύμφωνα με την πολυεπίπεδη προσέγγιση, διακρίνονται δύο επίπεδα-περίοδοι πρόβλεψης των Συστημικών Τραπεζικών Κρίσεων. Το πρώτο επίπεδο ονομάζεται έγκαιρη πρόβλεψη (early warning) και αφορά περίοδο -12 έως -7 τρίμηνα πριν την έλευση της κρίσης ενώ το δεύτερο επίπεδο ονομάζεται καθυστερημένη πρόβλεψη (late warning) και αφορά περίοδο -6 έως -1 τρίμηνα πριν την έλευση της κρίσης. Για την πολυεπίπεδη αυτή ταξινόμηση, γίνεται χρήση των Νευρωνικών Δικτύων (Neural Networks), της Πολυωνυμικής Λογιστικής Παλινδρόμησης (Multinomial Logistic Regression) και της Πολυεπίπεδης Γραμμικής Διακριτής Ανάλυσης (Multinomial Discriminant Analysis). Εφαρμόζοντας τον ίδιο εκτός του δείγματος έλεγχο με την πρώτη προσέγγιση επιτυγχάνεται η κατά 85.7% σωστή ταξινόμηση με την βέλτιστη μέθοδο που αποδεικνύεται ότι είναι η Πολυεπίπεδη Γραμμική Διακριτή Ανάλυση. Εφαρμόζοντας την ανωτέρω ανάλυση, οι ενδιαφερόμενοι φορείς άσκησης πολιτικής (policy makers) μπορούν να ανιχνεύσουν την ύπαρξης κρίσης σε βάθος χρόνου έως τριών ετών με τα προτεινόμενα μοντέλα, χρησιμοποιώντας μόνο δεδομένα που υπάρχουν ελεύθερα προσβάσιμα στο κοινό, ασκώντας με τον τρόπο αυτό την κατάλληλη ανά περίπτωση μακροπροληπτική πολιτική (macroprudential policy).



2015 ◽  
Vol 2015 ◽  
pp. 1-14 ◽  
Author(s):  
Nhu-Ty Nguyen ◽  
Thanh-Tuyen Tran

Inflation is a key element of a national economy, and it is also a prominent and important issue influencing the whole economy in terms of marketing. This is a complex problem requiring a large investment of time and wisdom to attain positive results. Thus, appropriate tools for forecasting inflation variables are crucial significant for policy making. In this study, both clarified value calculation and use of a genetic algorithm to find the optimal parameters are adopted simultaneously to construct improved models: ARIMA, GM(1,1), Verhulst, DGM(1,1), and DGM(2,1) by using data of Vietnamese inflation output from January 2005 to November 2013. The MAPE, MSE, RMSE, and MAD are four criteria with which the various forecasting models results are compared. Moreover, to see whether differences exist, Friedman and Wilcoxon tests are applied. Both in-sample and out-of-sample forecast performance results show that the ARIMA model has highly accurate forecasting in Raw Materials Price (RMP) and Gold Price (GP), whereas, the calculated results of GM(1,1) and DGM(1,1) are suitable to forecast Consumer Price Index (CPI). Therefore, the ARIMA, GM(1,1), and DGM(1,1) can handle the forecast accuracy of the issue, and they are suitable in modeling and forecasting of inflation in the case of Vietnam.



2020 ◽  
Vol 109 (11) ◽  
pp. 1997-2028 ◽  
Author(s):  
Vitor Cerqueira ◽  
Luis Torgo ◽  
Igor Mozetič


1973 ◽  
Vol 10 (2) ◽  
pp. 115-129 ◽  
Author(s):  
Gerald J. Eskin

A depth of repeat model is presented that can forecast the demand for new consumer products. The relation of the model to other forecasting models is noted. Data analysis, estimation procedures, and the observed accuracy of forecasts are discussed.



2017 ◽  
Vol 35 (15_suppl) ◽  
pp. e23059-e23059
Author(s):  
Oluf D. Røe ◽  
Vincenzo Lagani ◽  
Hans Fredrik Kvitvang ◽  
Maria Markaki ◽  
Ioannis Tsamardinos ◽  
...  

e23059 Background: The Cancer-Biomarkers in HUNTinitiative seeks to identify novel biomarkers for the early cancer diagnosis. For lung cancers and mesothelioma clinically useful early markers are not available. In the prospective HUNT study in Norway, pre-diagnostic samples ranging 0-20 years before diagnosis are available for research purposes. Here we present our first results on high-throughput metabolomics analysis in serum two months to 16 years before diagnosis. Methods: LC-MS untargeted (Amide-) metabolites (n = 1042) were profiled in serum samples from 48 future patients (12 each of adeno-, squamous cell carcinoma, small-cell lung cancer and mesothelioma) and from 48 controls that were cancer-free 5 years after blood sampling. All were active smokers. Metabolic features for (a) each cancer and (b) all cancers pooled together were analyzed with moderated t-test (R limma package). Multivariate analyses included (a) OPLS-DA and (b) signature identification through a data-analysis pipeline that includes feature selection (such as the algorithm in [1]), non-linear modelers (e.g., Random Forests) and Cross-Validation with bootstrapping [2] for optimizing algorithms and providing unbiased performance estimation. The pipeline is implemented in the Just Add Data software (Gnosis Data Analysis). Results: Univariate and OPLS-DA analyses did not identify any association between metabolites and cancer. The non-linear data analysis pipeline identified a signature containing five metabolites able to discriminate between cancer and non-cancer patients, statistically significantly better than random (AUC = 0.667, CI = [0.536, 0.784]). Conclusions: Our results indicate that metabolic profiling in serum may help in identifying subjects who are likely to be diagnosed with lung cancer/mesothelioma in a time period of several years before diagnosis. More data will be presented at the annual meeting. Further validation studies are planned for confirming the replicability of these findings. 1) Lagani V et al., 2016. arXiv:1611.03227 2) Greasidou L, 2017. Bias Correction of the Cross-Validation Performance Estimate and Speed Up of its Execution Time, MSc Thesis, University of Crete



2015 ◽  
Vol 34 (5) ◽  
pp. 461-484 ◽  
Author(s):  
Ore Koren

Forecasting models of state-led mass killing are limited in their use of structural indicators, despite a large body of research that emphasizes the importance of agency and security repertoires in conditioning political violence. I seek to overcome these limitations by developing a theoretical and statistical framework that highlights the advantages of using pro-government militias (PGMs) as a predictive indicator in forecasting models of state-led mass killing. I argue that PGMs can lower the potential costs associated with mass killing for a regime faced with an internal threat, and might hence “tip the balance” in its favor. In estimating a series of statistical models and their receiver–operator characteristic curves to evaluate this hypothesis globally for the years 1981–2007, focusing on 270 internal threat episodes, I find robust support for my expectations: including PGM indicators in state-led mass killing models significantly improves their predictive strength. Moreover, these results hold even when coefficient estimates produced by in-sample data are used to predict state-led mass killing in cross-validation and out-of-sample data for the years 2008–2013. This study hence provides an introductory demonstration of the potential advantages of including security repertoires, in addition to structural factors, in forecasting models.



Author(s):  
Pierre O. Jacquet ◽  
Farid Pazhoohi ◽  
Charles Findling ◽  
Hugo Mell ◽  
Coralie Chevallier ◽  
...  

AbstractWhy do moral religions exist? An influential psychological explanation is that religious beliefs in supernatural punishment is cultural group adaptation enhancing prosocial attitudes and thereby large-scale cooperation. An alternative explanation is that religiosity is an individual strategy that results from high level of mistrust and the need for individuals to control others’ behaviors through moralizing. Existing evidence is mixed but most works are limited by sample size and generalizability issues. The present study overcomes these limitations by applying k-fold cross-validation on multivariate modeling of data from >295,000 individuals in 108 countries of the World Values Surveys and the European Value Study. First, this methodology reveals no evidence that European and non-European religious people invest more in collective actions and are more trustful of unrelated conspecifics. Instead, the individuals’ level of religiosity is found to be weakly but positively associated with social mistrust and negatively associated with the production of behaviors, which benefit unrelated members of the large-scale community. Second, our models show that individual variation in religiosity is well explained by the interaction of increased levels of social mistrust and increased needs to moralize other people’s sexual behaviors. Finally, stratified k-fold cross-validation demonstrates that the structures of these association patterns are robust to sampling variability and reliable enough to generalize to out-of-sample data.





Sign in / Sign up

Export Citation Format

Share Document