Evaluating the performance of Gulf of Alaska walleye pollock (Theragra chalcogramma) recruitment forecasting models using a Monte Carlo resampling strategy

2009 ◽  
Vol 66 (3) ◽  
pp. 367-381 ◽  
Author(s):  
Yong-Woo Lee ◽  
Bernard A. Megrey ◽  
S. Allen Macklin

Multiple linear regressions (MLRs), generalized additive models (GAMs), and artificial neural networks (ANNs) were compared as methods to forecast recruitment of Gulf of Alaska walleye pollock ( Theragra chalcogramma ). Each model, based on a conceptual model, was applied to a 41-year time series of recruitment, spawner biomass, and environmental covariates. A subset of the available time series, an in-sample data set consisting of 35 of the 41 data points, was used to fit an environment-dependent recruitment model. Influential covariates were identified through statistical variable selection methods to build the best explanatory recruitment model. An out-of-sample set of six data points was retained for model validation. We tested each model’s ability to forecast recruitment by applying them to an out-of-sample data set. For a more robust evaluation of forecast accuracy, models were tested with Monte Carlo resampling trials. The ANNs outperformed the other techniques during the model fitting process. For forecasting, the ANNs were not statistically different from MLRs or GAMs. The results indicated that more complex models tend to be more susceptible to an overparameterization problem. The procedures described in this study show promise for building and testing recruitment forecasting models for other fish species.

2007 ◽  
Vol 64 (4) ◽  
pp. 713-722 ◽  
Author(s):  
Lorenzo Ciannelli ◽  
Kevin M Bailey ◽  
Kung-Sik Chan ◽  
Nils Chr. Stenseth

Over 20 years of egg sampling data were used to reconstruct the geographical and phenological patterns of walleye pollock (Theragra chalcogramma) spawning aggregations in the Gulf of Alaska (GOA). The analyzed time series (1972, 1978–1979, 1981–2000) included a documented event of climate change (i.e., 1988–1989) and the rise and fall of the GOA pollock population abundance and harvests. We compared results from two generalized additive model (GAM) formulations: one assuming no change of egg distribution and phenology over the examined time series (stationary) and the other admitting such changes (nonstationary) across an epoch determined from the data. Results from both model formulations corroborate the existence of a high egg concentration in Shelikof Strait, historically the primary spawning area of pollock in the GOA. However, model results also highlight the presence of other secondary, and possibly transitory, centers of egg distribution at various locations along the shelf and slope regions of the GOA. In addition, results from the nonstationary (and statistically superior) formulation indicate that the abundance of the non-Shelikof aggregations has increased over time, along with a tendency for earlier occurrence and displacement toward shallower areas of the high egg density regions.


2015 ◽  
Vol 34 (5) ◽  
pp. 461-484 ◽  
Author(s):  
Ore Koren

Forecasting models of state-led mass killing are limited in their use of structural indicators, despite a large body of research that emphasizes the importance of agency and security repertoires in conditioning political violence. I seek to overcome these limitations by developing a theoretical and statistical framework that highlights the advantages of using pro-government militias (PGMs) as a predictive indicator in forecasting models of state-led mass killing. I argue that PGMs can lower the potential costs associated with mass killing for a regime faced with an internal threat, and might hence “tip the balance” in its favor. In estimating a series of statistical models and their receiver–operator characteristic curves to evaluate this hypothesis globally for the years 1981–2007, focusing on 270 internal threat episodes, I find robust support for my expectations: including PGM indicators in state-led mass killing models significantly improves their predictive strength. Moreover, these results hold even when coefficient estimates produced by in-sample data are used to predict state-led mass killing in cross-validation and out-of-sample data for the years 2008–2013. This study hence provides an introductory demonstration of the potential advantages of including security repertoires, in addition to structural factors, in forecasting models.


Solar Energy ◽  
2002 ◽  
Author(s):  
Juan-Carlos Baltazar ◽  
David E. Claridge

A study of cubic splines and Fourier series as interpolation techniques for filling in missing data in energy and meteorological time series is presented. The followed procedure created artificially missing points (pseudo-gaps) in measured data sets and was based on the local behavior of the data set around those pseudo-gaps. Five variants of the cubic spline technique and 12 variants of Fourier series were tested and compared with linear interpolation, for filling in gaps of 1 to 6 hours of data in 20 samples of energy use and weather data. Each of the samples is at least one year in length. The analysis showed that linear interpolation is superior to the spline and Fourier series techniques for filling in 1–6 hour gaps in time series dry bulb and dew point temperature data. For filling 1–6 hour gaps in building cooling and heating use, the Fourier series approach with 24 data points before and after each gap and six constants was found to be the most suitable. In cases where there are insufficient data points for the application of this approach, simple linear interpolation is recommended.


2016 ◽  
Vol 9 (1) ◽  
pp. 108-136 ◽  
Author(s):  
Marian Alexander Dietzel

Purpose – Recent research has found significant relationships between internet search volume and real estate markets. This paper aims to examine whether Google search volume data can serve as a leading sentiment indicator and are able to predict turning points in the US housing market. One of the main objectives is to find a model based on internet search interest that generates reliable real-time forecasts. Design/methodology/approach – Starting from seven individual real-estate-related Google search volume indices, a multivariate probit model is derived by following a selection procedure. The best model is then tested for its in- and out-of-sample forecasting ability. Findings – The results show that the model predicts the direction of monthly price changes correctly, with over 89 per cent in-sample and just above 88 per cent in one to four-month out-of-sample forecasts. The out-of-sample tests demonstrate that although the Google model is not always accurate in terms of timing, the signals are always correct when it comes to foreseeing an upcoming turning point. Thus, as signals are generated up to six months early, it functions as a satisfactory and timely indicator of future house price changes. Practical implications – The results suggest that Google data can serve as an early market indicator and that the application of this data set in binary forecasting models can produce useful predictions of changes in upward and downward movements of US house prices, as measured by the Case–Shiller 20-City House Price Index. This implies that real estate forecasters, economists and policymakers should consider incorporating this free and very current data set into their market forecasts or when performing plausibility checks for future investment decisions. Originality/value – This is the first paper to apply Google search query data as a sentiment indicator in binary forecasting models to predict turning points in the housing market.


1999 ◽  
Vol 09 (08) ◽  
pp. 1485-1500 ◽  
Author(s):  
J. McNAMES ◽  
J. A. K. SUYKENS ◽  
J. VANDEWALLE

In this paper we describe the winning entry of the time-series prediction competition which was part of the International Workshop on Advanced Black-Box Techniques for Nonlinear Modeling, held at K. U. Leuven, Belgium on July 8–10, 1998. We also describe the source of the data set, a nonlinear transform of a 5-scroll generalized Chua's circuit. Participants were given 2000 data points and were asked to predict the next 200 points in the series. The winning entry exploited symmetry that was discovered during exploratory data analysis and a method of local modeling designed specifically for the prediction of chaotic time-series. This method includes an exponentially weighted metric, a nearest trajectory algorithm, integrated local averaging, and a novel multistep ahead cross-validation estimation of model error for the purpose of parameter optimization.


2008 ◽  
Vol 12 (2) ◽  
pp. 657-667 ◽  
Author(s):  
M. Herbst ◽  
M. C. Casper

Abstract. The reduction of information contained in model time series through the use of aggregating statistical performance measures is very high compared to the amount of information that one would like to draw from it for model identification and calibration purposes. It has been readily shown that this loss imposes important limitations on model identification and -diagnostics and thus constitutes an element of the overall model uncertainty. In this contribution we present an approach using a Self-Organizing Map (SOM) to circumvent the identifiability problem induced by the low discriminatory power of aggregating performance measures. Instead, a Self-Organizing Map is used to differentiate the spectrum of model realizations, obtained from Monte-Carlo simulations with a distributed conceptual watershed model, based on the recognition of different patterns in time series. Further, the SOM is used instead of a classical optimization algorithm to identify those model realizations among the Monte-Carlo simulation results that most closely approximate the pattern of the measured discharge time series. The results are analyzed and compared with the manually calibrated model as well as with the results of the Shuffled Complex Evolution algorithm (SCE-UA). In our study the latter slightly outperformed the SOM results. The SOM method, however, yields a set of equivalent model parameterizations and therefore also allows for confining the parameter space to a region that closely represents a measured data set. This particular feature renders the SOM potentially useful for future model identification applications.


Energetika ◽  
2020 ◽  
Vol 66 (1) ◽  
Author(s):  
Mindaugas Česnavičius

Electricity price changes can significantly affect expenses in energy intensive industries, adjust profits or losses for electricity retailers and cause problems for country’s national energy strategy implementation. Forecasting models based on statistical methods and previous variable values help to predict future values and adjust strategy according to the forecast. This paper concentrates on the Lithuanian electricity market and presents the widely used ARIMA forecasting models based on the univariate time series analysis. The Lithuanian electricity market is selected due to a lack of statistical researches based on electricity market prices in Lithuania, as well as significant future electricity market liberalization projects. Electricity price data for analysis are taken from the Nord Pool electricity market operator website. The Nord Pool represents the Northern Europe electricity market operator where Lithuania and other 14 European countries trade electricity on a daily basis. To provide a long-term electricity price outlook average monthly data from July 2012 to December 2019 are selected for analysis. Before building the ARIMA model data are tested with various statistical tests to guarantee that time series are stationary, there is no autocorrelation or structural breaks. Once the data validity is confirmed, the time series is divided into train and test sets. The train data set is used to create a fitting ARIMA model, while the test set is used to define forecasting accuracy. Created forecasts of models are compared between each other using common comparison statistics, and the most accurate models are defined. Finally, the selected model is trained on a full dataset and the electricity price forecast for the year 2020 is constructed. The created AR (1) model had the smallest error value compared to the test dataset, while the SARIMA (1,1,1) model had the best approximation statistics. By combining both models the weighted SARIMA (1,1,1) model is constructed with the features of low forecasting error and precise actual time series approximation. The final model forecast for the year 2020 shows the monthly average electricity price decrease at the beginning of the year, a significant increase at the second half of the year and a price drop at the end of the year. Forecasting results can help companies to plan their electricity production and maintenance periods to maximize income from sold energy and minimize potential losses due to planned shutdown.


2018 ◽  
Vol 28 (4) ◽  
pp. 475-499
Author(s):  
Milica Bogicevic ◽  
Milan Merkle

We present a new fast approximate algorithm for Tukey (halfspace) depth level sets and its implementation-ABCDepth. Given a d-dimensional data set for any d ? 1, the algorithm is based on a representation of level sets as intersections of balls in Rd. Our approach does not need calculations of projections of sample points to directions. This novel idea enables calculations of approximate level sets in very high dimensions with complexity that is linear in d, which provides a great advantage over all other approximate algorithms. Using different versions of this algorithm, we demonstrate approximate calculations of the deepest set of points ("Tukey median") and Tukey's depth of a sample point or out-of-sample point, all with a linear in d complexity. An additional theoretical advantage of this approach is that the data points are not assumed to be in "general position". Examples with real and synthetic data show that the executing time of the algorithm in all mentioned versions in high dimensions is much smaller than the time of other implemented algorithms. Also, our algorithms can be used with thousands of multidimensional observations.


2017 ◽  
Vol 45 (5) ◽  
pp. 864-887
Author(s):  
Lars Pforte ◽  
Chris Brunsdon ◽  
Conor Cahalane ◽  
Martin Charlton

This paper discusses a project on the completion of a database of socio-economic indicators across the European Union for the years from 1990 onward at various spatial scales. Thus the database consists of various time series with a spatial component. As a substantial amount of the data was missing a method of imputation was required to complete the database. A Markov Chain Monte Carlo approach was opted for. We describe the Markov Chain Monte Carlo method in detail. Furthermore, we explain how we achieved spatial coherence between different time series and their observed and estimated data points.


2011 ◽  
Vol 7 (S285) ◽  
pp. 291-293 ◽  
Author(s):  
Seo-Won Chang ◽  
Yong-Ik Byun ◽  
Dae-Won Kim

AbstractWe present a new photometric reduction method for precise time-series photometry of non-crowded fields that does not need to involve relatively complicated and CPU intensive techniques such as point-spread-function (PSF) fitting or difference image analysis. This method, which combines multi-aperture index photometry and a spatio-temporal de-trending algorithm, gives much superior performance in data recovery and light-curve precision. In practice, the brutal filtering that is often applied to remove outlying data points can result in the loss of vital data, with seriously negative impacts on short-term variations such as flares. Our method utilizes nearly 100% of available data and reduces the rms scatter to several times smaller than that for archived light curves for brighter stars. We outline the details of our new method, and apply it to cases of sample data from the MMT survey of the M37 field, and the HAT-South survey.


Sign in / Sign up

Export Citation Format

Share Document