Analysing “Long Data” on Collective Violence in Indonesia

2015 ◽  
Vol 43 (5) ◽  
pp. 613-633
Author(s):  
David A. Meyer ◽  
Arthur Stein

“Long data”, i.e., temporal data disaggregated to short time intervals to form a long time series, is a particularly interesting type of “big data”. Financial data are often available in this form (e.g., many years of daily stock prices), but until recently long data for other social, and even other economic, processes have been rare. Over the last decade, however, long data have begun to be extracted from (digitized) text, and then used to assess or formulate micro-level and macro-level theories. The UN Support Facility for Indonesian Recovery (UNSFIR) collected a long data set of incidents of collective violence in 14 Indonesian provinces during the 14 year period 1990–2003. In this paper we exploit the “length” of the UNSFIR data by applying several time series analysis methods. These reveal some previously unobserved features of collective violence in Indonesia—including periodic components and long time correlations—with important social/political interpretations and consequences for explanatory model building.

2021 ◽  
Vol 18 (32) ◽  
Author(s):  
Stanko Stanić ◽  
Bojan Baškot

Panel regression model may seem like an appealing solution in conditions of limited time series. This is often used as a shortcut to achieve deeper data set by setting several individual cases on the same time dimension, where cross units visually but not really multiply a time frame. Macroeconometrics of the Western Balkan region assumes short time series issue. Additionally, the structural brakes are numerous. Panel regression may seem like a solution, but there are some limitations that should be considered.


2020 ◽  
Author(s):  
Mieke Kuschnerus ◽  
Roderik Lindenbergh ◽  
Sander Vos

Abstract. Sandy coasts are constantly changing environments governed by complex interacting processes. Permanent laser scanning is a promising technique to monitor such coastal areas and support analysis of geomorphological deformation processes. This novel technique delivers 3D representations of a part of the coast at hourly temporal and centimetre spatial resolution and allows to observe small scale changes in elevation over extended periods of time. These observations have the potential to improve understanding and modelling of coastal deformation processes. However, to be of use to coastal researchers and coastal management, an efficient way to find and extract deformation processes from the large spatio-temporal data set is needed. In order to allow data mining in an automated way, we extract time series in elevation or range and use unsupervised learning algorithms to derive a partitioning of the observed area according to change patterns. We compare three well known clustering algorithms, k-means, agglomerative clustering and DBSCAN, and identify areas that undergo similar evolution during one month. We test if they fulfil our criteria for a suitable clustering algorithm on our exemplary data set. The three clustering methods are applied to time series of 30 epochs (during one month) extracted from a data set of daily scans covering a part of the coast at Kijkduin, the Netherlands. A small section of the beach, where a pile of sand was accumulated by a bulldozer is used to evaluate the performance of the algorithms against a ground truth. The k-means algorithm and agglomerative clustering deliver similar clusters, and both allow to identify a fixed number of dominant deformation processes in sandy coastal areas, such as sand accumulation by a bulldozer or erosion in the intertidal area. The DBSCAN algorithm finds clusters for only about 44 % of the area and turns out to be more suitable for the detection of outliers, caused for example by temporary objects on the beach. Our study provides a methodology to efficiently mine a spatio-temporal data set for predominant deformation patterns with the associated regions, where they occur.


Entropy ◽  
2019 ◽  
Vol 21 (4) ◽  
pp. 385 ◽  
Author(s):  
David Cuesta-Frau ◽  
Juan Pablo Murillo-Escobar ◽  
Diana Alexandra Orrego ◽  
Edilson Delgado-Trejos

Permutation Entropy (PE) is a time series complexity measure commonly used in a variety of contexts, with medicine being the prime example. In its general form, it requires three input parameters for its calculation: time series length N, embedded dimension m, and embedded delay τ . Inappropriate choices of these parameters may potentially lead to incorrect interpretations. However, there are no specific guidelines for an optimal selection of N, m, or τ , only general recommendations such as N > > m ! , τ = 1 , or m = 3 , … , 7 . This paper deals specifically with the study of the practical implications of N > > m ! , since long time series are often not available, or non-stationary, and other preliminary results suggest that low N values do not necessarily invalidate PE usefulness. Our study analyses the PE variation as a function of the series length N and embedded dimension m in the context of a diverse experimental set, both synthetic (random, spikes, or logistic model time series) and real–world (climatology, seismic, financial, or biomedical time series), and the classification performance achieved with varying N and m. The results seem to indicate that shorter lengths than those suggested by N > > m ! are sufficient for a stable PE calculation, and even very short time series can be robustly classified based on PE measurements before the stability point is reached. This may be due to the fact that there are forbidden patterns in chaotic time series, not all the patterns are equally informative, and differences among classes are already apparent at very short lengths.


2020 ◽  
Vol 12 (1) ◽  
pp. 54-61
Author(s):  
Abdullah M. Almarashi ◽  
Khushnoor Khan

The current study focused on modeling times series using Bayesian Structural Time Series technique (BSTS) on a univariate data-set. Real-life secondary data from stock prices for flying cement covering a period of one year was used for analysis. Statistical results were based on simulation procedures using Kalman filter and Monte Carlo Markov Chain (MCMC). Though the current study involved stock prices data, the same approach can be applied to complex engineering process involving lead times. Results from the current study were compared with classical Autoregressive Integrated Moving Average (ARIMA) technique. For working out the Bayesian posterior sampling distributions BSTS package run with R software was used. Four BSTS models were used on a real data set to demonstrate the working of BSTS technique. The predictive accuracy for competing models was assessed using Forecasts plots and Mean Absolute Percent Error (MAPE). An easyto-follow approach was adopted so that both academicians and practitioners can easily replicate the mechanism. Findings from the study revealed that, for short-term forecasting, both ARIMA and BSTS are equally good but for long term forecasting, BSTS with local level is the most plausible option.


Author(s):  
Richard Heuver ◽  
Ronald Heijmans

In this chapter the authors provide a method to aggregate large value payment system transaction data for executing simulations with the Bank of Finland payment simulator. When transaction data sets get large, simulation may become too time consuming in terms of computer power. Therefore, insufficient data from a statistical point of view can be processed. The method described in this chapter provides a solution to this statistical problem. In order to work around this problem the authors provide a method to aggregate transaction data set in such a way that it does not compromise the outcome of the simulation significantly. Depending on the type of simulations only a few business days or up to a year of data is required. In case of stress scenario analysis, in which e.g. liquidity position of banks deteriorates, long time series are preferred as business days can differ substantially. As an example this chapter shows that aggregating all low value transactions in the Dutch part of TARGET2 will not lead to a significantly different simulation outcome.


2016 ◽  
Vol 14 (1) ◽  
pp. 1074-1086 ◽  
Author(s):  
Mehmet Ali Balci

AbstractIn this study, we present an epidemic model that characterizes the behavior of a financial network of globally operating stock markets. Since the long time series have a global memory effect, we represent our model by using the fractional calculus. This model operates on a network, where vertices are the stock markets and edges are constructed by the correlation distances. Thereafter, we find an analytical solution to commensurate system and use the well-known differential transform method to obtain the solution of incommensurate system of fractional differential equations. Our findings are confirmed and complemented by the data set of the relevant stock markets between 2006 and 2016. Rather than the hypothetical values, we use the Hurst Exponent of each time series to approximate the fraction size and graph theoretical concepts to obtain the variables.


2010 ◽  
Vol 17 (6) ◽  
pp. 753-764 ◽  
Author(s):  
H. F. Astudillo ◽  
F. A. Borotto ◽  
R. Abarca-del-Rio

Abstract. We propose an alternative approach for the embedding space reconstruction method for short time series. An m-dimensional embedding space is reconstructed with a set of time delays including the relevant time scales characterizing the dynamical properties of the system. By using a maximal predictability criterion a d-dimensional subspace is selected with its associated set of time delays, in which a local nonlinear blind forecasting prediction performs the best reconstruction of a particular event of a time series. An locally unfolded d-dimensional embedding space is then obtained. The efficiency of the methodology, which is mathematically consistent with the fundamental definitions of the local nonlinear long time-scale predictability, was tested with a chaotic time series of the Lorenz system. When applied to the Southern Oscillation Index (SOI) (observational data associated with the El Niño-Southern Oscillation phenomena (ENSO)) an optimal set of embedding parameters exists, that allows constructing the main characteristics of the El Niño 1982–1983 and 1997–1998 events, directly from measurements up to 3 to 4 years in advance.


2005 ◽  
Vol 16 (11) ◽  
pp. 1733-1743 ◽  
Author(s):  
A. CHAKRABORTI ◽  
M. S. SANTHANAM

In this paper, we review some of the properties of financial and other spatio-temporal time series generated from coupled map lattices, GARCH(1,1) processes and random processes (for which analytical results are known). We use the Hurst exponent (R/S analysis) and detrended fluctuation analysis as the tools to study the long-time correlations in the time series. We also compare the eigenvalue properties of the empirical correlation matrices, especially in relation to random matrices.


Author(s):  
Heesung Yoon ◽  
Yongcheol Kim ◽  
Soo-Hyoung Lee ◽  
Kyoochul Ha

In the present study, we designed time series models for predicting groundwater level fluctuations using an artificial neural network (ANN) and a support vector machine (SVM). To estimate the model sensitivity to the range of data set for the model building, numerical tests were conducted using hourly measured groundwater level data at a coastal aquifer of Jeju Island in South Korea. The model performance of the two models is similar and acceptable when the range of input variable lies within the data set for the model building. However, when the range of input variables is beyond it, both the models showed abnormal prediction results: an oscillation for the ANN model and a constant value for SVM. The result of the numerical tests indicates that it is necessary to obtain various types of input and output variables and assign them to the model building process for the success of design time series models of groundwater level prediction.


2021 ◽  
Author(s):  
Tiziano Tirabassi ◽  
Daniela Buske

The recording of air pollution concentration values involves the measurement of a large volume of data. Generally, automatic selectors and explicators are provided by statistics. The use of the Representative Day allows the compilation of large amounts of data in a compact format that will supply meaningful information on the whole data set. The Representative Day (RD) is a real day that best represents (in the meaning of the least squares technique) the set of daily trends of the considered time series. The Least Representative Day (LRD), on the contrary, it is a real day that worst represents (in the meaning of the least squares technique) the set of daily trends of the same time series. The identification of RD and LRD can prove to be a very important tool for identifying both anomalous and standard behaviors of pollutants within the selected period and establishing measures of prevention, limitation and control. Two application examples, in two different areas, are presented related to meteorological and SO 2 and O 3 concentration data sets.


Sign in / Sign up

Export Citation Format

Share Document