Data science, statistics, and time series

Author(s):  
Patrick Bangert
Keyword(s):  
2020 ◽  
Vol 26 (3) ◽  
Author(s):  
Rex W. Douglass ◽  
Thomas Leo Scherer ◽  
Erik Gartzke

AbstractOne of the main ways we try to understand the COVID-19 pandemic is through time series cross section counts of cases and deaths. Observational studies based on these kinds of data have concrete and well known methodological issues that suggest significant caution for both consumers and produces of COVID-19 knowledge. We briefly enumerate some of these issues in the areas of measurement, inference, and interpretation.


2021 ◽  
Vol 257 ◽  
pp. 83-100
Author(s):  
Andrew Harvey

This article shows how new time series models can be used to track the progress of an epidemic, forecast key variables and evaluate the effects of policies. The univariate framework of Harvey and Kattuman (2020, Harvard Data Science Review, Special Issue 1—COVID-19, https://hdsr.mitpress.mit.edu/pub/ozgjx0yn) is extended to model the relationship between two or more series and the role of common trends is discussed. Data on daily deaths from COVID-19 in Italy and the UK provides an example of leading indicators when there is a balanced growth. When growth is not balanced, the model can be extended by including a non-stationary component in one of the series. The viability of this model is investigated by examining the relationship between new cases and deaths in the Florida second wave of summer 2020. The balanced growth framework is then used as the basis for policy evaluation by showing how some variables can serve as control groups for a target variable. This approach is used to investigate the consequences of Sweden’s soft lockdown coronavirus policy in the spring of 2020.


Author(s):  
Clony Junior ◽  
Pedro Gusmão ◽  
José Moreira ◽  
Ana Maria M. Tome

Data science highlights fields of study and research such as time series, which, although widely explored in the past, gain new perspectives in the context of this discipline. This chapter presents two approaches to time series forecasting, long short-term memory (LSTM), a special kind of recurrent neural network (RNN), and Prophet, an open-source library developed by Facebook for time series forecasting. With a focus on developing forecasting processes by data mining or machine learning experts, LSTM uses gating mechanisms to deal with long-term dependencies, reducing the short-term memory effect inherent to the traditional RNN. On the other hand, Prophet encapsulates statistical and computational complexity to allow broad use of time series forecasting, prioritizing the expert's business knowledge through exploration and experimentation. Both approaches were applied to a retail time series. This case study comprises daily and half-hourly forecasts, and the performance of both methods was measured using the standard metrics.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Kenji Yamanishi ◽  
Linchuan Xu ◽  
Ryo Yuki ◽  
Shintaro Fukushima ◽  
Chuan-hao Lin

AbstractWe are concerned with the issue of detecting changes and their signs from a data stream. For example, when given time series of COVID-19 cases in a region, we may raise early warning signals of an epidemic by detecting signs of changes in the data. We propose a novel methodology to address this issue. The key idea is to employ a new information-theoretic notion, which we call the differential minimum description length change statistics (D-MDL), for measuring the scores of change sign. We first give a fundamental theory for D-MDL. We then demonstrate its effectiveness using synthetic datasets. We apply it to detecting early warning signals of the COVID-19 epidemic using time series of the cases for individual countries. We empirically demonstrate that D-MDL is able to raise early warning signals of events such as significant increase/decrease of cases. Remarkably, for about $$64\%$$ 64 % of the events of significant increase of cases in studied countries, our method can detect warning signals as early as nearly six days on average before the events, buying considerably long time for making responses. We further relate the warning signals to the dynamics of the basic reproduction number R0 and the timing of social distancing. The results show that our method is a promising approach to the epidemic analysis from a data science viewpoint.


Author(s):  
Sean J Taylor ◽  
Benjamin Letham

Forecasting is a common data science task that helps organizations with capacity planning, goal setting, and anomaly detection. Despite its importance, there are serious challenges associated with producing reliable and high quality forecasts — especially when there are a variety of time series and analysts with expertise in time series modeling are relatively rare. To address these challenges, we describe a practical approach to forecasting “at scale” that combines configurable models with analyst-in-the-loop performance analysis. We propose a modular regression model with interpretable parameters that can be intuitively adjusted by analysts with domain knowledge about the time series. We describe performance analyses to compare and evaluate forecasting procedures, and automatically flag forecasts for manual review and adjustment. Tools that help analysts to use their expertise most effectively enable reliable, practical forecasting of business time series.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Andrei Bratu ◽  
Gabriela Czibula

Data augmentation is a commonly used technique in data science for improving the robustness and performance of machine learning models. The purpose of the paper is to study the feasibility of generating synthetic data points of temporal nature towards this end. A general approach named DAuGAN (Data Augmentation using Generative Adversarial Networks) is presented for identifying poorly represented sections of a time series, studying the synthesis and integration of new data points, and performance improvement on a benchmark machine learning model. The problem is studied and applied in the domain of algorithmic trading, whose constraints are presented and taken into consideration. The experimental results highlight an improvement in performance on a benchmark reinforcement learning agent trained on a dataset enhanced with DAuGAN to trade a financial instrument.


2021 ◽  
Vol 62 (1) ◽  
pp. 42-52
Author(s):  
Dung Mai Thi Nguyen ◽  
Thu Hoai Thi Vu ◽  

Spatial big data has a large scale and complex, therefore, it cannot be collected, managed, and analyzed by traditional data analytic software shortly. These platforms in many situations are restricted to vectors data. However, the raster data generated by the sensors on the enormous number of satellites now needs to be processed in parallel on the cluster environment. The article introduces the satellite image data analyzing method using the RasterFrames library on the Apache Spark platform. The RasterFrames library examines raster data for Python, Scala, and SQL, bringing the power of Spark DataFrames to access to Earth Observation, cloud computing, and data science. In the experimental part, the NDVI and the change in the average value of NDVI in the time series are calculated to demonstrate the vegetation mantle changes in Phu Tho province. These results are the reference data source in the assessment of weather, climate, and environmental changes in the study area during that time.


2019 ◽  
Vol 15 (2) ◽  
pp. 43-57
Author(s):  
Seng Hansun ◽  
Vincent Charles ◽  
Christiana Rini Indrati ◽  
Subanar

Time series are one of the most common data types encountered by data scientists and, in the context of today's exponentially increasing data, learning how to best model them to derive meaningful insights is an important skill in the Big Data and Data Science toolbox. As a result, many researchers have dedicated their efforts to developing time series analysis methods to predict future values based on previously observed values. One of the well-known methods is the Holt-Winters' seasonal method, which is commonly used to capture the seasonality effect in time series data. In this study, the authors aim to build upon the Holt-Winters' additive method by introducing new formulas for finding the initial values. Obtaining more accurate estimations of the initial values could result in a better forecasting result. The authors use the basic principle found in the weighted moving average method to assign more weight to the most recent data and combine it with the original initial conditions found in the Holt-Winters' additive method. Based on the experiment performed, the authors conclude that the new formulas for finding the initial values in the Holt-Winters' additive method could give a better forecasting when compared to the traditional Holt-Winters' additive method and the weighted moving average method in terms of the accuracy level.


Author(s):  
Luis Alexander Calvo-Valverde ◽  
David Elías Alfaro-Barboza

The ability to make short or long term predictions is at the heart of much of science. In the last decade, the data science community have been highly interested in foretelling real life events, using data mining techniques to discover meaningful rules or patterns, from different data types, including Time Series. Short-term predictions based on “the shape” of meaningful rules lead to a vast number of applications. The discovery of meaningful rules is achieved through efficient algorithms, equipped with a robust and accurate distance measure. Consequently, it is important to wisely choose a distance measure that can deal with noise, entropy and other technical constraints, to get accurate outcomes of similarity from the comparison between two time series. In this work, we do believe that Dynamic Time Warping based on Cubic Spline Interpolation (SIDTW), can be useful to carry out the similarity computation for two specific algorithms: 1- DiscoverRules() and 2- TestRules(). Mohammad Shokoohi-Yekta et al developed a framework, using these two algoritghms, to find and test meaningful rules from time series. Our research expanded the scope of their project, adding a set of well-known similarity search measures, including SIDTW as novel and enhanced version of DTW.


Sign in / Sign up

Export Citation Format

Share Document