Data science, statistics, and time series

The Data Science of COVID-19 Spread: Some Troubling Current and Future Trends

Peace Economics Peace Science and Public Policy ◽

10.1515/peps-2020-0053 ◽

2020 ◽

Vol 26 (3) ◽

Author(s):

Rex W. Douglass ◽

Thomas Leo Scherer ◽

Erik Gartzke

Keyword(s):

Time Series ◽

Cross Section ◽

Observational Studies ◽

Data Science ◽

Future Trends ◽

Methodological Issues

AbstractOne of the main ways we try to understand the COVID-19 pandemic is through time series cross section counts of cases and deaths. Observational studies based on these kinds of data have concrete and well known methodological issues that suggest significant caution for both consumers and produces of COVID-19 knowledge. We briefly enumerate some of these issues in the areas of measurement, inference, and interpretation.

Download Full-text

TIME SERIES MODELLING OF EPIDEMICS: LEADING INDICATORS, CONTROL GROUPS AND POLICY ASSESSMENT

National Institute Economic Review ◽

10.1017/nie.2021.21 ◽

2021 ◽

Vol 257 ◽

pp. 83-100

Author(s):

Andrew Harvey

Keyword(s):

Time Series ◽

Data Science ◽

Leading Indicators ◽

Control Groups ◽

Balanced Growth ◽

Time Series Modelling ◽

Key Variables ◽

The Uk ◽

The Relationship

This article shows how new time series models can be used to track the progress of an epidemic, forecast key variables and evaluate the effects of policies. The univariate framework of Harvey and Kattuman (2020, Harvard Data Science Review, Special Issue 1—COVID-19, https://hdsr.mitpress.mit.edu/pub/ozgjx0yn) is extended to model the relationship between two or more series and the role of common trends is discussed. Data on daily deaths from COVID-19 in Italy and the UK provides an example of leading indicators when there is a balanced growth. When growth is not balanced, the model can be extended by including a non-stationary component in one of the series. The viability of this model is investigated by examining the relationship between new cases and deaths in the Florida second wave of summer 2020. The balanced growth framework is then used as the basis for policy evaluation by showing how some variables can serve as control groups for a target variable. This approach is used to investigate the consequences of Sweden’s soft lockdown coronavirus policy in the spring of 2020.

Download Full-text

Time Series Forecasting in Retail Sales Using LSTM and Prophet

Advances in Business Information Systems and Analytics - Handbook of Research on Applied Data Science and Artificial Intelligence in Business and Industry ◽

10.4018/978-1-7998-6985-6.ch011 ◽

2021 ◽

pp. 241-262

Author(s):

Clony Junior ◽

Pedro Gusmão ◽

José Moreira ◽

Ana Maria M. Tome

Keyword(s):

Time Series ◽

Data Science ◽

Short Term Memory ◽

Special Kind ◽

Time Series Forecasting ◽

Short Term ◽

Term Memory ◽

Use Of Time

Data science highlights fields of study and research such as time series, which, although widely explored in the past, gain new perspectives in the context of this discipline. This chapter presents two approaches to time series forecasting, long short-term memory (LSTM), a special kind of recurrent neural network (RNN), and Prophet, an open-source library developed by Facebook for time series forecasting. With a focus on developing forecasting processes by data mining or machine learning experts, LSTM uses gating mechanisms to deal with long-term dependencies, reducing the short-term memory effect inherent to the traditional RNN. On the other hand, Prophet encapsulates statistical and computational complexity to allow broad use of time series forecasting, prioritizing the expert's business knowledge through exploration and experimentation. Both approaches were applied to a retail time series. This case study comprises daily and half-hourly forecasts, and the performance of both methods was measured using the standard metrics.

Download Full-text

Change sign detection with differential MDL change statistics and its applications to COVID-19 pandemic analysis

Scientific Reports ◽

10.1038/s41598-021-98781-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Kenji Yamanishi ◽

Linchuan Xu ◽

Ryo Yuki ◽

Shintaro Fukushima ◽

Chuan-hao Lin

Keyword(s):

Time Series ◽

Early Warning ◽

Data Science ◽

Minimum Description Length ◽

Length Change ◽

Warning Signals ◽

Early Warning Signals ◽

New Information ◽

Long Time ◽

Synthetic Datasets

AbstractWe are concerned with the issue of detecting changes and their signs from a data stream. For example, when given time series of COVID-19 cases in a region, we may raise early warning signals of an epidemic by detecting signs of changes in the data. We propose a novel methodology to address this issue. The key idea is to employ a new information-theoretic notion, which we call the differential minimum description length change statistics (D-MDL), for measuring the scores of change sign. We first give a fundamental theory for D-MDL. We then demonstrate its effectiveness using synthetic datasets. We apply it to detecting early warning signals of the COVID-19 epidemic using time series of the cases for individual countries. We empirically demonstrate that D-MDL is able to raise early warning signals of events such as significant increase/decrease of cases. Remarkably, for about $$64\%$$ 64 % of the events of significant increase of cases in studied countries, our method can detect warning signals as early as nearly six days on average before the events, buying considerably long time for making responses. We further relate the warning signals to the dynamics of the basic reproduction number R0 and the timing of social distancing. The results show that our method is a promising approach to the epidemic analysis from a data science viewpoint.

Download Full-text

Forecasting at scale

10.7287/peerj.preprints.3190 ◽

2017 ◽

Cited By ~ 4

Author(s):

Sean J Taylor ◽

Benjamin Letham

Keyword(s):

Time Series ◽

Performance Analysis ◽

Goal Setting ◽

Capacity Planning ◽

Domain Knowledge ◽

Data Science ◽

Time Series Modeling ◽

High Quality ◽

Performance Analyses ◽

Manual Review

Forecasting is a common data science task that helps organizations with capacity planning, goal setting, and anomaly detection. Despite its importance, there are serious challenges associated with producing reliable and high quality forecasts — especially when there are a variety of time series and analysts with expertise in time series modeling are relatively rare. To address these challenges, we describe a practical approach to forecasting “at scale” that combines configurable models with analyst-in-the-loop performance analysis. We propose a modular regression model with interpretable parameters that can be intuitively adjusted by analysts with domain knowledge about the time series. We describe performance analyses to compare and evaluate forecasting procedures, and automatically flag forecasts for manual review and adjustment. Tools that help analysts to use their expertise most effectively enable reliable, practical forecasting of business time series.

Download Full-text

DAuGAN: An Approach for Augmenting Time Series Imbalanced Datasets via Latent Space Sampling Using Adversarial Techniques

Scientific Programming ◽

10.1155/2021/7877590 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Andrei Bratu ◽

Gabriela Czibula

Keyword(s):

Machine Learning ◽

Time Series ◽

Data Science ◽

Data Augmentation ◽

Synthetic Data ◽

Generative Adversarial Networks ◽

Learning Agent ◽

Machine Learning Model ◽

Data Points ◽

And Performance

Data augmentation is a commonly used technique in data science for improving the robustness and performance of machine learning models. The purpose of the paper is to study the feasibility of generating synthetic data points of temporal nature towards this end. A general approach named DAuGAN (Data Augmentation using Generative Adversarial Networks) is presented for identifying poorly represented sections of a time series, studying the synthesis and integration of new data points, and performance improvement on a benchmark machine learning model. The problem is studied and applied in the domain of algorithmic trading, whose constraints are presented and taken into consideration. The experimental results highlight an improvement in performance on a benchmark reinforcement learning agent trained on a dataset enhanced with DAuGAN to trade a financial instrument.

Download Full-text

Vegetation change detection based on time series analysis by Apache Spark and RasterFrame

Journal of Mining and Earth Sciences ◽

10.46326/jmes.2021.62(1).06 ◽

2021 ◽

Vol 62 (1) ◽

pp. 42-52

Author(s):

Dung Mai Thi Nguyen ◽

Thu Hoai Thi Vu ◽

Keyword(s):

Time Series ◽

Large Scale ◽

Data Science ◽

Vegetation Change ◽

Environmental Changes ◽

Satellite Image ◽

Image Data ◽

Apache Spark ◽

Raster Data ◽

Average Value

Spatial big data has a large scale and complex, therefore, it cannot be collected, managed, and analyzed by traditional data analytic software shortly. These platforms in many situations are restricted to vectors data. However, the raster data generated by the sensors on the enormous number of satellites now needs to be processed in parallel on the cluster environment. The article introduces the satellite image data analyzing method using the RasterFrames library on the Apache Spark platform. The RasterFrames library examines raster data for Python, Scala, and SQL, bringing the power of Spark DataFrames to access to Earth Observation, cloud computing, and data science. In the experimental part, the NDVI and the change in the average value of NDVI in the time series are calculated to demonstrate the vegetation mantle changes in Phu Tho province. These results are the reference data source in the assessment of weather, climate, and environmental changes in the study area during that time.

Download Full-text

Revisiting the Holt-Winters' Additive Method for Better Forecasting

International Journal of Enterprise Information Systems ◽

10.4018/ijeis.2019040103 ◽

2019 ◽

Vol 15 (2) ◽

pp. 43-57

Author(s):

Seng Hansun ◽

Vincent Charles ◽

Christiana Rini Indrati ◽

Subanar

Keyword(s):

Time Series ◽

Average Method ◽

Data Science ◽

Time Series Data ◽

Initial Conditions ◽

Moving Average ◽

Series Data ◽

Initial Values ◽

Additive Method ◽

Weighted Moving Average

Time series are one of the most common data types encountered by data scientists and, in the context of today's exponentially increasing data, learning how to best model them to derive meaningful insights is an important skill in the Big Data and Data Science toolbox. As a result, many researchers have dedicated their efforts to developing time series analysis methods to predict future values based on previously observed values. One of the well-known methods is the Holt-Winters' seasonal method, which is commonly used to capture the seasonality effect in time series data. In this study, the authors aim to build upon the Holt-Winters' additive method by introducing new formulas for finding the initial values. Obtaining more accurate estimations of the initial values could result in a better forecasting result. The authors use the basic principle found in the weighted moving average method to assign more weight to the most recent data and combine it with the original initial conditions found in the Holt-Winters' additive method. Based on the experiment performed, the authors conclude that the new formulas for finding the initial values in the Holt-Winters' additive method could give a better forecasting when compared to the traditional Holt-Winters' additive method and the weighted moving average method in terms of the accuracy level.

Download Full-text

An HPC-Driven Data Science Platform to Speed-up Time Series Data Analysis of Patients with the Acute Respiratory Distress Syndrome

10.23919/mipro52101.2021.9596840 ◽

2021 ◽

Author(s):

C. Barakat ◽

S. Fritsch ◽

M. Riedel ◽

S. Brynjolfsson

Keyword(s):

Time Series ◽

Acute Respiratory Distress Syndrome ◽

Data Analysis ◽

Respiratory Distress Syndrome ◽

Data Science ◽

Distress Syndrome ◽

Time Series Data ◽

Series Data ◽

Speed Up ◽

Time Series Data Analysis

Download Full-text

Discovery of Meaningful Rules by using DTW based on Cubic Spline Interpolation

Revista Tecnología en Marcha ◽

10.18845/tm.v33i2.4073 ◽

2020 ◽

Author(s):

Luis Alexander Calvo-Valverde ◽

David Elías Alfaro-Barboza

Keyword(s):

Time Series ◽

Data Science ◽

Distance Measure ◽

Spline Interpolation ◽

Real Life ◽

Cubic Spline ◽

Data Types ◽

Vast Number ◽

Cubic Spline Interpolation ◽

Using Data

The ability to make short or long term predictions is at the heart of much of science. In the last decade, the data science community have been highly interested in foretelling real life events, using data mining techniques to discover meaningful rules or patterns, from different data types, including Time Series. Short-term predictions based on “the shape” of meaningful rules lead to a vast number of applications. The discovery of meaningful rules is achieved through efficient algorithms, equipped with a robust and accurate distance measure. Consequently, it is important to wisely choose a distance measure that can deal with noise, entropy and other technical constraints, to get accurate outcomes of similarity from the comparison between two time series. In this work, we do believe that Dynamic Time Warping based on Cubic Spline Interpolation (SIDTW), can be useful to carry out the similarity computation for two specific algorithms: 1- DiscoverRules() and 2- TestRules(). Mohammad Shokoohi-Yekta et al developed a framework, using these two algoritghms, to find and test meaningful rules from time series. Our research expanded the scope of their project, adding a set of well-known similarity search measures, including SIDTW as novel and enhanced version of DTW.

Download Full-text