Reconstructing Missing and Anomalous Data Collected from High-Frequency In-Situ Sensors in Fresh Waters

In situ sensors that collect high-frequency data are used increasingly to monitor aquatic environments. These sensors are prone to technical errors, resulting in unrecorded observations and/or anomalous values that are subsequently removed and create gaps in time series data. We present a framework based on generalized additive and auto-regressive models to recover these missing data. To mimic sporadically missing (i) single observations and (ii) periods of contiguous observations, we randomly removed (i) point data and (ii) day- and week-long sequences of data from a two-year time series of nitrate concentration data collected from Arikaree River, USA, where synoptically collected water temperature, turbidity, conductance, elevation, and dissolved oxygen data were available. In 72% of cases with missing point data, predicted values were within the sensor precision interval of the original value, although predictive ability declined when sequences of missing data occurred. Precision also depended on the availability of other water quality covariates. When covariates were available, even a sudden, event-based peak in nitrate concentration was reconstructed well. By providing a promising method for accurate prediction of missing data, the utility and confidence in summary statistics and statistical trends will increase, thereby assisting the effective monitoring and management of fresh waters and other at-risk ecosystems.

Download Full-text

An easy way to create duration variables in binary cross-sectional time-series data

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x20976322 ◽

2020 ◽

Vol 20 (4) ◽

pp. 916-930

Author(s):

Andrew Q. Philips

Keyword(s):

Time Series ◽

Missing Data ◽

Time Series Data ◽

Series Data ◽

Duration Dependence ◽

Cross Sectional ◽

Common Solution ◽

The Common

In cross-sectional time-series data with a dichotomous dependent variable, failing to account for duration dependence when it exists can lead to faulty inferences. A common solution is to include duration dummies, polynomials, or splines to proxy for duration dependence. Because creating these is not easy for the common practitioner, I introduce a new command, mkduration, that is a straightforward way to generate a duration variable for binary cross-sectional time-series data in Stata. mkduration can handle various forms of missing data and allows the duration variable to easily be turned into common parametric and nonparametric approximations.

Download Full-text

A New Method to Detect Outliers in High-frequency Time Series

International Journal of Statistics and Probability ◽

10.5539/ijsp.v8n1p16 ◽

2018 ◽

Vol 8 (1) ◽

pp. 16

Author(s):

Ilaria Lucrezia Amerise ◽

Agostino Tarsitano

Keyword(s):

Time Series ◽

High Frequency ◽

Time Series Data ◽

New Method ◽

Series Data ◽

Absolute Difference ◽

Data Preparation ◽

Simple Method ◽

Pass Through ◽

Nonparametric Procedure

The objective of this research is to develop a fast, simple method for detecting and replacing extreme spikes in high-frequency time series data. The method primarily consists  of a nonparametric procedure that pursues a balance between fidelity to observed data and smoothness. Furthermore, through examination of the absolute difference between original and smoothed values, the technique is also able to detect and, where necessary, replace outliers with less extreme data. Unlike other filtering procedures found in the literature, our method does not require a model to be specified for the data. Additionally, the filter makes only a single pass through the time series. Experiments  show that the new method can be validly used as a data preparation tool to ensure that time series modeling is supported by clean data, particularly in a complex context such as one with high-frequency data.

Download Full-text

Understanding Collective Human Mobility Spatiotemporal Patterns on Weekdays from Taxi Origin-Destination Point Data

Sensors ◽

10.3390/s19122812 ◽

2019 ◽

Vol 19 (12) ◽

pp. 2812 ◽

Cited By ~ 3

Author(s):

Jing Yang ◽

Yizhong Sun ◽

Bowen Shang ◽

Lei Wang ◽

Jie Zhu

Keyword(s):

Time Series ◽

Time Series Data ◽

Human Mobility ◽

Spatiotemporal Patterns ◽

Classification Model ◽

Series Data ◽

Similarity Measurement ◽

Mobility Patterns ◽

Point Data ◽

Destination Point

With the availability of large geospatial datasets, the study of collective human mobility spatiotemporal patterns provides a new way to explore urban spatial environments from the perspective of residents. In this paper, we constructed a classification model for mobility patterns that is suitable for taxi OD (Origin-Destination) point data, and it is comprised of three parts. First, a new aggregate unit, which uses a road intersection as the constraint condition, is designed for the analysis of the taxi OD point data. Second, the time series similarity measurement is improved by adding a normalization procedure and time windows to address the particular characteristics of the taxi time series data. Finally, the DBSCAN algorithm is used to classify the time series into different mobility patterns based on a proximity index that is calculated using the improved similarity measurement. In addition, we used the random forest algorithm to establish a correlation model between the mobility patterns and the regional functional characteristics. Based on the taxi OD point data from Nanjing, we delimited seven mobility patterns and illustrated that the regional functions have obvious driving effects on these mobility patterns. These findings are applicable to urban planning, traffic management and planning, and land use analyses in the future.

Download Full-text

Missing data imputation of high‐resolution temporal climate time series data

Meteorological Applications ◽

10.1002/met.1873 ◽

2020 ◽

Vol 27 (1) ◽

Author(s):

E Afrifa‐Yamoah ◽

U. A. Mueller ◽

S. M. Taylor ◽

A. J. Fisher

Keyword(s):

Time Series ◽

Missing Data ◽

High Resolution ◽

Time Series Data ◽

Series Data ◽

Data Imputation ◽

Missing Data Imputation ◽

Climate Time Series

Download Full-text

Use of remote sensing and long-term in-situ time-series data in an integrated hydrological model of the Central Kalahari Basin, Southern Africa

Hydrogeology Journal ◽

10.1007/s10040-019-01954-9 ◽

2019 ◽

Vol 27 (5) ◽

pp. 1541-1562 ◽

Cited By ~ 6

Author(s):

Moiteela Lekula ◽

Maciek W. Lubczynski

Keyword(s):

Remote Sensing ◽

Time Series ◽

Southern Africa ◽

Hydrological Model ◽

Time Series Data ◽

Series Data

Download Full-text

A comparison of ground vehicle mobility analysis based on soil moisture time series datasets from WindSat, LIS, and in situ sensors

Journal of Terramechanics ◽

10.1016/j.jterra.2016.02.002 ◽

2016 ◽

Vol 65 ◽

pp. 49-59 ◽

Cited By ~ 3

Author(s):

Maria T. Stevens ◽

George B. McKinley ◽

Farshid Vahedifard

Keyword(s):

Time Series ◽

Soil Moisture ◽

Mobility Analysis ◽

Ground Vehicle ◽

In Situ Sensors ◽

Vehicle Mobility

Download Full-text

Time-Series Causality with Missing Data

Journal of Computational Vision and Imaging Systems ◽

10.15353/jcvis.v6i1.3552 ◽

2021 ◽

Vol 6 (1) ◽

pp. 1-4

Author(s):

Bo Yuan Chang ◽

Mohamed A. Naiel ◽

Steven Wardell ◽

Stan Kleinikkink ◽

John S. Zelek

Keyword(s):

Time Series ◽

Missing Data ◽

Missing Values ◽

Time Series Data ◽

Multivariate Time Series ◽

Gaussian Process Regression ◽

Series Data ◽

Causal Relationships ◽

Sampled Data ◽

The Past

Over the past years, researchers have proposed various methods to discover causal relationships among time-series data as well as algorithms to fill in missing entries in time-series data. Little to no work has been done in combining the two strategies for the purpose of learning causal relationships using unevenly sampled multivariate time-series data. In this paper, we examine how the causal parameters learnt from unevenly sampled data (with missing entries) deviates from the parameters learnt using the evenly sampled data (without missing entries). However, to obtain the causal relationship from a given time-series requires evenly sampled data, which suggests filling the missing data values before obtaining the causal parameters. Therefore, the proposed method is based on applying a Gaussian Process Regression (GPR) model for missing data recovery, followed by several pairwise Granger causality equations in Vector Autoregssive form to fit the recovered data and obtain the causal parameters. Experimental results show that the causal parameters generated by using GPR data filling offers much lower RMSE than the dummy model (fill with last seen entry) under all missing values percentage, suggesting that GPR data filling can better preserve the causal relationships when compared with dummy data filling, thus should be considered when dealing with unevenly sampled time-series causality learning.

Download Full-text

Contextualizing time-series data: quantification of short-term regional variability in the San Pedro Channel using high-resolution in situ glider data

Biogeosciences ◽

10.5194/bg-15-6151-2018 ◽

2018 ◽

Vol 15 (20) ◽

pp. 6151-6165 ◽

Cited By ~ 1

Author(s):

Elizabeth N. Teel ◽

Xiao Liu ◽

Bridget N. Seegers ◽

Matthew A. Ragan ◽

William Z. Haskell ◽

...

Keyword(s):

Time Series ◽

High Resolution ◽

Water Column ◽

Time Series Data ◽

Series Data ◽

Regional Variability ◽

Modes Of Variability ◽

Subsurface Chlorophyll Maximum ◽

San Pedro

Abstract. Oceanic time series have been instrumental in providing an understanding of biological, physical, and chemical dynamics in the oceans and how these processes change over time. However, the extrapolation of these results to larger oceanographic regions requires an understanding and characterization of local versus regional drivers of variability. Here we use high-frequency spatial and temporal glider data to quantify variability at the coastal San Pedro Ocean Time-series (SPOT) site in the San Pedro Channel (SPC) and provide insight into the underlying oceanographic dynamics for the site. The dataset could be described by a combination of four water column profile types that typified active upwelling, a surface bloom, warm-stratified low-nutrient conditions, and a subsurface chlorophyll maximum. On weekly timescales, the SPOT station was on average representative of 64 % of profiles taken within the SPC. In general, shifts in water column profile characteristics at SPOT were also observed across the entire channel. On average, waters across the SPC were most similar to offshore profiles, suggesting that SPOT time series data would be more impacted by regional changes in circulation than local coastal events. These results indicate that high-resolution in situ glider deployments can be used to quantify major modes of variability and provide context for interpreting time series data, allowing for broader application of these datasets and greater integration into modeling efforts.

Download Full-text