scholarly journals Reconstructing Missing and Anomalous Data Collected from High-Frequency In-Situ Sensors in Fresh Waters

Author(s):  
Claire Kermorvant ◽  
Benoit Liquet ◽  
Guy Litt ◽  
Jeremy B. Jones ◽  
Kerrie Mengersen ◽  
...  

In situ sensors that collect high-frequency data are used increasingly to monitor aquatic environments. These sensors are prone to technical errors, resulting in unrecorded observations and/or anomalous values that are subsequently removed and create gaps in time series data. We present a framework based on generalized additive and auto-regressive models to recover these missing data. To mimic sporadically missing (i) single observations and (ii) periods of contiguous observations, we randomly removed (i) point data and (ii) day- and week-long sequences of data from a two-year time series of nitrate concentration data collected from Arikaree River, USA, where synoptically collected water temperature, turbidity, conductance, elevation, and dissolved oxygen data were available. In 72% of cases with missing point data, predicted values were within the sensor precision interval of the original value, although predictive ability declined when sequences of missing data occurred. Precision also depended on the availability of other water quality covariates. When covariates were available, even a sudden, event-based peak in nitrate concentration was reconstructed well. By providing a promising method for accurate prediction of missing data, the utility and confidence in summary statistics and statistical trends will increase, thereby assisting the effective monitoring and management of fresh waters and other at-risk ecosystems.

Author(s):  
Andrew Q. Philips

In cross-sectional time-series data with a dichotomous dependent variable, failing to account for duration dependence when it exists can lead to faulty inferences. A common solution is to include duration dummies, polynomials, or splines to proxy for duration dependence. Because creating these is not easy for the common practitioner, I introduce a new command, mkduration, that is a straightforward way to generate a duration variable for binary cross-sectional time-series data in Stata. mkduration can handle various forms of missing data and allows the duration variable to easily be turned into common parametric and nonparametric approximations.


2018 ◽  
Vol 8 (1) ◽  
pp. 16
Author(s):  
Ilaria Lucrezia Amerise ◽  
Agostino Tarsitano

The objective of this research is to develop a fast, simple method for detecting and replacing extreme spikes in high-frequency time series data. The method primarily consists  of a nonparametric procedure that pursues a balance between fidelity to observed data and smoothness. Furthermore, through examination of the absolute difference between original and smoothed values, the technique is also able to detect and, where necessary, replace outliers with less extreme data. Unlike other filtering procedures found in the literature, our method does not require a model to be specified for the data. Additionally, the filter makes only a single pass through the time series. Experiments  show that the new method can be validly used as a data preparation tool to ensure that time series modeling is supported by clean data, particularly in a complex context such as one with high-frequency data.


Sensors ◽  
2019 ◽  
Vol 19 (12) ◽  
pp. 2812 ◽  
Author(s):  
Jing Yang ◽  
Yizhong Sun ◽  
Bowen Shang ◽  
Lei Wang ◽  
Jie Zhu

With the availability of large geospatial datasets, the study of collective human mobility spatiotemporal patterns provides a new way to explore urban spatial environments from the perspective of residents. In this paper, we constructed a classification model for mobility patterns that is suitable for taxi OD (Origin-Destination) point data, and it is comprised of three parts. First, a new aggregate unit, which uses a road intersection as the constraint condition, is designed for the analysis of the taxi OD point data. Second, the time series similarity measurement is improved by adding a normalization procedure and time windows to address the particular characteristics of the taxi time series data. Finally, the DBSCAN algorithm is used to classify the time series into different mobility patterns based on a proximity index that is calculated using the improved similarity measurement. In addition, we used the random forest algorithm to establish a correlation model between the mobility patterns and the regional functional characteristics. Based on the taxi OD point data from Nanjing, we delimited seven mobility patterns and illustrated that the regional functions have obvious driving effects on these mobility patterns. These findings are applicable to urban planning, traffic management and planning, and land use analyses in the future.


2020 ◽  
Vol 27 (1) ◽  
Author(s):  
E Afrifa‐Yamoah ◽  
U. A. Mueller ◽  
S. M. Taylor ◽  
A. J. Fisher

2021 ◽  
Vol 6 (1) ◽  
pp. 1-4
Author(s):  
Bo Yuan Chang ◽  
Mohamed A. Naiel ◽  
Steven Wardell ◽  
Stan Kleinikkink ◽  
John S. Zelek

Over the past years, researchers have proposed various methods to discover causal relationships among time-series data as well as algorithms to fill in missing entries in time-series data. Little to no work has been done in combining the two strategies for the purpose of learning causal relationships using unevenly sampled multivariate time-series data. In this paper, we examine how the causal parameters learnt from unevenly sampled data (with missing entries) deviates from the parameters learnt using the evenly sampled data (without missing entries). However, to obtain the causal relationship from a given time-series requires evenly sampled data, which suggests filling the missing data values before obtaining the causal parameters. Therefore, the proposed method is based on applying a Gaussian Process Regression (GPR) model for missing data recovery, followed by several pairwise Granger causality equations in Vector Autoregssive form to fit the recovered data and obtain the causal parameters. Experimental results show that the causal parameters generated by using GPR data filling offers much lower RMSE than the dummy model (fill with last seen entry) under all missing values percentage, suggesting that GPR data filling can better preserve the causal relationships when compared with dummy data filling, thus should be considered when dealing with unevenly sampled time-series causality learning.


2018 ◽  
Vol 15 (20) ◽  
pp. 6151-6165 ◽  
Author(s):  
Elizabeth N. Teel ◽  
Xiao Liu ◽  
Bridget N. Seegers ◽  
Matthew A. Ragan ◽  
William Z. Haskell ◽  
...  

Abstract. Oceanic time series have been instrumental in providing an understanding of biological, physical, and chemical dynamics in the oceans and how these processes change over time. However, the extrapolation of these results to larger oceanographic regions requires an understanding and characterization of local versus regional drivers of variability. Here we use high-frequency spatial and temporal glider data to quantify variability at the coastal San Pedro Ocean Time-series (SPOT) site in the San Pedro Channel (SPC) and provide insight into the underlying oceanographic dynamics for the site. The dataset could be described by a combination of four water column profile types that typified active upwelling, a surface bloom, warm-stratified low-nutrient conditions, and a subsurface chlorophyll maximum. On weekly timescales, the SPOT station was on average representative of 64 % of profiles taken within the SPC. In general, shifts in water column profile characteristics at SPOT were also observed across the entire channel. On average, waters across the SPC were most similar to offshore profiles, suggesting that SPOT time series data would be more impacted by regional changes in circulation than local coastal events. These results indicate that high-resolution in situ glider deployments can be used to quantify major modes of variability and provide context for interpreting time series data, allowing for broader application of these datasets and greater integration into modeling efforts.


Sign in / Sign up

Export Citation Format

Share Document