Technical note: A procedure to clean, decompose and aggregate time series
Abstract. Errors, gaps and outliers complicate and sometimes invalidate the analysis of time series. While most fields have developed their own strategy to clean the raw data, no generic procedure has been promoted to standardize the pre-processing. This lack of harmonization makes the inter-comparison of studies difficult, and leads to screening methods that are usually ambiguous or case-specific. This study provides a generic pre-processing procedure (called past, implemented in R) dedicated to any univariate time series. Past is based on data binning and decomposes the time series into a long-term trend and a cyclic component (quantified by a new metric, the Stacked Cycles Index) to finally aggregate the data. Outliers are flagged with an enhanced Boxplot rule called Logbox. Three different Earth Science datasets (contaminated with gaps and outliers) are successfully cleaned and aggregated with past. This illustrates the robustness of this procedure that can be valuable to any discipline.