scholarly journals Real-time data-driven missing data imputation for short-term sensor data of marine systems. A comparative study

2020 ◽  
Vol 218 ◽  
pp. 108261
Author(s):  
Christian Velasco-Gallego ◽  
Iraklis Lazakis
Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1782
Author(s):  
Yulong Deng ◽  
Chong Han ◽  
Jian Guo ◽  
Lijuan Sun

Data missing is a common problem in wireless sensor networks. Currently, to ensure the performance of data processing, making imputation for the missing data is the most common method before getting into sensor data analysis. In this paper, the temporal and spatial nearest neighbor values-based missing data imputation (TSNN), a new imputation based on the temporal and spatial nearest neighbor values has been presented. First, four nearest neighbor values have been defined from the perspective of space and time dimensions as well as the geometrical and data distances, which are the bases of the algorithm that help to exploit the correlations among sensor data on the nodes with the regression tool. Next, the algorithm has been elaborated as well as two parameters, the best number of neighbors and spatial–temporal coefficient. Finally, the algorithm has been tested on an indoor and an outdoor wireless sensor network, and the result shows that TSNN is able to improve the accuracy of imputation and increase the number of cases that can be imputed effectively.


2019 ◽  
Vol 28 (1) ◽  
pp. 58-70 ◽  
Author(s):  
Concepción Crespo-Turrado ◽  
José Luis Casteleiro-Roca ◽  
Fernando Sánchez-Lasheras ◽  
José Antonio López-Vázquez ◽  
Francisco Javier De Cos Juez ◽  
...  

Abstract Student performance and its evaluation remain a serious challenge for education systems. Frequently, the recording and processing of students’ scores in a specific curriculum have several flaws for various reasons. In this context, the absence of data from some of the student scores undermines the efficiency of any future analysis carried out in order to reach conclusions. When this is the case, missing data imputation algorithms are needed. These algorithms are capable of substituting, with a high level of accuracy, the missing data for predicted values. This research presents the hybridization of an algorithm previously proposed by the authors called adaptive assignation algorithm (AAA), with a well-known technique called multivariate imputation by chained equations (MICE). The results show how the suggested methodology outperforms both algorithms.


Proceedings ◽  
2018 ◽  
Vol 2 (11) ◽  
pp. 698 ◽  
Author(s):  
Klemen Kenda ◽  
Filip Koprivec ◽  
Dunja Mladenić

In this study an algorithm for missing data imputation is presented. The algorithm uses measurements from neighboring sensors to estimate the missing values. Data-driven approach is used and methodology chooses the optimal available combination of modeling algorithm and available measurements to produce an estimate from the model with lowest error. The methodology was tested on Ljubljana polje aquifer data and has produced close to perfect results.


2020 ◽  
Author(s):  
Fabio Oriani ◽  
Simon Stisen ◽  
Mehmet C. Demirel ◽  
Gregoire Mariethoz

<p>In the era of big data, missing data imputation remains a delicate topic for both the analysis of natural processes and to provide input data for physical models. We propose here a comparative study for missing data imputation on daily rainfall, a variable that can exhibit a complex structure composed of a dry/wet pattern and anisotropic sharp variations.</p><p>The seven algorithms considered can be grouped in two families: geostatistical interpolation techniques based on inverse-distance weighting and Kriging, widely used in gap-filling [1], and data-driven techniques based on the analysis of historical data patterns. This latter family of algorithms has been already applied to rainfall generation [2, 3], but it is not originally suitable to historical datasets presenting many data gaps. This happens because they usually operate in a rigid framework where, when a rainfall value is estimated for a station, the others are considered as predictor variables and require to be informed. To overcome this limitation, we propose here i) an adaptation of k-nearest neighbor (KNN) and ii) a new algorithm called Vector Sampling (VS), that combines concepts of multiple-point statistics and resampling. These data-driven algorithms can draw estimations from largely and variably incomplete data patterns, allowing the target dataset to be at the same time the training dataset.</p><p>Tested on different case studies from Denmark, Australia, and Switzerland, the algorithms show a different performance that seems to be related to the terrain type: on flat terrains with spatially uniform rain events, geostatistical interpolation tends to minimize the error, while, in mountainous regions with non-stationary rainfall statistics, data mining can recover better the complex rainfall patterns. The VS algorithm, being faster than KNN and requiring minimal parametrization, turns out to be a convenient option for routine application if a representative historical dataset is available. VS is open-source and freely available at .</p><p> </p><p>REFERENCES:</p><p></p><p><span>org/</span></p><p><span>org/</span></p>


2019 ◽  
Vol 50 (3) ◽  
pp. 860-877 ◽  
Author(s):  
Jie Lin ◽  
NianHua Li ◽  
Md Ashraful Alam ◽  
Yuqing Ma

Abstract Due to cluster instability, not in the cluster monitoring system. This paper focuses on the missing data imputation processing for the cluster monitoring application and proposes a new hybrid multiple imputation framework. This new imputation approach is different from the conventional multiple imputation technologies in the fact that it attempts to impute the missing data for an arbitrary missing pattern with a model-based and data-driven combination architecture. Essentially, the deep neural network, as the data model, extracts deep features from the data and deep features are further calculated then by a regression or data-driven strategies and used to create the estimation of missing data with the arbitrary missing pattern. This paper gives evidence that if we can train a deep neural network to construct the deep features of the data, imputation based on deep features is better than that directly on the original data. In the experiments, we compare the proposed method with other conventional multiple imputation approaches for varying missing data patterns, missing ratios, and different datasets including real cluster data. The result illustrates that when data encounters larger missing ratio and various missing patterns, the proposed algorithm has the ability to achieve more accurate and stable imputation performance.


Sign in / Sign up

Export Citation Format

Share Document