A Machine Learning Approach for Data Quality Control of Earth Observation Data Management System

Weiguo Han ◽  
Matthew Jochum
Edlira Skrami ◽  
Flavia Carle ◽  
Simona Villani ◽  
Paola Borrelli ◽  
Antonella Zambon ◽  

The purpose of the study was to map and describe the healthcare utilization databases (HUDs) available in Italy’s 19 regions and two autonomous provinces and develop a tool to navigate through them. A census of the HUDs covering the population of a single region/province and recording local-level data was conducted between January 2014 and October 2016. The characteristics of each HUD regarding the start year, data type and completeness, data management system (DMS), data protection procedures, and data quality control adopted were collected through interviews with the database managers using a standard questionnaire or directly from the website of the regional body managing them. Overall, 352 HUDs met the study criteria. The DMSs, anonymization procedures of personal identification data, and frequency of data quality control were fairly homogeneous within regions, whereas the number of HUDs, data availability, type of identification code, and anonymization procedures were considerably heterogeneous across regions. The study provides an updated inventory of the available regional HUDs in Italy and highlights the need for greater homogeneity across regions to improve comparability of health data from secondary sources. It could represent a reference model for other countries to provide information on the available HUDs and their features, enhancing epidemiological studies across countries.

2020 ◽  
Dario Lucente ◽  
Freddy Bouchet ◽  
Corentin Herbert

<p>There is a growing interest in the climate community to improve the prediction of high impact climate events, for instance ENSO (El-Ni\~no--Southern Oscillation) or extreme events, using a combination of model and observation data. In this talk we present a machine learning approach for predicting the committor function, the relevant concept.<span> </span></p><p>Because the dynamics of the climate system is chaotic, one usually distinguishes between time scales much shorter than a Lyapunov time for which a deterministic weather forecast is relevant, and time scales much longer than a mixing times beyond which any deterministic forecast is irrelevant and only climate averaged or probabilistic quantities can be predicted. However, for most applications, the largest interest is for intermediate time scales for which some information, more precise than the climate averages, might be predicted, but for which a deterministic forecast is not relevant. We call this range of time scales \it{the predictability margin}. We stress in this talk that the prediction problem at the predictability margin is of a probabilistic nature. Indeed, such time scales might typically be of the order of the Lyapunov time scale or larger, where errors on the initial condition and model errors limit our ability to compute deterministically the evolution. In this talk we explain that, in a dynamical context, the relevant quantity for predicting a future event at the predictability margin is a committor function. A committor function is the probability that an event will occur or not in the future, as a function of the current state of the system.<span> </span></p><p>We compute and discuss the committor function from data, either through a direct approach or through a machine learning approach using neural networks. We discuss two examples: a) the computation of the Jin and Timmerman model, a low dimensional model proposed to explain the decadal amplitude changes of El-Ni\~no, b) the computation of committor function for extreme heat waves. We compare several machine learning approaches, using neural network or using kernel-based analogue methods.</p><p>From the point of view of the climate extremes, our main conclusion is that one should generically distinguish between states with either intrinsic predictability or intrinsic unpredictability. This predictability concept is markedly different from the deterministic unpredictability arising because of chaotic dynamics and exponential sensivity to initial conditions.<span> </span></p>

Sign in / Sign up

Export Citation Format

Share Document