scholarly journals Extended isolation forest – application to outlier detection in geomagnetic data

2021 ◽  
Vol 929 (1) ◽  
pp. 012022
Author(s):  
S A Imashev

Abstract The aim of this study is to present a method for detection of outliers in the time series of total intensity of geomagnetic field using Extended Isolation Forest algorithm. The method is consisted of three steps: 1) generation of additional features that take into account the regular daily variation and smooth behaviour of normal data, 2) detection of potential outliers based on ensemble of extended isolating trees and 3) subsequent refinement based on difference between the outlier and its replacement with interpolated value. Application of the method for detection of outliers in yearly time series of the total geomagnetic field at Ak-Suu and Kegety stations showed that the algorithm identifies both global and contextual outliers. Average classification metrics for the method are characterized as high and have the following values: precision 94.3%, recall 93.9% and F-score 94.5%, and probabilities of errors of the first and second kind are comparable to similar algorithms used for detection of outliers in magnetograms of different sampling rate.

Author(s):  
Andrei Vorobev ◽  
Vyacheslav Pilipenko ◽  
Gulnara Vorobeva ◽  
Olga Khristodulo

Introduction: Magnetic stations are one of the main tools for observing the geomagnetic field. However, gaps and anomalies in time series of geomagnetic data, which often exceed 30% of the number of recorded values, negatively affect the effectiveness of the implemented approach and complicate the application of mathematical tools which require that the information signal is continuous. Besides, the missing values ​​add extra uncertainty in computer simulation of dynamic spatial distribution of geomagnetic variations and related parameters. Purpose: To develop a methodology for improving the efficiency of technical means for observing the geomagnetic field. Method: Creation of problem-oriented digital twins of magnetic stations, and their integration into the collection and preprocessing of geomagnetic data, in order to simulate the functioning of their physical prototypes with a certain accuracy. Results: Using Kilpisjärvi magnetic station (Finland) as an example, it is shown that the use of digital twins, whose information environment is made up of geomagnetic data from adjacent stations, can provide the opportunity for reconstruction (retrospective forecast) of geomagnetic variation parameters with a mean square error in the auroral zone of up to 11.5 nT. The integration of problem-oriented digital twins of magnetic stations into the processes of collecting and registering geomagnetic data can provide automatic identification and replacement of missing and abnormal values, increasing, due to the redundancy effect, the fault tolerance of the magnetic station as a data source object. For example, the digital twin of Kilpisjärvi station recovers 99.55% of annual information, and 86.73% of it has an error not exceeding 12 nT. Discussion: Due to the spatial anisotropy of geomagnetic field parameters, the error at the digital twin output will be different in each specific case, depending on the geographic location of the magnetic station, as well as on the number of the surrounding magnetic stations and the distance to them. However, this problem can be minimized by integrating geomagnetic data from satellites into the information environment of the digital twin. Practical relevance: The proposed methodology provides the opportunity for automated diagnostics of time series of geomagnetic data for outliers and anomalies, as well as restoration of missing values and identification of small-scale disturbances.


2020 ◽  
Vol 9 (3) ◽  
pp. 336-345
Author(s):  
Alvi Waldira ◽  
Abdul Hoyyi ◽  
Dwi Ispriyanti

 Transportation has a strategic role, even becoming one of the main needs of the community, especially air transportation services. A large number of passengers in air transportation always experiences a difference every month. One of the differences occurred when approaching Eid al-Fitr, which changes every year based on an Islamic calendar that is different from Masehi calendar. The lunar shift in the occurrence of Eid al-Fitr forms a pattern called calendar variation. The effects of calendar variations can be overcome by using an additional variable, such as a dummy variable, this variable which will be used in the ARIMAX model. Observation of time series is often influenced by several unexpected events such as outliers. This outlier causes the results of data analysis to be less valid. So the researchers added the detection of outliers in this study. Based on the analysis results, the ARIMA calendar variation model is obtained (1.0, [12]), with time variable t, dummy variable , and the addition of one outlier. This model has a MAPE value of 0.07079609 which means this model is very good for forecasting. Forecasting results showed an increase in the number of passengers during the two months before Eid. Keywords: Passenger, calendar variation, outlier detection


2021 ◽  
Vol 54 (3) ◽  
pp. 1-33
Author(s):  
Ane Blázquez-García ◽  
Angel Conde ◽  
Usue Mori ◽  
Jose A. Lozano

Recent advances in technology have brought major breakthroughs in data collection, enabling a large amount of data to be gathered over time and thus generating time series. Mining this data has become an important task for researchers and practitioners in the past few years, including the detection of outliers or anomalies that may represent errors or events of interest. This review aims to provide a structured and comprehensive state-of-the-art on unsupervised outlier detection techniques in the context of time series. To this end, a taxonomy is presented based on the main aspects that characterize an outlier detection technique.


PLoS ONE ◽  
2021 ◽  
Vol 16 (2) ◽  
pp. e0247119
Author(s):  
Gen Li ◽  
Jason J. Jung

Existing dynamic graph embedding-based outlier detection methods mainly focus on the evolution of graphs and ignore the similarities among them. To overcome this limitation for the effective detection of abnormal climatic events from meteorological time series, we proposed a dynamic graph embedding model based on graph proximity, called DynGPE. Climatic events are represented as a graph where each vertex indicates meteorological data and each edge indicates a spurious relationship between two meteorological time series that are not causally related. The graph proximity is described as the distance between two graphs. DynGPE can cluster similar climatic events in the embedding space. Abnormal climatic events are distant from most of the other events and can be detected using outlier detection methods. We conducted experiments by applying three outlier detection methods (i.e., isolation forest, local outlier factor, and box plot) to real meteorological data. The results showed that DynGPE achieves better results than the baseline by 44.3% on average in terms of the F-measure. Isolation forest provides the best performance and stability. It achieved higher results than the local outlier factor and box plot methods, namely, by 15.4% and 78.9% on average, respectively.


2009 ◽  
Vol 27 (6) ◽  
pp. 2483-2490 ◽  
Author(s):  
P. De Michelis ◽  
R. Tozzi ◽  
A. Meloni

Abstract. The target of this work is to investigate the nature of magnetic perturbations produced by ionospheric and magnetospheric currents as recorded at high-latitude geomagnetic stations. In particular, we investigate the effects of these currents on geomagnetic data recorded in Antarctica. To this purpose we apply a mathematical method, known as Natural Orthogonal Composition, to analyze the magnetic field disturbances along the three geomagnetic field components (X, Y and Z) recorded at Mario Zucchelli Station (IAGA code TNB; geographic coordinates: 74.7° S, 164.1° E) from 1995 to 1998. Using this type of analysis, we characterize the dominant modes of the geomagnetic field daily variability through a set of empirical orthogonal functions (EOFs). While such mathematically independent EOFs do not necessarily represent physically independent modes of variability, we find that some of them are actually related to well known current patterns located at high latitudes.


2021 ◽  
Vol 73 (1) ◽  
Author(s):  
Magnus D. Hammer ◽  
Grace A. Cox ◽  
William J. Brown ◽  
Ciarán D. Beggan ◽  
Christopher C. Finlay

AbstractWe present geomagnetic main field and secular variation time series, at 300 equal-area distributed locations and at 490 km altitude, derived from magnetic field measurements collected by the three Swarm satellites. These Geomagnetic Virtual Observatory (GVO) series provide a convenient means to globally monitor and analyze long-term variations of the geomagnetic field from low-Earth orbit. The series are obtained by robust fits of local Cartesian potential field models to along-track and East–West sums and differences of Swarm satellite data collected within a radius of 700 km of the GVO locations during either 1-monthly or 4-monthly time windows. We describe two GVO data products: (1) ‘Observed Field’ GVO time series, where all observed sources contribute to the estimated values, without any data selection or correction, and (2) ‘Core Field’ GVO time series, where additional data selection is carried out, then de-noising schemes and epoch-by-epoch spherical harmonic analysis are applied to reduce contamination by magnetospheric and ionospheric signals. Secular variation series are provided as annual differences of the Core Field GVOs. We present examples of the resulting Swarm GVO series, assessing their quality through comparisons with ground observatories and geomagnetic field models. In benchmark comparisons with six high-quality mid-to-low latitude ground observatories we find the secular variation of the Core Field GVO field intensities, calculated using annual differences, agrees to an rms of 1.8 nT/yr and 1.2 nT/yr for the 1-monthly and 4-monthly versions, respectively. Regular sampling in space and time, and the availability of data error estimates, makes the GVO series well suited for users wishing to perform data assimilation studies of core dynamics, or to study long-period magnetospheric and ionospheric signals and their induced counterparts. The Swarm GVO time series will be regularly updated, approximately every four months, allowing ready access to the latest secular variation data from the Swarm satellites.


2021 ◽  
Vol 723 (4) ◽  
pp. 042070
Author(s):  
I Vorotnikov ◽  
A Rozanov ◽  
M Sidelnikova ◽  
S Tkachev ◽  
L Volochuk

Author(s):  
Saeed Mehrang ◽  
Elina Helander ◽  
Misha Pavel ◽  
Angela Chieh ◽  
Ilkka Korhonen

Sign in / Sign up

Export Citation Format

Share Document