A Safe Zone SMOTE Oversampling Algorithm Used in Earthquake Prediction Based on Extreme Imbalanced Precursor Data

Author(s):  
Dongmei Wang ◽  
Yiwen Liang ◽  
Xinmin Yang ◽  
Hongbin Dong ◽  
Chengyu Tan

Earthquake prediction based on extreme imbalanced precursor data is a challenging task for standard algorithms. Since even if an area is in an earthquake-prone zone, the proportion of days with earthquakes per year is still a minority. The general method is to generate more artificial data for the minority class that is the earthquake occurrence data. But the most popular oversampling methods generate synthetic samples along line segments that join minority class instances, which is not suitable for earthquake precursor data. In this paper, we propose a Safe Zone Synthetic Minority Oversampling Technique (SZ-SMOTE) oversampling method as an enhancement of the SMOTE data generation mechanism. SZ-SMOTE generates synthetic samples with a concentration mechanism in the hyper-sphere area around each selected minority instances. The performance of SZ-SMOTE is compared against no oversampling, SMOTE and its popular modifications adaptive synthetic sampling (ADASYN) and borderline SMOTE (B-SMOTE) on six different classifiers. The experiment results show that the quality of earthquake prediction using SZ-SMOTE as oversampling algorithm significantly outperforms that of using the other oversampling algorithms.

2021 ◽  
Vol 13 (9) ◽  
pp. 1842
Author(s):  
Valeri Gitis ◽  
Alexander Derendyaev ◽  
Konstantin Petrov

The results of earthquake prediction largely depend on the quality of data and the methods of their joint processing. At present, for a number of regions, it is possible, in addition to data from earthquake catalogs, to use space geodesy data obtained with the help of GPS. The purpose of our study is to evaluate the efficiency of using the time series of displacements of the Earth’s surface according to GPS data for the systematic prediction of earthquakes. The criterion of efficiency is the probability of successful prediction of an earthquake with a limited size of the alarm zone. We use a machine learning method, namely the method of the minimum area of alarm, to predict earthquakes with a magnitude greater than 6.0 and a hypocenter depth of up to 60 km, which occurred from 2016 to 2020 in Japan, and earthquakes with a magnitude greater than 5.5. and a hypocenter depth of up to 60 km, which happened from 2013 to 2020 in California. For each region, we compare the following results: random forecast of earthquakes, forecast obtained with the field of spatial density of earthquake epicenters, forecast obtained with spatio-temporal fields based on GPS data, based on seismological data, and based on combined GPS data and seismological data. The results confirm the effectiveness of using GPS data for the systematic prediction of earthquakes.


2013 ◽  
Vol 13 (10) ◽  
pp. 2605-2618 ◽  
Author(s):  
Q. Li ◽  
G.-M. Xu

Abstract. We found the possible correlation between the precursory pattern of tidal triggering of earthquakes and the crustal heterogeneities, which is of particular importance to the researchers in earthquake prediction and earthquake hazard prevention. We investigated the connection between the tidal variations and earthquake occurrence in the Liyang, Wunansha, Cangshan, Wenan, Luquan and Yaoan regions of China. Most of the regions show a higher correlation with tidal triggering in several years preceding the large or destructive earthquakes compared to other times, indicating that the tidal triggering may inherently relate to the nucleation of the destructive earthquakes during this time. In addition, the analysis results indicate that the Liyang, Cangshan and Luquan regions, with stronger heterogeneity, show statistically significant effects of tidal triggering preceding the large or destructive earthquakes, while the Wunansha, Wenan and Yaoan regions, with relatively weak heterogeneity, show statistically insignificant effects of it, signifying that the precursory pattern of tidal triggering of earthquakes in these six regions is possibly related to the heterogeneities of the crustal rocks. The above results suggest that when people try to find the potential earthquake hazardous areas or make middle–long-term earthquake forecasting by means of precursory pattern of the tidal triggering, the crustal heterogeneity in these areas has to be taken into consideration for the purpose of increasing the prediction efficiency. If they do not consider the influence of crustal heterogeneity on the tidal triggering of earthquakes, the prediction efficiency might greatly decrease.


2018 ◽  
Vol 18 (03) ◽  
pp. e23 ◽  
Author(s):  
María José Basgall ◽  
Waldo Hasperué ◽  
Marcelo Naiouf ◽  
Alberto Fernández ◽  
Francisco Herrera

The volume of data in today's applications has meant a change in the way Machine Learning issues are addressed. Indeed, the Big Data scenario involves scalability constraints that can only be achieved through intelligent model design and the use of distributed technologies. In this context, solutions based on the Spark platform have established themselves as a de facto standard. In this contribution, we focus on a very important framework within Big Data Analytics, namely classification with imbalanced datasets. The main characteristic of this problem is that one of the classes is underrepresented, and therefore it is usually more complex to find a model that identifies it correctly. For this reason, it is common to apply preprocessing techniques such as oversampling to balance the distribution of examples in classes. In this work we present SMOTE-BD, a fully scalable preprocessing approach for imbalanced classification in Big Data. It is based on one of the most widespread preprocessing solutions for imbalanced classification, namely the SMOTE algorithm, which creates new synthetic instances according to the neighborhood of each example of the minority class. Our novel development is made to be independent of the number of partitions or processes created to achieve a higher degree of efficiency. Experiments conducted on different standard and Big Data datasets show the quality of the proposed design and implementation.


The paper describes an investigation carried out at the National Physical Laboratory to determine the colorimetric properties of a group of seven subjects as obtained from direct measurements of the trichromatic coefficients of the spectrum on a trichromatic colorimeter. The “spectral distribution curves of the primaries,” by means of which the colorimetric quality of a heterochromatic stimulus may be computed from its energy distribution curve, are obtained by combining the experimentally determined trichromatic coefficients with the International Standard visibility curve. This procedure is a simplification, applicable to the mean results of a normal group, of a general method by which the chromatic and luminosity functions of any subject or group of subjects can be determined from one set of observations. The general method is described in an Appendix.


Author(s):  
CHENGYING MAO ◽  
XINXIN YU

The quality of test data has an important impact on the effect of software testing, so test data generation has always been a key task for finding the potential faults in program code. In structural testing, the primary goal is to cover some kinds of structure elements with some specific inputs. Search-based test data generation provides a rational way to handle this difficult problem. In the past, some well-known meta-heuristic search algorithms have been successfully utilized to solve this issue. In this paper, we introduce a variant of genetic algorithm (GA), called quantum-inspired genetic algorithm (QIGA), to generate the test data with stronger coverage ability. In this new algorithm, the traditional binary bit is replaced by a quantum bit (Q-bit) to enlarge the search space so as to avoid falling into local optimal solution. On the other hand, some other strategies such as quantum rotation gate and catastrophe operation are also used to improve algorithm efficiency and quality of test data. In addition, experimental analysis on eight real-world programs is performed to validate the effectiveness of our method. The results show that QIGA-based method can generate test data with higher coverage in much smaller convergence generations than GA-based method. More importantly, our proposed method is more robust for algorithm parameter change.


2012 ◽  
Vol 09 (03) ◽  
pp. 1250024
Author(s):  
KARTLOS J. KACHIASHVILI ◽  
MUNTAZIM A. HASHMI ◽  
ABDUL MUEED

In the work the problem of sustainable development of production, i.e., an optimum choice of parameter values of technological process with the purpose of minimization of risk of obtaining production of not planed quality also incorrect making decision about quality of production and maximization of profit of production at the guaranteed social and economic effects is formalized. Different statements of the problem depending on the put ultimate purpose are considered. The general method of solution of the put task using Bayesian approach of testing many hypotheses is offered.


Author(s):  
Alain Combescure ◽  
Najib Mahjoubi ◽  
Anthony Gravouil ◽  
Nicolas Greffet

This paper is devoted to a brief presentation of recent research results upon structural mechanics code coupling in transient analysis. The domain is supposed to be decomposed into a series of sub domains which are treated independently with their own time integration scheme and or their own code. The paper gives a general method which allows to couple these subdomains. The proposed method is rather general and based upon a weak vision of dynamic equilibrium equation. This new vision allows to design a coupling strategy which ensure by design that no energy is introduced or dissipated in the interfaces between the sub domains. The proposed coupling method hence does not perturb the quality of the time integrators of each sub domain. This also allows to develop a general code coupler for transient dynamics. Two examples are given to illustrate the paper.


Sign in / Sign up

Export Citation Format

Share Document