A Safe Zone SMOTE Oversampling Algorithm Used in Earthquake Prediction Based on Extreme Imbalanced Precursor Data

Earthquake prediction based on extreme imbalanced precursor data is a challenging task for standard algorithms. Since even if an area is in an earthquake-prone zone, the proportion of days with earthquakes per year is still a minority. The general method is to generate more artificial data for the minority class that is the earthquake occurrence data. But the most popular oversampling methods generate synthetic samples along line segments that join minority class instances, which is not suitable for earthquake precursor data. In this paper, we propose a Safe Zone Synthetic Minority Oversampling Technique (SZ-SMOTE) oversampling method as an enhancement of the SMOTE data generation mechanism. SZ-SMOTE generates synthetic samples with a concentration mechanism in the hyper-sphere area around each selected minority instances. The performance of SZ-SMOTE is compared against no oversampling, SMOTE and its popular modifications adaptive synthetic sampling (ADASYN) and borderline SMOTE (B-SMOTE) on six different classifiers. The experiment results show that the quality of earthquake prediction using SZ-SMOTE as oversampling algorithm significantly outperforms that of using the other oversampling algorithms.

Download Full-text

Analyzing the Performance of GPS Data for Earthquake Prediction

Remote Sensing ◽

10.3390/rs13091842 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1842

Author(s):

Valeri Gitis ◽

Alexander Derendyaev ◽

Konstantin Petrov

Keyword(s):

Earthquake Prediction ◽

Spatial Density ◽

Gps Data ◽

Quality Of Data ◽

Minimum Area ◽

Seismological Data ◽

Successful Prediction ◽

Spatio Temporal ◽

Hypocenter Depth

The results of earthquake prediction largely depend on the quality of data and the methods of their joint processing. At present, for a number of regions, it is possible, in addition to data from earthquake catalogs, to use space geodesy data obtained with the help of GPS. The purpose of our study is to evaluate the efficiency of using the time series of displacements of the Earth’s surface according to GPS data for the systematic prediction of earthquakes. The criterion of efficiency is the probability of successful prediction of an earthquake with a limited size of the alarm zone. We use a machine learning method, namely the method of the minimum area of alarm, to predict earthquakes with a magnitude greater than 6.0 and a hypocenter depth of up to 60 km, which occurred from 2016 to 2020 in Japan, and earthquakes with a magnitude greater than 5.5. and a hypocenter depth of up to 60 km, which happened from 2013 to 2020 in California. For each region, we compare the following results: random forecast of earthquakes, forecast obtained with the field of spatial density of earthquake epicenters, forecast obtained with spatio-temporal fields based on GPS data, based on seismological data, and based on combined GPS data and seismological data. The results confirm the effectiveness of using GPS data for the systematic prediction of earthquakes.

Download Full-text

Improving Imbalanced Multidimensional Dataset Learner Performance with Artificial Data Generation: Density-Based Class-Boost Algorithm

Advances in Data Mining. Medical Applications, E-Commerce, Marketing, and Theoretical Aspects - Lecture Notes in Computer Science ◽

10.1007/978-3-540-70720-2_13 ◽

2008 ◽

pp. 165-176 ◽

Cited By ~ 2

Author(s):

Ladan Malazizi ◽

Daniel Neagu ◽

Qasim Chaudhry

Keyword(s):

Data Generation ◽

Artificial Data ◽

Learner Performance

Download Full-text

Precursory pattern of tidal triggering of earthquakes in six regions of China: the possible relation to the crustal heterogeneity

Natural Hazards and Earth System Science ◽

10.5194/nhess-13-2605-2013 ◽

2013 ◽

Vol 13 (10) ◽

pp. 2605-2618 ◽

Cited By ~ 2

Author(s):

Q. Li ◽

G.-M. Xu

Keyword(s):

Earthquake Prediction ◽

Earthquake Hazard ◽

Earthquake Forecasting ◽

Earthquake Occurrence ◽

Crustal Rocks ◽

Tidal Triggering ◽

Crustal Heterogeneity ◽

Tidal Variations ◽

Prediction Efficiency

Abstract. We found the possible correlation between the precursory pattern of tidal triggering of earthquakes and the crustal heterogeneities, which is of particular importance to the researchers in earthquake prediction and earthquake hazard prevention. We investigated the connection between the tidal variations and earthquake occurrence in the Liyang, Wunansha, Cangshan, Wenan, Luquan and Yaoan regions of China. Most of the regions show a higher correlation with tidal triggering in several years preceding the large or destructive earthquakes compared to other times, indicating that the tidal triggering may inherently relate to the nucleation of the destructive earthquakes during this time. In addition, the analysis results indicate that the Liyang, Cangshan and Luquan regions, with stronger heterogeneity, show statistically significant effects of tidal triggering preceding the large or destructive earthquakes, while the Wunansha, Wenan and Yaoan regions, with relatively weak heterogeneity, show statistically insignificant effects of it, signifying that the precursory pattern of tidal triggering of earthquakes in these six regions is possibly related to the heterogeneities of the crustal rocks. The above results suggest that when people try to find the potential earthquake hazardous areas or make middle–long-term earthquake forecasting by means of precursory pattern of the tidal triggering, the crustal heterogeneity in these areas has to be taken into consideration for the purpose of increasing the prediction efficiency. If they do not consider the influence of crustal heterogeneity on the tidal triggering of earthquakes, the prediction efficiency might greatly decrease.

Download Full-text

SMOTE-BD: An Exact and Scalable Oversampling Method for Imbalanced Classification in Big Data

Journal of Computer Science and Technology ◽

10.24215/16666038.18.e23 ◽

2018 ◽

Vol 18 (03) ◽

pp. e23 ◽

Cited By ~ 7

Author(s):

María José Basgall ◽

Waldo Hasperué ◽

Marcelo Naiouf ◽

Alberto Fernández ◽

Francisco Herrera

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Model Design ◽

Minority Class ◽

Imbalanced Classification ◽

Design And Implementation ◽

Learning Issues ◽

Intelligent Model

The volume of data in today's applications has meant a change in the way Machine Learning issues are addressed. Indeed, the Big Data scenario involves scalability constraints that can only be achieved through intelligent model design and the use of distributed technologies. In this context, solutions based on the Spark platform have established themselves as a de facto standard. In this contribution, we focus on a very important framework within Big Data Analytics, namely classification with imbalanced datasets. The main characteristic of this problem is that one of the classes is underrepresented, and therefore it is usually more complex to find a model that identifies it correctly. For this reason, it is common to apply preprocessing techniques such as oversampling to balance the distribution of examples in classes. In this work we present SMOTE-BD, a fully scalable preprocessing approach for imbalanced classification in Big Data. It is based on one of the most widespread preprocessing solutions for imbalanced classification, namely the SMOTE algorithm, which creates new synthetic instances according to the neighborhood of each example of the minority class. Our novel development is made to be independent of the number of partitions or processes created to achieve a higher degree of efficiency. Experiments conducted on different standard and Big Data datasets show the quality of the proposed design and implementation.

Download Full-text

The colorimetric properties of the spectrum

Proceedings of the Royal Society of London. Series B, Containing Papers of a Biological Character ◽

10.1098/rspb.1931.0091 ◽

1931 ◽

Vol 108 (759) ◽

pp. 576-576 ◽

Cited By ~ 4

Keyword(s):

Distribution Curve ◽

National Physical Laboratory ◽

Group Of Seven ◽

Physical Laboratory ◽

Luminosity Functions ◽

Direct Measurements ◽

Energy Distribution Curve ◽

The Mean ◽

General Method

The paper describes an investigation carried out at the National Physical Laboratory to determine the colorimetric properties of a group of seven subjects as obtained from direct measurements of the trichromatic coefficients of the spectrum on a trichromatic colorimeter. The “spectral distribution curves of the primaries,” by means of which the colorimetric quality of a heterochromatic stimulus may be computed from its energy distribution curve, are obtained by combining the experimentally determined trichromatic coefficients with the International Standard visibility curve. This procedure is a simplification, applicable to the mean results of a normal group, of a general method by which the chromatic and luminosity functions of any subject or group of subjects can be determined from one set of observations. The general method is described in an Appendix.

Download Full-text

TEST DATA GENERATION FOR SOFTWARE TESTING BASED ON QUANTUM-INSPIRED GENETIC ALGORITHM

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026813500041 ◽

2013 ◽

Vol 12 (01) ◽

pp. 1350004 ◽

Cited By ~ 2

Author(s):

CHENGYING MAO ◽

XINXIN YU

Keyword(s):

Genetic Algorithm ◽

Software Testing ◽

Test Data ◽

Optimal Solution ◽

Search Space ◽

Structural Testing ◽

Test Data Generation ◽

Data Generation ◽

Quantum Bit

The quality of test data has an important impact on the effect of software testing, so test data generation has always been a key task for finding the potential faults in program code. In structural testing, the primary goal is to cover some kinds of structure elements with some specific inputs. Search-based test data generation provides a rational way to handle this difficult problem. In the past, some well-known meta-heuristic search algorithms have been successfully utilized to solve this issue. In this paper, we introduce a variant of genetic algorithm (GA), called quantum-inspired genetic algorithm (QIGA), to generate the test data with stronger coverage ability. In this new algorithm, the traditional binary bit is replaced by a quantum bit (Q-bit) to enlarge the search space so as to avoid falling into local optimal solution. On the other hand, some other strategies such as quantum rotation gate and catastrophe operation are also used to improve algorithm efficiency and quality of test data. In addition, experimental analysis on eight real-world programs is performed to validate the effectiveness of our method. The results show that QIGA-based method can generate test data with higher coverage in much smaller convergence generations than GA-based method. More importantly, our proposed method is more robust for algorithm parameter change.

Download Full-text

THE STATISTICAL RISK ANALYSIS AS THE BASIS OF THE SUSTAINABLE DEVELOPMENT

International Journal of Innovation and Technology Management ◽

10.1142/s0219877012500241 ◽

2012 ◽

Vol 09 (03) ◽

pp. 1250024

Author(s):

KARTLOS J. KACHIASHVILI ◽

MUNTAZIM A. HASHMI ◽

ABDUL MUEED

Keyword(s):

Sustainable Development ◽

Bayesian Approach ◽

Economic Effects ◽

The Sustainable Development ◽

Method Of Solution ◽

Parameter Values ◽

Ultimate Purpose ◽

Statistical Risk ◽

General Method

In the work the problem of sustainable development of production, i.e., an optimum choice of parameter values of technological process with the purpose of minimization of risk of obtaining production of not planed quality also incorrect making decision about quality of production and maximization of profit of production at the guaranteed social and economic effects is formalized. Different statements of the problem depending on the put ultimate purpose are considered. The general method of solution of the put task using Bayesian approach of testing many hypotheses is offered.

Download Full-text

Synchronized earthquake occurrence in the Hellenic Arc and implications for earthquake prediction in the Dodecanese Islands (Greece)

Tectonophysics ◽

10.1016/0040-1951(88)90207-7 ◽

1988 ◽

Vol 145 (3-4) ◽

pp. 343-347 ◽

Cited By ~ 4

Author(s):

Gerassimos A. Papadopoulos

Keyword(s):

Earthquake Prediction ◽

Earthquake Occurrence ◽

Hellenic Arc

Download Full-text

How to Measure Quality of Service Using Unstructured Data Analysis: A General Method Design

Journal of Systems Integration ◽

10.20470/jsi.v6i4.242 ◽

2015 ◽

pp. 3-16 ◽

Cited By ~ 5

Author(s):

Lucie Sperkova ◽

Filip Vencovsky ◽

Tomas Bruckner

Keyword(s):

Quality Of Service ◽

Data Analysis ◽

Unstructured Data ◽

General Method

Download Full-text

A time variational method to couple heterogeneous time integrators

European Journal of Computational Mechanics ◽

10.13052/ejcm.19.11-24 ◽

2010 ◽

pp. 11-24

Author(s):

Alain Combescure ◽

Najib Mahjoubi ◽

Anthony Gravouil ◽

Nicolas Greffet

Keyword(s):

Transient Analysis ◽

Dynamic Equilibrium ◽

Time Integration ◽

Integration Scheme ◽

Time Integrators ◽

Coupling Strategy ◽

Time Integration Scheme ◽

New Vision ◽

General Method

This paper is devoted to a brief presentation of recent research results upon structural mechanics code coupling in transient analysis. The domain is supposed to be decomposed into a series of sub domains which are treated independently with their own time integration scheme and or their own code. The paper gives a general method which allows to couple these subdomains. The proposed method is rather general and based upon a weak vision of dynamic equilibrium equation. This new vision allows to design a coupling strategy which ensure by design that no energy is introduced or dissipated in the interfaces between the sub domains. The proposed coupling method hence does not perturb the quality of the time integrators of each sub domain. This also allows to develop a general code coupler for transient dynamics. Two examples are given to illustrate the paper.

Download Full-text