Simple statistics for complex Earthquakes’ time distribution

Abstract. Here we investigated a statistical feature of earthquakes time distribution in southern Californian earthquake catalogue. As a main data analysis tool, we used simple statistical approach based on the calculation of integral deviation times (IDT) from the time distribution of regular markers. The research objective is to define whether the process of earthquakes time distribution approaches to randomness. Effectiveness of the IDT calculation method was tested on the set of simulated color noise data sets with the different extent of regularity. Standard methods of complex data analysis have also been used, such as power spectrum regression, Lempel and Ziv complexity and recurrence quantification analysis as well as multi-scale entropy calculation. After testing the IDT calculation method for simulated model data sets, we have analyzed the variation of the extent of regularity in southern Californian earthquake catalogue. Analysis was carried out for different time periods and at different magnitude thresholds. It was found that the extent of the order in earthquakes time distribution is fluctuating over the catalogue. Particularly, we show that the process of earthquakes’ time distribution becomes most random-like in periods of decreased local seismic activity.

Download Full-text

Simple statistics for complex Earthquake time distributions

Nonlinear Processes in Geophysics ◽

10.5194/npg-25-497-2018 ◽

2018 ◽

Vol 25 (3) ◽

pp. 497-510 ◽

Cited By ~ 4

Author(s):

Teimuraz Matcharashvili ◽

Takahiro Hatano ◽

Tamaz Chelidze ◽

Natalia Zhukova

Keyword(s):

Data Analysis ◽

Calculation Method ◽

Time Distribution ◽

Southern California ◽

Multiscale Entropy ◽

Data Sets ◽

Analysis Tool ◽

Earthquake Catalog ◽

Complex Data ◽

Process Data

Abstract. Here we investigated a statistical feature of earthquake time distributions in the southern California earthquake catalog. As a main data analysis tool, we used a simple statistical approach based on the calculation of integral deviation times (IDT) from the time distribution of regular markers. The research objective is to define whether and when the process of earthquake time distribution approaches to randomness. Effectiveness of the IDT calculation method was tested on the set of simulated color noise data sets with the different extent of regularity, as well as for Poisson process data sets. Standard methods of complex data analysis have also been used, such as power spectrum regression, Lempel and Ziv complexity, and recurrence quantification analysis, as well as multiscale entropy calculations. After testing the IDT calculation method for simulated model data sets, we have analyzed the variation in the extent of regularity in the southern California earthquake catalog. Analysis was carried out for different periods and at different magnitude thresholds. It was found that the extent of the order in earthquake time distributions is fluctuating over the catalog. Particularly, we show that in most cases, the process of earthquake time distributions is less random in periods of strong earthquake occurrence compared to periods with relatively decreased local seismic activity. Also, we noticed that the strongest earthquakes occur in periods when IDT values increase.

Download Full-text

Anomaly Detection for Inferring Social Structure

Social Computing ◽

10.4018/978-1-60566-984-7.ch118 ◽

2010 ◽

pp. 1797-1803

Author(s):

Lisa Friedland

Keyword(s):

Data Analysis ◽

Anomaly Detection ◽

Social Structure ◽

Small Groups ◽

Analysis Data ◽

Data Sets ◽

Complex Data ◽

Detection Approach ◽

Complex Data Sets ◽

Data Points

In traditional data analysis, data points lie in a Cartesian space, and an analyst asks certain questions: (1) What distribution can I fit to the data? (2) Which points are outliers? (3) Are there distinct clusters or substructure? Today, data mining treats richer and richer types of data. Social networks encode information about people and their communities; relational data sets incorporate multiple types of entities and links; and temporal information describes the dynamics of these systems. With such semantically complex data sets, a greater variety of patterns can be described and views constructed of the data. This article describes a specific social structure that may be present in such data sources and presents a framework for detecting it. The goal is to identify tribes, or small groups of individuals that intentionally coordinate their behavior—individuals with enough in common that they are unlikely to be acting independently. While this task can only be conceived of in a domain of interacting entities, the solution techniques return to the traditional data analysis questions. In order to find hidden structure (3), we use an anomaly detection approach: develop a model to describe the data (1), then identify outliers (2).

Download Full-text

Development of a Powerful Data-Analysis Tool Using Nonparametric Smoothing Models To Identify Drillsites in Tight Shale Reservoirs With High Economic Potential

SPE Journal ◽

10.2118/189440-pa ◽

2017 ◽

Vol 23 (03) ◽

pp. 719-736 ◽

Cited By ~ 2

Author(s):

Quan Cai ◽

Wei Yu ◽

Hwa Chi Liang ◽

Jenn-Tai Liang ◽

Suojin Wang ◽

...

Keyword(s):

Big Data ◽

Data Analysis ◽

Predictive Power ◽

Oil And Gas ◽

Predictor Variables ◽

Data Sets ◽

Analysis Tool ◽

Data Set ◽

Data Analysis Tool ◽

Shale Reservoirs

Summary The oil-and-gas industry is entering an era of “big data” because of the huge number of wells drilled with the rapid development of unconventional oil-and-gas reservoirs during the past decade. The massive amount of data generated presents a great opportunity for the industry to use data-analysis tools to help make informed decisions. The main challenge is the lack of the application of effective and efficient data-analysis tools to analyze and extract useful information for the decision-making process from the enormous amount of data available. In developing tight shale reservoirs, it is critical to have an optimal drilling strategy, thereby minimizing the risk of drilling in areas that would result in low-yield wells. The objective of this study is to develop an effective data-analysis tool capable of dealing with big and complicated data sets to identify hot zones in tight shale reservoirs with the potential to yield highly productive wells. The proposed tool is developed on the basis of nonparametric smoothing models, which are superior to the traditional multiple-linear-regression (MLR) models in both the predictive power and the ability to deal with nonlinear, higher-order variable interactions. This data-analysis tool is capable of handling one response variable and multiple predictor variables. To validate our tool, we used two real data sets—one with 249 tight oil horizontal wells from the Middle Bakken and the other with 2,064 shale gas horizontal wells from the Marcellus Shale. Results from the two case studies revealed that our tool not only can achieve much better predictive power than the traditional MLR models on identifying hot zones in the tight shale reservoirs but also can provide guidance on developing the optimal drilling and completion strategies (e.g., well length and depth, amount of proppant and water injected). By comparing results from the two data sets, we found that our tool can achieve model performance with the big data set (2,064 Marcellus wells) with only four predictor variables that is similar to that with the small data set (249 Bakken wells) with six predictor variables. This implies that, for big data sets, even with a limited number of available predictor variables, our tool can still be very effective in identifying hot zones that would yield highly productive wells. The data sets that we have access to in this study contain very limited completion, geological, and petrophysical information. Results from this study clearly demonstrated that the data-analysis tool is certainly powerful and flexible enough to take advantage of any additional engineering and geology data to allow the operators to gain insights on the impact of these factors on well performance.

Download Full-text

LIVE: A Work-Centered Approach to Support Visual Analytics of Multi-Dimensional Engineering Design Data With Interactive Visualization and Data-Mining

Volume 5: 37th Design Automation Conference, Parts A and B ◽

10.1115/detc2011-48333 ◽

2011 ◽

Cited By ~ 1

Author(s):

Xin Yan ◽

Mu Qiao ◽

Timothy W. Simpson ◽

Jia Li ◽

Xiaolong Luke Zhang

Keyword(s):

Data Analysis ◽

Engineering Design ◽

Visual Analytics ◽

Interactive Visualization ◽

User Interaction ◽

Data Sets ◽

Preliminary Evaluation ◽

Complex Data ◽

Design Data ◽

Depth Analysis

During the process of trade space exploration, information overload has become a notable problem. To find the best design, designers need more efficient tools to analyze the data, explore possible hidden patterns, and identify preferable solutions. When dealing with large-scale, multi-dimensional, continuous data sets (e.g., design alternatives and potential solutions), designers can be easily overwhelmed by the volume and complexity of the data. Traditional information visualization tools have some limits to support the analysis and knowledge exploration of such data, largely because they usually emphasize the visual presentation of and user interaction with data sets, and lack the capacity to identify hidden data patterns that are critical to in-depth analysis. There is a need for the integration of user-centered visualization designs and data-oriented data analysis algorithms in support of complex data analysis. In this paper, we present a work-centered approach to support visual analytics of multi-dimensional engineering design data by combining visualization, user interaction, and computational algorithms. We describe a system, Learning-based Interactive Visualization for Engineering design (LIVE), that allows designer to interactively examine large design input data and performance output data analysis simultaneously through visualization. We expect that our approach can help designers analyze complex design data more efficiently and effectively. We report our preliminary evaluation on the use of our system in analyzing a design problem related to aircraft wing sizing.

Download Full-text

Using visual analytics to make sense of railway Close Calls

Proceedings of the Institution of Mechanical Engineers Part F Journal of Rail and Rapid Transit ◽

10.1177/0954409716676221 ◽

2016 ◽

Vol 231 (10) ◽

pp. 1107-1114

Author(s):

Miguel Figueres-Esteban ◽

Peter Hughes ◽

Coen van Gulijk

Keyword(s):

Big Data ◽

Data Analysis ◽

Visual Analytics ◽

Near Miss ◽

Test Case ◽

Data Sets ◽

Complex Data ◽

Network Text Analysis ◽

Complex Data Sets ◽

Close Call

In the big data era, large and complex data sets will exceed scientists’ capacity to make sense of them in the traditional way. New approaches in data analysis, supported by computer science, will be necessary to address the problems that emerge with the rise of big data. The analysis of the Close Call database, which is a text-based database for near-miss reporting on the GB railways, provides a test case. The traditional analysis of Close Calls is time consuming and prone to differences in interpretation. This paper investigates the use of visual analytics techniques, based on network text analysis, to conduct data analysis and extract safety knowledge from 500 randomly selected Close Call records relating to worker slips, trips and falls. The results demonstrate a straightforward, yet effective, way to identify hazardous conditions without having to read each report individually. This opens up new ways to perform data analysis in safety science.

Download Full-text

An Improved MOEA/D Algorithm for Complex Data Analysis

Wireless Communications and Mobile Computing ◽

10.1155/2021/6393638 ◽

2021 ◽

Vol 2021 ◽

pp. 1-20

Author(s):

Weihua Qian ◽

Jiahui Liu ◽

Yuanguo Lin ◽

Lvqing Yang ◽

Jianwei Zhang ◽

...

Keyword(s):

Data Analysis ◽

Population Distribution ◽

Global Search ◽

Data Sets ◽

Complex Data ◽

Multiple Level ◽

Exploration And Exploitation ◽

Objective Space ◽

Artificial Intelligence Technology ◽

Offspring Generation

There are a large number of multiple level datasets in the Industry 4.0 era. Thus, it is necessary to utilize artificial intelligence technology for the complex data analysis. In fact, the technology often suffers from the self-optimization issue of multiple level datasets, which is taken as a kind of multiobjective optimization problem (MOP). Naturally, the MOP can be solved by the multiobjective evolutionary algorithm based on decomposition (MOEA/D). However, most existing MOEA/D algorithms usually fail to adapt neighborhood for the offspring generation, since these algorithms have shortcomings in both global search and adaptive control. To address this issue, we propose a MOEA/D with adaptive exploration and exploitation, termed MOEA/D-AEE, which adopts random numbers with a uniform distribution to explore the objective space and introduces a joint exploitation coefficient between parents to generate better offspring. By dynamic exploration and joint exploitation, MOEA/D-AEE improves both global search ability and diversity of the algorithm. Experimental results on benchmark data sets demonstrate that our proposed approach achieves global search ability and diversity in terms of the population distribution than state-of-the-art MOEA/D algorithms.

Download Full-text

Comparison of Multivariate Data Analysis Strategies for High-Content Screening

CrossRef Listing of Deleted DOIs ◽

10.1177/1087057110395390 ◽

2011 ◽

Vol 16 (3) ◽

pp. 338-347 ◽

Cited By ~ 19

Author(s):

Anne Kümmel ◽

Paul Selzer ◽

Martin Beibel ◽

Hanspeter Gubler ◽

Christian N. Parker ◽

...

Keyword(s):

Data Analysis ◽

High Content Screening ◽

Data Sets ◽

Complex Data ◽

Data Set ◽

Processing Strategies ◽

Analysis Strategies ◽

Complex Data Sets ◽

High Degree ◽

Cell Data

High-content screening (HCS) is increasingly used in biomedical research generating multivariate, single-cell data sets. Before scoring a treatment, the complex data sets are processed (e.g., normalized, reduced to a lower dimensionality) to help extract valuable information. However, there has been no published comparison of the performance of these methods. This study comparatively evaluates unbiased approaches to reduce dimensionality as well as to summarize cell populations. To evaluate these different data-processing strategies, the prediction accuracies and the Z′ factors of control compounds of a HCS cell cycle data set were monitored. As expected, dimension reduction led to a lower degree of discrimination between control samples. A high degree of classification accuracy was achieved when the cell population was summarized on well level using percentile values. As a conclusion, the generic data analysis pipeline described here enables a systematic review of alternative strategies to analyze multiparametric results from biological systems.

Download Full-text

Simpler methods do it better: Success of Recurrence Quantification Analysis as a general purpose data analysis tool

Physics Letters A ◽

10.1016/j.physleta.2009.08.052 ◽

2009 ◽

Vol 373 (41) ◽

pp. 3753-3756 ◽

Cited By ~ 35

Author(s):

Charles L. Webber ◽

Norbert Marwan ◽

Angelo Facchini ◽

Alessandro Giuliani

Keyword(s):

Data Analysis ◽

General Purpose ◽

Recurrence Quantification Analysis ◽

Analysis Tool ◽

Recurrence Quantification ◽

Quantification Analysis ◽

Data Analysis Tool

Download Full-text

MLPAnalyzer: Data Analysis Tool for Reliable Automated Normalization of MLPA Fragment Data

Analytical Cellular Pathology ◽

10.1155/2008/605109 ◽

2008 ◽

Vol 30 (4) ◽

pp. 323-335

Author(s):

Jordy Coffa ◽

Mark A. van de Wiel ◽

Begoña Diosdado ◽

Beatriz Carvalho ◽

Jan Schouten ◽

...

Keyword(s):

Data Analysis ◽

Data Processing ◽

Copy Number ◽

Data Sets ◽

Analysis Tool ◽

Visual Examination ◽

High Throughput Analysis ◽

Analysis Strategy ◽

Suggested Strategy ◽

Copy Number Changes

Background: Multiplex Ligation dependent Probe Amplification (MLPA) is a rapid, simple, reliable and customized method for detection of copy number changes of individual genes at a high resolution and allows for high throughput analysis. This technique is typically applied for studying specific genes in large sample series. The large amount of data, dissimilarities in PCR efficiency among the different probe amplification products, and sample-to-sample variation pose a challenge to data analysis and interpretation. We therefore set out to develop an MLPA data analysis strategy and tool that is simple to use, while still taking into account the above-mentioned sources of variation.Materials and Methods: MLPAnalyzer was developed in Visual Basic for Applications, and can accept a large number of file formats directly from capillary sequence systems. Sizes of all MLPA probe signals are determined and filtered, quality control steps are performed, and variation in peak intensity related to size is corrected for. DNA copy number ratios of test samples are computed, displayed in a table view and a set of comprehensive figures is generated. To validate this approach, MLPA reactions were performed using a dedicated MLPA mix on 6 different colorectal cancer cell lines. The generated data were normalized using our program and results were compared to previously performed array-CGH results using both statistical methods and visual examination.Results and Discussion: Visual examination of bar graphs and direct ratios for both techniques showed very similar results, while the average Pearson moment correlation over all MLPA probes was found to be 0.42. Our results thus show that automated MLPA data processing following our suggested strategy may be of significant use, especially when handling large MLPA data sets, when samples are of different quality, or interpretation of MLPA electropherograms is too complex. It remains, however, important to recognize that automated MLPA data processing may only be successful when a dedicated experimental setup is also considered.

Download Full-text

Complex Big Data Analysis Based on Multi-granularity Generalized Functions

International Journal of Online Engineering (iJOE) ◽

10.3991/ijoe.v14i04.8368 ◽

2018 ◽

Vol 14 (04) ◽

pp. 43

Author(s):

Zhang Xueya ◽

Jianwei Zhang

Keyword(s):

Big Data ◽

Data Analysis ◽

Data Model ◽

Input Data ◽

Big Data Analysis ◽

Generalized Functions ◽

Data Sets ◽

Test Results ◽

Data Space ◽

Noise Data

<p>A new method for the big data analysis - multi-granularity generalized functions data model (referred to as MGGF for short) is put forward. This method adopts the dynamic adaptive multi-granularity clustering technique, transforms the grid like "Hard partitioning" to the input data space by the generalized functions data model (referred to as GFDM for short) into the multi-granularity partitioning, and identifies the multi-granularity pattern class in the input data space. By defining the type of the mapping relationship between the multi-granularity model class and the decision-making category ftype:Ci→y, and the concept of the Degree of Fulfillment (referred to as DoF (x)) of the input data to the classification rules of the various pattern classes, the corresponding MGGF model is established. Experimental test results of different data sets show that, compared with the GFDM method, the method proposed in this paper has better data summarization ability, stronger noise data processing ability and higher searching efficiency.</p>

Download Full-text