scholarly journals Statistical methods of fracture characterization using acoustic borehole televiewer log interpretation

2021 ◽  
Author(s):  
C Massiot ◽  
John Townend ◽  
A Nicol ◽  
DD McNamara

©2017. American Geophysical Union. All Rights Reserved. Acoustic borehole televiewer (BHTV) logs provide measurements of fracture attributes (orientations, thickness, and spacing) at depth. Orientation, censoring, and truncation sampling biases similar to those described for one-dimensional outcrop scanlines, and other logging or drilling artifacts specific to BHTV logs, can affect the interpretation of fracture attributes from BHTV logs. K-means, fuzzy K-means, and agglomerative clustering methods provide transparent means of separating fracture groups on the basis of their orientation. Fracture spacing is calculated for each of these fracture sets. Maximum likelihood estimation using truncated distributions permits the fitting of several probability distributions to the fracture attribute data sets within truncation limits, which can then be extrapolated over the entire range where they naturally occur. Akaike Information Criterion (AIC) and Schwartz Bayesian Criterion (SBC) statistical information criteria rank the distributions by how well they fit the data. We demonstrate these attribute analysis methods with a data set derived from three BHTV logs acquired from the high-temperature Rotokawa geothermal field, New Zealand. Varying BHTV log quality reduces the number of input data points, but careful selection of the quality levels where fractures are deemed fully sampled increases the reliability of the analysis. Spacing data analysis comprising up to 300 data points and spanning three orders of magnitude can be approximated similarly well (similar AIC rankings) with several distributions. Several clustering configurations and probability distributions can often characterize the data at similar levels of statistical criteria. Thus, several scenarios should be considered when using BHTV log data to constrain numerical fracture models.


2021 ◽  
Author(s):  
C Massiot ◽  
John Townend ◽  
A Nicol ◽  
DD McNamara

©2017. American Geophysical Union. All Rights Reserved. Acoustic borehole televiewer (BHTV) logs provide measurements of fracture attributes (orientations, thickness, and spacing) at depth. Orientation, censoring, and truncation sampling biases similar to those described for one-dimensional outcrop scanlines, and other logging or drilling artifacts specific to BHTV logs, can affect the interpretation of fracture attributes from BHTV logs. K-means, fuzzy K-means, and agglomerative clustering methods provide transparent means of separating fracture groups on the basis of their orientation. Fracture spacing is calculated for each of these fracture sets. Maximum likelihood estimation using truncated distributions permits the fitting of several probability distributions to the fracture attribute data sets within truncation limits, which can then be extrapolated over the entire range where they naturally occur. Akaike Information Criterion (AIC) and Schwartz Bayesian Criterion (SBC) statistical information criteria rank the distributions by how well they fit the data. We demonstrate these attribute analysis methods with a data set derived from three BHTV logs acquired from the high-temperature Rotokawa geothermal field, New Zealand. Varying BHTV log quality reduces the number of input data points, but careful selection of the quality levels where fractures are deemed fully sampled increases the reliability of the analysis. Spacing data analysis comprising up to 300 data points and spanning three orders of magnitude can be approximated similarly well (similar AIC rankings) with several distributions. Several clustering configurations and probability distributions can often characterize the data at similar levels of statistical criteria. Thus, several scenarios should be considered when using BHTV log data to constrain numerical fracture models.



2008 ◽  
Vol 06 (02) ◽  
pp. 261-282 ◽  
Author(s):  
AO YUAN ◽  
WENQING HE

Clustering is a major tool for microarray gene expression data analysis. The existing clustering methods fall mainly into two categories: parametric and nonparametric. The parametric methods generally assume a mixture of parametric subdistributions. When the mixture distribution approximately fits the true data generating mechanism, the parametric methods perform well, but not so when there is nonnegligible deviation between them. On the other hand, the nonparametric methods, which usually do not make distributional assumptions, are robust but pay the price for efficiency loss. In an attempt to utilize the known mixture form to increase efficiency, and to free assumptions about the unknown subdistributions to enhance robustness, we propose a semiparametric method for clustering. The proposed approach possesses the form of parametric mixture, with no assumptions to the subdistributions. The subdistributions are estimated nonparametrically, with constraints just being imposed on the modes. An expectation-maximization (EM) algorithm along with a classification step is invoked to cluster the data, and a modified Bayesian information criterion (BIC) is employed to guide the determination of the optimal number of clusters. Simulation studies are conducted to assess the performance and the robustness of the proposed method. The results show that the proposed method yields reasonable partition of the data. As an illustration, the proposed method is applied to a real microarray data set to cluster genes.



Author(s):  
Valentin Raileanu ◽  

The article briefly describes the history and fields of application of the theory of extreme values, including climatology. The data format, the Generalized Extreme Value (GEV) probability distributions with Bock Maxima, the Generalized Pareto (GP) distributions with Point of Threshold (POT) and the analysis methods are presented. Estimating the distribution parameters is done using the Maximum Likelihood Estimation (MLE) method. Free R software installation, the minimum set of required commands and the GUI in2extRemes graphical package are described. As an example, the results of the GEV analysis of a simulated data set in in2extRemes are presented.



2019 ◽  
Vol 7 (4) ◽  
pp. 23-34
Author(s):  
I. A. Osmakov ◽  
T. A. Savelieva ◽  
V. B. Loschenov ◽  
S. A. Goryajnov ◽  
A. A. Potapov

The paper presents the results of a comparative study of methods of cluster analysis of optical intraoperative spectroscopy data during surgery of glial tumors with varying degree of malignancy. The analysis was carried out both for individual patients and for the entire dataset. The data were obtained using combined optical spectroscopy technique, which allowed simultaneous registration of diffuse reflectance spectra of broadband radiation in the 500–600 nm spectral range (for the analysis of tissue blood supply and the degree of hemoglobin oxygenation), fluorescence spectra of 5‑ALA induced protoporphyrin IX (Pp IX) (for analysis of the malignancy degree) and signal of diffusely reflected laser light used to excite Pp IX fluorescence (to take into account the scattering properties of tissues). To determine the threshold values of these parameters for the tumor, the infltration zone and the normal white matter, we searched for the natural clusters in the available intraoperative optical spectroscopy data and compared them with the results of the pathomorphology. It was shown that, among the considered clustering methods, EM‑algorithm and k‑means methods are optimal for the considered data set and can be used to build a decision support system (DSS) for spectroscopic intraoperative navigation in neurosurgery. Results of clustering relevant to thepathological studies were also obtained using the methods of spectral and agglomerative clustering. These methods can be used to postprocess combined spectroscopy data.



2016 ◽  
Vol 13 (10) ◽  
pp. 6935-6943 ◽  
Author(s):  
Jia-Lin Hua ◽  
Jian Yu ◽  
Miin-Shen Yang

Mountains, which heap up by densities of a data set, intuitively reflect the structure of data points. These mountain clustering methods are useful for grouping data points. However, the previous mountain-based clustering suffers from the choice of parameters which are used to compute the density. In this paper, we adopt correlation analysis to determine the density, and propose a new clustering algorithm, called Correlative Density-based Clustering (CDC). The new algorithm computes the density with a modified way and determines the parameters based on the inherent structure of data points. Experiments on artificial datasets and real datasets demonstrate the simplicity and effectiveness of the proposed approach.



Author(s):  
Diamond O. Tuoyo ◽  
Festus C. Opone ◽  
N. Ekhosuehi

This paper presents a new generalization of the Topp-Leone distribution called the Topp-Leone Weibull Distribution (TLWD). Some of the mathematical properties of the proposed distribution are derived, and the maximum likelihood estimation method is adopted in estimating the parameters of the proposed distribution. An application of the proposed distribution alongside with some well-known distributions belonging to the Topp-Leone generated family of distributions, to a real lifetime data set reveals that the proposed distribution exhibits more flexibility in modeling lifetime data based on some comparison criteria such as maximized log-likelihood, Akaike Information Criterion [AIC=2k-2 log⁡(L) ], Kolmogorov-Smirnov test statistic (K-S) and Anderson Darling test statistic (A*) and Crammer-Von Mises test statistic (W*).



2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
A. S. Al-Moisheer

Finite mixture models provide a flexible tool for handling heterogeneous data. This paper introduces a new mixture model which is the mixture of Lindley and lognormal distributions (MLLND). First, the model is formulated, and some of its statistical properties are studied. Next, maximum likelihood estimation of the parameters of the model is considered, and the performance of the estimators of the parameters of the proposed models is evaluated via simulation. Also, the flexibility of the proposed mixture distribution is demonstrated by showing its superiority to fit a well-known real data set of 128 bladder cancer patients compared to several mixture and nonmixture distributions. The Kolmogorov Smirnov test and some information criteria are used to compare the fitted models to the real dataset. Finally, the results are verified using several graphical methods.



2020 ◽  
Vol 501 (2) ◽  
pp. 2268-2278
Author(s):  
John K Webb ◽  
Chung-Chi Lee ◽  
Robert F Carswell ◽  
Dinko Milaković

ABSTRACT Robust model-fitting to spectroscopic transitions is a requirement across many fields of science. The corrected Akaike and Bayesian information criteria (AICc and BIC) are most frequently used to select the optimal number of fitting parameters. In general, AICc modelling is thought to overfit (too many model parameters) and BIC underfits. For spectroscopic modelling, both AICc and BIC lack in two important respects: (a) no penalty distinction is made according to line strength such that parameters of weak lines close to the detection threshold are treated with equal importance as strong lines and (b) no account is taken of the way in which a narrow spectral line impacts only on a very small section of the overall data. In this paper, we introduce a new information criterion that addresses these shortcomings, the Spectral Information Criterion (SpIC). Spectral simulations are used to compare performances. The main findings are (i) SpIC clearly outperforms AICc for high signal-to-noise data, (ii) SpIC and AICc work equally well for lower signal-to-noise data, although SpIC achieves this with fewer parameters, and (iii) BIC does not perform well (for this application) and should be avoided. The new method should be of broader applicability (beyond spectroscopy), wherever different model parameters influence separated small ranges within a larger data set and/or have widely varying sensitivities.



2021 ◽  
Vol 9 (1) ◽  
pp. 89-103
Author(s):  
Mieke Kuschnerus ◽  
Roderik Lindenbergh ◽  
Sander Vos

Abstract. Sandy coasts are constantly changing environments governed by complex, interacting processes. Permanent laser scanning is a promising technique to monitor such coastal areas and to support analysis of geomorphological deformation processes. This novel technique delivers 3-D representations of the coast at hourly temporal and centimetre spatial resolution and allows us to observe small-scale changes in elevation over extended periods of time. These observations have the potential to improve understanding and modelling of coastal deformation processes. However, to be of use to coastal researchers and coastal management, an efficient way to find and extract deformation processes from the large spatiotemporal data set is needed. To enable automated data mining, we extract time series of surface elevation and use unsupervised learning algorithms to derive a partitioning of the observed area according to change patterns. We compare three well-known clustering algorithms (k-means clustering, agglomerative clustering and density-based spatial clustering of applications with noise; DBSCAN), apply them on the set of time series and identify areas that undergo similar evolution during 1 month. We test if these algorithms fulfil our criteria for suitable clustering on our exemplary data set. The three clustering methods are applied to time series over 30 d extracted from a data set of daily scans covering about 2 km of coast in Kijkduin, the Netherlands. A small section of the beach, where a pile of sand was accumulated by a bulldozer, is used to evaluate the performance of the algorithms against a ground truth. The k-means algorithm and agglomerative clustering deliver similar clusters, and both allow us to identify a fixed number of dominant deformation processes in sandy coastal areas, such as sand accumulation by a bulldozer or erosion in the intertidal area. The level of detail found with these algorithms depends on the choice of the number of clusters k. The DBSCAN algorithm finds clusters for only about 44 % of the area and turns out to be more suitable for the detection of outliers, caused, for example, by temporary objects on the beach. Our study provides a methodology to efficiently mine a spatiotemporal data set for predominant deformation patterns with the associated regions where they occur.



2020 ◽  
Author(s):  
Mio Hosoe ◽  
Masashi Kuwano ◽  
Taku Moriyama

Abstract With the development of ICT (Information and Communication Technology), interest in using the large amount of accumulated data for traffic policy planning has been increasing. In recent years, data polishing has been proposed as a new methodology for big data analysis. Data polishing is one of the graphical clustering methods. This method can be used to extract patterns that are similar or related to each other by clarifying the cluster structures in the data. The purpose of this study is to reveal travel patterns of railway passengers by applying data polishing to smart card data collected in Kagawa Prefecture, Japan. This study uses 9,008,709 data points collected during the 15 months from December 1st, 2013 to February 28th, 2015. This data set includes such information as trip histories and types of passengers. The study uses the data polishing method to cluster 4,667,520 combinations of information about individual rides: day of the week, time of day, passenger type, origin station, and destination station. As a result, 127 characteristic travel patterns were specified from those combinations.



Sign in / Sign up

Export Citation Format

Share Document