A methodology for integrating unconventional geologic and engineering data into a geocellular model

2020 ◽  
Vol 8 (1) ◽  
pp. B13-B33 ◽  
Author(s):  
Kathryn Tamulonis

Unconventional field development and well performance analysis encompass multiple disciplines and large data sets. Even when seismic and other data sets are not available, geologists can build geocellular models to determine factors that improve operational efficiency by incorporating well log, geosteering, stratigraphic, structural, completion, and production data. I have developed a methodology to integrate these data sets from vertical and horizontal wells to build a sequence stratigraphic and structurally framed geocellular model for an unconventional Marcellus Formation field in the Appalachian Basin, USA. The model would benefit from additional data sets to perform a rigorous investigation of performance drivers. However, the presented methodology emphasizes the value of constructing geocellular models for fields with sparse data by building a geologically detailed model in a field area without seismic and core data. I used third-order stratigraphic sequences interpreted from vertical wells and geosteering data to define model layers and then incorporate completion treating pressures and proppant delivered per stage into the model. These data were upscaled and geostatistically distributed throughout the model to visualize completion trends. Based on these results, I conclude that geologic structure and treating pressures coincide, as treating pressures increase with stage proximity to a left-lateral strike-slip fault, and completion trends vary among third-order systems tracts. Mapped completion issues are further emphasized by areas with higher model proppant values, and all treating pressure and proppant realizations for each systems tract have the greatest variance away from data points. Similar models can be built to further understand any global unconventional play, even when data are sparse, and, by doing so, geologists and engineers can (1) predict completion trends based on geology, (2) optimize efficiency in the planning and operational phases of field development, and (3) foster supportive relationships within integrated subsurface teams.

2011 ◽  
Vol 268-270 ◽  
pp. 811-816
Author(s):  
Yong Zhou ◽  
Yan Xing

Affinity Propagation(AP)is a new clustering algorithm, which is based on the similarity matrix between pairs of data points and messages are exchanged between data points until clustering result emerges. It is efficient and fast , and it can solve the clustering on large data sets. But the traditional Affinity Propagation has many limitations, this paper introduces the Affinity Propagation, and analyzes in depth the advantages and limitations of it, focuses on the improvements of the algorithm — improve the similarity matrix, adjust the preference and the damping-factor, combine with other algorithms. Finally, discusses the development of Affinity Propagation.


Author(s):  
M. EMRE CELEBI ◽  
HASSAN A. KINGRAVI

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. Many of these methods, however, have superlinear complexity in the number of data points, making them impractical for large data sets. On the other hand, linear methods are often random and/or order-sensitive, which renders their results unrepeatable. Recently, Su and Dy proposed two highly successful hierarchical initialization methods named Var-Part and PCA-Part that are not only linear, but also deterministic (nonrandom) and order-invariant. In this paper, we propose a discriminant analysis based approach that addresses a common deficiency of these two methods. Experiments on a large and diverse collection of data sets from the UCI machine learning repository demonstrate that Var-Part and PCA-Part are highly competitive with one of the best random initialization methods to date, i.e. k-means++, and that the proposed approach significantly improves the performance of both hierarchical methods.


Author(s):  
Md. Zakir Hossain ◽  
Md.Nasim Akhtar ◽  
R.B. Ahmad ◽  
Mostafijur Rahman

<span>Data mining is the process of finding structure of data from large data sets. With this process, the decision makers can make a particular decision for further development of the real-world problems. Several data clusteringtechniques are used in data mining for finding a specific pattern of data. The K-means method isone of the familiar clustering techniques for clustering large data sets.  The K-means clustering method partitions the data set based on the assumption that the number of clusters are fixed.The main problem of this method is that if the number of clusters is to be chosen small then there is a higher probability of adding dissimilar items into the same group. On the other hand, if the number of clusters is chosen to be high, then there is a higher chance of adding similar items in the different groups. In this paper, we address this issue by proposing a new K-Means clustering algorithm. The proposed method performs data clustering dynamically. The proposed method initially calculates a threshold value as a centroid of K-Means and based on this value the number of clusters are formed. At each iteration of K-Means, if the Euclidian distance between two points is less than or equal to the threshold value, then these two data points will be in the same group. Otherwise, the proposed method will create a new cluster with the dissimilar data point. The results show that the proposed method outperforms the original K-Means method.</span>


Author(s):  
Ana Cristina Bicharra Garcia ◽  
Inhauma Ferraz ◽  
Adriana S. Vivacqua

AbstractMost past approaches to data mining have been based on association rules. However, the simple application of association rules usually only changes the user's problem from dealing with millions of data points to dealing with thousands of rules. Although this may somewhat reduce the scale of the problem, it is not a completely satisfactory solution. This paper presents a new data mining technique, called knowledge cohesion (KC), which takes into account a domain ontology and the user's interest in exploring certain data sets to extract knowledge, in the form of semantic nets, from large data sets. The KC method has been successfully applied to mine causal relations from oil platform accident reports. In a comparison with association rule techniques for the same domain, KC has shown a significant improvement in the extraction of relevant knowledge, using processing complexity and knowledge manageability as the evaluation criteria.


2014 ◽  
Vol 574 ◽  
pp. 728-733
Author(s):  
Shu Xia Lu ◽  
Cai Hong Jiao ◽  
Le Tong ◽  
Yang Fan Zhou

Core Vector Machine (CVM) can be used to deal with large data sets by find minimum enclosing ball (MEB), but one drawback is that CVM is very sensitive to the outliers. To tackle this problem, we propose a novel Position Regularized Core Vector Machine (PCVM).In the proposed PCVM, the data points are regularized by assigning a position-based weighting. Experimental results on several benchmark data sets show that the performance of PCVM is much better than CVM.


Geophysics ◽  
2002 ◽  
Vol 67 (6) ◽  
pp. 1823-1834 ◽  
Author(s):  
Stephen D. Billings ◽  
Garry N. Newsam ◽  
Rick K. Beatson

Continuous global surfaces (CGS) are a general framework for interpolation and smoothing of geophysical data. The first of two smoothing techniques we consider in this paper is generalized cross validation (GCV), which is a bootstrap measure of the predictive error of a surface that requires no prior knowledge of noise levels. The second smoothing technique is to define the CGS surface with fewer centers than data points, and compute the fit by least squares (LSQR); the noise levels are implicitly estimated by the number and placement of the centers relative to the data points. We show that both smoothing methods can be implemented using extensions to the existing fast framework for interpolation, so that it is now possible to construct realistic smooth fits to the very large data sets typically collected in geophysics. Thin‐plate spline and kriging surfaces with GCV smoothing appear to produce realistic fits to noisy radiometric data. The resulting surfaces are similar, yet the thin‐plate spline required less parameterization. Given the simplicity and parsimony of GCV, this makes a combination of the two methods a reasonable default choice for the smoothing problem. LSQR smooth fitting with sinc functions defined on a regular grid of centers, effectively low‐pass filters the data and produces a reasonable surface, although one not as visually appealing as for splines and kriging.


Author(s):  
Stefan Bamberger ◽  
Felix Krahmer

AbstractJohnson–Lindenstrauss embeddings are widely used to reduce the dimension and thus the processing time of data. To reduce the total complexity, also fast algorithms for applying these embeddings are necessary. To date, such fast algorithms are only available either for a non-optimal embedding dimension or up to a certain threshold on the number of data points. We address a variant of this problem where one aims to simultaneously embed larger subsets of the data set. Our method follows an approach by Nelson et al. (New constructions of RIP matrices with fast multiplication and fewer rows. In: Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1515-1528, 2014): a subsampled Hadamard transform maps points into a space of lower, but not optimal dimension. Subsequently, a random matrix with independent entries projects to an optimal embedding dimension. For subsets whose size scales at least polynomially in the ambient dimension, the complexity of this method comes close to the number of operations just to read the data under mild assumptions on the size of the data set that are considerably less restrictive than in previous works. We also prove a lower bound showing that subsampled Hadamard matrices alone cannot reach an optimal embedding dimension. Hence, the second embedding cannot be omitted.


2016 ◽  
Vol 8 (5) ◽  
pp. 708-712 ◽  
Author(s):  
Meghan McConnell ◽  
Jonathan Sherbino ◽  
Teresa M. Chan

ABSTRACT Background  The increasing use of workplace-based assessments (WBAs) in competency-based medical education has led to large data sets that assess resident performance longitudinally. With large data sets, problems that arise from missing data are increasingly likely. Objective  The purpose of this study is to examine (1) whether data are missing at random across various WBAs, and (2) the relationship between resident performance and the proportion of missing data. Methods  During 2012–2013, a total of 844 WBAs of CanMEDs Roles were completed for 9 second-year emergency medicine residents. To identify whether missing data were randomly distributed across various WBAs, the total number of missing data points was calculated for each Role. To examine whether the amount of missing data was related to resident performance, 5 faculty members rank-ordered the residents based on performance. A median rank score was calculated for each resident and was correlated with the proportion of missing data. Results  More data were missing for Health Advocate and Professional WBAs relative to other competencies (P &lt; .001). Furthermore, resident rankings were not related to the proportion of missing data points (r = 0.29, P &gt; .05). Conclusions  The results of the present study illustrate that some CanMEDS Roles are less likely to be assessed than others. At the same time, the amount of missing data did not correlate with resident performance, suggesting lower-performing residents are no more likely to have missing data than their higher-performing peers. This article discusses several approaches to dealing with missing data.


Author(s):  
Frank Klawonn ◽  
Olga Georgieva

Most clustering methods have to face the problem of characterizing good clusters among noise data. The arbitrary noise points that just do not belong to any class being searched for are of a real concern. The outliers or noise data points are data that severely deviate from the pattern set by the majority of the data, and rounding and grouping errors result from the inherent inaccuracy in the collection and recording of data. In fact, a single outlier can completely spoil the least squares (LS) estimate and thus the results of most LS based clustering techniques such as the hard C-means (HCM) and the fuzzy C-means algorithm (FCM) (Bezdek, 1999).


Sign in / Sign up

Export Citation Format

Share Document