discovery process models
Recently Published Documents


TOTAL DOCUMENTS

11
(FIVE YEARS 1)

H-INDEX

3
(FIVE YEARS 0)

Author(s):  
Mouhib Alnoukari ◽  
Asim El Sheikh

Knowledge Discovery (KD) process model was first discussed in 1989. Different models were suggested starting with Fayyad’s et al (1996) process model. The common factor of all data-driven discovery process is that knowledge is the final outcome of this process. In this chapter, the authors will analyze most of the KD process models suggested in the literature. The chapter will have a detailed discussion on the KD process models that have innovative life cycle steps. It will propose a categorization of the existing KD models. The chapter deeply analyzes the strengths and weaknesses of the leading KD process models, with the supported commercial systems and reported applications, and their matrix characteristics.


2010 ◽  
Vol 25 (2) ◽  
pp. 137-166 ◽  
Author(s):  
Gonzalo Mariscal ◽  
Óscar Marbán ◽  
Covadonga Fernández

AbstractUp to now, many data mining and knowledge discovery methodologies and process models have been developed, with varying degrees of success. In this paper, we describe the most used (in industrial and academic projects) and cited (in scientific literature) data mining and knowledge discovery methodologies and process models, providing an overview of its evolution along data mining and knowledge discovery history and setting down the state of the art in this topic. For every approach, we have provided a brief description of the proposed knowledge discovery in databases (KDD) process, discussing about special features, outstanding advantages and disadvantages of every approach. Apart from that, a global comparative of all presented data mining approaches is provided, focusing on the different steps and tasks in which every approach interprets the whole KDD process. As a result of the comparison, we propose a new data mining and knowledge discovery process namedrefined data mining processfor developing any kind of data mining and knowledge discovery project. The refined data mining process is built on specific steps taken from analyzed approaches.


Author(s):  
P.J. Lee

The procedure and steps of petroleum resource assessment involve a learning process that is characterized by an interactive loop between geological and statistical models and their feedback mechanisms. Geological models represent natural populations and are the basic units for petroleum resource evaluation. Statistical models include the superpopulation, finite population, and discovery process models that may be used for estimating the distributions for pool size and number of pools, and can be estimated from somewhat biased exploration data. Methods for assessing petroleum resources have been developed using different geological perspectives. Each of them can be applied to a specific case. When we consider using a particular method, the following aspects should be examined: • Types of data required—Some methods can only incorporate certain types of data; others can incorporate all data that are available. • Assumptions required—We must study what specific assumptions should be made and what role they play in the process of estimation. • Types of estimates—What types of estimates does the method provide (aggregate estimates vs. pool-size estimates)? Do the types of estimates fulfill our needs for economic analysis? • Feedback mechanisms—What types of feedback mechanism does the method offer? PETRIMES is based on a probabilistic framework that uses superpopulation and finite population concepts, discovery process models, and the optional use of lognormal distributions. The reasoning behind the application of discovery process models is that they offer the only known way to incorporate petroleum assessment fundamentals (i.e., realism) into the estimates. PETRIMES requires an exploration time series as basic input and can be applied to both mature and frontier petroleum resource evaluations.


Author(s):  
P.J. Lee

A conceptual play has not yet been proved through exploration and can only be postulated from geological information. An immature play contains several discoveries, but not enough for discovery process models (described in Chapter 3) to be applied. The amount of data available for evaluating a conceptual play can be highly variable. Therefore, the evaluation methods used are related to the amount and types of data available, some of which are listed in Table 5.1. Detailed descriptions of these methods are beyond the scope of this book. However, an overview of these and other methods will be presented in Chapter 7. This chapter deals with the application of numerical methods to conceptual or immature plays. For immature plays, discoveries can be used to validate the estimates obtained. In this chapter, the Beaverhill Lake play and a play from the East Coast of Canada are examined. A play consists of a number of pools and/or prospects that may or may not contain hydrocarbons. Therefore, associated with each prospect is an exploration risk that measures the probability of a prospect being a pool. Estimating exploration risk in petroleum resource evaluation is important. Methods for quantifying exploration risks are described later. Geological factors that determine the accumulation of hydrocarbons include the presence of closure and of reservoir facies, as well as adequate seal, porosity, timing, source, migration, preservation, and recovery. For a specific play, only a few of these factors are recognized as critical to the amount of final accumulation. Consequently, if a prospect located within a sandstone play, for example, were tested, it might prove unsuccessful for any of the following reasons: lack of closure, unfavorable reservoir facies, lack of adequate source or migration path, and/or absence of cap rock. The frequency of occurrence of a geological factor can be measured from marginal probabilities. For example, if the marginal probability for the presence-of-closure factor is 0.9, there is a 90% chance that prospects drilled will have adequate closure. For a prospect to be a pool, the simultaneous presence of all the geological factors in the prospect is necessary. This requirement leads us to exploration risk analysis.


Author(s):  
P.J. Lee

In Chapter 3 we discussed the concepts, functions, and applications of the two discovery process models LDSCV and NDSCV. In this chapter we will use various simulated populations to validate these two models to examine whether their performance meets our expectations. In addition, lognormal assumptions are applied to Weibull and Pareto populations to assess the impact on petroleum evaluation as a result of incorrect specification of probability distributions. A mixed population of two lognormal populations and a mixed population of lognormal, Weibull, and Pareto populations were generated to test the impact of mixed populations on assessment quality. NDSCV was then applied to all these data sets to validate the performance of the models. Finally, justifications for choosing a lognormal distribution in petroleum assessments are discussed in detail. Known populations were created as follows: A finite population was generated from a random sample of size 300 (N = 300) drawn from the lognormal, Pareto, and Weibull superpopulations. For the lognormal case, a population with μ = 0 and σ2 = 5 was assumed. The truncated and shifted Pareto population with shape factor θ = 0.4, maximum pool size = 4000, and minimum pool size = 1 was created. The Weibull population with λ = 20, θ = 1.0 was generated for the current study. The first mixed population was created by mixing two lognormal populations. Parameters for population I are μ = 0, σ2 = 3, and N1 = 150. For population II, μ = 3.0, σ2 = 3.2, and N2 = 150. The second mixed population was generated by mixing lognormal (N1 = 100), Pareto (N2 = 100), and Weibull (N3 = 100) populations with a total of 300 pools. In addition, a gamma distribution was also used for reference. The lognormal distribution is J-shaped if an arithmetic scale is used for the horizontal axis, but it shows an almost symmetrical pattern when a logarithmic scale is applied.


Sign in / Sign up

Export Citation Format

Share Document