Summary
Artificial neural networks (ANNs) have been used widely for prediction and classification problems. In particular, many methods for building ANNs have appeared in the last 2 decades. One of the continuing important limitations of using ANNs, however, is their poor ability to analyze small data sets because of overfitting. Several methods have been proposed in the literature to overcome this problem. On the basis of our study, we can conclude that ANNs that use radial basis functions (RBFs) can decrease the error of the prediction effectively when there is an underlying relationship between the variables. We have applied this and other methods to determine the factors controlling and related to fracture spacing in the Lisburne formation, northeastern Alaska.
By comparing the RBF results with those from other ANN methods, we find that the former method gives a substantially smaller error than many of the alternative methods. For example, the errors in predicted fracture spacing for the Lisburne formation with conventional ANN methods are approximately 50 to 200% larger than those obtained with RBFs. With a method that predicts fracture spacing more accurately, we were able to identify more reliably the effects on the spacing of such factors as bed thickness, lithology, structural position, and degree of folding.
By comparing performances of all the methods we tested, we observed that some methods that performed well in one test did not necessarily do as well in another test. This suggests that, while RBF can be expected to be among the best methods, there is no "best universal method" for all the cases, and testing different methods for each case is required. Nonetheless, through this study, we were able to identify several candidate methods and, thereby, narrow the work required to find a suitable ANN.
In petroleum engineering and geosciences, the number of data is limited in many cases because of expense or logistical limitations (e.g., limited core, poor borehole conditions, or restricted logging suites). Thus, the methods used in this study should be attractive in many petroleum-engineering contexts in which complex, nonlinear relationships need to be modeled by use of small data sets.
Introduction
An ANN is "an information-processing system that has certain performance characteristics in common with biological neural networks" (Fausett 1994). On the basis of the "universal approximation theorem" with a sufficient number of hidden nodes, multilayer neural networks (Fig. 1) are able to predict any unknown function (Haykin 1999). ANNs are widely used in prediction and classification problems and have numerous applications in geosciences and petroleum engineering, including permeability prediction (Aminian et al. 2003), fluid-properties prediction (Sultan and Al-Kaabi 2002), and well-test-data analysis (Osman and Al-Marhoun 2005).
Given a basic network structure, there is a wide variety of ANNs that can be produced. For example, different methods or criteria used to train the network produce ANNs that provide different predictions (e.g., the early-stopping and weight-decay methods.) Also, two or more neural networks can be combined to produce an ANN with better error performance or other qualities, giving the so-called "ensemble learning methods," a term that covers a large variety of methods, including stacked generalization and ensemble averaging. An additional problem is introduced when the data sets are small. This is a common situation in petroleum-engineering and geosciences applications, in which the cost of data or collection logistics may limit the number of measurements. In such instances, the use of ANNs can result in overfitting, where the model is fitted to the training data points but performs poorly for prediction of other points (Fig. 2).
In this study, we try to identify—among myriad possibilities—a few ANNs that provide good error performance with limited sample numbers. After a brief review of various types of ANNs, we use a synthetic data set to discuss, apply, and compare the methods that have been proposed in the literature to overcome the small-data-sets problem. Finally, we apply these methods to an actual data set—fracture-spacing data from the Lisburne Group, northeastern Alaska—and evaluate the results.