scholarly journals Prediction of drilling leakage locations based on optimized neural networks and the standard random forest method

Author(s):  
Junlin Su ◽  
Yang Zhao ◽  
Tao He ◽  
Pingya Luo

Circulation loss is one of the most serious and complex hindrances for normal and safe drilling operations. Detecting the layer at which the circulation loss has occurred is important for formulating technical measures related to leakage prevention and plugging and reducing the wastage because of circulation loss as much as possible. Unfortunately, because of the lack of a general method for predicting the potential location of circulation loss during drilling, most current procedures depend on the plugging test. Therefore, the aim of this study was to use an Artificial Intelligence (AI)-based method to screen and process the historical data of 240 wells and 1029 original well loss cases in a localized area of southwestern China and to perform data mining. Using comparative analysis involving the Genetic Algorithm-Back Propagation (GA-BP) neural network and random forest optimization algorithms, we proposed an efficient real-time model for predicting leakage layer locations. For this purpose, data processing and correlation analysis were first performed using existing data to improve the effects of data mining. The well history data was then divided into training and testing sets in a 3:1 ratio. The parameter values of the BP were then corrected as per the network training error, resulting in the final output of a prediction value with a globally optimal solution. The standard random forest model is a particularly capable model that can deal with high-dimensional data without feature selection. To evaluate and confirm the generated model, the model is applied to eight oil wells in a well site in southwestern China. Empirical results demonstrate that the proposed method can satisfy the requirements of actual application to drilling and plugging operations and is able to accurately predict the locations of leakage layers.

2018 ◽  
Vol 5 (1) ◽  
pp. 47-55
Author(s):  
Florensia Unggul Damayanti

Data mining help industries create intelligent decision on complex problems. Data mining algorithm can be applied to the data in order to forecasting, identity pattern, make rules and recommendations, analyze the sequence in complex data sets and retrieve fresh insights. Yet, increasing of technology and various techniques among data mining availability data give opportunity to industries to explore and gain valuable information from their data and use the information to support business decision making. This paper implement classification data mining in order to retrieve knowledge in customer databases to support marketing department while planning strategy for predict plan premium. The dataset decompose into conceptual analytic to identify characteristic data that can be used as input parameter of data mining model. Business decision and application is characterized by processing step, processing characteristic and processing outcome (Seng, J.L., Chen T.C. 2010). This paper set up experimental of data mining based on J48 and Random Forest classifiers and put a light on performance evaluation between J48 and random forest in the context of dataset in insurance industries. The experiment result are about classification accuracy and efficiency of J48 and Random Forest , also find out the most attribute that can be used to predict plan premium in context of strategic planning to support business strategy.


2020 ◽  
Vol 71 (6) ◽  
pp. 66-74
Author(s):  
Younis M. Younis ◽  
Salman H. Abbas ◽  
Farqad T. Najim ◽  
Firas Hashim Kamar ◽  
Gheorghe Nechifor

A comparison between artificial neural network (ANN) and multiple linear regression (MLR) models was employed to predict the heat of combustion, and the gross and net heat values, of a diesel fuel engine, based on the chemical composition of the diesel fuel. One hundred and fifty samples of Iraqi diesel provided data from chromatographic analysis. Eight parameters were applied as inputs in order to predict the gross and net heat combustion of the diesel fuel. A trial-and-error method was used to determine the shape of the individual ANN. The results showed that the prediction accuracy of the ANN model was greater than that of the MLR model in predicting the gross heat value. The best neural network for predicting the gross heating value was a back-propagation network (8-8-1), using the Levenberg�Marquardt algorithm for the second step of network training. R = 0.98502 for the test data. In the same way, the best neural network for predicting the net heating value was a back-propagation network (8-5-1), using the Levenberg�Marquardt algorithm for the second step of network training. R = 0.95112 for the test data.


Author(s):  
Guoshi Wang ◽  
Ying Liu ◽  
Xiaowen Chen ◽  
Qing Yan ◽  
Haibin Sui ◽  
...  

Abstract Transformer is the most important equipment in the power system. The research and development of fault diagnosis technology for Internet of Things equipment can effectively detect the operation status of equipment and eliminate hidden faults in time, which is conducive to reducing the incidence of accidents and improving people's life safety index. Objective To explore the utility of Internet of Things in power transformer fault diagnosis system. Methods A total of 30 groups of transformer fault samples were selected, and 10 groups were randomly selected for network training, and the rest samples were used for testing. The matter-element extension mathematical model of power transformer fault diagnosis was established, and the correlation function was improved according to the characteristics of three ratio method. Each group of power transformer was diagnosed for four months continuously, and the monitoring data and diagnosis were recorded and analyzed result. GPRS communication network is used to complete the communication between data acquisition terminal and monitoring terminal. According to the parameters of the database, the working state of the equipment is set, and various sensors are controlled by the instrument driver module to complete the diagnosis of transformer fault system. Results The detection success rate of the power transformer fault diagnosis system model established in this paper is as high as 95.6%, the training error is less than 0.0001, and it can correctly identify the fault types of the non training samples. It can be seen that the technical support of the Internet of Things is helpful to the upgrading and maintenance of the power transformer fault diagnosis system.


2016 ◽  
Vol 51 (20) ◽  
pp. 2853-2862 ◽  
Author(s):  
Serkan Ballı

The aim of this study is to diagnose and classify the failure modes for two serial fastened sandwich composite plates using data mining techniques. The composite material used in the study was manufactured using glass fiber reinforced layer and aluminum sheets. Obtained results of previous experimental study for sandwich composite plates, which were mechanically fastened with two serial pins or bolts were used for classification of failure modes. Furthermore, experimental data from previous study consists of different geometrical parameters for various applied preload moments as 0 (pinned), 2, 3, 4, and 5 Nm (bolted). In this study, data mining methods were applied by using these geometrical parameters and pinned/bolted joint configurations. Therefore, three geometrical parameters and 100 test data were used for classification by utilizing support vector machine, Naive Bayes, K-Nearest Neighbors, Logistic Regression, and Random Forest methods. According to experiments, Random Forest method achieved better results than others and it was appropriate for diagnosing and classification of the failure modes. Performances of all data mining methods used were discussed in terms of accuracy and error ratios.


Author(s):  
Chenxi Li ◽  
Zhendong Guo ◽  
Liming Song ◽  
Jun Li ◽  
Zhenping Feng

The design of turbomachinery cascades is a typical high dimensional and computationally expensive problem, a metamodel-based global optimization and data mining method is proposed to solve it. A modified Efficient Global Optimization (EGO) algorithm named Multi-Point Search based Efficient Global Optimization (MSEGO) is proposed, which is characterized by adding multiple samples at per iteration. By testing on typical mathematical functions, MSEGO outperforms EGO in accuracy and convergence rate. MSEGO is used for the optimization of a turbine vane with non-axisymmetric endwall contouring (NEC), the total pressure coefficient of the optimal vane is increased by 0.499%. Under the same settings, another two optimization processes are conducted by using the EGO and an Adaptive Range Differential Evolution algorithm (ARDE), respectively. The optimal solution of MSEGO is far better than EGO. While achieving similar optimal solutions, the cost of MSEGO is only 3% of ARDE. Further, data mining techniques are used to extract information of design space and analyze the influence of variables on design performance. Through the analysis of variance (ANOVA), the variables of section profile are found to have most significant effects on cascade loss performance. However, the NEC seems not so important through the ANOVA analysis. This is due to the fact the performance difference between different NEC designs is very small in our prescribed space. However, the designs with NEC are always much better than the reference design as shown by parallel axis, i.e., the NEC would significantly influence the cascade performance. Further, it indicates that the ensemble learning by combing results of ANOVA and parallel axis is very useful to gain full knowledge from the design space.


Author(s):  
T R Stella Mary ◽  
Shoney Sebastian

<span>Data mining can be defined as a process of extracting unknown, verifiable and possibly helpful data from information. Among the various ailments, heart ailment is one of the primary reason behind death of individuals around the globe, hence in order to curb this, a detailed analysis is done using Data Mining. Many a times we limit ourselves with minimal attributes that are required to predict a patient with heart disease. By doing so we are missing on a lot of important attributes that are main causes for heart diseases. Hence, this research aims at considering almost all the important features affecting heart disease and performs the analysis step by step with minimal to maximum set of attributes using Data Mining techniques to predict heart ailments. The various classification methods used are Naïve Bayes classifier, Random Forest and Random Tree which are applied on three datasets with different number of attributes but with a common class label. From the analysis performed, it shows that there is a gradual increase in prediction accuracies with the increase in the attributes irrespective of the classifiers used and Naïve Bayes and Random Forest algorithms comparatively outperforms with these sets of data.</span>


2016 ◽  
Vol 31 (2) ◽  
pp. 581-599 ◽  
Author(s):  
David Ahijevych ◽  
James O. Pinto ◽  
John K. Williams ◽  
Matthias Steiner

Abstract A data mining and statistical learning method known as a random forest (RF) is employed to generate 2-h forecasts of the likelihood for initiation of mesoscale convective systems (MCS-I). The RF technique uses an ensemble of decision trees to relate a set of predictors [in this case radar reflectivity, satellite imagery, and numerical weather prediction (NWP) model diagnostics] to a predictand (in this case MCS-I). The RF showed a remarkable ability to detect MCS-I events. Over 99% of the 550 observed MCS-I events were detected to within 50 km. However, this high detection rate came with a tendency to issue false alarms either because of premature warning of an MCS-I event or in the continued elevation of RF forecast likelihoods well after an MCS-I event occurred. The skill of the RF forecasts was found to increase with the number of trees and the fraction of positive events used in the training set. The skill of the RF was also highly dependent on the types of predictor fields included in the training set and was notably better when a more recent training period was used. The RF offers advantages over high-resolution NWP because it can be run in a fraction of the time and can account for nonlinearly varying biases in the model data. In addition, as part of the training process, the RF ranks the importance of each predictor, which can be used to assess the utility of new datasets in the prediction of MCS-I.


2021 ◽  
Vol 9 (1) ◽  
pp. 25
Author(s):  
Maulida Ayu Fitriani ◽  
Dany Candra Febrianto

Direct marketing is an effort made by the Bank to increase sales of its products and services, but the Bank sometimes has to contact a customer or prospective customer more than once to ascertain whether the customer or prospective customer is willing to subscribe to a product or service. To overcome this ineffective process several data mining methods are proposed. This study compares several data mining methods such as Naïve Bayes, K-NN, Random Forest, SVM, J48, AdaBoost J48 which prior to classification the SMOTE pre-processing technique was done in order to eliminate the class imbalance problem in the Bank Marketing dataset instance. The SMOTE + Random Forest method in this study produced the highest accuracy value of 92.61%.


Sign in / Sign up

Export Citation Format

Share Document