scholarly journals Classification random forest with exact conditioning for spatial prediction of categorical variables

Author(s):  
Francky Fouedjio
Author(s):  
Cheng-ming Ye ◽  
Rui-long Wei ◽  
Yong-gang Ge ◽  
Yao Li ◽  
José Marcato Junior ◽  
...  

PLoS ONE ◽  
2020 ◽  
Vol 15 (11) ◽  
pp. e0242458
Author(s):  
Minzheng Jiang ◽  
Tiancai Cheng ◽  
Kangxing Dong ◽  
Shufan Xu ◽  
Yulong Geng

The difficulty in directly determining the failure mode of the submersible screw pump will shorten the life of the system and the normal production of the oil well. This thesis aims to identify the fault forms of submersible screw pump accurately and efficiently, and proposes a fault diagnosis method of the submersible screw pump based on random forest. HDFS storage system and MapReduce processing system are established based on Hadoop big data processing platform; Furthermore, the Bagging algorithm is used to collect the training set data. Also, this thesis adopts the CART method to establish the sample library and the decision trees for a random forest model. Six continuous variables, four categorical variables and fault categories of submersible screw pump oil production system are used for training the decision trees. As several decision trees constitute a random forest model, the parameters to be tested are input into the random forest models, and various types of decision trees are used to determine the failure category in the submersible screw pump. It has been verified that the accuracy rate of fault diagnosis is 92.86%. This thesis can provide some meaningful guidance for timely detection of the causes of downhole unit failures, reducing oil well production losses, and accelerating the promotion and application of submersible screw pumps in oil fields.


Geoderma ◽  
2018 ◽  
Vol 316 ◽  
pp. 100-114 ◽  
Author(s):  
Carlos M. Guio Blanco ◽  
Victor M. Brito Gomez ◽  
Patricio Crespo ◽  
Mareike Ließ

Geoderma ◽  
2012 ◽  
Vol 171-172 ◽  
pp. 35-43 ◽  
Author(s):  
Shiwen Zhang ◽  
Yuanfang Huang ◽  
Chongyang Shen ◽  
Huichun Ye ◽  
Yichun Du

2021 ◽  
Vol 19 ◽  
pp. 310-320
Author(s):  
Suboh Alkhushayni ◽  
Taeyoung Choi ◽  
Du’a Alzaleq

This work aims to expand the knowledge of the area of data analysis through both persistence homology, as well as representations of directed graphs. To be specific, we looked for how we can analyze homology cluster groups using agglomerative Hierarchical Clustering algorithms and methods. Additionally, the Wine data, which is offered in R studio, was analyzed using various cluster algorithms such as Hierarchical Clustering, K-Means Clustering, and PAM Clustering. The goal of the analysis was to find out which cluster's method is proper for a given numerical data set. By testing the data, we tried to find the agglomerative hierarchical clustering method that will be the optimal clustering algorithm among these three; K-Means, PAM, and Random Forest methods. By comparing each model's accuracy value with cultivar coefficients, we came with a conclusion that K-Means methods are the most helpful when working with numerical variables. On the other hand, PAM clustering and Gower with random forest are the most beneficial approaches when working with categorical variables. All these tests can determine the optimal number of clustering groups, given the data set, and by doing the proper analysis. Using those the project, we can apply our method to several industrial areas such that clinical, business, and others. For example, people can make different groups based on each patient who has a common disease, required therapy, and other things in the clinical society. Additionally, for the business area, people can expect to get several clustered groups based on the marginal profit, marginal cost, or other economic indicators.


Author(s):  
Jean Michel Moura-Bueno ◽  
Ricardo Simão Diniz Dalmolin ◽  
Taciara Zborowski Horst-Heinen ◽  
Luciano Campos Cancian ◽  
Ricardo Bergamo Schenato ◽  
...  

Abstract: The objective of this work was to evaluate the use of covariate selection by expert knowledge on the performance of soil class predictive models in a complex landscape, in order to identify the best predictive model for digital soil mapping in the Southern region of Brazil. A total of 164 points were sampled in the field using the conditioned Latin hypercube, considering the covariates elevation, slope, and aspect. From the digital elevation model, environmental covariates were extracted, composing three sets, made up of: 21 covariates, covariates after the exclusion of the multicollinear ones, and covariates chosen by expert knowledge. Prediction was performed with the following models: decision tree, random forest, multiple logistic regression, and support vector machine. The accuracy of the models was evaluated by the kappa index (K), general accuracy (GA), and class accuracy. The prediction models were sensitive to the disproportionate sampling of soil classes. The best predicted map achieved a GA of 71% and K of 0.59. The use of the covariate set chosen by expert knowledge improves model performance in predicting soil classes in a complex landscape, and random forest is the best model for the spatial prediction of soil classes.


Sign in / Sign up

Export Citation Format

Share Document