scholarly journals Penerapan dan Implementasi Algoritma CART Dalam Penentuan Kelayakan Penerima Bantuan PKH Di Desa Ngadirejo

2021 ◽  
Vol 7 (1) ◽  
pp. 40
Author(s):  
Agustrika Aribowo ◽  
Rakhmad Kuswandhie ◽  
Yogi Primadasa
Keyword(s):  

Berdasarkan data Badan Pusat Statistuk (BPS) jumlah penduduk miskin di Indonesia pada maret 2019 mencapai 25,14 juta jiwa atau sekitar 9,41% . Dalam menanggulangi keadaan tersebut pemerintah indonesia telah membuat program–program bantuan sosial salah satunya adalah Program Keluarga Harapan (PKH). Proses penentuan kelayakan penerima PKH di desa Ngadirejo dilakukan dengan cara bermusyawarah. Musyawarah dilakukan antara pemerintah desa dan tokoh-tokoh masyarakat setempat. Untuk membantu pihak pemerintah desa Ngadirejo dalam mendapatkan informasi mengenai penerima PKH yang layak secara cepat dan tepat dapat memanfaatkan data mining dengan menggunakan algoritma classification dan regression tree (CART). Dengan algoritma CART ini nantinya mendapatkan pohon keputusan yang mana pohon keputusan tersebut dijadikan rule dalam klasifikasi penentuan kelayakan penerima bantuan PKH di Desa Ngadirejo. Setelah pohon keputusan didapatkan maka peneliti merancang sistemnya yang mana nantinya digunakan oleh pihak Desa Ngadirejo untuk melakukan klasifikasiKata kunci : Data Mining, Klasifikasi, CART, PKH

2018 ◽  
Vol 8 (8) ◽  
pp. 1369 ◽  
Author(s):  
Alireza Arabameri ◽  
Biswajeet Pradhan ◽  
Hamid Reza Pourghasemi ◽  
Khalil Rezaei ◽  
Norman Kerle

Gully erosion triggers land degradation and restricts the use of land. This study assesses the spatial relationship between gully erosion (GE) and geo-environmental variables (GEVs) using Weights-of-Evidence (WoE) Bayes theory, and then applies three data mining methods—Random Forest (RF), boosted regression tree (BRT), and multivariate adaptive regression spline (MARS)—for gully erosion susceptibility mapping (GESM) in the Shahroud watershed, Iran. Gully locations were identified by extensive field surveys, and a total of 172 GE locations were mapped. Twelve gully-related GEVs: Elevation, slope degree, slope aspect, plan curvature, convergence index, topographic wetness index (TWI), lithology, land use/land cover (LU/LC), distance from rivers, distance from roads, drainage density, and NDVI were selected to model GE. The results of variables importance by RF and BRT models indicated that distance from road, elevation, and lithology had the highest effect on GE occurrence. The area under the curve (AUC) and seed cell area index (SCAI) methods were used to validate the three GE maps. The results showed that AUC for the three models varies from 0.911 to 0.927, whereas the RF model had a prediction accuracy of 0.927 as per SCAI values, when compared to the other models. The findings will be of help for planning and developing the studied region.


Water ◽  
2018 ◽  
Vol 10 (10) ◽  
pp. 1405 ◽  
Author(s):  
Seyed Naghibi ◽  
Mehdi Vafakhah ◽  
Hossein Hashemi ◽  
Biswajeet Pradhan ◽  
Seyed Alavi

It is a well-known fact that sustainable development goals are difficult to achieve without a proper water resources management strategy. This study tries to implement some state-of-the-art statistical and data mining models i.e., weights-of-evidence (WoE), boosted regression trees (BRT), and classification and regression tree (CART) to identify suitable areas for artificial recharge through floodwater spreading (FWS). At first, suitable areas for the FWS project were identified in a basin in north-eastern Iran based on the national guidelines and a literature survey. Using the same methodology, an identical number of FWS unsuitable areas were also determined. Afterward, a set of different FWS conditioning factors were selected for modeling FWS suitability. The models were applied using 70% of the suitable and unsuitable locations and validated with the rest of the input data (i.e., 30%). Finally, a receiver operating characteristics (ROC) curve was plotted to compare the produced FWS suitability maps. The findings depicted acceptable performance of the BRT, CART, and WoE for FWS suitability mapping with an area under the ROC curves of 92, 87.5, and 81.6%, respectively. Among the considered variables, transmissivity, distance from rivers, aquifer thickness, and electrical conductivity were determined as the most important contributors in the modeling. FWS suitability maps produced by the proposed method in this study could be used as a guideline for water resource managers to control flood damage and obtain new sources of groundwater. This methodology could be easily replicated to produce FWS suitability maps in other regions with similar hydrogeological conditions.


2021 ◽  
Vol 35 (3) ◽  
pp. 209-215
Author(s):  
Pratibha Verma ◽  
Vineet Kumar Awasthi ◽  
Sanat Kumar Sahu

Data mining techniques are included with Ensemble learning and deep learning for the classification. The methods used for classification are, Single C5.0 Tree (C5.0), Classification and Regression Tree (CART), kernel-based Support Vector Machine (SVM) with linear kernel, ensemble (CART, SVM, C5.0), Neural Network-based Fit single-hidden-layer neural network (NN), Neural Networks with Principal Component Analysis (PCA-NN), deep learning-based H2OBinomialModel-Deeplearning (HBM-DNN) and Enhanced H2OBinomialModel-Deeplearning (EHBM-DNN). In this study, experiments were conducted on pre-processed datasets using R programming and 10-fold cross-validation technique. The findings show that the ensemble model (CART, SVM and C5.0) and EHBM-DNN are more accurate for classification, compared with other methods.


Author(s):  
Marko Robnik-Šikonja

The research in machine learning, data mining, and statistics has provided a number of methods that estimate the usefulness of an attribute (feature) for prediction of the target variable. The estimates of attributes’ utility are subsequently used in various important tasks, e.g., feature subset selection, feature weighting, feature ranking, feature construction, data transformation, decision and regression tree building, data discretization, visualization, and comprehension. These tasks frequently occur in data mining, robotics, and in the construction of intelligent systems in general. A majority of attribute evaluation measures used are myopic in a sense that they estimate the quality of one feature independently of the context of other features. In problems which possibly involve much feature interactions these measures are not appropriate. The measures which are historically based on the Relief algorithm (Kira & Rendell, 1992) take context into account through distance between the instances and are efficient in problems with strong dependencies between attributes.


Author(s):  
Zdravko Pecar ◽  
Ivan Bratko

The aim of this research was to study the performance of 58 Slovenian administrative districts (state government offices at local level), to identify the factors that affect the performance, and how these effects interact. The main idea was to analyze the available statistical data relevant to the performance of the administrative districts with machine learning tools for data mining, and to extract from available data clear relations between various parameters of administrative districts and their performance. The authors introduced the concept of basic unit of administrative service, which enables the measurement of an administrative district’s performance. The main data mining tool used in this study was the method of regression tree induction. This method can handle numeric and discrete data, and has the benefit of providing clear insight into the relations between the parameters in the system, thereby facilitating the interpretation of the results of data mining. The authors investigated various relations between the parameters in their domain, for example, how the performance of an administrative district depends on the trends in the number of applications, employees’ level of professional qualification, etc. In the chapter, they report on a variety of (occasionally surprising) findings extracted from the data, and discuss how these findings can be used to improve decisions in managing administrative districts.


2021 ◽  
Author(s):  
Bohan Zheng

With Internet of Things (IoT) being prevalently adopted in recent years, traditional machine learning and data mining methods can hardly be competent to deal with the complex big data problems if applied alone. However, hybridizing those who have complementary advantages could achieve optimized practical solutions. This work discusses how to solve multivariate regression problems and extract intrinsic knowledge by hybridizing Self-Organizing Maps (SOM) and Regression Trees. A dual-layer SOM map is developed in which the first layer accomplishes unsupervised learning and then regression tree layer performs supervised learning in the second layer to get predictions and extract knowledge. In this framework, SOM neurons serve as kernels with similar training samples mapped so that regression tree could achieve regression locally. In this way, the difficulties of applying and visualizing local regression on high dimensional data are overcome. Further, we provide an automated growing mechanism based on a few stop criteria without adding new parameters. A case study of solving Electrical Vehicle (EV) range anxiety problem is presented and it demonstrates that our proposed hybrid model is quantitatively precise and interpretive. key words: Multivariate Regression, Big Data, Machine Learning, Data Mining, Self-Organizing Maps (SOM), Regression Tree, Electrical Vehicle (EV), Range Estimation, Internet of Things (IoT)


2003 ◽  
Vol 17 (1) ◽  
pp. 109-114 ◽  
Author(s):  
S.A. Gansky

Knowledge Discovery and Data Mining (KDD) have become popular buzzwords. But what exactly is data mining? What are its strengths and limitations? Classic regression, artificial neural network (ANN), and classification and regression tree (CART) models are common KDD tools. Some recent reports ( e.g., Kattan et al., 1998 ) show that ANN and CART models can perform better than classic regression models: CART models excel at covariate interactions, while ANN models excel at nonlinear covariates. Model prediction performance is examined with the use of validation procedures and evaluating concordance, sensitivity, specificity, and likelihood ratio. To aid interpretation, various plots of predicted probabilities are utilized, such as lift charts, receiver operating characteristic curves, and cumulative captured-response plots. A dental caries study is used as an illustrative example. This paper compares the performance of logistic regression with KDD methods of CART and ANN in analyzing data from the Rochester caries study. With careful analysis, such as validation with sufficient sample size and the use of proper competitors, problems of naïve KDD analyses ( Schwarzer et al., 2000 ) can be carefully avoided.


2017 ◽  
Vol 7 (1.2) ◽  
pp. 43 ◽  
Author(s):  
K. Sreenivasa Rao ◽  
N. Swapna ◽  
P. Praveen Kumar

Data Mining is the process of extracting useful information from large sets of data. Data mining enablesthe users to have insights into the data and make useful decisions out of the knowledge mined from databases. The purpose of higher education organizations is to offer superior opportunities to its students. As with data mining, now-a-days Education Data Mining (EDM) also is considered as a powerful tool in the field of education. It portrays an effective method for mining the student’s performance based on various parameters to predict and analyze whether a student (he/she) will be recruited or not in the campus placement. Predictions are made using the machine learning algorithms J48, Naïve Bayes, Random Forest, and Random Tree in weka tool and Multiple Linear Regression, binomial logistic regression, Recursive Partitioning and Regression Tree (rpart), conditional inference tree (ctree) and Neural Network (nnet) algorithms in R studio. The results obtained from each approaches are then compared with respect to their performance and accuracy levels by graphical analysis. Based on the result, higher education organizations can offer superior training to its students.


Sign in / Sign up

Export Citation Format

Share Document