Penerapan dan Implementasi Algoritma CART Dalam Penentuan Kelayakan Penerima Bantuan PKH Di Desa Ngadirejo

Agustrika Aribowo; Rakhmad Kuswandhie; Yogi Primadasa

doi:10.31154/cogito.v7i1.293.40-51

Penerapan dan Implementasi Algoritma CART Dalam Penentuan Kelayakan Penerima Bantuan PKH Di Desa Ngadirejo

CogITo Smart Journal ◽

10.31154/cogito.v7i1.293.40-51 ◽

2021 ◽

Vol 7 (1) ◽

pp. 40

Author(s):

Agustrika Aribowo ◽

Rakhmad Kuswandhie ◽

Yogi Primadasa

Keyword(s):

Data Mining ◽

Regression Tree

Berdasarkan data Badan Pusat Statistuk (BPS) jumlah penduduk miskin di Indonesia pada maret 2019 mencapai 25,14 juta jiwa atau sekitar 9,41% . Dalam menanggulangi keadaan tersebut pemerintah indonesia telah membuat program–program bantuan sosial salah satunya adalah Program Keluarga Harapan (PKH). Proses penentuan kelayakan penerima PKH di desa Ngadirejo dilakukan dengan cara bermusyawarah. Musyawarah dilakukan antara pemerintah desa dan tokoh-tokoh masyarakat setempat. Untuk membantu pihak pemerintah desa Ngadirejo dalam mendapatkan informasi mengenai penerima PKH yang layak secara cepat dan tepat dapat memanfaatkan data mining dengan menggunakan algoritma classification dan regression tree (CART). Dengan algoritma CART ini nantinya mendapatkan pohon keputusan yang mana pohon keputusan tersebut dijadikan rule dalam klasifikasi penentuan kelayakan penerima bantuan PKH di Desa Ngadirejo. Setelah pohon keputusan didapatkan maka peneliti merancang sistemnya yang mana nantinya digunakan oleh pihak Desa Ngadirejo untuk melakukan klasifikasiKata kunci : Data Mining, Klasifikasi, CART, PKH

Download Full-text

Spatial Modelling of Gully Erosion Using GIS and R Programing: A Comparison among Three Data Mining Algorithms

Applied Sciences ◽

10.3390/app8081369 ◽

2018 ◽

Vol 8 (8) ◽

pp. 1369 ◽

Cited By ~ 52

Author(s):

Alireza Arabameri ◽

Biswajeet Pradhan ◽

Hamid Reza Pourghasemi ◽

Khalil Rezaei ◽

Norman Kerle

Keyword(s):

Data Mining ◽

Spatial Relationship ◽

Area Under The Curve ◽

Regression Tree ◽

Drainage Density ◽

Gully Erosion ◽

Slope Aspect ◽

Topographic Wetness Index ◽

Boosted Regression Tree ◽

Area Index

Gully erosion triggers land degradation and restricts the use of land. This study assesses the spatial relationship between gully erosion (GE) and geo-environmental variables (GEVs) using Weights-of-Evidence (WoE) Bayes theory, and then applies three data mining methods—Random Forest (RF), boosted regression tree (BRT), and multivariate adaptive regression spline (MARS)—for gully erosion susceptibility mapping (GESM) in the Shahroud watershed, Iran. Gully locations were identified by extensive field surveys, and a total of 172 GE locations were mapped. Twelve gully-related GEVs: Elevation, slope degree, slope aspect, plan curvature, convergence index, topographic wetness index (TWI), lithology, land use/land cover (LU/LC), distance from rivers, distance from roads, drainage density, and NDVI were selected to model GE. The results of variables importance by RF and BRT models indicated that distance from road, elevation, and lithology had the highest effect on GE occurrence. The area under the curve (AUC) and seed cell area index (SCAI) methods were used to validate the three GE maps. The results showed that AUC for the three models varies from 0.911 to 0.927, whereas the RF model had a prediction accuracy of 0.927 as per SCAI values, when compared to the other models. The findings will be of help for planning and developing the studied region.

Download Full-text

Groundwater Augmentation through the Site Selection of Floodwater Spreading Using a Data Mining Approach (Case study: Mashhad Plain, Iran)

Water ◽

10.3390/w10101405 ◽

2018 ◽

Vol 10 (10) ◽

pp. 1405 ◽

Cited By ~ 12

Author(s):

Seyed Naghibi ◽

Mehdi Vafakhah ◽

Hossein Hashemi ◽

Biswajeet Pradhan ◽

Seyed Alavi

Keyword(s):

Data Mining ◽

Regression Tree ◽

Roc Curves ◽

Flood Damage ◽

Boosted Regression Trees ◽

Classification And Regression Tree ◽

Operating Characteristics ◽

National Guidelines ◽

Conditioning Factors ◽

Suitability Maps

It is a well-known fact that sustainable development goals are difficult to achieve without a proper water resources management strategy. This study tries to implement some state-of-the-art statistical and data mining models i.e., weights-of-evidence (WoE), boosted regression trees (BRT), and classification and regression tree (CART) to identify suitable areas for artificial recharge through floodwater spreading (FWS). At first, suitable areas for the FWS project were identified in a basin in north-eastern Iran based on the national guidelines and a literature survey. Using the same methodology, an identical number of FWS unsuitable areas were also determined. Afterward, a set of different FWS conditioning factors were selected for modeling FWS suitability. The models were applied using 70% of the suitable and unsuitable locations and validated with the rest of the input data (i.e., 30%). Finally, a receiver operating characteristics (ROC) curve was plotted to compare the produced FWS suitability maps. The findings depicted acceptable performance of the BRT, CART, and WoE for FWS suitability mapping with an area under the ROC curves of 92, 87.5, and 81.6%, respectively. Among the considered variables, transmissivity, distance from rivers, aquifer thickness, and electrical conductivity were determined as the most important contributors in the modeling. FWS suitability maps produced by the proposed method in this study could be used as a guideline for water resource managers to control flood damage and obtain new sources of groundwater. This methodology could be easily replicated to produce FWS suitability maps in other regions with similar hydrogeological conditions.

Download Full-text

Prediction of Body Weight of Turkish Tazi Dogs using Data Mining Techniques: Classification and Regression Tree (CART) and Multivariate Adaptive Regression Splines (MARS)

Pakistan Journal of Zoology ◽

10.17582/journal.pjz/2018.50.2.575.583 ◽

2018 ◽

Vol 50 (2) ◽

Cited By ~ 4

Author(s):

Senol Celik ◽

Orhan Yilmaz

Keyword(s):

Data Mining ◽

Body Weight ◽

Regression Tree ◽

Multivariate Adaptive Regression Splines ◽

Classification And Regression Tree ◽

Regression Splines ◽

Adaptive Regression ◽

Classification And Regression ◽

Using Data ◽

Adaptive Regression Splines

Download Full-text

A Novel Design of Classification of Coronary Artery Disease Using Deep Learning and Data Mining Algorithms

Revue d intelligence artificielle ◽

10.18280/ria.350304 ◽

2021 ◽

Vol 35 (3) ◽

pp. 209-215

Author(s):

Pratibha Verma ◽

Vineet Kumar Awasthi ◽

Sanat Kumar Sahu

Keyword(s):

Neural Network ◽

Data Mining ◽

Deep Learning ◽

Regression Tree ◽

Principal Component ◽

Classification And Regression Tree ◽

Support Vector ◽

Data Mining Algorithms ◽

R Programming ◽

Hidden Layer

Data mining techniques are included with Ensemble learning and deep learning for the classification. The methods used for classification are, Single C5.0 Tree (C5.0), Classification and Regression Tree (CART), kernel-based Support Vector Machine (SVM) with linear kernel, ensemble (CART, SVM, C5.0), Neural Network-based Fit single-hidden-layer neural network (NN), Neural Networks with Principal Component Analysis (PCA-NN), deep learning-based H2OBinomialModel-Deeplearning (HBM-DNN) and Enhanced H2OBinomialModel-Deeplearning (EHBM-DNN). In this study, experiments were conducted on pre-processed datasets using R programming and 10-fold cross-validation technique. The findings show that the ensemble model (CART, SVM and C5.0) and EHBM-DNN are more accurate for classification, compared with other methods.

Download Full-text

Context-Sensitive Attribute Evaluation

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch052 ◽

2011 ◽

pp. 328-332

Author(s):

Marko Robnik-Šikonja

Keyword(s):

Data Mining ◽

Intelligent Systems ◽

Regression Tree ◽

Feature Weighting ◽

Feature Subset Selection ◽

Context Sensitive ◽

Relief Algorithm ◽

Sensitive Attribute ◽

Attribute Evaluation ◽

Tree Building

The research in machine learning, data mining, and statistics has provided a number of methods that estimate the usefulness of an attribute (feature) for prediction of the target variable. The estimates of attributes’ utility are subsequently used in various important tasks, e.g., feature subset selection, feature weighting, feature ranking, feature construction, data transformation, decision and regression tree building, data discretization, visualization, and comprehension. These tasks frequently occur in data mining, robotics, and in the construction of intelligent systems in general. A majority of attribute evaluation measures used are myopic in a sense that they estimate the quality of one feature independently of the context of other features. In problems which possibly involve much feature interactions these measures are not appropriate. The measures which are historically based on the Relief algorithm (Kira & Rendell, 1992) take context into account through distance between the instances and are efficient in problems with strong dependencies between attributes.

Download Full-text

The Use of Data Mining for Assessing Performance of Administrative Services

Advances in Data Mining and Database Management - Data Mining in Public and Private Sectors ◽

10.4018/978-1-60566-906-9.ch004 ◽

2010 ◽

pp. 67-82

Author(s):

Zdravko Pecar ◽

Ivan Bratko

Keyword(s):

Data Mining ◽

State Government ◽

Local Level ◽

Regression Tree ◽

Main Idea ◽

Basic Unit ◽

Learning Tools ◽

Administrative District ◽

Administrative Services ◽

Use Of Data

The aim of this research was to study the performance of 58 Slovenian administrative districts (state government offices at local level), to identify the factors that affect the performance, and how these effects interact. The main idea was to analyze the available statistical data relevant to the performance of the administrative districts with machine learning tools for data mining, and to extract from available data clear relations between various parameters of administrative districts and their performance. The authors introduced the concept of basic unit of administrative service, which enables the measurement of an administrative district’s performance. The main data mining tool used in this study was the method of regression tree induction. This method can handle numeric and discrete data, and has the benefit of providing clear insight into the relations between the parameters in the system, thereby facilitating the interpretation of the results of data mining. The authors investigated various relations between the parameters in their domain, for example, how the performance of an administrative district depends on the trends in the number of applications, employees’ level of professional qualification, etc. In the chapter, they report on a variety of (occasionally surprising) findings extracted from the data, and discuss how these findings can be used to improve decisions in managing administrative districts.

Download Full-text

Growing self-organizing tree-based kernel smoother for machine learning and data mining

10.32920/ryerson.14645937 ◽

2021 ◽

Author(s):

Bohan Zheng

Keyword(s):

Machine Learning ◽

Data Mining ◽

Big Data ◽

Internet Of Things ◽

Multivariate Regression ◽

Regression Tree ◽

Local Regression ◽

Self Organizing Maps ◽

Electrical Vehicle ◽

Self Organizing

With Internet of Things (IoT) being prevalently adopted in recent years, traditional machine learning and data mining methods can hardly be competent to deal with the complex big data problems if applied alone. However, hybridizing those who have complementary advantages could achieve optimized practical solutions. This work discusses how to solve multivariate regression problems and extract intrinsic knowledge by hybridizing Self-Organizing Maps (SOM) and Regression Trees. A dual-layer SOM map is developed in which the first layer accomplishes unsupervised learning and then regression tree layer performs supervised learning in the second layer to get predictions and extract knowledge. In this framework, SOM neurons serve as kernels with similar training samples mapped so that regression tree could achieve regression locally. In this way, the difficulties of applying and visualizing local regression on high dimensional data are overcome. Further, we provide an automated growing mechanism based on a few stop criteria without adding new parameters. A case study of solving Electrical Vehicle (EV) range anxiety problem is presented and it demonstrates that our proposed hybrid model is quantitatively precise and interpretive. key words: Multivariate Regression, Big Data, Machine Learning, Data Mining, Self-Organizing Maps (SOM), Regression Tree, Electrical Vehicle (EV), Range Estimation, Internet of Things (IoT)

Download Full-text

Dental Data Mining: Potential Pitfalls and Practical Issues

Advances in Dental Research ◽

10.1177/154407370301700125 ◽

2003 ◽

Vol 17 (1) ◽

pp. 109-114 ◽

Cited By ~ 19

Author(s):

S.A. Gansky

Keyword(s):

Data Mining ◽

Operating Characteristic ◽

Regression Tree ◽

Careful Analysis ◽

Classification And Regression Tree ◽

Receiver Operating Characteristic Curves ◽

Ann Models ◽

Artificial Neural Network Ann ◽

Sensitivity Specificity ◽

Better Than

Knowledge Discovery and Data Mining (KDD) have become popular buzzwords. But what exactly is data mining? What are its strengths and limitations? Classic regression, artificial neural network (ANN), and classification and regression tree (CART) models are common KDD tools. Some recent reports ( e.g., Kattan et al., 1998 ) show that ANN and CART models can perform better than classic regression models: CART models excel at covariate interactions, while ANN models excel at nonlinear covariates. Model prediction performance is examined with the use of validation procedures and evaluating concordance, sensitivity, specificity, and likelihood ratio. To aid interpretation, various plots of predicted probabilities are utilized, such as lift charts, receiver operating characteristic curves, and cumulative captured-response plots. A dental caries study is used as an illustrative example. This paper compares the performance of logistic regression with KDD methods of CART and ANN in analyzing data from the Rochester caries study. With careful analysis, such as validation with sufficient sample size and the use of proper competitors, problems of naïve KDD analyses ( Schwarzer et al., 2000 ) can be carefully avoided.

Download Full-text

Creating an Educational Roadmap for Engineering Students via an Optimal and Iterative Yearly Regression Tree using Data Mining

Proceedings of the International Conference on Knowledge Engineering and Ontology Development ◽

10.5220/0004130300430052 ◽

2012 ◽

Keyword(s):

Data Mining ◽

Engineering Students ◽

Regression Tree ◽

Using Data

Download Full-text

Educational data mining for student placement prediction using machine learning algorithms

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.2.8988 ◽

2017 ◽

Vol 7 (1.2) ◽

pp. 43 ◽

Cited By ~ 3

Author(s):

K. Sreenivasa Rao ◽

N. Swapna ◽

P. Praveen Kumar

Keyword(s):

Higher Education ◽

Machine Learning ◽

Data Mining ◽

Recursive Partitioning ◽

Learning Algorithms ◽

Educational Data Mining ◽

Regression Tree ◽

Machine Learning Algorithms ◽

Conditional Inference ◽

Higher Education Organizations

Data Mining is the process of extracting useful information from large sets of data. Data mining enablesthe users to have insights into the data and make useful decisions out of the knowledge mined from databases. The purpose of higher education organizations is to offer superior opportunities to its students. As with data mining, now-a-days Education Data Mining (EDM) also is considered as a powerful tool in the field of education. It portrays an effective method for mining the student’s performance based on various parameters to predict and analyze whether a student (he/she) will be recruited or not in the campus placement. Predictions are made using the machine learning algorithms J48, Naïve Bayes, Random Forest, and Random Tree in weka tool and Multiple Linear Regression, binomial logistic regression, Recursive Partitioning and Regression Tree (rpart), conditional inference tree (ctree) and Neural Network (nnet) algorithms in R studio. The results obtained from each approaches are then compared with respect to their performance and accuracy levels by graphical analysis. Based on the result, higher education organizations can offer superior training to its students.

Download Full-text