A comparative Analysis of Multiple Regression in Data Mining

Penelitian ini bertujuan untuk membuat prediksi prestasi belajar siswa berdasarkan status sosial ekonomi orang tua, motivasi, kedisiplinan siswa dan prestasi masa lalu menggunakan metode data mining dengan algoritma J48. Sebagai perbandingan, data penelitian dianalisis juga dengan CHAID (Chi Squared Automatic Interaction Detection) dan regresi ganda. Pendekatan penelitian yang digunakan adalah kuantitatif. Subyek penelitian ini adalah siswa tingkat X SMK Negeri 4 Surakarta berjumlah 416 siswa. Teknik pengumpulan data yang digunakan adalah dokumentasi dan angket. Hasil penelitian menunjukkan bahwa analisis prediksi menggunakan decision tree algoritma J48 memiliki akurasi sebesar 95,7%, sedangkan analisis prediksi menggunakan CHAID memiliki tingat akurasi 82,1% dan analisis regresi ganda menghasilkan tingkat signifikansi sebesar 90,6%. Berdasarkan hasil tersebut bisa disimpulkan bahwa metode J48 lebih baik dibandingkan dengan metode CHAID dan regresi ganda. DATA MINING TO PREDICT STUDENT’S ACHIEVEMENT BASED ON SOCIO-ECONOMIC, MOTIVATION, DISCIPLINE AND ACHIEVEMENT OF THE PASTAbstractThis study aims to make student achievement prediction based on socio-economic status of parents, motivation, discipline students and past achievements using data mining methods with the J48 algorithm. For comparison, the data were analyzed also with CHAID (Chi Squared Automatic Interaction Detection) and multiple regression. The research approach is quantitative. The subjects of this study were student-first level at SMK Negeri 4 Surakarta totaled 416 students. Data collection techniques used are documentation and questionnaires. The results showed that the predictive analysis using J48 decision tree algorithm has an accuracy of 95.7%, while the predictive analysis using CHAID has the rank of an accuracy of 82.1% and a multiple regression analysis resulted in a significance level of 90.6%. Based on these results it can be concluded that the J48 method is better than the CHAID and multiple regression methods.

Download Full-text

Network Data Mining Application in Earnings Management of Private Holding Enterprise: An Empirical Analysis Based on Multiple Regression Model

International Journal of Database Theory and Application ◽

10.14257/ijdta.2017.10.1.25 ◽

2017 ◽

Vol 10 (1) ◽

pp. 271-284

Author(s):

Hongtao Liu

Keyword(s):

Data Mining ◽

Earnings Management ◽

Regression Model ◽

Multiple Regression ◽

Empirical Analysis ◽

Multiple Regression Model ◽

Network Data ◽

Data Mining Application

Download Full-text

A Comparative Analysis of Data Mining Techniques on Breast Cancer Diagnosis Data using WEKA Toolbox

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2020.0110829 ◽

2020 ◽

Vol 11 (8) ◽

Cited By ~ 1

Author(s):

Majdah Alshammari ◽

Mohammad Mezher

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Comparative Analysis ◽

Cancer Diagnosis ◽

Breast Cancer Diagnosis ◽

Data Mining Techniques

Download Full-text

Distance Based Pattern Driven Mining for Outlier Detection in High Dimensional Big Dataset

ACM Transactions on Management Information Systems ◽

10.1145/3469891 ◽

2022 ◽

Vol 13 (1) ◽

pp. 1-17

Author(s):

Ankit Kumar ◽

Abhishek Kumar ◽

Ali Kashif Bashir ◽

Mamoon Rashid ◽

V. D. Ambeth Kumar ◽

...

Keyword(s):

Data Mining ◽

Comparative Analysis ◽

Outlier Detection ◽

Credit Card ◽

High Dimensional ◽

Work Efficiency ◽

Average Value ◽

Novel Method ◽

Detection Of Outliers ◽

Better Than

Detection of outliers or anomalies is one of the vital issues in pattern-driven data mining. Outlier detection detects the inconsistent behavior of individual objects. It is an important sector in the data mining field with several different applications such as detecting credit card fraud, hacking discovery and discovering criminal activities. It is necessary to develop tools used to uncover the critical information established in the extensive data. This paper investigated a novel method for detecting cluster outliers in a multidimensional dataset, capable of identifying the clusters and outliers for datasets containing noise. The proposed method can detect the groups and outliers left by the clustering process, like instant irregular sets of clusters (C) and outliers (O), to boost the results. The results obtained after applying the algorithm to the dataset improved in terms of several parameters. For the comparative analysis, the accurate average value and the recall value parameters are computed. The accurate average value is 74.05% of the existing COID algorithm, and our proposed algorithm has 77.21%. The average recall value is 81.19% and 89.51% of the existing and proposed algorithm, which shows that the proposed work efficiency is better than the existing COID algorithm.

Download Full-text