Identifying factors that influence student failure rate using Exhaustive CHAID (Chi-square automatic interaction detection)

Author(s):  
Riasyah Novita ◽  
Mira Kania Sabariah ◽  
Veronikha Effendy
2020 ◽  
Vol 2019 (1) ◽  
pp. 357-367
Author(s):  
Isti Samrotul Hidayati ◽  
I Made Arcana

Metode Chi-squared Automatic Interaction Detection (CHAID) merupakan metode segmentasi berdasarkan hubungan variabel respon dan penjelas menggunakan uji chi-square, yang dalam penerapannya perlu memperhatikan keseimbangan data untuk meminimalkan kesalahan dalam klasifikasi. Salah satu pendekatan yang dapat digunakan pada data yang tidak seimbang adalah metode Synthetic Minority Over-sampling Technique (SMOTE). Dalam penelitian ini, metode CHAID dengan pendekatan SMOTE diterapkan pada Angka Kematian Balita (AKBa) di Kawasan Timur Indonesia (KTI). Tujuannya adalah untuk mengetahui variabel-variabel yang mencirikan kematian balita berdasarkan metode analisis CHAID yang diterapkan dan membandingkannya dengan pendekatan SMOTE. Hasil perbandingan menunjukkan bahwa pendekatan SMOTE lebih baik digunakan dengan nilai sensitivitas sebesar 48,3% dan nilai presisi sebesar 75,9%. Variabel yang signifikan mencirikan kematian balita di KTI adalah berat badan saat lahir, jenis kelahiran, status bekerja ibu dan kekayaan rumah tangga, dengan karakteristik utama adalah balita yang memiliki berat badan lahir rendah dan terlahir kembar.


2018 ◽  
Vol 7 (3) ◽  
pp. 18
Author(s):  
Sunita Mall ◽  
Prasun Ghosh ◽  
Parita Shah

Frauds in insurance are typically where a fraudster tries to gain undue benefit from the insurance contract by ignorance or wilful manipulation. Using the claims data in motor insurance obtained from a Mumbai based insurance company for the time period of 2010-2016, this study focuses on studying the pattern exhibited by those claims which have been rejected and accepted as well. The prime objective of the study is to identify the important or the significant triggers of fraud and predicting the fraudulent behaviour of the customers using the identified triggers in an existing algorithm. This study makes use of statistical techniques like logistic regression & CHAID (Chi Square Automatic Interaction Detection) technique to identify the significant fraud triggers and to determine the probability of rejection & acceptance of each claim coming in future respectively. Data mining techniques like decision tree and confusion matrix are used on the important parameters to find all possible combinations of these significant variables and the bucket for each combination.This study finds that variables like Seats/Tonnage, No Claim Bonus, Type of Vehicle, Gross Written Premium, Sum Insured, Discounts, State Similarity and Previous Insurance details are found to be significant at 1% level of significance. The variables like Branch Code and Risk Types are found to be significant at 5% level of signify cance. The Gain chart depicts that our model is a fairly good model. This research would help the insurance company in settling the legitimate claims within less time and less cost and would also help in identifying the fraudulent claims.


2020 ◽  
Vol 18 (1) ◽  
pp. 29
Author(s):  
Muhammad Rizki ◽  
Muhammad Isnaini Hadiyul Umam ◽  
Muhammad Luthfi Hamzah

Seiring dengan digalakkannya Industrial 4.0, data mining menjadi topik yang hangat untuk bahas dikalangan peneliti. Perkembangan teknologi yang begitu cepat memaksa kita untuk dapat mengambil keputusan dengan cepat pula. Kredit macet menjadi salah satu resiko terbesar lembaga keuangan. Resiko kredit macet ini wajib diminimalisir dengan menganalisa faktor status nasabah berdasarkan data personalnya, sehingga dapat dilakukan klasifikasi berdasarkan  hubungan antar faktor tersebut. Salah satu kunci utama memenangkan persaingan pasar yaitu dengan menentukan target pasar. Data mining menyediakan banyak alat bantu untuk klasifikasi, salah satunya dengan menggunakan metode analisis CHAID (Chi-square Automatic Interaction Detection Analysis). Diagram pohon keputusan yang dihasilan dari Analisis CHAID dapat memberikan informasi tentang derajat hubungan antara variable independent dan dependent, serta informasi tentang karakteristik masing-masing kategori. Dalam hal ini, analisis CHAID digunakan untuk menentukan klasifikasi nasabah berdasarkan status kredit nasabah sebagai variable terikat dan data pribadi nasabah sebagai variable bebas. Dengan menggunakan uji Chi-square, dari total 7 variables independent, hanya 5 variable yang signifikan dengan variable dependent. Variable-variable tersebut adalah variable independent usia, pekerjaan, pendidikan, jangka waktu dan jumlah pinjaman. Berdasarkan hasil analisis CHAID didapatkan empat kelas. Kelas nasabah dengan pekerjaan sebagai (Aparatur Sipil Negara) ASN merupakan kelas yang memiliki resiko kredit macet yang paling minimal.


2008 ◽  
Vol 78 (5) ◽  
pp. 935-940 ◽  
Author(s):  
Davide Mirabella ◽  
Raffaele Spena ◽  
Giovanni Scognamiglio ◽  
Lombardo Luca ◽  
Antonio Gracco ◽  
...  

Abstract Objective: To test the hypothesis that bonding with a blue light-emitting diode (LED) curing unit produces no more failures in adhesive-precoated (APC) orthodontic brackets than bonding carried out by a conventional halogen lamp. Materials and Methods: Sixty-five patients were selected for this randomized clinical trial, in which a total of 1152 stainless steel APC brackets were employed. In order to carry out a valid comparison of the bracket failure rate following use of each type of curing unit, each patient's mouth was divided into four quadrants. In 34 of the randomly selected patients, designated group A, the APC brackets of the right maxillary and left mandibular quadrants were bonded using a halogen light, while the remaining quadrants were treated with an LED curing unit. In the other 31 patients, designated group B, halogen light was used to cure the left maxillary and right mandibular quadrants, whereas the APC brackets in the remaining quadrants were bonded using an LED dental curing light. The bonding date, the type of light used for curing, and the date of any bracket failures over a mean period of 8.9 months were recorded for each bracket and, subsequently, the chi-square test, the Yates-corrected chi-square test, the Fisher exact test, Kaplan-Meier survival estimates, and the log-rank test were employed in statistical analyses of the results. Results: No statistically significant difference in bond failure rate was found between APC brackets bonded with the halogen light-curing unit and those cured with LED light. However, significantly fewer bonding failures were noted in the maxillary arch (1.67%) than in the mandibular arch (4.35%) after each light-curing technique. Conclusions: The hypothesis cannot be rejected since use of an LED curing unit produces similar APC bracket failure rates to use of conventional halogen light, with the advantage of a far shorter curing time (10 seconds).


1982 ◽  
Vol 19 (4) ◽  
pp. 461-471 ◽  
Author(s):  
Jay Magidson

Examples of some common pitfalls in the analysis of categorical data are discussed in the context of causal interpretation of the results. Though no statistical technique can replace theory, the author shows that log-linear modeling and chi square automatic interaction detection can provide researchers with powerful tools for gaining valuable causal insights into their data. Examples include the biasing effects of omitted variables, omitted interactions, improper contrast coding, and misspecification of the structure of an hypothesized interaction.


Author(s):  
Benjamin H. Cottrell ◽  
In-Kyu Lim

This paper discusses the process used to develop a safety improvement plan for unsignalized intersections using systemic low-cost countermeasures. The scope of this project focused on unsignalized intersections with stop sign control on the minor approaches. The first objective was to perform an assessment of Virginia’s unsignalized intersection crashes over a five-year period to determine predominant crash trends and collision types to target for treatment. The four focus collision types with the highest frequency of crashes and the greatest potential reduction in crashes were 3-leg angle, 3-leg fixed object off the road, 4-leg angle and 4-leg rear end. Chi-square automatic interaction detection decision tree analysis was used to perform a systemic analysis to identify a group of intersections associated with potential risk factors related to the focus collision types. A tiered list of systemic countermeasures to deploy was developed. The countermeasures were intended to warn of the stop ahead, make the stop sign and stop location more visible on a minor street, and to warn of the intersection ahead on a major street. The potential for safety improvement measure was used to prioritize the candidate treatment intersections. Before deployment, a study of the intersection by district traffic engineering staff was planned to finalize the plan. The output from the research was a safety improvement plan to systemically deploy treatments to unsignalized intersections as part of the safety program.


2020 ◽  
Vol 36 (3) ◽  
pp. 503-511
Author(s):  
Fermín Torrano Montalvo ◽  
Iván Fernández-Suárez ◽  
María Botey

El propósito de esta investigación es analizar las relaciones entre las condiciones de contratación y el absentismo laboral en una muestra de 5524 trabajadores, con el fin de identificar qué segmentos (por tipo de contrato y jornada, tiempo contratado, antigüedad en la empresa y bajas por enfermedad ocurridas en los tres últimos años) están más relacionados con la posibilidad de sufrir un proceso de enfermedad en el año 2017. Se realizaron análisis descriptivos, la prueba chi-cuadrado para tablas de contingencia con dos muestras independientes y los árboles de decisión, basados en el algoritmo CHAID (Chi-squared Automatic Interaction Detection), para detectar las variables más importantes en la identificación de perfiles con una mayor probabilidad de sufrir una incapacidad temporal. Los resultados ponen de manifiesto la existencia de diferencias entre las variables estudiadas. Se considera la modalidad de contratación un factor de riesgo importante del absentismo laboral. The purpose of this research is to analyze the relationship between hiring conditions and work absenteeism in a sample of 5.524 workers in order to identify which segments (by type of contract and workday, time hired, seniority in the company and sick leaves occurred in the last three years) are more related to the possibility of suffering a disease process in 2017. Descriptive analyzes, the chi-square test for contingency tables with two independent samples, and the decision trees based on the CHAID algorithm (Chi-squared Automatic Interaction Detection) were carried out to detect the most important variables in the identification of profiles with a greater probability of suffering a temporary disability. The results show the existence of differences between the variables studied. The hiring modality is considered an important risk factor for work absenteeism.


2019 ◽  
Vol 90 (8) ◽  
pp. 834-846 ◽  
Author(s):  
Momen A. Atieh ◽  
Ju Keat Pang ◽  
Kylie Lian ◽  
Stephanie Wong ◽  
Andrew Tawse‐Smith ◽  
...  

2014 ◽  
Vol 30 (2) ◽  
pp. 311-334 ◽  
Author(s):  
Celeste Stone ◽  
Leslie Scott ◽  
Danielle Battle ◽  
Patricia Maher

Abstract Many longitudinal and follow-up studies face a common challenge: locating study participants. This study examines the extent to which a geographically dispersed subsample of participants can be relocated after 37 to 51 years of noncontact. Relying mostly on commercially available databases and administrative records, the 2011-12 Project Talent Follow-up Pilot Study (PTPS12) located nearly 85 percent of the original sample members, many of whom had not participated in the study since 1960. This study uses data collected in the base year to examine which subpopulations were the hardest to find after this extended hiatus. The results indicate that females were located at significantly lower rates than males. As expected, sample members with lower cognitive abilities were among the hardest-to-reach subpopulations. We next evaluate the extent to which biases introduced during the tracking phase can be minimized by using the multivariate chi-square automatic interaction detection (CHAID) technique to calculate tracking loss adjustments. Unlike a 1995 study that found that these adjustments reduced statistical biases among its sample of located females, our results suggest that statistical adjustments were not as effective in PTPS12, where many participants had not been contacted in nearly 50 years and the tracking rates varied so greatly across subgroups.


2018 ◽  
Vol 147 ◽  
pp. 02007 ◽  
Author(s):  
Darwin ◽  
Benecditus Kombaitan ◽  
Gatot Yudoko ◽  
Heru Purboyo

Floods in Bandung area often occur when the rainfall is high then the water volume exceed the capacity of Citarum watershed. Floods cause economic and social losses. The purpose of this research is to get the GIS application model in the estimation of puddle area and road network in Bandung Metropolitan Area has disturbed.Geospatial map preparation methodology used statistical data from 11041 flood points, which divided into two groups, 7729 flood points to estimate the decision tree model and 3312 flood points to validate the model. The process of making flood vulnerability maps is approached by Chi-square Automatic Interaction Detection (CHAID) method, and validation using Receiver Operating Characteristic (ROC) method. Validation results in the area under the curve with a value of 93.1% for success rate and 92.7% for the prediction level.Chaid result is class 0 - 0,047 covering 76,68% area; Grades of 0.047-0.307 include 5.37%; Grades 0.307 - 0.599 (Low) covering 5.36%; Grades 0.599 to 0.4444 include 5.31% and grade 0.844-1 (high) covering 7.27% of the research area. Flood-prone road network is Link from Rancaekek (Area of PT Kahatex), link from Solokan Jeruk (Cicalengka-Majalaya), Link Baleendah, and linkDayeuhkolot (M.Toha - Andir)


Sign in / Sign up

Export Citation Format

Share Document