ANALISIS DAN PENERAPAN ALGORITHMA C45 DALAM DATA MINING UNTUK MENUNJANG STRATEGI PROMOSI PRODI INFORMATIKA UPGRIS

ABSTRAK Untuk menarik minat pendaftar mahasiswa baru memerlukan strategi khusus. Salah satu strategi adalah dengan melakukan analisa data dengan tujuan mengubah kumpulan data menjadi memiliki nilai bisnis melalui laporan analitik sehingga menghasilkan informasi yang akan diambil polanya menjadi pengetahuan [Kusrini, 2009]. Teknik klasifikasi merupakan pendekatan fungsi klasifikasi dalam data mining yang digunakan untuk melakukan prediksi atas informasi yang belum diketahui sebelumnya[Larose, 2005]. Pohon keputusan merupakan metode klasifikasi dan prediksi. pada penelitian ini algorithma yang dipakai untuk pembentukan pohon keputusan dengan mengunakan algoritma C45[Larose, 2005]. Data yang diproses adalah data mahasiswa baru angkatan 2014 dan angkatan 2015. Hasil penelitian ini menunjukkan bahwa variabel yang paling tinggi pengaruhnya terhadap hasil registrasi mahasiswa adalah Asal Sekolah dan Jenis Kelamin. Rata-rata berasal dari Semarang dengan jurusan SMU dari IPA dan yang berasal dari luar kota rata-rata berasal dari Batang dan Pati. Dari SMU jurusan IPS dan berjenis kelamin Laki-laki berasal dari Batang dan yang berjenis kelamin Perempuan berasal dari Pati.. Accuracy dari pembenukan model ini adalah sebesar 89.33 % (Good Classification). ABSTRACT To attract new student applicants requires a special strategy. One strategy is to perform data analysis with the aim of converting the data set to have business value through analytic reports so that the information will be taken into the pattern of knowledge [Kusrini, 2009]. The classification technique is an approximate classification function in data mining used to predict information previously unknown [Larose, 2005]. Decision tree is a method of classification and prediction. in this study the algorithm used for the formation of decision trees using the C45 algorithm [Larose, 2005]. Processed data are new student data of class of 2014 and class of 2015. The result of this research indicates that the variable that has the highest effect on student registration result is School Origin and Gender. The average comes from Semarang with high school majors from IPA and those coming from out of town on average come from Batang and Pati. Of SMU majoring in IPS and Male sex comes from the stem and the female sex is derived from Pati .. Accuracy of this model is 89.33% (Good Classification).

Download Full-text

Social Media Data using Various Classification Algorithms in Datamaning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l1145.10812s19 ◽

2019 ◽

Vol 8 (12S) ◽

pp. 588-589

Keyword(s):

Data Mining ◽

Social Media ◽

Training Data ◽

Classification Algorithms ◽

Data Set ◽

Social Media Data ◽

The Past ◽

Classification Technique ◽

Mathematical Techniques ◽

Media Data

Data Mining is one of the most successful domains in research. It describes the past and speculates the future for analysis. There are several techniques used in data mining. Among them classification is one of the main data mining techniques based on machine learning. In classification technique data set is classified into predefined set of groups or classes. Mathematical techniques such as decision tree, linear regression, neural networks and statistics are used for classification methods. Classification is a problem to identify which set of categories the new observation belongs to using training data set. This paper analyses the data taken from social media and uses the classification algorithm for making a comparative study on social advertisement using python.

Download Full-text

Territorial Distribution of Alcohol and Drug Addictions Mortality Concerning Regional Disparities in the Slovak Republic from Year 1996 to Year 2015

10.35198/01-2019-003-0005 ◽

2020 ◽

Keyword(s):

Mortality Rate ◽

Slovak Republic ◽

Regional Disparities ◽

Point Of View ◽

Extreme Position ◽

Data Set ◽

Female Sex ◽

Male Sex ◽

The Individual ◽

High Level

BACKGROUND: This paper deals with territorial distribution of the alcohol and drug addictions mortality at a level of the districts of the Slovak Republic. AIM: The aim of the paper is to explore the relations within the administrative territorial division of the Slovak Republic, that is, between the individual districts and hence, to reveal possibly hidden relation in alcohol and drug mortality. METHODS: The analysis is divided and executed into the two fragments – one belongs to the female sex, the other one belongs to the male sex. The standardised mortality rate is computed according to a sequence of the mathematical relations. The Euclidean distance is employed to compute the similarity within each pair of a whole data set. The cluster analysis examines is performed. The clusters are created by means of the mutual distances of the districts. The data is collected from the database of the Statistical Office of the Slovak Republic for all the districts of the Slovak Republic. The covered time span begins in the year 1996 and ends in the year 2015. RESULTS: The most substantial point is that the Slovak Republic possesses the regional disparities in a field of mortality expressed by the standardised mortality rate computed particularly for the diagnoses assigned to the alcohol and drug addictions at a considerably high level. However, the female sex and the male sex have the different outcome. The Bratislava III District keeps absolutely the most extreme position. It forms an own cluster for the both sexes too. The Topoľčany District bears a similar extreme position from a point of view of the male sex. All the Bratislava districts keep their mutual notable dissimilarity. Contrariwise, evaluation of a development of the regional disparities among the districts looks like notably heterogeneously. CONCLUSIONS: There are considerable regional discrepancies throughout the districts of the Slovak Republic. Hence, it is necessary to create a common platform how to proceed with the solution of this issue.

Download Full-text

Description of multimorbidity clusters of admitted patients in medical departments of a general hospital

Postgraduate Medical Journal ◽

10.1136/postgradmedj-2020-139361 ◽

2021 ◽

pp. postgradmedj-2020-139361

Author(s):

María Matesanz-Fernández ◽

Teresa Seoane-Pillado ◽

Iria Iñiguez-Vázquez ◽

Roi Suárez-Gil ◽

Sonia Pértega-Díaz ◽

...

Keyword(s):

General Hospital ◽

Multiple Correspondence Analysis ◽

Hypertensive Heart Disease ◽

Chronic Kidney Failure ◽

Hospital Environment ◽

Heart Valve Disease ◽

Malignant Neoplasms ◽

Hospital Data ◽

Data Set ◽

And Gender

ObjectiveWe aim to identify patterns of disease clusters among inpatients of a general hospital and to describe the characteristics and evolution of each group.MethodsWe used two data sets from the CMBD (Conjunto mínimo básico de datos - Minimum Basic Hospital Data Set (MBDS)) of the Lucus Augusti Hospital (Spain), hospitalisations and patients, realising a retrospective cohort study among the 74 220 patients discharged from the Medic Area between 01 January 2000 and 31 December 2015. We created multimorbidity clusters using multiple correspondence analysis.ResultsWe identified five clusters for both gender and age. Cluster 1: alcoholic liver disease, alcoholic dependency syndrome, lung and digestive tract malignant neoplasms (age under 50 years). Cluster 2: large intestine, prostate, breast and other malignant neoplasms, lymphoma and myeloma (age over 70, mostly males). Cluster 3: malnutrition, Parkinson disease and other mobility disorders, dementia and other mental health conditions (age over 80 years and mostly women). Cluster 4: atrial fibrillation/flutter, cardiac failure, chronic kidney failure and heart valve disease (age between 70–80 and mostly women). Cluster 5: hypertension/hypertensive heart disease, type 2 diabetes mellitus, ischaemic cardiomyopathy, dyslipidaemia, obesity and sleep apnea, including mostly men (age range 60–80). We assessed significant differences among the clusters when gender, age, number of chronic pathologies, number of rehospitalisations and mortality during the hospitalisation were assessed (p<0001 in all cases).ConclusionsWe identify for the first time in a hospital environment five clusters of disease combinations among the inpatients. These clusters contain several high-incidence diseases related to both age and gender that express their own evolution and clinical characteristics over time.

Download Full-text

Rationale for Timing of Follow-Up Visits to Assess Gluten-Free Diet in Celiac Disease Patients Based on Data Mining

Nutrients ◽

10.3390/nu13020357 ◽

2021 ◽

Vol 13 (2) ◽

pp. 357

Author(s):

Alfonso Rodríguez-Herrera ◽

Joaquín Reyes-Andrade ◽

Cristina Rubio-Escudero

Keyword(s):

Data Mining ◽

Celiac Disease ◽

Cultural Context ◽

Gluten Free Diet ◽

Gluten Free ◽

Adherence To Diet ◽

And Gender ◽

Few Data

The assessment of compliance of gluten-free diet (GFD) is a keystone in the supervision of celiac disease (CD) patients. Few data are available documenting evidence-based follow-up frequency for CD patients. In this work we aim at creating a criterion for timing of clinical follow-up for CD patients using data mining. We have applied data mining to a dataset with 188 CD patients on GFD (75% of them are children below 14 years old), evaluating the presence of gluten immunogenic peptides (GIP) in stools as an adherence to diet marker. The variables considered are gender, age, years following GFD and adherence to the GFD by fecal GIP. The results identify patients on GFD for more than two years (41.5% of the patients) as more prone to poor compliance and so needing more frequent follow-up than patients with less than 2 years on GFD. This is against the usual clinical practice of following less patients on long term GFD, as they are supposed to perform better. Our results support different timing follow-up frequency taking into consideration the number of years on GFD, age and gender. Patients on long term GFD should have a more frequent monitoring as they show a higher level of gluten exposure. A gender perspective should also be considered as non-compliance is partially linked to gender in our results: Males tend to get more gluten exposure, at least in the cultural context where our study was carried out. Children tend to perform better than teenagers or adults.

Download Full-text

WHY DON’T OLDER ADULTS USE SENIOR CENTERS? EVIDENCE FROM A SAMPLE OF MASSACHUSETTS ADULTS AGE 50 AND OLDER

Innovation in Aging ◽

10.1093/geroni/igz038.1845 ◽

2019 ◽

Vol 3 (Supplement_1) ◽

pp. S498-S498

Author(s):

Ceara Somerville ◽

Nidya Velasco Roldan ◽

Cindy N Bui ◽

Caitlin E Coyle

Keyword(s):

Older Adults ◽

Age Groups ◽

Community Dwelling ◽

Senior Center ◽

Community Resource ◽

Senior Centers ◽

Data Set ◽

Younger Age ◽

And Gender ◽

Vast Range

Abstract Senior centers are an integral community resource, providing programs and services intended to meet the vast range of needs and interests of older adults. There is a growing literature describing senior center participants and benefits to participation, but little is known about those who choose not to participate at a local senior center. This presentation uniquely characterizes non-users of senior centers, based on a sample of community-dwelling adults aged 50+ from seven communities in Massachusetts (N = 9,462). To date, this is the largest data set that describes senior center usage. Most of the sample were women (60%) and in the 60-69 age group (36%). More than three quarters of the sample do not use the local senior center (77%). The most common reasons for non-usage were lack of interest (27%) and not feeling old enough (26%). There are significant differences in reasons of non-usage among age groups and gender (p < .001). Younger age groups’ (50-69) most popular reasons for non-usage were not feeling old enough, not having time, inconvenient senior center hours, and not knowing what is offered. In contrast, older age groups (80+) more frequently reported having no interest or using programs elsewhere. Men were more likely to report not being interested and not being familiar with what is offered. Women were more likely to report not having time, inconvenient hours of programming, and using programs elsewhere. Based on results from this study, this presentation will outline implications for the future of senior centers and their programming.

Download Full-text

Characterization of Road Condition with Data Mining Based on Measured Kinematic Vehicle Parameters

Journal of Advanced Transportation ◽

10.1155/2018/8647607 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Johannes Masino ◽

Jakob Thumm ◽

Guillaume Levasseur ◽

Michael Frey ◽

Frank Gauterin ◽

...

Keyword(s):

Data Mining ◽

Support Vector ◽

Matlab Toolbox ◽

Data Set ◽

The Road ◽

Acceleration Sensors ◽

Road Surfaces ◽

Road Condition ◽

Sensor Signals

This work aims at classifying the road condition with data mining methods using simple acceleration sensors and gyroscopes installed in vehicles. Two classifiers are developed with a support vector machine (SVM) to distinguish between different types of road surfaces, such as asphalt and concrete, and obstacles, such as potholes or railway crossings. From the sensor signals, frequency-based features are extracted, evaluated automatically with MANOVA. The selected features and their meaning to predict the classes are discussed. The best features are used for designing the classifiers. Finally, the methods, which are developed and applied in this work, are implemented in a Matlab toolbox with a graphical user interface. The toolbox visualizes the classification results on maps, thus enabling manual verification of the results. The accuracy of the cross-validation of classifying obstacles yields 81.0% on average and of classifying road material 96.1% on average. The results are discussed on a comprehensive exemplary data set.

Download Full-text

Vegetable price prediction using data mining classification technique

International Conference on Pattern Recognition, Informatics and Medical Engineering (PRIME-2012) ◽

10.1109/icprime.2012.6208294 ◽

2012 ◽

Cited By ~ 4

Author(s):

G. M. Nasira ◽

N. Hemageetha

Keyword(s):

Data Mining ◽

Price Prediction ◽

Classification Technique ◽

Using Data

Download Full-text

Sexism and Gender Inequality Across 57 Societies

Psychological Science ◽

10.1177/0956797611420445 ◽

2011 ◽

Vol 22 (11) ◽

pp. 1413-1418 ◽

Cited By ~ 76

Author(s):

Mark J. Brandt

Keyword(s):

Longitudinal Data ◽

Multilevel Modeling ◽

Gender Inequality ◽

Past Research ◽

Status Quo ◽

Data Set ◽

Representative Data ◽

Gender Hierarchy ◽

And Gender

Theory predicts that individuals’ sexism serves to exacerbate inequality in their society’s gender hierarchy. Past research, however, has provided only correlational evidence to support this hypothesis. In this study, I analyzed a large longitudinal data set that included representative data from 57 societies. Multilevel modeling showed that sexism directly predicted increases in gender inequality. This study provides the first evidence that sexist ideologies can create gender inequality within societies, and this finding suggests that sexism not only legitimizes the societal status quo, but also actively enhances the severity of the gender hierarchy. Three potential mechanisms for this effect are discussed briefly.

Download Full-text

A Survey on Major Classification Algorithms and Comparative Analysis of Few Classification Algorithms on Contact Lenses Data Set Using Data Mining Tool

New Trends in Computational Vision and Bio-inspired Computing ◽

10.1007/978-3-030-41862-5_121 ◽

2020 ◽

pp. 1201-1209

Author(s):

Syed Nawaz Pasha ◽

D. Ramesh ◽

Mohammad Sallauddin

Keyword(s):

Data Mining ◽

Comparative Analysis ◽

Contact Lenses ◽

Classification Algorithms ◽

Data Set ◽

Data Mining Tool ◽

Mining Tool ◽

Using Data

Download Full-text

Failure Analysis in University and Computer Science Contexts With Data Mining

10.5753/wei.2020.11132 ◽

2020 ◽

Author(s):

Daniela De Souza Gomes ◽

Marcos Henrique Fonseca Ribeiro ◽

Giovanni Ventorim Comarela ◽

Gabriel Philippe Pereira

Keyword(s):

Data Mining ◽

Decision Making ◽

Failure Analysis ◽

Computer Science ◽

Educational Administration ◽

Intelligent Systems ◽

Data Set ◽

Data Mining Techniques ◽

Study Case ◽

Support Students

High failure rates are a worrying and relevant problem in Brazilian universities. From a data set of student transcripts, we performed a study case for both general and Computer Science contexts, in which Data Mining Techniques were used to find patterns concerning failures. The knowledge acquired can be used for better educational administration and also build intelligent systems to support students’ decision making.

Download Full-text