Combining Data Mining Techniques to Analyse Factors Associated with Allocation of Socioeconomic Resources at IFMG

The granting of socioeconomic assistance to students from Federal Education Institutions is one of the ways found to provide finantial support during their studies, focusing primarily on those who are more socially vulnerable. Institutions carry out selection processes to identify students with a profile of demand and appropriately distribute the grants according to the budget available for this purpose. This article applied Data Mining techniques to a set of information from students who applied to receive scholarships at IFMG - Campus Bambuí, seeking to identify the attributes associated with the distribution of benefits and analyzing the adequacy of the current indicator used by the institution to classify the level of social vulnerability of students. The proposed methodology involved combining different machine learning algorithms, such as data classification and feature selection techniques. In addition to identifying the degree of importance of each attribute in the constructed model, the differential of this article is to present well-founded suggestions for new attributes that could be able to improve the index used by the institution and, consequently, optimize the workload of those involved with the analysis of selective processes. The composition of the institution's index with five new attributes resulted in a gain of around 10% in rating performance.

Download Full-text

KLASIFIKASI SMS SPAM MENGGUNAKAN SUPPORT VECTOR MACHINE

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.693 ◽

2019 ◽

Vol 15 (2) ◽

pp. 275-280

Author(s):

Agus Setiyono ◽

Hilman F Pardede

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Spam Detection ◽

Support Vector Machine Algorithm ◽

Data Mining Techniques ◽

To Receive

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam. One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.

Download Full-text

Prediction of Skin Diseases Using Machine Learning

10.4018/978-1-7998-7888-9.ch008 ◽

2022 ◽

pp. 154-178

Author(s):

Siddhartha Kumar Arjaria ◽

Vikas Raj ◽

Sunil Kumar ◽

Priyanshu Shrivastava ◽

Monu Kumar ◽

...

Keyword(s):

Machine Learning ◽

Data Mining ◽

Skin Disease ◽

Skin Diseases ◽

Information Gain ◽

Machine Learning Algorithms ◽

Ensemble Method ◽

Chi Square ◽

Data Mining Techniques ◽

Disease Rates

Skin disease rates have been increasing over the past few decades. It has led to both fatal and non-fatal disabilities all around the world, especially in those areas where medical resources are not good enough. Early diagnosis of skin diseases increases the chances of cure significantly. Therefore, this work is comparing six machine learning algorithms, namely KNN, random forest, neural network, naïve bayes, logistic regression, and SVM, for the prediction of the skin diseases. The information gain, gain ratio, gini decrease, chi-square, and relieff are used to rank the features. This work comprises the introduction, literature review, and proposed methodology parts. In this research paper, a new method of analyzing skin disease has been proposed in which six different data mining techniques are used to develop an ensemble method that integrates all the six data mining techniques as a single one. The ensemble method used on the dermatology dataset gives improved result with 94% accuracy in comparison to other classifier algorithms and hence is more effective in this area.

Download Full-text

Combining Data Mining Techniques for Evolutionary Analysis of Programming Languages

2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI) ◽

10.1109/iri.2019.00015 ◽

2019 ◽

Author(s):

Rafael Jose de Alencar Almeida ◽

Vinicius Humberto Serapilha Durelli ◽

Igor Campos Moraes ◽

Matheus Carvalho Viana ◽

Elverton Carvalho Fazzion ◽

...

Keyword(s):

Data Mining ◽

Programming Languages ◽

Evolutionary Analysis ◽

Data Mining Techniques ◽

Combining Data

Download Full-text

Factores de éxito de un emprendimiento: Un estudio exploratorio con base en técnicas de data mining (Entrepreneurial success factors: An exploratory study based on Data Mining Techniques)

TEC Empresarial ◽

10.18845/te.v9i1.2206 ◽

2015 ◽

Vol 9 (1) ◽

pp. 30 ◽

Cited By ~ 4

Author(s):

María Messina ◽

Esther Hochsztain

Keyword(s):

Data Mining ◽

Logistic Regression ◽

Exploratory Study ◽

Preliminary Analysis ◽

Success Factors ◽

Entrepreneurial Success ◽

Data Mining Techniques ◽

Factors Associated ◽

Main Factors

El Centro de Emprendedurismo CCEEmprende de- sarrolla, desde 2007, un programa de apoyo a emprende- dores. Para mejorar su gestión, resulta de gran importancia analizar, en forma preliminar, los emprendimientos en una de dos categorías: éxito o fracaso. En este artículo se identifican los principales factores asociados al éxito de un emprendimiento y cómo se vincu- lan para anticipar el futuro del emprendimiento. Se presenta un caso de estudio con base en los datos de una encuesta realizada a emprendedores participantes del programa, aplicando técnicas de clasificación. Las dos técnicas utilizadas de data mining son árbol de decisión y regresión logística, en ambas se obtuvieron resultados coincidentes. Los hallazgos muestran que los dos elementos más relevantes para anticipar el éxito de un emprendimiento son contar con financiamiento y que, anteriormente, la situa- ción laboral del emprendedor sea trabajador independiente. Estos primeros resultados obtenidos en el estudio de caso revelan información útil acerca de las mejores formas de apoyo al emprendedor, cómo generar incentivos al em- prendedor y la definición de herramientas o actividades que incidan favorablemente en el éxito de los emprendimientos. Si bien desde la teoría o para otras realidades existe infor- mación sobre los factores que colaboran en la determina- ción del éxito, para la realidad del Uruguay no se identifican estudios similares. Abstract Since 2007, the CCEE Entrepreneurship Centre has developed a supporting program for entrepreneurs. A preliminary analysis to determine if the venture was successful or a failure is made to improve the program’s management . In this article, the authors identify the main factors associated with entrepreneurship’s success, and how they can anticipate entrepreneurship’s performance. The case study is based on a survey data applied to the Entrepreneurship Program participants. The two data mining techniques are decision trees and logistic regression. The results were consistent across both tech- niques. The findings show that the two most important elements to predict entrepreneurship’s success are fun- ding and previous experience as self-employed. The results provided very useful insight about the best ways to support entrepreneurship, how to encoura- ge entrepreneurs, and define tools or activities to impact positively ventures success in Uruguay, since similar stu- dies have not been developed.

Download Full-text

Heart disease prediction using Advanced Machine Learning Algorithms

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35495 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 2160-2163

Author(s):

Minal Shahakar

Keyword(s):

Machine Learning ◽

Data Mining ◽

Heart Disease ◽

Web Application ◽

Intelligent System ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Disease Prediction ◽

End User ◽

Data Mining Techniques

It might have happened so many times that you or someone yours need doctors help immediately, but they are not available due to some reason. The Heart Disease Prediction application is an end user support to the online. Here, we propose a web application that allows users to get instant guidance on their heart disease through an intelligent system online. The application is fed with various details and the heart disease associated with those details. The applications allows user to share their heart related issues. It then processes user specific details to check for various illnesses that could be associated with it. Here we use some intelligent data mining techniques to the most accurate that could be associated with patient‟s details. Based on result, system automatically shows the result specific doctors for further treatment and the system allows user to view doctor‟s details.

Download Full-text

An Integrated Approach for Cancer Survival Prediction Using Data Mining Techniques

Computational Intelligence and Neuroscience ◽

10.1155/2021/6342226 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Ishleen Kaur ◽

M. N. Doja ◽

Tanvir Ahmad ◽

Musheer Ahmad ◽

Amir Hussain ◽

...

Keyword(s):

Data Mining ◽

Ovarian Cancer ◽

Cancer Patients ◽

Missing Values ◽

Life Quality ◽

Advanced Ovarian Cancer ◽

Machine Learning Algorithms ◽

Survival Prediction ◽

Data Mining Techniques ◽

Using Data

Ovarian cancer is the third most common gynecologic cancers worldwide. Advanced ovarian cancer patients bear a significant mortality rate. Survival estimation is essential for clinicians and patients to understand better and tolerate future outcomes. The present study intends to investigate different survival predictors available for cancer prognosis using data mining techniques. Dataset of 140 advanced ovarian cancer patients containing data from different data profiles (clinical, treatment, and overall life quality) has been collected and used to foresee cancer patients’ survival. Attributes from each data profile have been processed accordingly. Clinical data has been prepared corresponding to missing values and outliers. Treatment data including varying time periods were created using sequence mining techniques to identify the treatments given to the patients. And lastly, different comorbidities were combined into a single factor by computing Charlson Comorbidity Index for each patient. After appropriate preprocessing, the integrated dataset is classified using appropriate machine learning algorithms. The proposed integrated model approach gave the highest accuracy of 76.4% using ensemble technique with sequential pattern mining including time intervals of 2 months between treatments. Thus, the treatment sequences and, most importantly, life quality attributes significantly contribute to the survival prediction of cancer patients.

Download Full-text

FEATURE SELECTION FOR OPTIMIZATION OF WAVELET PACKET DECOMPOSITION IN RELIABILITY ANALYSIS OF SYSTEMS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213013600117 ◽

2013 ◽

Vol 22 (05) ◽

pp. 1360011 ◽

Cited By ~ 4

Author(s):

RANDALL WALD ◽

TAGHI M. KHOSHGOFTAAR ◽

JOHN C. SLOAN

Keyword(s):

Data Mining ◽

Feature Selection ◽

Wavelet Packet ◽

Vibration Signal ◽

Machine Learning Algorithms ◽

Wavelet Packet Decomposition ◽

Time Frequency ◽

Speed Up ◽

Frequency Domain Techniques ◽

Feature Selection Techniques

One of the most important types of signal found in the area of machine condition monitoring/prognostic health monitoring (MCM/PHM) is the vibration signal, a type of waveform. Many time-frequency domain techniques have been proposed to interpret such signals, including wavelet packet decomposition (WPD). Previous work has shown how to extend the WPD algorithm to operate on streaming signals, but the number of output variables becomes exponential in the number of levels of decomposition, hindering data mining in limited-memory environments. Feature selection techniques, well understood in other areas of data mining, can be used to greatly reduce the number of output variables and speed up the machine learning algorithms. This paper presents a case study comparing two versions of WPD both with and without feature selection, demonstrating that removing most of the features produced by the WPD does not impair its performance within the context of MCM/PHM.

Download Full-text

Combining Data Mining Techniques to Enhance Cardiac Arrhythmia Detection

Lecture Notes in Computer Science - Computational Science – ICCS 2018 ◽

10.1007/978-3-319-93701-4_24 ◽

2018 ◽

pp. 321-333

Author(s):

Christian Gomes ◽

Alan Cardoso ◽

Thiago Silveira ◽

Diego Dias ◽

Elisa Tuler ◽

...

Keyword(s):

Data Mining ◽

Cardiac Arrhythmia ◽

Arrhythmia Detection ◽

Data Mining Techniques ◽

Combining Data

Download Full-text

Identification of Factors Associated With School Effectiveness With Data Mining Techniques: Testing a New Approach

Frontiers in Psychology ◽

10.3389/fpsyg.2019.02583 ◽

2019 ◽

Vol 10 ◽

Cited By ~ 1

Author(s):

Fernando Martínez-Abad

Keyword(s):

Data Mining ◽

School Effectiveness ◽

New Approach ◽

Data Mining Techniques ◽

Factors Associated

Download Full-text

Differential Demographic and Clinical Characteristics between MMR Vaccinated and Unvaccinated Children in South Korea: A Nationwide Study

Vaccines ◽

10.3390/vaccines9060653 ◽

2021 ◽

Vol 9 (6) ◽

pp. 653

Author(s):

Dongwon Yoon ◽

Juhwan Kim ◽

Juyoung Shin

Keyword(s):

South Korea ◽

Clinical Characteristics ◽

Mmr Vaccine ◽

Targeted Interventions ◽

Factors Associated ◽

Immunization Registry ◽

Combining Data ◽

Measles Outbreaks ◽

To Receive ◽

Information Database

In the context of recent measles outbreaks, substantial factors associated with measles-mumps-rubella (MMR) unvaccination need to be clarified. This study aimed to identify differential demographic and clinical characteristics between MMR vaccinated and unvaccinated groups. We used a large-linked database to identify children born between 2008 and 2016 by combining data from the Korea Immunization Registry Information System and National Health Information database. The MMR vaccination status was ascertained up to the age of 2 to define MMR vaccinated and unvaccinated groups. We conducted a multivariate logistic regression to estimate odds ratios (ORs) with 95% confidence intervals (CIs) to identify factors associated with MMR unvaccination. Of 3,973,253 children, 75,674 (1.9%) did not receive the MMR vaccine. Compared with the MMR vaccinated group, the underutilization of healthcare resources was more notable in the MMR unvaccinated group (number of outpatient visits (5.73 ± 12.1 vs. 25.8 ± 17.06); days hospitalized (1.69 ± 14.5 vs. 2.32 ± 6.90)). Children were less likely to receive the MMR vaccine if they were born with congenital anomaly (OR 2.12; 95% CI 1.90–2.36), were never admitted to an intensive care unit (1.88; 1.78–1.98), or never visited an emergency room (3.57; 3.53–3.72). There were substantial factors associated with MMR unvaccination, underscoring a need to optimize targeted interventions tailored to the subset of children in South Korea.

Download Full-text