Educational Data Classification and prediction using Data Mining Algorithms

Data Mining is the process of extraction interesting patterns from huge data sets and converts the patterns into logical structure for further Analysis. Predictive Modeling processes that make use of data mining, Machine learning and probability methods to forecast. Engineering is the most widely accepted stream of education in India. Students are uncertain about which department to join in engineering. It is important to improve the individual performance and help the students make the perfect choice regarding the department. In this paper, the hidden information from the previously recorded enrollment details during admission process is used to solve the students’ uncertainty in their choice of department. In addition to this, the performance of alumnae also needs to be analyzed by the teachers to have a clear idea about the future of existing students. Our main goal is to unravel these problems using predictive Modeling. Here, we are focusing on three classification algorithms namely, support vector machine, Random Forest and Naïve Bayes. Data has been collected, normalized and applied to the three different classification algorithms, from which the best model is formulated using various parameters of evaluation. In this paper, we present our approach towards implementing the best model which is built based on the profession of parents, demographic features, type of location of the student and correlation between high school and higher secondary examinations. The Result of this research work shows that Random forest is efficient for the data set used when compared to the other two Classification algorithms.

Download Full-text

Empirical study on the effect of using synthetic attributes on classification algorithms

International Journal of Intelligent Computing and Cybernetics ◽

10.1108/ijicc-08-2016-0029 ◽

2017 ◽

Vol 10 (2) ◽

pp. 111-129 ◽

Cited By ~ 2

Author(s):

Ali Hasan Alsaffar

Keyword(s):

Data Mining ◽

Empirical Study ◽

Student Performance ◽

Software Tool ◽

Real Data ◽

Support Vector ◽

Classification Algorithms ◽

Past Performance ◽

Data Set ◽

Content Type

Purpose The purpose of this paper is to present an empirical study on the effect of two synthetic attributes to popular classification algorithms on data originating from student transcripts. The attributes represent past performance achievements in a course, which are defined as global performance (GP) and local performance (LP). GP of a course is an aggregated performance achieved by all students who have taken this course, and LP of a course is an aggregated performance achieved in the prerequisite courses by the student taking the course. Design/methodology/approach The paper uses Educational Data Mining techniques to predict student performance in courses, where it identifies the relevant attributes that are the most key influencers for predicting the final grade (performance) and reports the effect of the two suggested attributes on the classification algorithms. As a research paradigm, the paper follows Cross-Industry Standard Process for Data Mining using RapidMiner Studio software tool. Six classification algorithms are experimented: C4.5 and CART Decision Trees, Naive Bayes, k-neighboring, rule-based induction and support vector machines. Findings The outcomes of the paper show that the synthetic attributes have positively improved the performance of the classification algorithms, and also they have been highly ranked according to their influence to the target variable. Originality/value This paper proposes two synthetic attributes that are integrated into real data set. The key motivation is to improve the quality of the data and make classification algorithms perform better. The paper also presents empirical results showing the effect of these attributes on selected classification algorithms.

Download Full-text

Comparison of Some Classification Algorithms for the Analysis of Students Academic Performance in Educational Data Mining Using Orange

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-1394 ◽

2021 ◽

pp. 318-324

Author(s):

Vanthana V

Keyword(s):

Data Mining ◽

Academic Performance ◽

Random Forest ◽

Educational Data Mining ◽

Evaluation Process ◽

Support Vector ◽

Classification Algorithms ◽

Academic Improvement ◽

Academic Information ◽

Tools And Techniques

In the modern education system, many higher education institutions prefer data mining tools and techniques to analyze the academic improvement of their students. To support that many data mining techniques and tools are available. This paper uses the classification concept to analyze the student’s academic performance. This paper presents the comparison result of five classification algorithms – Decision Tree, Naïve Bayesian, K-Nearest Neighbour, Support Vector Machine and Random Forest which is applied to the data collected from three colleges of Assam, India. The data consists of socio-economic, demographic as well as academic information of three hundred students with twenty-four attributes. The data mining tool used was ORANGE. The internal assessment attribute in the continuous evaluation process makes the highest impact in the final semester results of the students in the dataset. The results showed that Random Forest out performs the other classifiers based on accuracy.

Download Full-text

Diagnosis of Various Thyroid Ailments using Data Mining Classification Techniques

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit195119 ◽

2019 ◽

pp. 131-136

Author(s):

Umar Sidiq ◽

Syed Mutahar Aaqib ◽

Rafi Ahmad Khan

Keyword(s):

Data Mining ◽

Decision Tree ◽

Research Work ◽

Support Vector ◽

Data Sets ◽

Data Mining Technique ◽

K Nearest Neighbors ◽

Data Set ◽

Classification Techniques ◽

Using Data

Classification is one of the most considerable supervised learning data mining technique used to classify predefined data sets the classification is mainly used in healthcare sectors for making decisions, diagnosis system and giving better treatment to the patients. In this work, the data set used is taken from one of recognized lab of Kashmir. The entire research work is to be carried out with ANACONDA3-5.2.0 an open source platform under Windows 10 environment. An experimental study is to be carried out using classification techniques such as k nearest neighbors, Support vector machine, Decision tree and Naïve bayes. The Decision Tree obtained highest accuracy of 98.89% over other classification techniques.

Download Full-text

Prediction of Student Performance using Hybrid Classification

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrted8241.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 6566-6570

Keyword(s):

Data Mining ◽

Academic Performance ◽

Student Performance ◽

Research Work ◽

Educational Data Mining ◽

Classification Algorithm ◽

Classification Algorithms ◽

Data Types ◽

Data Set ◽

Hybrid Classification

Data mining technologies allow collection, storage and processing huge amounts of data and carrying a large variety of data types and samples. Predicting academic performance of student is the most successive research in this era. Previous research work researchers are used different classification algorithm to predict the student performance. There is lot of research work to be taken in the field of educational data mining and big data in education to increase the accuracy of the classification algorithm and predict the academic performance of student. In this research work we used hybrid classification algorithm for predicting the performance of students. Two Popular classification algorithms ID3 and J48 were applied on the data set. To make hybrid classification voting technique is applied using weka machine learning tool. In this work we tested how the hybrid algorithm accurately predicts the student data set. To check the predicted result classification accuracy was computed. This hybrid classification algorithm gives accuracy with 62.67%.

Download Full-text

Improving the Automobile Purchasing Behavior of Customer: Classification Techniques

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b2924.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 2219-2223

Keyword(s):

Data Mining ◽

Research Work ◽

Classification Algorithms ◽

Purchasing Behavior ◽

Data Set ◽

Vehicle Data ◽

Complex Process ◽

Online Shoppers ◽

Automate Detection ◽

Customer Classification

Data mining (DM) is the automate detection of relevant pattern from the database. E-Commerce is a very famous as well as frequently used new technique in the real world applications. DM is an automate detection of relevant patterns from large amount of information repositories. E-Commerce is a Killer-domain for data mining. DM is often a complex process and may require a variety of steps before some results are obtained. To predict behaviors and future trends many tools are available in DM, also allowing the businesses to make proactive pathways for the customer. In this research work, it is taken online shoppers purchasing vehicle data set and find accuracy in terms of its purchasing behavior using some of the classification algorithms. The classification algorithms namely Bayes Net and NavieBayse are utilized for the analysis and a comparative study of both the algorithms are carried out. Finally, the performance of the chosen algorithm is suggested for analyzing the vehicle data set based on the purchasing behavior of the customer and predicts some accuracy.

Download Full-text

Review of Data Mining Techniques Used in Healthcare

Advances in Medical Technologies and Clinical Practice - Diagnostic Applications of Health Intelligence and Surveillance Systems ◽

10.4018/978-1-7998-6527-8.ch001 ◽

2021 ◽

pp. 1-26

Author(s):

Usha Gupta ◽

Kamlesh Sharma

Keyword(s):

Data Mining ◽

Vital Role ◽

Mining Machine ◽

Support Vector ◽

Data Set ◽

Data Mining Techniques ◽

Network Support ◽

Data Mining Algorithms ◽

Clinical Databases ◽

Mining Algorithms

Data mining plays a vital role in converting the medical data like text, image, and graphs into meaningful new data, which helps to take the better decision. In this chapter, an overview of the current research is discussed using the data mining techniques for the finding, analysis, and prediction of various diseases. The focus of this study is to identify the well-performing data mining algorithms used on medical and clinical databases. Multiple algorithms have been identified: text-based mining, association rule-based mining, pattern-based mining, keyword-based mining, machine learning, neural network support vector machine, apriori algorithm, k-means clustering, and natural language. Analyses of the algorithm show that there is no single algorithm or model more suitable for diagnosing or predicting diseases. In some scenarios, some algorithms work very well but not in another data set. There are many examples in clinical or medical research where the combination of different algorithms gives good results.

Download Full-text

Early Detection of Red Palm Weevil, Rhynchophorus ferrugineus (Olivier), Infestation Using Data Mining

Plants ◽

10.3390/plants10010095 ◽

2021 ◽

Vol 10 (1) ◽

pp. 95

Author(s):

Heba Kurdi ◽

Amal Al-Aldawsari ◽

Isra Al-Turaiki ◽

Abdulrahman S. Aldawood

Keyword(s):

Data Mining ◽

Plant Size ◽

Support Vector ◽

Classification Algorithms ◽

Palm Tree ◽

Rhynchophorus Ferrugineus ◽

Red Palm Weevil ◽

Palm Weevil ◽

Using Data ◽

F Measure

In the past 30 years, the red palm weevil (RPW), Rhynchophorus ferrugineus (Olivier), a pest that is highly destructive to all types of palms, has rapidly spread worldwide. However, detecting infestation with the RPW is highly challenging because symptoms are not visible until the death of the palm tree is inevitable. In addition, the use of automated RPW weevil identification tools to predict infestation is complicated by a lack of RPW datasets. In this study, we assessed the capability of 10 state-of-the-art data mining classification algorithms, Naive Bayes (NB), KSTAR, AdaBoost, bagging, PART, J48 Decision tree, multilayer perceptron (MLP), support vector machine (SVM), random forest, and logistic regression, to use plant-size and temperature measurements collected from individual trees to predict RPW infestation in its early stages before significant damage is caused to the tree. The performance of the classification algorithms was evaluated in terms of accuracy, precision, recall, and F-measure using a real RPW dataset. The experimental results showed that infestations with RPW can be predicted with an accuracy up to 93%, precision above 87%, recall equals 100%, and F-measure greater than 93% using data mining. Additionally, we found that temperature and circumference are the most important features for predicting RPW infestation. However, we strongly call for collecting and aggregating more RPW datasets to run more experiments to validate these results and provide more conclusive findings.

Download Full-text

A data mining approach to the diagnosis of failure modes for two serial fastened sandwich composite plates

Journal of Composite Materials ◽

10.1177/0021998316679720 ◽

2016 ◽

Vol 51 (20) ◽

pp. 2853-2862 ◽

Cited By ~ 2

Author(s):

Serkan Ballı

Keyword(s):

Data Mining ◽

Random Forest ◽

Failure Modes ◽

Composite Plates ◽

Study Data ◽

Sandwich Composite ◽

Support Vector ◽

Geometrical Parameters ◽

Mining Methods

The aim of this study is to diagnose and classify the failure modes for two serial fastened sandwich composite plates using data mining techniques. The composite material used in the study was manufactured using glass fiber reinforced layer and aluminum sheets. Obtained results of previous experimental study for sandwich composite plates, which were mechanically fastened with two serial pins or bolts were used for classification of failure modes. Furthermore, experimental data from previous study consists of different geometrical parameters for various applied preload moments as 0 (pinned), 2, 3, 4, and 5 Nm (bolted). In this study, data mining methods were applied by using these geometrical parameters and pinned/bolted joint configurations. Therefore, three geometrical parameters and 100 test data were used for classification by utilizing support vector machine, Naive Bayes, K-Nearest Neighbors, Logistic Regression, and Random Forest methods. According to experiments, Random Forest method achieved better results than others and it was appropriate for diagnosing and classification of the failure modes. Performances of all data mining methods used were discussed in terms of accuracy and error ratios.

Download Full-text

Data Mining Approach to Analyze COVID-19 Clinical Dataset

10.53350/pjmhs211561812 ◽

2021 ◽

Vol 15 (6) ◽

pp. 1812-1819

Author(s):

Azita Yazdani ◽

Ramin Ravangard ◽

Roxana Sharifian

Keyword(s):

Artificial Intelligence ◽

Data Mining ◽

Support Vector Machine ◽

Nearest Neighbor ◽

Clinical Signs ◽

Study Data ◽

Mining Machine ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Mining Approach

The new coronavirus has been spreading since the beginning of 2020 and many efforts have been made to develop vaccines to help patients recover. It is now clear that the world needs a rapid solution to curb the spread of COVID-19 worldwide with non-clinical approaches such as data mining, enhanced intelligence, and other artificial intelligence techniques. These approaches can be effective in reducing the burden on the health care system to provide the best possible way to diagnose and predict the COVID-19 epidemic. In this study, data mining models for early detection of Covid-19 in patients were developed using the epidemiological dataset of patients and individuals suspected of having Covid-19 in Iran. C4.5, support vector machine, Naive Bayes, logistic regression, Random Forest, and k-nearest neighbor algorithm were used directly on the dataset using Rapid miner to develop the models. By receiving clinical signs, this model diagnosis the risk of contracting the COVID-19 virus. Examination of the models in this study has shown that the support vector machine with 93.41% accuracy is more efficient in the diagnosis of patients with COVID-19 pandemic, which is the best model among other developed models. Keywords: COVID-19, Data mining, Machine Learning, Artificial Intelligence, Classification

Download Full-text

Characterization of Road Condition with Data Mining Based on Measured Kinematic Vehicle Parameters

Journal of Advanced Transportation ◽

10.1155/2018/8647607 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Johannes Masino ◽

Jakob Thumm ◽

Guillaume Levasseur ◽

Michael Frey ◽

Frank Gauterin ◽

...

Keyword(s):

Data Mining ◽

Support Vector ◽

Matlab Toolbox ◽

Data Set ◽

The Road ◽

Acceleration Sensors ◽

Road Surfaces ◽

Road Condition ◽

Sensor Signals

This work aims at classifying the road condition with data mining methods using simple acceleration sensors and gyroscopes installed in vehicles. Two classifiers are developed with a support vector machine (SVM) to distinguish between different types of road surfaces, such as asphalt and concrete, and obstacles, such as potholes or railway crossings. From the sensor signals, frequency-based features are extracted, evaluated automatically with MANOVA. The selected features and their meaning to predict the classes are discussed. The best features are used for designing the classifiers. Finally, the methods, which are developed and applied in this work, are implemented in a Matlab toolbox with a graphical user interface. The toolbox visualizes the classification results on maps, thus enabling manual verification of the results. The accuracy of the cross-validation of classifying obstacles yields 81.0% on average and of classifying road material 96.1% on average. The results are discussed on a comprehensive exemplary data set.

Download Full-text