Prediction of warning level in aircraft accidents using data mining techniques

Abstract Data mining is a data analysis process which is designed for large amounts of data. It proposes a methodology for evaluating risk and safety and describes the main issues of aircraft accidents. We have a huge amount of knowledge and data collection in aviation companies. This paper focuses on different feature selectwindion techniques applied to the datasets of airline databases to understand and clean the dataset. CFS subset evaluator, consistency subset evaluator, gain ratio feature evaluator, information gain attribute evaluator, OneR attribute evaluator, principal components attribute transformer, ReliefF attribute evaluatoboundar and symmetrical uncertainty attribute evaluator are used in this study in order to reduce the number of initial attributes. The classification algorithms, such as DT, KNN, SVM, NN and NB, are used to predict the warning level of the component as the class attribute. We have explored the use of different classification techniques on aviation components data. For this purpose Weka software tools are used. This study also proves that the principal components attribute with decision tree classifier would perform better than other attributes and techniques on airline data. Accuracy is also very highly improved. This work may be useful for an aviation company to make better predictions. Some safety recommendations are also addressed to airline companies.

Download Full-text

Student Performance Predictions Using Knowledge Discovery Database and Data Mining, DPU Students Records as Sample

Academic Journal of Nawroz University ◽

10.25007/ajnu.v10n3a875 ◽

2021 ◽

Vol 10 (3) ◽

pp. 121-127

Author(s):

Bareen Haval ◽

Karwan Jameel Abdulrahman ◽

Araz Rajab

Keyword(s):

Data Mining ◽

Decision Tree ◽

Student Performance ◽

Educational Data Mining ◽

Data Sets ◽

Decision Tree Classifier ◽

Data Mining Techniques ◽

Academic History ◽

Tree Classifier ◽

Using Data

This article presents the results of connecting an educational data mining techniques to the academic performance of students. Three classification models (Decision Tree, Random Forest and Deep Learning) have been developed to analyze data sets and predict the performance of students. The projected submission of the three classificatory was calculated and matched. The academic history and data of the students from the Office of the Registrar were used to train the models. Our analysis aims to evaluate the results of students using various variables such as the student's grade. Data from (221) students with (9) different attributes were used. The results of this study are very important, provide a better understanding of student success assessments and stress the importance of data mining in education. The main purpose of this study is to show the student successful forecast using data mining techniques to improve academic programs. The results of this research indicate that the Decision Tree classifier overtakes two other classifiers by achieving a total prediction accuracy of 97%.

Download Full-text

Prediction of Black Sigatoka Disease in Banana Plants By Data Mining Classification Techniques using Scikit for Python

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8714.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 1273-1278

Keyword(s):

Data Mining ◽

Decision Tree ◽

Gaussian Process ◽

Linear Models ◽

Control Measures ◽

Decision Tree Classifier ◽

Black Sigatoka ◽

Tree Classifier ◽

Using Data ◽

Black Sigatoka Disease

Agriculture has been evolving since humans started cultivating plants for food consumption. As the agriculture field evolves, the disease control measures too have evolved. Now in this modern era, disease in plants can be easily identified using computers. Data mining is the process of obtaining the useful information from the data. Before the electronic era, diseases in plants are identified just by seeing the symptoms of the plants. Similarly, we can identify the diseases in plants using data mining by supplying the disease symptoms data and classify them accordingly. The purpose of this paper is focusing on the prediction of the diseases from images of black sigatoka disease and uses the following methods: MultilayerPerceptrons, SVM,KNeighborsClassifier,K-NeighborsRegressor, Gaussian Process Regressor, Gaussian Process Classifier, GaussianNB, Decision Tree Classifier, Decision Tree Regressor, linear models such as Linear Regression, RidgeCV, Lasso, ElasticNet, Logistic RegressionCV, SGD Classifier, Perceptron and Passive Aggressive Classifier and ensemble models of the above classifiers. The results are compared, and multilayer perceptron model is seen to give better results for individual classifiers and ensemble of week classifiers gives better results when ensembled. In future, a new hybrid algorithm would be used from the above algorithms for attaining better accuracy. The scikit is a library used for classification, clustering, regression, dimensionality reduction,model selection and preprocessing. Our paper discusses various classifiers used in scikit-learn library for Python and their ensembling is done. This can be applied to all the classification tasks. Classification is done for classifying the black sigatoka disease in banana from healthy leaves.This disease is the most vulnerable one among banana plants.

Download Full-text

Prediction of Heart Disease Using Different Classification Techniques

APTIKOM Journal on Computer Science and Information Technologies ◽

10.11591/aptikom.j.csit.106 ◽

2017 ◽

Vol 2 (2) ◽

pp. 68-76 ◽

Cited By ~ 3

Author(s):

Sonam Nikhar ◽

A.M. Karandikar

Keyword(s):

Data Mining ◽

Heart Disease ◽

Decision Tree ◽

Bayesian Classifier ◽

Decision Tree Classifier ◽

Data Mining Technique ◽

Naive Bayesian ◽

Naïve Bayesian ◽

Tree Classifier ◽

Better Than

Data mining is one of the essential areas of research that is more popular in health organization. Heart disease is the leading cause of death in the world over the past 10 years. The healthcare industry gathers enormous amount of heart disease data which are not “mined” to discover hidden information for effective decision making. This research intends to provide a detailed description of Naïve Bayes, decision tree classifier and Selective Bayesian classifier that are applied in our research particularly in the prediction of Heart Disease. It is known that Naïve Bayesian classifier (NB) works very well on some domains, and poorly on some. The performance of NB suffers in domains that involve correlated features. C4.5 decision trees, on the other hand, typically perform better than the Naïve Bayesian algorithm on such domains. This paper describes a Selective Bayesian classifier (SBC) that simply uses only those features that C4.5 would use in its decision tree when learning a small example of a training set, a combination of the two different natures of classifiers. Experiments conducted on Cleveland datasets indicate that SBC performs reliably better than NB on all domains, and SBC outperforms C4.5 on this dataset of which C4.5 outperform NB. Some experiment has been conducted to compare the execution of predictive data mining technique on the same dataset, and the consequence reveals that Decision Tree outperforms over Bayesian classifier and experiment also reveals that selective Bayesian classifier has a better accuracy as compared to other classifiers.

Download Full-text

Student Academic Performance Prediction using Supervised Learning Techniques

International Journal of Emerging Technologies in Learning (iJET) ◽

10.3991/ijet.v14i14.10310 ◽

2019 ◽

Vol 14 (14) ◽

pp. 92 ◽

Cited By ~ 1

Author(s):

Muhammad Imran ◽

Shahzad Latif ◽

Danish Mehmood ◽

Muhammad Saqlain Shah

Keyword(s):

Data Mining ◽

Supervised Learning ◽

Student Performance ◽

Performance Prediction ◽

Class Imbalance ◽

Ensemble Methods ◽

Fine Tuning ◽

Classification Error ◽

Decision Tree Classifier ◽

Tree Classifier

Automatic Student performance prediction is a crucial job due to the large volume of data in educational databases. This job is being addressed by educational data mining (EDM). EDM develop methods for discovering data that is derived from educational environment. These methods are used for understanding student and their learning environment. The educational institutions are often curious that how many students will be pass/fail for necessary arrangements. In previous studies, it has been observed that many researchers have intension on the selection of appropriate algorithm for just classification and ignores the solutions of the problems which comes during data mining phases such as data high dimensionality ,class imbalance and classification error etc. Such types of problems reduced the accuracy of the model. Several well-known classification algorithms are applied in this domain but this paper proposed a student performance prediction model based on supervised learning decision tree classifier. In addition, an ensemble method is applied to improve the performance of the classifier. Ensemble methods approach is designed to solve classification, predictions problems. This study proves the importance of data preprocessing and algorithms fine-tuning tasks to resolve the data quality issues. The experimental dataset used in this work belongs to Alentejo region of Portugal which is obtained from UCI Machine Learning Repository. Three supervised learning algorithms (J48, NNge and MLP) are employed in this study for experimental purposes. The results showed that J48 achieved highest accuracy 95.78% among others.

Download Full-text

Data Mining: A Bagged Decision Tree Classifier Algorithm For Ids Intrusion Detection System Based Attacks Classification

Design Engineering ◽

10.17762/de.v2021i04.1800 ◽

2021 ◽

pp. 1826-1839

Author(s):

Sandeep Adhikari, Dr. Sunita Chaudhary

Keyword(s):

Data Mining ◽

Intrusion Detection ◽

Decision Tree ◽

Intrusion Detection System ◽

Detection System ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Decision Tree Classifier ◽

Tree Classifier

The exponential growth in the use of computers over networks, as well as the proliferation of applications that operate on different platforms, has drawn attention to network security. This paradigm takes advantage of security flaws in all operating systems that are both technically difficult and costly to fix. As a result, intrusion is used as a key to worldwide a computer resource's credibility, availability, and confidentiality. The Intrusion Detection System (IDS) is critical in detecting network anomalies and attacks. In this paper, the data mining principle is combined with IDS to efficiently and quickly identify important, secret data of interest to the user. The proposed algorithm addresses four issues: data classification, high levels of human interaction, lack of labeled data, and the effectiveness of distributed denial of service attacks. We're also working on a decision tree classifier that has a variety of parameters. The previous algorithm classified IDS up to 90% of the time and was not appropriate for large data sets. Our proposed algorithm was designed to accurately classify large data sets. Aside from that, we quantify a few more decision tree classifier parameters.

Download Full-text

DEVELOPING A PARALLEL CLASSIFIER FOR MINING IN BIG DATA SETS

IIUM Engineering Journal ◽

10.31436/iiumej.v22i2.1541 ◽

2021 ◽

Vol 22 (2) ◽

pp. 119-134

Author(s):

Ahad Shamseen ◽

Morteza Mohammadi Zanjireh ◽

Mahdi Bahaghighat ◽

Qin Xin

Keyword(s):

Data Mining ◽

Big Data ◽

Decision Tree ◽

Main Memory ◽

Experimental Results ◽

Primary Data ◽

Data Sets ◽

Decision Tree Classifier ◽

Vast Amount ◽

Tree Classifier

Data mining is the extraction of information and its roles from a vast amount of data. This topic is one of the most important topics these days. Nowadays, massive amounts of data are generated and stored each day. This data has useful information in different fields that attract programmers’ and engineers’ attention. One of the primary data mining classifying algorithms is the decision tree. Decision tree techniques have several advantages but also present drawbacks. One of its main drawbacks is its need to reside its data in the main memory. SPRINT is one of the decision tree builder classifiers that has proposed a fix for this problem. In this paper, our research developed a new parallel decision tree classifier by working on SPRINT results. Our experimental results show considerable improvements in terms of the runtime and memory requirements compared to the SPRINT classifier. Our proposed classifier algorithm could be implemented in serial and parallel environments and can deal with big data. ABSTRAK: Perlombongan data adalah pengekstrakan maklumat dan peranannya dari sejumlah besar data. Topik ini adalah salah satu topik yang paling penting pada masa ini. Pada masa ini, data yang banyak dihasilkan dan disimpan setiap hari. Data ini mempunyai maklumat berguna dalam pelbagai bidang yang menarik perhatian pengaturcara dan jurutera. Salah satu algoritma pengkelasan perlombongan data utama adalah pokok keputusan. Teknik pokok keputusan mempunyai beberapa kelebihan tetapi kekurangan. Salah satu kelemahan utamanya adalah keperluan menyimpan datanya dalam memori utama. SPRINT adalah salah satu pengelasan pembangun pokok keputusan yang telah mengemukakan untuk masalah ini. Dalam makalah ini, penyelidikan kami sedang mengembangkan pengkelasan pokok keputusan selari baru dengan mengusahakan hasil SPRINT. Hasil percubaan kami menunjukkan peningkatan yang besar dari segi jangka masa dan keperluan memori berbanding dengan pengelasan SPRINT. Algoritma pengklasifikasi yang dicadangkan kami dapat dilaksanakan dalam persekitaran bersiri dan selari dan dapat menangani data besar.

Download Full-text

Predicting Win-Loss outcomes in MLB regular season games – A comparative study using data mining methods

International Journal of Computer Science in Sport ◽

10.1515/ijcss-2016-0007 ◽

2016 ◽

Vol 15 (2) ◽

pp. 91-112 ◽

Cited By ~ 11

Author(s):

C. Soto Valero

Keyword(s):

Data Mining ◽

Cross Validation ◽

Data Contamination ◽

Past Data ◽

Mining Methods ◽

Using Data ◽

New Statistics ◽

Fold Cross Validation ◽

Better Than ◽

Model Approach

Abstract Baseball is a statistically filled sport, and predicting the winner of a particular Major League Baseball (MLB) game is an interesting and challenging task. Up to now, there is no definitive formula for determining what factors will conduct a team to victory, but through the analysis of many years of historical records many trends could emerge. Recent studies concentrated on using and generating new statistics called sabermetrics in order to rank teams and players according to their perceived strengths and consequently applying these rankings to forecast specific games. In this paper, we employ sabermetrics statistics with the purpose of assessing the predictive capabilities of four data mining methods (classification and regression based) for predicting outcomes (win or loss) in MLB regular season games. Our model approach uses only past data when making a prediction, corresponding to ten years of publicly available data. We create a dataset with accumulative sabermetrics statistics for each MLB team during this period for which data contamination is not possible. The inherent difficulties of attempting this specific sports prediction are confirmed using two geometry or topology based measures of data complexity. Results reveal that the classification predictive scheme forecasts game outcomes better than regression scheme, and of the four data mining methods used, SVMs produce the best predictive results with a mean of nearly 60% prediction accuracy for each team. The evaluation of our model is performed using stratified 10-fold cross-validation.

Download Full-text

Applying particle swarm optimization-based decision tree classifier for wart treatment selection

Complex & Intelligent Systems ◽

10.1007/s40747-021-00348-3 ◽

2021 ◽

Author(s):

Junhua Hu ◽

Xiangzhu Ou ◽

Pei Liang ◽

Bo Li

Keyword(s):

Decision Tree ◽

Particle Swarm ◽

Classification And Regression Tree ◽

Particle Swarm Algorithm ◽

Decision Tree Classifier ◽

Tree Model ◽

Proposed Model ◽

Tree Classifier ◽

Cart Algorithm ◽

Better Than

AbstractWart is a disease caused by human papillomavirus with common and plantar warts as general forms. Commonly used methods to treat warts are immunotherapy and cryotherapy. The selection of proper treatment is vital to cure warts. This paper establishes a classification and regression tree (CART) model based on particle swarm optimisation to help patients choose between immunotherapy and cryotherapy. The proposed model can accurately predict the response of patients to the two methods. Using an improved particle swarm algorithm (PSO) to optimise the parameters of the model instead of the traditional pruning algorithm, a more concise and more accurate model is obtained. Two experiments are conducted to verify the feasibility of the proposed model. On the hand, five benchmarks are used to verify the performance of the improved PSO algorithm. On the other hand, the experiment on two wart datasets is conducted. Results show that the proposed model is effective. The proposed method classifies better than k-nearest neighbour, C4.5 and logistic regression. It also performs better than the conventional optimisation method for the CART algorithm. Moreover, the decision tree model established in this study is interpretable and understandable. Therefore, the proposed model can help patients and doctors reduce the medical cost and improve the quality of healing operation.

Download Full-text

Random Forest: A Hybrid Implementation for Sarcasm Detection in Public Opinion Mining

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3758.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 5022-5025

Keyword(s):

Decision Making ◽

Public Opinion ◽

Random Forest ◽

Decision Tree ◽

Opinion Mining ◽

Random Forest Classifier ◽

Decision Tree Classifier ◽

Wrong Decision ◽

Tree Classifier ◽

Better Than

Modelling the sentiment with context is one of the most important part in Sentiment analysis. There are various classifiers which helps in detecting and classifying it. Detection of sentiment with consideration of sarcasm would make it more accurate. But detection of sarcasm in people review is a challenging task and it may lead to wrong decision making or classification if not detected. This paper uses Decision Tree and Random forest classifiers and compares the performance of both. Here we consider the random forest as hybrid decision tree classifier. We propose that performance of random forest classifier is better than any other normal decision tree classifier with appropriate reasoning

Download Full-text

Using T3, an Improved Decision Tree Classifier, for Mining Stroke-related Medical Data

Methods of Information in Medicine ◽

10.1160/me0317 ◽

2007 ◽

Vol 46 (05) ◽

pp. 523-529 ◽

Cited By ~ 8

Author(s):

M. Saraee ◽

B. Theodoulidis ◽

J. A. Keane ◽

C. Tjortjis

Keyword(s):

Data Mining ◽

Decision Tree ◽

Predictive Models ◽

Medical Data ◽

Classification Algorithm ◽

Medical Decision ◽

Classification Error ◽

Decision Tree Classifier ◽

Data Set ◽

Tree Classifier

Summary Objectives: Medical data are a valuable resource from which novel and potentially useful knowledge can be discovered by using data mining. Data mining can assist and support medical decision making and enhance clinical managementand investigative research. The objective of this work is to propose a method for building accurate descriptive and predictive models based on classification of past medical data. We also aim to compare this method with other well established data mining methods and identify strengths and weaknesses. Method: We propose T3, a decision tree classifier which builds predictive models based on known classes, by allowing for a certain amount of misclassification error in training in order to achieve better descriptive and predictive accuracy. We then experiment with a real medical data set on stroke, and various subsets, in order to identify strengths and weaknesses. We also compare performance with a very successful and well established decision tree classifier. Results: T3 demonstrated impressive performance when predicting unseen cases of stroke resulting in as little as 0.4% classification error while the state of the art decision tree classifier resulted in 33.6% classification error respectively. Conclusions: This paper presents and evaluates T3, a classification algorithm that builds decision trees of depth at most three, and results in high accuracy whilst keeping the tree size reasonably small. T3 demonstrates strong descriptive and predictive power without compromising simplicity and clarity. We evaluate T3 based on real stroke register data and compare it with C4.5, a well-known classification algorithm, showing that T3 produces significantly more accurate and readable classifiers.

Download Full-text