Large-scale data analysis on aviation accident database using different data mining techniques

2016 ◽  
Vol 120 (1234) ◽  
pp. 1849-1866 ◽  
Author(s):  
A.B. Arockia Christopher ◽  
V. Shunmughavel Vivekanandam ◽  
A.B. Antony Anderson ◽  
S. Markkandeyan ◽  
V. Sivakumar

ABSTRACTData mining is an iterative process in which progress is defined by discovery through either automatic or manual methods. A data cleaning procedure is proposed to improve the quality of classification tasks in the knowledge discovery process by taking into account both redundant and conflicting data. The redundancy check is performed on the original dataset and the resultant dataset is preserved. This resultant dataset is then checked for conflicting data and, if any are found, they are corrected and updated on the original aircraft dataset. This updated dataset is then classified using a variety of classifiers such as Bayes, functions, lazy, MISC, rules and decision trees. The performance of the updated datasets on these classifiers is examine, and the result shows a significant improvement in the classification accuracy after redundancy and conflicts are removed. The conflicts after correction are updated in the original dataset, and when the performance of the classifier is evaluated, great improvement is observed. This paper aims to address how data mining techniques can be used to understand complex system accidents in the aviation domain. Decision trees are considered to be the one of the most powerful and popular approaches in knowledge discovery and data mining. The objective is to develop a classification model for aviation risk investigation and reduction using a decision tree induction method that enhances the ability to form decision trees and thereby proves that the classification accuracy of decision trees is greater. Different feature selectors are used in this study in order to reduce the number of initial attributes.

2021 ◽  
pp. 097215092098485
Author(s):  
Sonika Gupta ◽  
Sushil Kumar Mehta

Data mining techniques have proven quite effective not only in detecting financial statement frauds but also in discovering other financial crimes, such as credit card frauds, loan and security frauds, corporate frauds, bank and insurance frauds, etc. Classification of data mining techniques, in recent years, has been accepted as one of the most credible methodologies for the detection of symptoms of financial statement frauds through scanning the published financial statements of companies. The retrieved literature that has used data mining classification techniques can be broadly categorized on the basis of the type of technique applied, as statistical techniques and machine learning techniques. The biggest challenge in executing the classification process using data mining techniques lies in collecting the data sample of fraudulent companies and mapping the sample of fraudulent companies against non-fraudulent companies. In this article, a systematic literature review (SLR) of studies from the area of financial statement fraud detection has been conducted. The review has considered research articles published between 1995 and 2020. Further, a meta-analysis has been performed to establish the effect of data sample mapping of fraudulent companies against non-fraudulent companies on the classification methods through comparing the overall classification accuracy reported in the literature. The retrieved literature indicates that a fraudulent sample can either be equally paired with non-fraudulent sample (1:1 data mapping) or be unequally mapped using 1:many ratio to increase the sample size proportionally. Based on the meta-analysis of the research articles, it can be concluded that machine learning approaches, in comparison to statistical approaches, can achieve better classification accuracy, particularly when the availability of sample data is low. High classification accuracy can be obtained with even a 1:1 mapping data set using machine learning classification approaches.


Author(s):  
Shadi Aljawarneh ◽  
Aurea Anguera ◽  
John William Atwood ◽  
Juan A. Lara ◽  
David Lizcano

AbstractNowadays, large amounts of data are generated in the medical domain. Various physiological signals generated from different organs can be recorded to extract interesting information about patients’ health. The analysis of physiological signals is a hard task that requires the use of specific approaches such as the Knowledge Discovery in Databases process. The application of such process in the domain of medicine has a series of implications and difficulties, especially regarding the application of data mining techniques to data, mainly time series, gathered from medical examinations of patients. The goal of this paper is to describe the lessons learned and the experience gathered by the authors applying data mining techniques to real medical patient data including time series. In this research, we carried out an exhaustive case study working on data from two medical fields: stabilometry (15 professional basketball players, 18 elite ice skaters) and electroencephalography (100 healthy patients, 100 epileptic patients). We applied a previously proposed knowledge discovery framework for classification purpose obtaining good results in terms of classification accuracy (greater than 99% in both fields). The good results obtained in our research are the groundwork for the lessons learned and recommendations made in this position paper that intends to be a guide for experts who have to face similar medical data mining projects.


2014 ◽  
Vol 5 (3) ◽  
pp. 11-28
Author(s):  
Ljiljana Kašćelan ◽  
Vladimir Kašćelan ◽  
Milijana Novović-Burić

This paper has proposed a data mining approach for risk assessment in car insurance. Standard methods imply classification of policies to great number of tariff classes and assessment of risk on basis of them. With application of data mining techniques, it is possible to get functional dependencies between the level of risk and risk factors as well as better results in predictions. On the case study data it has been proved that data mining techniques can, with better accuracy than the standard methods, predict claim sizes and occurrence of claims, and this represents the basis for calculation of net risk premium and risk classification. This paper, also, discusses advantages of data mining methods compared to standard methods for risk assessment in car insurance, as well as the specificities of the obtained results due to small insurance market, such is the one in Montenegro.


2018 ◽  
Vol 7 (2.4) ◽  
pp. 10
Author(s):  
V Mala ◽  
K Meena

Traditional signature based approach fails in detecting advanced malwares like stuxnet, flame, duqu etc. Signature based comparison and correlation are not up to the mark in detecting such attacks. Hence, there is crucial to detect these kinds of attacks as early as possible. In this research, a novel data mining based approach were applied to detect such attacks. The main innovation lies on Misuse signature detection systems based on supervised learning algorithm. In learning phase, labeled examples of network packets systems calls are (gave) provided, on or after which algorithm can learn about the attack which is fast and reliable to known. In order to detect advanced attacks, unsupervised learning methodologies were employed to detect the presence of zero day/ new attacks. The main objective is to review, different intruder detection methods. To study the role of Data Mining techniques used in intruder detection system. Hybrid –classification model is utilized to detect advanced attacks.


The improvement of an information processing and Memory capacity, the vast amount of data is collected for various data analyses purposes. Data mining techniques are used to get knowledgeable information. The process of extraction of data by using data mining techniques the data get discovered publically and this leads to breaches of specific privacy data. Privacypreserving data mining is used to provide to protection of sensitive information from unwanted or unsanctioned disclosure. In this paper, we analysis the problem of discovering similarity checks for functional dependencies from a given dataset such that application of algorithm (l, d) inference with generalization can anonymised the micro data without loss in utility. [8] This work has presented Functional dependency based perturbation approach which hides sensitive information from the user, by applying (l, d) inference model on the dependency attributes based on Information Gain. This approach works on both categorical and numerical attributes. The perturbed data set does not affects the original dataset it maintains the same or very comparable patterns as the original data set. Hence the utility of the application is always high, when compared to other data mining techniques. The accuracy of the original and perturbed datasets is compared and analysed using tools, data mining classification algorithm.


2019 ◽  
Vol 3 (2) ◽  
pp. 10
Author(s):  
Ardalan Husin Awlla

In this period of computerization, schooling has additionally remodeled itself and is not restrained to old lecture technique. The everyday quest is on to discover better approaches to make it more successful and productive for students. These days, masses of data are gathered in educational databases, however it stays unutilized. To be able to get required advantages from such major information, effective tools are required. Data mining is a developing capable tool for examination and expectation. It is effectively applied in the field of fraud detection, marketing, promoting, forecast and loan assessment. However, it is in incipient stage in the area of education. In this paper, data mining techniques have been applied to construct a classification model to predict the performance of students.


Author(s):  
Feyza Gürbüz ◽  
Fatma Gökçe Önen

The previous decades have witnessed major change within the Information Systems (IS) environment with a corresponding emphasis on the importance of specifying timely and accurate information strategies. Currently, there is an increasing interest in data mining and information systems optimization. Therefore, it makes data mining for optimization of information systems a new and growing research community. This chapter surveys the application of data mining to optimization of information systems. These systems have different data sources and accordingly different objectives for knowledge discovery. After the preprocessing stage, data mining techniques can be applied on the suitable data for the objective of the information systems. These techniques are prediction, classification, association rule mining, statistics and visualization, clustering and outlier detection.


Big Data ◽  
2016 ◽  
pp. 2028-2046
Author(s):  
Ljiljana Kašćelan ◽  
Vladimir Kašćelan ◽  
Milijana Novović-Burić

This paper has proposed a data mining approach for risk assessment in car insurance. Standard methods imply classification of policies to great number of tariff classes and assessment of risk on basis of them. With application of data mining techniques, it is possible to get functional dependencies between the level of risk and risk factors as well as better results in predictions. On the case study data it has been proved that data mining techniques can, with better accuracy than the standard methods, predict claim sizes and occurrence of claims, and this represents the basis for calculation of net risk premium and risk classification. This paper, also, discusses advantages of data mining methods compared to standard methods for risk assessment in car insurance, as well as the specificities of the obtained results due to small insurance market, such is the one in Montenegro.


2019 ◽  
Vol 123 (1267) ◽  
pp. 1415-1436 ◽  
Author(s):  
A. B. A. Anderson ◽  
A. J. Sanjeev Kumar ◽  
A. B. Arockia Christopher

ABSTRACTData mining is a process of finding correlations and collecting and analysing a huge amount of data in a database to discover patterns or relationships. Flight delay creates significant problems in the present aviation system. Data mining techniques are desired for analysing the performance in which micro-level causes propagate to make system-level patterns of delay. Analysing flight delays is very difficult – both when looking from a historical view as well as when estimating delays with forecast demand. This paper proposes using Decision Tree (DT), Support Vector Machine (SVM), Naive Bayesian (NB), K-nearest neighbour (KNN) and Artificial Neural Network (ANN) to study and analyse delays among aircrafts. The performance of different data mining methods is found in the different regions of the updated datasets on these classifiers. Finally, the result shows a significant variation in the performance of different data mining methods and feature selection for this problem. This paper aims to deal with how data mining techniques can be used to understand difficult aircraft system delays in aviation. Our aim is to develop a classification model for studying and reducing delay using different data mining methods and, in this manner, to show that DT has a greater classification accuracy. The different feature selectors are used in this study in order to reduce the number of initial attributes. Our results clearly demonstrate the value of DT for analysing and visualising how system-level effects happen from subsystem-level causes.


2018 ◽  
Vol 5 (3) ◽  
pp. 1-20 ◽  
Author(s):  
Sharmila Subudhi ◽  
Suvasini Panigrahi

This article presents a novel approach for fraud detection in automobile insurance claims by applying various data mining techniques. Initially, the most relevant attributes are chosen from the original dataset by using an evolutionary algorithm based feature selection method. A test set is then extracted from the selected attribute set and the remaining dataset is subjected to the Possibilistic Fuzzy C-Means (PFCM) clustering technique for the undersampling approach. The 10-fold cross validation method is then used on the balanced dataset for training and validating a group of Weighted Extreme Learning Machine (WELM) classifiers generated from various combinations of WELM parameters. Finally, the test set is applied on the best performing model for classification purpose. The efficacy of the proposed system is illustrated by conducting several experiments on a real-world automobile insurance defraud dataset. Besides, a comparative analysis with another approach justifies the superiority of the proposed system.


Sign in / Sign up

Export Citation Format

Share Document