The Impact of Feature Selection on Meta-Heuristic Algorithms to Data Mining Methods

ABSTRACTData mining is a process of finding correlations and collecting and analysing a huge amount of data in a database to discover patterns or relationships. Flight delay creates significant problems in the present aviation system. Data mining techniques are desired for analysing the performance in which micro-level causes propagate to make system-level patterns of delay. Analysing flight delays is very difficult – both when looking from a historical view as well as when estimating delays with forecast demand. This paper proposes using Decision Tree (DT), Support Vector Machine (SVM), Naive Bayesian (NB), K-nearest neighbour (KNN) and Artificial Neural Network (ANN) to study and analyse delays among aircrafts. The performance of different data mining methods is found in the different regions of the updated datasets on these classifiers. Finally, the result shows a significant variation in the performance of different data mining methods and feature selection for this problem. This paper aims to deal with how data mining techniques can be used to understand difficult aircraft system delays in aviation. Our aim is to develop a classification model for studying and reducing delay using different data mining methods and, in this manner, to show that DT has a greater classification accuracy. The different feature selectors are used in this study in order to reduce the number of initial attributes. Our results clearly demonstrate the value of DT for analysing and visualising how system-level effects happen from subsystem-level causes.

Download Full-text

Identifying Metabolite Biomarkers in Unstable Angina In-Patients by Feature Selection Based Data Mining Methods

2010 Second International Conference on Computer Modeling and Simulation ◽

10.1109/iccms.2010.335 ◽

2010 ◽

Cited By ~ 1

Author(s):

Huihui Zhao ◽

Jianxin Chen ◽

Na Hou ◽

Chenglong Zheng ◽

Wei Wang

Keyword(s):

Data Mining ◽

Feature Selection ◽

Unstable Angina ◽

Mining Methods

Download Full-text

Use of Particle Swarm Optimization for Feature Selection and Data Mining Methods for Efficient Detection of Automobile Insurance Fraud

2018 International Conference on Recent Innovations in Electrical, Electronics & Communication Engineering (ICRIEECE) ◽

10.1109/icrieece44171.2018.9009411 ◽

2018 ◽

Author(s):

Anmol Pattanaik ◽

Suvasini Panigrahi

Keyword(s):

Data Mining ◽

Feature Selection ◽

Particle Swarm Optimization ◽

Particle Swarm ◽

Automobile Insurance ◽

Insurance Fraud ◽

Swarm Optimization ◽

Efficient Detection ◽

Mining Methods

Download Full-text

THE IMPACT OF FEATURE SELECTION: A DATA-MINING APPLICATION IN DIRECT MARKETING

Intelligent Systems in Accounting Finance & Management ◽

10.1002/isaf.1335 ◽

2013 ◽

Vol 20 (1) ◽

pp. 23-38 ◽

Cited By ~ 4

Author(s):

Ding-Wen Tan ◽

William Yeoh ◽

Yee Ling Boo ◽

Soung-Yue Liew

Keyword(s):

Data Mining ◽

Feature Selection ◽

Direct Marketing ◽

The Impact ◽

Data Mining Application

Download Full-text

Application of the Data Mining Methods to Assess the Impact of Meteorological Conditions on the Episodes of High Concentrations of PM10 along the Polish – Czech Border

E3S Web of Conferences ◽

10.1051/e3sconf/20182801027 ◽

2018 ◽

Vol 28 ◽

pp. 01027

Author(s):

Leszek Ośródka ◽

Ewa Krajny ◽

Marek Wojtylak

Keyword(s):

Data Mining ◽

Air Pollution ◽

Particulate Matter ◽

Concentration Level ◽

Meteorological Data ◽

Meteorological Conditions ◽

Mining Methods ◽

High Concentrations ◽

The Impact ◽

Derivatives Of

The paper presents an attempt to use selected data mining methods to determine the influence of a complex of meteorological conditions on the concentrations of PM10 (PM2.5) proffering the example of the regions of Silesia and Northern Moravia. The collection of standard meteorological data has been supplemented by increments and derivatives of measurable weather elements such as vertical pseudo-gradient of air temperature. The main objective was to develop a universal methodology for the assessment of these impacts, i.e. one that would be independent of the analysed pollution. The probability of occurrence (at a given location) of the assumed concentration level as exceeding the value of the specified distributional quintile was adopted as the discriminant of the incidence. As a result of the analyses conducted, incidences of elevated concentrations of air pollution particulate matter PM10 have been identified and the types of weather responsible for the emergence of such situations have also been determined.

Download Full-text

Data mining to estimate broiler mortality when exposed to heat wave

Scientia Agricola ◽

10.1590/s0103-90162008000300001 ◽

2008 ◽

Vol 65 (3) ◽

pp. 223-229 ◽

Cited By ~ 13

Author(s):

Marcos Martinez Vale ◽

Daniella Jorge de Moura ◽

Irenilza de Alencar Nääs ◽

Stanley Robson de Medeiros Oliveira ◽

Luiz Henrique Antunes Rodrigues

Keyword(s):

Data Mining ◽

Feature Selection ◽

Heat Wave ◽

High Mortality ◽

Animal Production ◽

Environmental Data ◽

Data Mining Techniques ◽

Broiler Production ◽

The Impact ◽

Wave Incidence

Heat waves usually result in losses of animal production since they are exposed to thermal stress inducing an increase in mortality and consequent economical losses. Animal science and meteorological databases from the last years contain enough data in the poultry production business to allow the modeling of mortality losses due to heat wave incidence. This research analyzes a database of broiler production associated to climatic data, using data mining techniques such as attribute selection and data classification (decision tree) to model the impact of heat wave incidence on broiler mortality. The temperature and humidity index (THI) was used for screening environmental data. The data mining techniques allowed the development of three comprehensible models for estimating specifically high mortality during broiler production. Two models yielded a classification accuracy of 89.3% by using Principal Component Analysis (PCA) and Wrapper feature selection approaches. Both models obtained a class precision of 0.83 for classifying high mortality. When the feature selection was made by the domain experts, the model accuracy reached 85.7%, while the class precision of high mortality was 0.76. Meteorological data and the calculated THI from meteorological stations were helpful to select the range of harmful environmental conditions for broilers 29 and 42 days old. The data mining techniques were useful for building animal production models.

Download Full-text

Classification Ratemaking Using Decision Tree in the Insurance Market of Bosnia and Herzegovina

South East European Journal of Economics and Business ◽

10.2478/jeb-2020-0020 ◽

2020 ◽

Vol 15 (2) ◽

pp. 124-139

Author(s):

Amela Omerašević ◽

Jasmina Selimović

Keyword(s):

Data Mining ◽

Bosnia And Herzegovina ◽

Linear Models ◽

Risk Model ◽

Parameter Estimates ◽

East European ◽

Reliable Parameter ◽

Mining Methods ◽

Claims Frequency ◽

The Impact

AbstractThis paper investigates the impact of risk classification on life insurance ratemaking with particular reference to Bosnia and Herzegovina (BiH). The research is based on a sample of over eighteen thousand insurance policies for passenger vehicles collected over the period 2015-2020. In our empirical investigation we develop a standard risk model based on the application of Poisson Generalized linear models (GLM) for claims frequency estimate and Gamma GLM for claim severity estimate. The analysis reveals that GLM does not provide a reliable parameter estimates for Multi-level factor (MLF) categorical predictors. Although GLM is widely used method to deter insurance premiums, improvements of GLM by using the data mining methods identified in this paper may solve practical challenges for the risk models. The popularity of applying data mining methods in the actuarial community has been growing in recent years due to its efficiency and precision. These models are recommended to be considered in BiH and South East European region in general.

Download Full-text

An Optimization of Feature Selection for Classification using Bat Algorithm

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f5331.039621 ◽

2021 ◽

Vol 9 (6) ◽

pp. 39-43

Author(s):

Yasaswini V. ◽

Santhi Baskaran

Keyword(s):

Data Mining ◽

Feature Selection ◽

Heuristic Algorithms ◽

Harmony Search ◽

Medical Engineering ◽

Bat Algorithm ◽

Cuckoo Search ◽

Vital Role ◽

Target Class ◽

Data Set

Data mining is the action of searching the large existing database in order to get new and best information. It plays a major and vital role now-a-days in all sorts of fields like Medical, Engineering, Banking, Education and Fraud detection. In this paper Feature selection which is a part of Data mining is performed to do classification. The role of feature selection is in the context of deep learning and how it is related to feature engineering. Feature selection is a preprocessing technique which selects the appropriate features from the data set to get the accurate result and outcome for the classification. Natureinspired Optimization algorithms like Ant colony, Firefly, Cuckoo Search and Harmony Search showed better performance by giving the best accuracy rate with less number of features selected and also fine f-Measure value is noted. These algorithms are used to perform classification that accurately predicts the target class for each case in the data set. We propose a technique to get the optimized feature selection to perform classification using Meta Heuristic algorithms. We applied new and recent advanced optimized algorithm named Bat algorithm on UCI datasets that showed comparatively equal results with best performed existing firefly but with less number of features selected. The work is implemented using JAVA and the Medical dataset (UCI) has been used. These datasets were chosen due to nominal class features. The number of attributes, instances and classes varies from chosen dataset to represent different combinations. Classification is done using J48 classifier in WEKA tool. We demonstrate the comparative results of the presently used algorithms with the existing algorithms thoroughly.

Download Full-text

An Empirical Study on Initializing Centroid in K-Means Clustering for Feature Selection

International Journal of Software Science and Computational Intelligence ◽

10.4018/ijssci.2021010101 ◽

2021 ◽

Vol 13 (1) ◽

pp. 1-16

Author(s):

Amit Saxena ◽

John Wang ◽

Wutiphol Sintunavarat

Keyword(s):

Data Mining ◽

Feature Selection ◽

Empirical Study ◽

Data Sets ◽

Data Set ◽

Unsupervised Feature Selection ◽

The Impact

One of the main problems in K-means clustering is setting of initial centroids which can cause misclustering of patterns which affects clustering accuracy. Recently, a density and distance-based technique for determining initial centroids has claimed a faster convergence of clusters. Motivated from this key idea, the authors study the impact of initial centroids on clustering accuracy for unsupervised feature selection. Three metrics are used to rank the features of a data set. The centroids of the clusters in the data sets, to be applied in K-means clustering, are initialized randomly as well as by density and distance-based approaches. Extensive experiments are performed on 15 datasets. The main significance of the paper is that the K-means clustering yields higher accuracies in majority of these datasets using proposed density and distance-based approach. As an impact of the paper, with fewer features, a good clustering accuracy can be achieved which can be useful in data mining of data sets with thousands of features.

Download Full-text

An Optimization of Feature Selection for Classification using Modified Bat Algorithm

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2021.04.04 ◽

2021 ◽

Vol 13 (4) ◽

pp. 38-46

Author(s):

V. Yasaswini ◽

◽

Santhi Baskaran

Keyword(s):

Data Mining ◽

Feature Selection ◽

Heuristic Algorithms ◽

Predictive Accuracy ◽

Harmony Search ◽

Medical Engineering ◽

Bat Algorithm ◽

Target Class ◽

Data Set ◽

Modified Bat Algorithm

Data mining is the action of searching the large existing database in order to get new and best information. It plays a major and vital role now-a-days in all sorts of fields like Medical, Engineering, Banking, Education and Fraud detection. In this paper Feature selection which is a part of Data mining is performed to do classification. The role of feature selection is in the context of deep learning and how it is related to feature engineering. Feature selection is a preprocessing technique which selects the appropriate features from the data set to get the accurate result and outcome for the classification. Nature-inspired Optimization algorithms like Ant colony, Firefly, Cuckoo Search and Harmony Search showed better performance by giving the best accuracy rate with less number of features selected and also fine fMeasure value is noted. These algorithms are used to perform classification that accurately predicts the target class for each case in the data set. We propose a technique to get the optimized feature selection to perform classification using Meta Heuristic algorithms. We applied new and recent advanced optimized algorithm named Modified Bat algorithm on University of California Irvine datasets that showed comparatively equal results with best performed existing firefly but with less number of features selected. The work is implemented using JAVA and the Medical dataset has been used. These datasets were chosen due to nominal class features. The number of attributes, instances and classes varies from chosen dataset to represent different combinations. Classification is done using J48 classifier in WEKA tool. We demonstrate the comparative results of the presently used algorithms with the existing algorithms thoroughly. The significance of this research is it will show a great impact in selecting the best features out of all the existing features which gives best accuracy rates which helps in extracting the information from raw data in Data Mining Domain. The Value of this research is it will manage main fields like medical and banking which gives exact and proper results in their respective field. The best quality of the research is to optimize the selection of features to achieve maximum predictive accuracy of the data sets which solves both single variable and multi-variable functions through the generation of binary structuring of features in the dataset and to increase the performance of classification by using nature inspired and Meta Heuristic algorithms.

Download Full-text