scholarly journals The Impact of Feature Selection on Meta-Heuristic Algorithms to Data Mining Methods

Author(s):  
Maysam Toghraee ◽  
◽  
Hamid Parvin ◽  
Farhad Rad
2019 ◽  
Vol 123 (1267) ◽  
pp. 1415-1436 ◽  
Author(s):  
A. B. A. Anderson ◽  
A. J. Sanjeev Kumar ◽  
A. B. Arockia Christopher

ABSTRACTData mining is a process of finding correlations and collecting and analysing a huge amount of data in a database to discover patterns or relationships. Flight delay creates significant problems in the present aviation system. Data mining techniques are desired for analysing the performance in which micro-level causes propagate to make system-level patterns of delay. Analysing flight delays is very difficult – both when looking from a historical view as well as when estimating delays with forecast demand. This paper proposes using Decision Tree (DT), Support Vector Machine (SVM), Naive Bayesian (NB), K-nearest neighbour (KNN) and Artificial Neural Network (ANN) to study and analyse delays among aircrafts. The performance of different data mining methods is found in the different regions of the updated datasets on these classifiers. Finally, the result shows a significant variation in the performance of different data mining methods and feature selection for this problem. This paper aims to deal with how data mining techniques can be used to understand difficult aircraft system delays in aviation. Our aim is to develop a classification model for studying and reducing delay using different data mining methods and, in this manner, to show that DT has a greater classification accuracy. The different feature selectors are used in this study in order to reduce the number of initial attributes. Our results clearly demonstrate the value of DT for analysing and visualising how system-level effects happen from subsystem-level causes.


2013 ◽  
Vol 20 (1) ◽  
pp. 23-38 ◽  
Author(s):  
Ding-Wen Tan ◽  
William Yeoh ◽  
Yee Ling Boo ◽  
Soung-Yue Liew

2018 ◽  
Vol 28 ◽  
pp. 01027
Author(s):  
Leszek Ośródka ◽  
Ewa Krajny ◽  
Marek Wojtylak

The paper presents an attempt to use selected data mining methods to determine the influence of a complex of meteorological conditions on the concentrations of PM10 (PM2.5) proffering the example of the regions of Silesia and Northern Moravia. The collection of standard meteorological data has been supplemented by increments and derivatives of measurable weather elements such as vertical pseudo-gradient of air temperature. The main objective was to develop a universal methodology for the assessment of these impacts, i.e. one that would be independent of the analysed pollution. The probability of occurrence (at a given location) of the assumed concentration level as exceeding the value of the specified distributional quintile was adopted as the discriminant of the incidence. As a result of the analyses conducted, incidences of elevated concentrations of air pollution particulate matter PM10 have been identified and the types of weather responsible for the emergence of such situations have also been determined.


2008 ◽  
Vol 65 (3) ◽  
pp. 223-229 ◽  
Author(s):  
Marcos Martinez Vale ◽  
Daniella Jorge de Moura ◽  
Irenilza de Alencar Nääs ◽  
Stanley Robson de Medeiros Oliveira ◽  
Luiz Henrique Antunes Rodrigues

Heat waves usually result in losses of animal production since they are exposed to thermal stress inducing an increase in mortality and consequent economical losses. Animal science and meteorological databases from the last years contain enough data in the poultry production business to allow the modeling of mortality losses due to heat wave incidence. This research analyzes a database of broiler production associated to climatic data, using data mining techniques such as attribute selection and data classification (decision tree) to model the impact of heat wave incidence on broiler mortality. The temperature and humidity index (THI) was used for screening environmental data. The data mining techniques allowed the development of three comprehensible models for estimating specifically high mortality during broiler production. Two models yielded a classification accuracy of 89.3% by using Principal Component Analysis (PCA) and Wrapper feature selection approaches. Both models obtained a class precision of 0.83 for classifying high mortality. When the feature selection was made by the domain experts, the model accuracy reached 85.7%, while the class precision of high mortality was 0.76. Meteorological data and the calculated THI from meteorological stations were helpful to select the range of harmful environmental conditions for broilers 29 and 42 days old. The data mining techniques were useful for building animal production models.


2020 ◽  
Vol 15 (2) ◽  
pp. 124-139
Author(s):  
Amela Omerašević ◽  
Jasmina Selimović

AbstractThis paper investigates the impact of risk classification on life insurance ratemaking with particular reference to Bosnia and Herzegovina (BiH). The research is based on a sample of over eighteen thousand insurance policies for passenger vehicles collected over the period 2015-2020. In our empirical investigation we develop a standard risk model based on the application of Poisson Generalized linear models (GLM) for claims frequency estimate and Gamma GLM for claim severity estimate. The analysis reveals that GLM does not provide a reliable parameter estimates for Multi-level factor (MLF) categorical predictors. Although GLM is widely used method to deter insurance premiums, improvements of GLM by using the data mining methods identified in this paper may solve practical challenges for the risk models. The popularity of applying data mining methods in the actuarial community has been growing in recent years due to its efficiency and precision. These models are recommended to be considered in BiH and South East European region in general.


Author(s):  
Yasaswini V. ◽  
Santhi Baskaran

Data mining is the action of searching the large existing database in order to get new and best information. It plays a major and vital role now-a-days in all sorts of fields like Medical, Engineering, Banking, Education and Fraud detection. In this paper Feature selection which is a part of Data mining is performed to do classification. The role of feature selection is in the context of deep learning and how it is related to feature engineering. Feature selection is a preprocessing technique which selects the appropriate features from the data set to get the accurate result and outcome for the classification. Natureinspired Optimization algorithms like Ant colony, Firefly, Cuckoo Search and Harmony Search showed better performance by giving the best accuracy rate with less number of features selected and also fine f-Measure value is noted. These algorithms are used to perform classification that accurately predicts the target class for each case in the data set. We propose a technique to get the optimized feature selection to perform classification using Meta Heuristic algorithms. We applied new and recent advanced optimized algorithm named Bat algorithm on UCI datasets that showed comparatively equal results with best performed existing firefly but with less number of features selected. The work is implemented using JAVA and the Medical dataset (UCI) has been used. These datasets were chosen due to nominal class features. The number of attributes, instances and classes varies from chosen dataset to represent different combinations. Classification is done using J48 classifier in WEKA tool. We demonstrate the comparative results of the presently used algorithms with the existing algorithms thoroughly.


Author(s):  
Amit Saxena ◽  
John Wang ◽  
Wutiphol Sintunavarat

One of the main problems in K-means clustering is setting of initial centroids which can cause misclustering of patterns which affects clustering accuracy. Recently, a density and distance-based technique for determining initial centroids has claimed a faster convergence of clusters. Motivated from this key idea, the authors study the impact of initial centroids on clustering accuracy for unsupervised feature selection. Three metrics are used to rank the features of a data set. The centroids of the clusters in the data sets, to be applied in K-means clustering, are initialized randomly as well as by density and distance-based approaches. Extensive experiments are performed on 15 datasets. The main significance of the paper is that the K-means clustering yields higher accuracies in majority of these datasets using proposed density and distance-based approach. As an impact of the paper, with fewer features, a good clustering accuracy can be achieved which can be useful in data mining of data sets with thousands of features.


Author(s):  
V. Yasaswini ◽  
◽  
Santhi Baskaran

Data mining is the action of searching the large existing database in order to get new and best information. It plays a major and vital role now-a-days in all sorts of fields like Medical, Engineering, Banking, Education and Fraud detection. In this paper Feature selection which is a part of Data mining is performed to do classification. The role of feature selection is in the context of deep learning and how it is related to feature engineering. Feature selection is a preprocessing technique which selects the appropriate features from the data set to get the accurate result and outcome for the classification. Nature-inspired Optimization algorithms like Ant colony, Firefly, Cuckoo Search and Harmony Search showed better performance by giving the best accuracy rate with less number of features selected and also fine fMeasure value is noted. These algorithms are used to perform classification that accurately predicts the target class for each case in the data set. We propose a technique to get the optimized feature selection to perform classification using Meta Heuristic algorithms. We applied new and recent advanced optimized algorithm named Modified Bat algorithm on University of California Irvine datasets that showed comparatively equal results with best performed existing firefly but with less number of features selected. The work is implemented using JAVA and the Medical dataset has been used. These datasets were chosen due to nominal class features. The number of attributes, instances and classes varies from chosen dataset to represent different combinations. Classification is done using J48 classifier in WEKA tool. We demonstrate the comparative results of the presently used algorithms with the existing algorithms thoroughly. The significance of this research is it will show a great impact in selecting the best features out of all the existing features which gives best accuracy rates which helps in extracting the information from raw data in Data Mining Domain. The Value of this research is it will manage main fields like medical and banking which gives exact and proper results in their respective field. The best quality of the research is to optimize the selection of features to achieve maximum predictive accuracy of the data sets which solves both single variable and multi-variable functions through the generation of binary structuring of features in the dataset and to increase the performance of classification by using nature inspired and Meta Heuristic algorithms.


Sign in / Sign up

Export Citation Format

Share Document