scholarly journals An Optimization of Feature Selection for Classification using Modified Bat Algorithm

Author(s):  
V. Yasaswini ◽  
◽  
Santhi Baskaran

Data mining is the action of searching the large existing database in order to get new and best information. It plays a major and vital role now-a-days in all sorts of fields like Medical, Engineering, Banking, Education and Fraud detection. In this paper Feature selection which is a part of Data mining is performed to do classification. The role of feature selection is in the context of deep learning and how it is related to feature engineering. Feature selection is a preprocessing technique which selects the appropriate features from the data set to get the accurate result and outcome for the classification. Nature-inspired Optimization algorithms like Ant colony, Firefly, Cuckoo Search and Harmony Search showed better performance by giving the best accuracy rate with less number of features selected and also fine fMeasure value is noted. These algorithms are used to perform classification that accurately predicts the target class for each case in the data set. We propose a technique to get the optimized feature selection to perform classification using Meta Heuristic algorithms. We applied new and recent advanced optimized algorithm named Modified Bat algorithm on University of California Irvine datasets that showed comparatively equal results with best performed existing firefly but with less number of features selected. The work is implemented using JAVA and the Medical dataset has been used. These datasets were chosen due to nominal class features. The number of attributes, instances and classes varies from chosen dataset to represent different combinations. Classification is done using J48 classifier in WEKA tool. We demonstrate the comparative results of the presently used algorithms with the existing algorithms thoroughly. The significance of this research is it will show a great impact in selecting the best features out of all the existing features which gives best accuracy rates which helps in extracting the information from raw data in Data Mining Domain. The Value of this research is it will manage main fields like medical and banking which gives exact and proper results in their respective field. The best quality of the research is to optimize the selection of features to achieve maximum predictive accuracy of the data sets which solves both single variable and multi-variable functions through the generation of binary structuring of features in the dataset and to increase the performance of classification by using nature inspired and Meta Heuristic algorithms.

Author(s):  
Yasaswini V. ◽  
Santhi Baskaran

Data mining is the action of searching the large existing database in order to get new and best information. It plays a major and vital role now-a-days in all sorts of fields like Medical, Engineering, Banking, Education and Fraud detection. In this paper Feature selection which is a part of Data mining is performed to do classification. The role of feature selection is in the context of deep learning and how it is related to feature engineering. Feature selection is a preprocessing technique which selects the appropriate features from the data set to get the accurate result and outcome for the classification. Natureinspired Optimization algorithms like Ant colony, Firefly, Cuckoo Search and Harmony Search showed better performance by giving the best accuracy rate with less number of features selected and also fine f-Measure value is noted. These algorithms are used to perform classification that accurately predicts the target class for each case in the data set. We propose a technique to get the optimized feature selection to perform classification using Meta Heuristic algorithms. We applied new and recent advanced optimized algorithm named Bat algorithm on UCI datasets that showed comparatively equal results with best performed existing firefly but with less number of features selected. The work is implemented using JAVA and the Medical dataset (UCI) has been used. These datasets were chosen due to nominal class features. The number of attributes, instances and classes varies from chosen dataset to represent different combinations. Classification is done using J48 classifier in WEKA tool. We demonstrate the comparative results of the presently used algorithms with the existing algorithms thoroughly.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Shahla U. Umar ◽  
Tarik A. Rashid

Purpose The purpose of this study is to provide the reader with a full study of the bat algorithm, including its limitations, the fields that the algorithm has been applied, versatile optimization problems in different domains and all the studies that assess its performance against other meta-heuristic algorithms. Design/methodology/approach Bat algorithm is given in-depth in terms of backgrounds, characteristics, limitations, it has also displayed the algorithms that hybridized with BA (K-Medoids, back-propagation neural network, harmony search algorithm, differential evaluation strategies, enhanced particle swarm optimization and Cuckoo search algorithm) and their theoretical results, as well as to the modifications that have been performed of the algorithm (modified bat algorithm, enhanced bat algorithm, bat algorithm with mutation (BAM), uninhabited combat aerial vehicle-BAM and non-linear optimization). It also provides a summary review that focuses on improved and new bat algorithm (directed artificial bat algorithm, complex-valued bat algorithm, principal component analyzes-BA, multiple strategies coupling bat algorithm and directional bat algorithm). Findings Shed light on the advantages and disadvantages of this algorithm through all the research studies that dealt with the algorithm in addition to the fields and applications it has addressed in the hope that it will help scientists understand and develop it. Originality/value As far as the research community knowledge, there is no comprehensive survey study conducted on this algorithm covering all its aspects.


Increase in blood glucose (hyperglycaemia) leads to Diabetes Mellitus. There are two kinds of Diabetes mellitus: (Type 1 Diabetes Mellitus (T1DM) and (Diabetes Mellitus (T2DM), then former one is dependent on insulin and the latter one is independent of insulin. Various factors make it difficult to diagnose it. SO the author focuses at binging-in and analyzing the method for making a novel robust diagnosis system using data mining methods. Complete datasets is necessary for data mining techniques, but these techniques doesn’t give accurate results with missing values and all features. So, for prediction, Handling Missing value replacement and selection of important features are becomes a major issue. Hence, Adaptive Neuro Fuzzy Inference System (ANFIS) were proposed to acquire the missing value in dataset and to rectify the above mentioned issue. Then for an effective seed selection in Improved K-means algorithm, Enhanced Inertia Weight Binary Bat Algorithm (EIWBBA) is proposed, which results in high convergence speed. This research work proposed for feature selection with the help of Improved Distributed Kernel based Principal Component analysis (IDKPCA) with less time, after minimizing the entire feature space to the best features set. Then for classification of clustered samples, the author brought-in the Support Vector Machine (SVM). The experimental result confirms that the proposed algorithm gives the best classification accuracy rate when compared with other methods. From Pima Indians Diabetes, the data set has been considered and the experiment is done with the help of MATLAB for examining the Knowledge and the results were distinguished with other outcomes using appropriate toolkits.


2017 ◽  
Vol E100.D (8) ◽  
pp. 1860-1869 ◽  
Author(s):  
Bin YANG ◽  
Yuliang LU ◽  
Kailong ZHU ◽  
Guozheng YANG ◽  
Jingwei LIU ◽  
...  

Author(s):  
Barak Chizi ◽  
Lior Rokach ◽  
Oded Maimon

Dimensionality (i.e., the number of data set attributes or groups of attributes) constitutes a serious obstacle to the efficiency of most data mining algorithms (Maimon and Last, 2000). The main reason for this is that data mining algorithms are computationally intensive. This obstacle is sometimes known as the “curse of dimensionality” (Bellman, 1961). The objective of Feature Selection is to identify features in the data-set as important, and discard any other feature as irrelevant and redundant information. Since Feature Selection reduces the dimensionality of the data, data mining algorithms can be operated faster and more effectively by using Feature Selection. In some cases, as a result of feature selection, the performance of the data mining method can be improved. The reason for that is mainly a more compact, easily interpreted representation of the target concept. The filter approach (Kohavi , 1995; Kohavi and John ,1996) operates independently of the data mining method employed subsequently -- undesirable features are filtered out of the data before learning begins. These algorithms use heuristics based on general characteristics of the data to evaluate the merit of feature subsets. A sub-category of filter methods that will be refer to as rankers, are methods that employ some criterion to score each feature and provide a ranking. From this ordering, several feature subsets can be chosen by manually setting There are three main approaches for feature selection: wrapper, filter and embedded. The wrapper approach (Kohavi, 1995; Kohavi and John,1996), uses an inducer as a black box along with a statistical re-sampling technique such as cross-validation to select the best feature subset according to some predictive measure. The embedded approach (see for instance Guyon and Elisseeff, 2003) is similar to the wrapper approach in the sense that the features are specifically selected for a certain inducer, but it selects the features in the process of learning.


Author(s):  
Saja Taha Ahmed ◽  
Rafah Al-Hamdani ◽  
Muayad Sadik Croock

<p><span>Recently, the decision trees have been adopted among the preeminent utilized classification models. They acquire their fame from their efficiency in predictive analytics, easy to interpret and implicitly perform feature selection. This latter perspective is one of essential significance in Educational Data Mining (EDM), in which selecting the most relevant features has a major impact on classification accuracy enhancement. <br /> The main contribution is to build a new multi-objective decision tree, which can be used for feature selection and classification. The proposed Decisive Decision Tree (DDT) is introduced and constructed based on a decisive feature value as a feature weight related to the target class label. The traditional Iterative Dichotomizer 3 (ID3) algorithm and the proposed DDT are compared using three datasets in terms of some ID3 issues, including logarithmic calculation complexity and multi-values features<em></em>selection. The results indicated that the proposed DDT outperforms the ID3 in the developing time. The accuracy of the classification is improved on the basis of 10-fold cross-validation for all datasets with the highest accuracy achieved by the proposed method is 92% for the student.por dataset and holdout validation for two datasets, i.e. Iraqi and Student-Math. The experiment also shows that the proposed DDT tends to select attributes that are important rather than multi-value. </span></p>


2018 ◽  
Vol 7 (1) ◽  
pp. 9-24
Author(s):  
Mohammad Masoud Javidi

Finding a subset of features from a large data set is a problem that arises in many fields of study. It is important to have an effective subset of features that is selected for the system to provide acceptable performance. This will lead us in a direction that to use meta-heuristic algorithms to find the optimal subset of features. The performance of evolutionary algorithms is dependent on many parameters which have significant impact on its performance, and these algorithms usually use a random process to set parameters. The nature of chaos is apparently random and unpredictable; however it also deterministic, it can suitable alternative instead of random process in meta-heuristic algorithms


Author(s):  
Laith Mohammad Abualigah ◽  
Mofleh Al‐diabat ◽  
Mohammad Al Shinwan ◽  
Khaldoon Dhou ◽  
Bisan Alsalibi ◽  
...  

2017 ◽  
Vol 2017 ◽  
pp. 1-18 ◽  
Author(s):  
Andrea Bommert ◽  
Jörg Rahnenführer ◽  
Michel Lang

Finding a good predictive model for a high-dimensional data set can be challenging. For genetic data, it is not only important to find a model with high predictive accuracy, but it is also important that this model uses only few features and that the selection of these features is stable. This is because, in bioinformatics, the models are used not only for prediction but also for drawing biological conclusions which makes the interpretability and reliability of the model crucial. We suggest using three target criteria when fitting a predictive model to a high-dimensional data set: the classification accuracy, the stability of the feature selection, and the number of chosen features. As it is unclear which measure is best for evaluating the stability, we first compare a variety of stability measures. We conclude that the Pearson correlation has the best theoretical and empirical properties. Also, we find that for the stability assessment behaviour it is most important that a measure contains a correction for chance or large numbers of chosen features. Then, we analyse Pareto fronts and conclude that it is possible to find models with a stable selection of few features without losing much predictive accuracy.


Sign in / Sign up

Export Citation Format

Share Document