Feature Selection for Fatigue Segment Classification System Using Elitist Non-Dominated Sorting in Genetic Algorithm

2012 ◽  
Vol 165 ◽  
pp. 232-236 ◽  
Author(s):  
Mohd Haniff Osman ◽  
Z.M. Nopiah ◽  
S. Abdullah

Having relevant features for representing dataset would motivate such algorithms to provide a highly accurate classification system in less-consuming time. Unfortunately, one good set of features is sometimes not fit to all learning algorithms. To confirm that learning algorithm selection does not weights system accuracy user has to validate that the given dataset is a feature-oriented dataset. Thus, in this study we propose a simple verification procedure based on multi objective approach by means of elitist Non-dominated Sorting in Genetic Algorithm (NSGA-II). The way NSGA-II performs in this work is quite similar to the feature selection procedure except on interpretation of the results i.e. set of optimal solutions. Two conflicting minimization elements namely classification error and number of used features are taken as objective functions. A case study of fatigue segment classification was chosen for the purpose of this study where simulations were repeated using four single classifiers such as Naive-Bayes, k nearest neighbours, decision tree and radial basis function. The proposed procedure demonstrates that only two features are needed for classifying a fatigue segment task without having to place concern on learning algorithm

2021 ◽  
pp. 2796-2812
Author(s):  
Nishath Ansari

     Feature selection, a method of dimensionality reduction, is nothing but collecting a range of appropriate feature subsets from the total number of features. In this paper, a point by point explanation review about the feature selection in this segment preferred affairs and its appraisal techniques are discussed. I will initiate my conversation with a straightforward approach so that we consider taking care of features and preferred issues depending upon meta-heuristic strategy. These techniques help in obtaining the best highlight subsets. Thereafter, this paper discusses some system models that drive naturally from the environment are discussed and calculations are performed so that we can take care of the preferred feature matters in complex and massive data. Here, furthermore, I discuss algorithms like the genetic algorithm (GA), the Non-Dominated Sorting Genetic Algorithm (NSGA-II), Particle Swarm Optimization (PSO), and some other meta-heuristic strategies for considering the provisional separation of issues. A comparison of these algorithms has been performed; the results show that the feature selection technique benefits machine learning algorithms by improving the performance of the algorithm. This paper also presents various real-world applications of using feature selection.


2021 ◽  
Vol 9 (8) ◽  
pp. 888
Author(s):  
Qasem Al-Tashi ◽  
Emelia Akashah Patah Akhir ◽  
Said Jadid Abdulkadir ◽  
Seyedali Mirjalili ◽  
Tareq M. Shami ◽  
...  

The accurate classification of reservoir recovery factor is dampened by irregularities such as noisy and high-dimensional features associated with the reservoir measurements or characterization. These irregularities, especially a larger number of features, make it difficult to perform accurate classification of reservoir recovery factor, as the generated reservoir features are usually heterogeneous. Consequently, it is imperative to select relevant reservoir features while preserving or amplifying reservoir recovery accuracy. This phenomenon can be treated as a multi-objective optimization problem, since there are two conflicting objectives: minimizing the number of measurements and preserving high recovery classification accuracy. In this study, wrapper-based multi-objective feature selection approaches are proposed to estimate the set of Pareto optimal solutions that represents the optimum trade-off between these two objectives. Specifically, three multi-objective optimization algorithms—Non-dominated Sorting Genetic Algorithm II (NSGA-II), Multi-Objective Grey Wolf Optimizer (MOGWO) and Multi-Objective Particle Swarm Optimization (MOPSO)—are investigated in selecting relevant features from the reservoir dataset. To the best of our knowledge, this is the first time multi-objective optimization has been used for reservoir recovery factor classification. The Artificial Neural Network (ANN) classification algorithm is used to evaluate the selected reservoir features. Findings from the experimental results show that the proposed MOGWO-ANN outperforms the other two approaches (MOPSO and NSGA-II) in terms of producing non-dominated solutions with a small subset of features and reduced classification error rate.


Energies ◽  
2018 ◽  
Vol 11 (10) ◽  
pp. 2641 ◽  
Author(s):  
Aydin Jadidi ◽  
Raimundo Menezes ◽  
Nilmar de Souza ◽  
Antonio de Castro Lima

The use of photovoltaics is still considered to be challenging because of certain reliability issues and high dependence on the global horizontal irradiance (GHI). GHI forecasting has a wide application from grid safety to supply–demand balance and economic load dispatching. Given a data set, a multi-layer perceptron neural network (MLPNN) is a strong tool for solving the forecasting problems. Furthermore, noise detection and feature selection in a data set with numerous variables including meteorological parameters and previous values of GHI are of crucial importance to obtain the desired results. This paper employs density-based spatial clustering of applications with noise (DBSCAN) and non-dominated sorting genetic algorithm II (NSGA II) algorithms for noise detection and feature selection, respectively. Tuning the neural network is another important issue that includes choosing the hidden layer size and activation functions between the layers of the network. Previous studies have utilized a combination of different parameters based on trial and error, which seems to be inefficient in terms of accurate selection of the desired features and also tuning of the neural network. In this research, two different methods—namely, particle swarm optimization (PSO) algorithm and genetic algorithm (GA)—are utilized in order to tune the MLPNN, and the results of one-hour-ahead forecasting of the GHI are subsequently compared. The methodology is validated using the hourly data for Elizabeth City located in North Carolina, USA, and the results demonstrated a better performance of GA in comparison with PSO. The GA-tuned MLPNN reported a normalized root mean square error (nRMSE) of 0.0458 and a normalized mean absolute error (nMAE) of 0.0238.


Author(s):  
Yuan-Dong Lan

Feature selection aims to choose an optimal subset of features that are necessary and sufficient to improve the generalization performance and the running efficiency of the learning algorithm. To get the optimal subset in the feature selection process, a hybrid feature selection based on mutual information and genetic algorithm is proposed in this paper. In order to make full use of the advantages of filter and wrapper model, the algorithm is divided into two phases: the filter phase and the wrapper phase. In the filter phase, this algorithm first uses the mutual information to sort the feature, and provides the heuristic information for the subsequent genetic algorithm, to accelerate the search process of the genetic algorithm. In the wrapper phase, using the genetic algorithm as the search strategy, considering the performance of the classifier and dimension of subset as an evaluation criterion, search the best subset of features. Experimental results on benchmark datasets show that the proposed algorithm has higher classification accuracy and smaller feature dimension, and its running time is less than the time of using genetic algorithm.


2019 ◽  
Vol 8 (3) ◽  
pp. 7020-7023

In our Society, Aging society plays serious problems in health and medical care. When compared to other diseases in the real life Rheumatoid Arthritis disease is a common disease, Rheumatoid Arthritis is a disease that causes pain in musculoskeletal system that affect the quality of the people. Rheumatoid Arthritis is onset at middle age, but can affect children and young adults. If the disease is not monitored and treated as early as possible, it can cause serious joint deformities. Cluster analysis is an unsupervised learning technique in data mining for identifying or exploring out the structure of data without known about class label. Many clustering algorithms were proposed to analyze high volume of data, but many of them not evaluate cluster’s quality because of inconvenient features presented in the dataset. Feature selection is a prime task in data analysis in case of high dimensional dataset. Optimal subsets of features are enough to cluster the data. In this study, Rheumatoid Arthritis clinical data were analyzed to predict the patient affected with Rheumatoid Arthritis disease. In this study, KMeans clustering algorithm was used to predict the patient affected with Rheumatoid Arthritis Disease. Genetic algorithm is used to filter the feature and at the end of the process it finds optimal clusters for k-Means clustering algorithm. Based on the initial centroid , K-Means algorithm may have the chance of producing empty cluster. K-means does not effectively handle the outliers or noisy data in the dataset. K-means algorithm when combined with Genetic Algorithm shows high performance quality of clustering and fast evolution process when compared with K-Means alone. In this paper, to diagnosis Rheumatoid Arthritis disease we use machine learning algorithm FSKG. A predictive FSKG model is explored that diagnoses rheumatoid arthritis. After completing data analysis and pre-processing operations, Genetic Algorithm and K-Means Clustering Algorithm are integrated to choose correct features among all the features. Experimental Results from this study imply improved accuracy when compared to k-means algorithm for rheumatoid disease prediction.


Author(s):  
Hai Thanh Nguyen ◽  
Katrin Franke ◽  
Slobodan Petrovic

In this paper, the authors propose a new feature selection procedure for intrusion detection, which is based on filter method used in machine learning. They focus on Correlation Feature Selection (CFS) and transform the problem of feature selection by means of CFS measure into a mixed 0-1 linear programming problem with a number of constraints and variables that is linear in the number of full set features. The mixed 0-1 linear programming problem can then be solved by using branch-and-bound algorithm. This feature selection algorithm was compared experimentally with the best-first-CFS and the genetic-algorithm-CFS methods regarding the feature selection capabilities. Classification accuracies obtained after the feature selection by means of the C4.5 and the BayesNet over the KDD CUP’99 dataset were also tested. Experiments show that the authors’ method outperforms the best-first-CFS and the genetic-algorithm-CFS methods by removing much more redundant features while keeping the classification accuracies or getting better performances.


Author(s):  
Hai Thanh Nguyen ◽  
Katrin Franke ◽  
Slobodan Petrovic

In this paper, the authors propose a new feature selection procedure for intrusion detection, which is based on filter method used in machine learning. They focus on Correlation Feature Selection (CFS) and transform the problem of feature selection by means of CFS measure into a mixed 0-1 linear programming problem with a number of constraints and variables that is linear in the number of full set features. The mixed 0-1 linear programming problem can then be solved by using branch-and-bound algorithm. This feature selection algorithm was compared experimentally with the best-first-CFS and the genetic-algorithm-CFS methods regarding the feature selection capabilities. Classification accuracies obtained after the feature selection by means of the C4.5 and the BayesNet over the KDD CUP’99 dataset were also tested. Experiments show that the authors’ method outperforms the best-first-CFS and the genetic-algorithm-CFS methods by removing much more redundant features while keeping the classification accuracies or getting better performances.


2015 ◽  
Vol 15 (02) ◽  
pp. 1540025 ◽  
Author(s):  
IMANE NEDJAR ◽  
MOSTAFA EL HABIB DAHO ◽  
NESMA SETTOUTI ◽  
SAÏD MAHMOUDI ◽  
MOHAMED AMINE CHIKH

Automated classification of medical images is an increasingly important tool for physicians in their daily activities. However, due to its computational complexity, this task is one of the major current challenges in the field of content-based image retrieval (CBIR). In this paper, a medical image classification approach is proposed. This method is composed of two main phases. The first step consists of a pre-processing, where a texture and shape based features vector is extracted. Also, a feature selection approach was applied by using a Genetic Algorithm (GA). The proposed GA uses a kNN based classification error as fitness function, which enables the GA to obtain a combinatorial set of feature giving rise to optimal accuracy. In the second phase, a classification process is achieved by using random Forest classifier and a supervised multi-class classifier based on the support vector machine (SVM) for classifying X-ray images.


2020 ◽  
Vol 5 (2) ◽  
pp. 153
Author(s):  
Rizki Tri Prasetio

Computer assisted medical diagnosis is a major machine learning problem being researched recently. General classifiers learn from the data itself through training process, due to the inexperience of an expert in determining parameters. This research proposes a methodology based on machine learning paradigm. Integrates the search heuristic that is inspired by natural evolution called genetic algorithm with the simplest and the most used learning algorithm, k-nearest Neighbor. The genetic algorithm were used for feature selection and parameter optimization while k-nearest Neighbor were used as a classifier. The proposed method is experimented on five benchmarked medical datasets from University California Irvine Machine Learning Repository and compared with original k-NN and other feature selection algorithm i.e., forward selection, backward elimination and greedy feature selection.  Experiment results show that the proposed method is able to achieve good performance with significant improvement with p value of t-Test is 0.0011.


Sign in / Sign up

Export Citation Format

Share Document