scholarly journals A Survey on Evolutionary Computation Approaches to Feature Selection

2021 ◽  
Author(s):  
Bing Xue ◽  
Mengjie Zhang ◽  
William Browne ◽  
X Yao

Feature selection is an important task in data miningand machine learning to reduce the dimensionality of the dataand increase the performance of an algorithm, such as a clas-sification algorithm. However, feature selection is a challengingtask due mainly to the large search space. A variety of methodshave been applied to solve feature selection problems, whereevolutionary computation techniques have recently gained muchattention and shown some success. However, there are no compre-hensive guidelines on the strengths and weaknesses of alternativeapproaches. This leads to a disjointed and fragmented fieldwith ultimately lost opportunities for improving performanceand successful applications. This paper presents a comprehensivesurvey of the state-of-the-art work on evolutionary computationfor feature selection, which identifies the contributions of thesedifferent algorithms. In addition, current issues and challengesare also discussed to identify promising areas for future research. Index Terms—Evolutionary computation, feature selection,classification, data mining, machine learning. © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

2021 ◽  
Author(s):  
Bing Xue ◽  
Mengjie Zhang ◽  
William Browne ◽  
X Yao

Feature selection is an important task in data miningand machine learning to reduce the dimensionality of the dataand increase the performance of an algorithm, such as a clas-sification algorithm. However, feature selection is a challengingtask due mainly to the large search space. A variety of methodshave been applied to solve feature selection problems, whereevolutionary computation techniques have recently gained muchattention and shown some success. However, there are no compre-hensive guidelines on the strengths and weaknesses of alternativeapproaches. This leads to a disjointed and fragmented fieldwith ultimately lost opportunities for improving performanceand successful applications. This paper presents a comprehensivesurvey of the state-of-the-art work on evolutionary computationfor feature selection, which identifies the contributions of thesedifferent algorithms. In addition, current issues and challengesare also discussed to identify promising areas for future research. Index Terms—Evolutionary computation, feature selection,classification, data mining, machine learning. © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.


2017 ◽  
Vol 108 (1) ◽  
pp. 307-318 ◽  
Author(s):  
Eleftherios Avramidis

AbstractA deeper analysis on Comparative Quality Estimation is presented by extending the state-of-the-art methods with adequacy and grammatical features from other Quality Estimation tasks. The previously used linear method, unable to cope with the augmented features, is replaced with a boosting classifier assisted by feature selection. The methods indicated show improved performance for 6 language pairs, when applied on the output from MT systems developed over 7 years. The improved models compete better with reference-aware metrics.Notable conclusions are reached through the examination of the contribution of the features in the models, whereas it is possible to identify common MT errors that are captured by the features. Many grammatical/fluency features have a good contribution, few adequacy features have some contribution, whereas source complexity features are of no use. The importance of many fluency and adequacy features is language-specific.


2018 ◽  
Vol 186 ◽  
pp. 09004
Author(s):  
André Schaaff ◽  
Marc Wenger

The work environment has deeply evolved in recent decades with the generalisation of IT in terms of hardware, online resources and software. Librarians do not escape this movement and their working environment is becoming essentially digital (databases, online publications, Wikis, specialised software, etc.). With the Big Data era, new tools will be available, implementing artificial intelligence, text mining, machine learning, etc. Most of these technologies already exist but they will become widespread and strongly impact our ways of working. The development of social networks that are "business" oriented will also have an increasing influence. In this context, it is interesting to reflect on how the work environment of librarians will evolve. Maintaining interest in the daily work is fundamental and over-automation is not desirable. It is imperative to keep the human-driven factor. We draw on state of the art new technologies which impact their work, and initiate a discussion about how to integrate them while preserving their expertise.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Zhixun Zhao ◽  
Xiaocai Zhang ◽  
Fang Chen ◽  
Liang Fang ◽  
Jinyan Li

Abstract Background DNA N4-methylcytosine (4mC) is a critical epigenetic modification and has various roles in the restriction-modification system. Due to the high cost of experimental laboratory detection, computational methods using sequence characteristics and machine learning algorithms have been explored to identify 4mC sites from DNA sequences. However, state-of-the-art methods have limited performance because of the lack of effective sequence features and the ad hoc choice of learning algorithms to cope with this problem. This paper is aimed to propose new sequence feature space and a machine learning algorithm with feature selection scheme to address the problem. Results The feature importance score distributions in datasets of six species are firstly reported and analyzed. Then the impact of the feature selection on model performance is evaluated by independent testing on benchmark datasets, where ACC and MCC measurements on the performance after feature selection increase by 2.3% to 9.7% and 0.05 to 0.19, respectively. The proposed method is compared with three state-of-the-art predictors using independent test and 10-fold cross-validations, and our method outperforms in all datasets, especially improving the ACC by 3.02% to 7.89% and MCC by 0.06 to 0.15 in the independent test. Two detailed case studies by the proposed method have confirmed the excellent overall performance and correctly identified 24 of 26 4mC sites from the C.elegans gene, and 126 out of 137 4mC sites from the D.melanogaster gene. Conclusions The results show that the proposed feature space and learning algorithm with feature selection can improve the performance of DNA 4mC prediction on the benchmark datasets. The two case studies prove the effectiveness of our method in practical situations.


2021 ◽  
Vol 11 (6) ◽  
pp. 7824-7835
Author(s):  
H. Alalawi ◽  
M. Alsuwat ◽  
H. Alhakami

The importance of classification algorithms has increased in recent years. Classification is a branch of supervised learning with the goal of predicting class labels categorical of new cases. Additionally, with Coronavirus (COVID-19) propagation since 2019, the world still faces a great challenge in defeating COVID-19 even with modern methods and technologies. This paper gives an overview of classification algorithms to provide the readers with an understanding of the concept of the state-of-the-art classification algorithms and their applications used in the COVID-19 diagnosis and detection. It also describes some of the research published on classification algorithms, the existing gaps in the research, and future research directions. This article encourages both academics and machine learning learners to further strengthen the basis of classification methods.


2018 ◽  
Vol 8 (3) ◽  
pp. 46-67 ◽  
Author(s):  
Mehrnoush Barani Shirzad ◽  
Mohammad Reza Keyvanpour

This article describes how feature selection for learning to rank algorithms has become an interesting issue. While noisy and irrelevant features influence performance, and result in an overfitting problem in ranking systems, reducing the number of features by illuminating irrelevant and noisy features is a solution. Several studies have applied feature selection for learning to rank, which promote efficiency and effectiveness of ranking models. As the number of features and consequently the number of irrelevant and noisy features is increasing, systematic a review of Feature selection for learning to rank methods is required. In this article, a framework to examine research on feature selection for learning to rank (FSLR) is proposed. Under this framework, the authors review the most state-of-the-art methods and suggest several criteria to analyze them. FSLR offers a structured classification of current algorithms for future research to: a) properly select strategies from existing algorithms using certain criteria or b) to find ways to develop existing methodologies.


2020 ◽  
Vol 36 (4) ◽  
pp. 1769-1801 ◽  
Author(s):  
Yazhou Xie ◽  
Majid Ebad Sichani ◽  
Jamie E Padgett ◽  
Reginald DesRoches

Machine learning (ML) has evolved rapidly over recent years with the promise to substantially alter and enhance the role of data science in a variety of disciplines. Compared with traditional approaches, ML offers advantages to handle complex problems, provide computational efficiency, propagate and treat uncertainties, and facilitate decision making. Also, the maturing of ML has led to significant advances in not only the main-stream artificial intelligence (AI) research but also other science and engineering fields, such as material science, bioengineering, construction management, and transportation engineering. This study conducts a comprehensive review of the progress and challenges of implementing ML in the earthquake engineering domain. A hierarchical attribute matrix is adopted to categorize the existing literature based on four traits identified in the field, such as ML method, topic area, data resource, and scale of analysis. The state-of-the-art review indicates to what extent ML has been applied in four topic areas of earthquake engineering, including seismic hazard analysis, system identification and damage detection, seismic fragility assessment, and structural control for earthquake mitigation. Moreover, research challenges and the associated future research needs are discussed, which include embracing the next generation of data sharing and sensor technologies, implementing more advanced ML techniques, and developing physics-guided ML models.


2021 ◽  
pp. 1-11
Author(s):  
Carolina Martín-del-Campo-Rodríguez ◽  
Grigori Sidorov ◽  
Ildar Batyrshin

This paper presents a computational model for the unsupervised authorship attribution task based on a traditional machine learning scheme. An improvement over the state of the art is achieved by comparing different feature selection methods on the PAN17 author clustering dataset. To achieve this improvement, specific pre-processing and features extraction methods were proposed, such as a method to separate tokens by type to assign them to only one category. Similarly, special characters are used as part of the punctuation marks to improve the result obtained when applying typed character n-grams. The Weighted cosine similarity measure is applied to improve the B 3 F-score by reducing the vector values where attributes are exclusive. This measure is used to define distances between documents, which later are occupied by the clustering algorithm to perform authorship attribution.


Sign in / Sign up

Export Citation Format

Share Document