scholarly journals Correction to: Cooperative co‑evolution for feature selection in Big Data with random feature grouping

2020 ◽  
Vol 7 (1) ◽  
Author(s):  
A. N. M. Bazlur Rashid ◽  
Mohiuddin Ahmed ◽  
Leslie F. Sikos ◽  
Paul Haskell‑Dowland

An amendment to this paper has been published and can be accessed via the original article.

2020 ◽  
Vol 7 (1) ◽  
Author(s):  
A. N. M. Bazlur Rashid ◽  
Mohiuddin Ahmed ◽  
Leslie F. Sikos ◽  
Paul Haskell-Dowland

AbstractA massive amount of data is generated with the evolution of modern technologies. This high-throughput data generation results in Big Data, which consist of many features (attributes). However, irrelevant features may degrade the classification performance of machine learning (ML) algorithms. Feature selection (FS) is a technique used to select a subset of relevant features that represent the dataset. Evolutionary algorithms (EAs) are widely used search strategies in this domain. A variant of EAs, called cooperative co-evolution (CC), which uses a divide-and-conquer approach, is a good choice for optimization problems. The existing solutions have poor performance because of some limitations, such as not considering feature interactions, dealing with only an even number of features, and decomposing the dataset statically. In this paper, a novel random feature grouping (RFG) has been introduced with its three variants to dynamically decompose Big Data datasets and to ensure the probability of grouping interacting features into the same subcomponent. RFG can be used in CC-based FS processes, hence called Cooperative Co-Evolutionary-Based Feature Selection with Random Feature Grouping (CCFSRFG). Experiment analysis was performed using six widely used ML classifiers on seven different datasets from the UCI ML repository and Princeton University Genomics repository with and without FS. The experimental results indicate that in most cases [i.e., with naïve Bayes (NB), support vector machine (SVM), k-Nearest Neighbor (k-NN), J48, and random forest (RF)] the proposed CCFSRFG-1 outperforms an existing solution (a CC-based FS, called CCEAFS) and CCFSRFG-2, and also when using all features in terms of accuracy, sensitivity, and specificity.


2021 ◽  
Vol 558 ◽  
pp. 124-139
Author(s):  
D. López ◽  
S. Ramírez-Gallego ◽  
S. García ◽  
N. Xiong ◽  
F. Herrera

Author(s):  
Miguel García-Torres ◽  
Francisco Gómez-Vela ◽  
Federico Divina ◽  
Diego P. Pinto-Roa ◽  
José Luis Vázquez Noguera ◽  
...  

Methods ◽  
2016 ◽  
Vol 111 ◽  
pp. 21-31 ◽  
Author(s):  
Lipo Wang ◽  
Yaoli Wang ◽  
Qing Chang

PLoS ONE ◽  
2018 ◽  
Vol 13 (8) ◽  
pp. e0202674 ◽  
Author(s):  
Simeone Marino ◽  
Jiachen Xu ◽  
Yi Zhao ◽  
Nina Zhou ◽  
Yiwang Zhou ◽  
...  

2021 ◽  
Vol 26 (1) ◽  
pp. 67-77
Author(s):  
Siva Sankari Subbiah ◽  
Jayakumar Chinnappan

Now a day, all the organizations collecting huge volume of data without knowing its usefulness. The fast development of Internet helps the organizations to capture data in many different formats through Internet of Things (IoT), social media and from other disparate sources. The dimension of the dataset increases day by day at an extraordinary rate resulting in large scale dataset with high dimensionality. The present paper reviews the opportunities and challenges of feature selection for processing the high dimensional data with reduced complexity and improved accuracy. In the modern big data world the feature selection has a significance in reducing the dimensionality and overfitting of the learning process. Many feature selection methods have been proposed by researchers for obtaining more relevant features especially from the big datasets that helps to provide accurate learning results without degradation in performance. This paper discusses the importance of feature selection, basic feature selection approaches, centralized and distributed big data processing using Hadoop and Spark, challenges of feature selection and provides the summary of the related research work done by various researchers. As a result, the big data analysis with the feature selection improves the accuracy of the learning.


Author(s):  
Aakriti Shukla ◽  
◽  
Dr Damodar Prasad Tiwari ◽  

Dimension reduction or feature selection is thought to be the backbone of big data applications in order to improve performance. Many scholars have shifted their attention in recent years to data science and analysis for real-time applications using big data integration. It takes a long time for humans to interact with big data. As a result, while handling high workload in a distributed system, it is necessary to make feature selection elastic and scalable. In this study, a survey of alternative optimizing techniques for feature selection are presented, as well as an analytical result analysis of their limits. This study contributes to the development of a method for improving the efficiency of feature selection in big complicated data sets.


Sign in / Sign up

Export Citation Format

Share Document