Feature selection methods for big data bioinformatics: A survey from the search perspective

Opportunities and Challenges of Feature Selection Methods for High Dimensional Data: A Review

Ingénierie des systèmes d information ◽

10.18280/isi.260107 ◽

2021 ◽

Vol 26 (1) ◽

pp. 67-77

Author(s):

Siva Sankari Subbiah ◽

Jayakumar Chinnappan

Keyword(s):

Feature Selection ◽

Big Data ◽

Large Scale ◽

High Dimensional Data ◽

Research Work ◽

Basic Feature ◽

High Dimensional ◽

Selection Methods ◽

Fast Development ◽

Improved Accuracy

Now a day, all the organizations collecting huge volume of data without knowing its usefulness. The fast development of Internet helps the organizations to capture data in many different formats through Internet of Things (IoT), social media and from other disparate sources. The dimension of the dataset increases day by day at an extraordinary rate resulting in large scale dataset with high dimensionality. The present paper reviews the opportunities and challenges of feature selection for processing the high dimensional data with reduced complexity and improved accuracy. In the modern big data world the feature selection has a significance in reducing the dimensionality and overfitting of the learning process. Many feature selection methods have been proposed by researchers for obtaining more relevant features especially from the big datasets that helps to provide accurate learning results without degradation in performance. This paper discusses the importance of feature selection, basic feature selection approaches, centralized and distributed big data processing using Hadoop and Spark, challenges of feature selection and provides the summary of the related research work done by various researchers. As a result, the big data analysis with the feature selection improves the accuracy of the learning.

Get full-text (via PubEx)

Lightweight Feature Selection Methods Based on Standardized Measure of Dispersion for Mining Big Data

2016 IEEE International Conference on Computer and Information Technology (CIT) ◽

10.1109/cit.2016.120 ◽

2016 ◽

Author(s):

Simon Fong ◽

Robert P. Biuk-Aghai ◽

Yain-Whar Si

Keyword(s):

Feature Selection ◽

Big Data ◽

Selection Methods ◽

Standardized Measure

Get full-text (via PubEx)

Competent of Feature Selection Methods to Classify Big Data Using Social Internet of Things (SIoT)

Algorithms for Intelligent Systems - Information Management and Machine Intelligence ◽

10.1007/978-981-15-4936-6_43 ◽

2020 ◽

pp. 393-398

Author(s):

S. Jayasri ◽

R. Parameswari

Keyword(s):

Feature Selection ◽

Big Data ◽

Internet Of Things ◽

Selection Methods ◽

Social Internet Of Things

Get full-text (via PubEx)

A hybrid metaheuristic approach for efficient feature selection methods in big data

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-019-01656-w ◽

2020 ◽

Cited By ~ 4

Author(s):

S. Meera ◽

C. Sundar

Keyword(s):

Feature Selection ◽

Big Data ◽

Selection Methods ◽

Hybrid Metaheuristic

Get full-text (via PubEx)

Data Feature Selection Methods on Distributed Big Data Processing Platforms

2018 3rd International Conference on Computer Science and Engineering (UBMK) ◽

10.1109/ubmk.2018.8566451 ◽

2018 ◽

Cited By ~ 2

Author(s):

Mehmet Burak Catalkaya ◽

Oya Kalipsiz ◽

Mehmet S. Aktas ◽

Umut Orcun Turgut

Keyword(s):

Feature Selection ◽

Big Data ◽

Data Processing ◽

Selection Methods ◽

Big Data Processing

Get full-text (via PubEx)

Feature selection methods and genomic big data: a systematic review

Journal Of Big Data ◽

10.1186/s40537-019-0241-0 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 6

Author(s):

Khawla Tadist ◽

Said Najah ◽

Nikola S. Nikolov ◽

Fatiha Mrabti ◽

Azeddine Zahi

Keyword(s):

Systematic Review ◽

Feature Selection ◽

Big Data ◽

Selection Methods

Get full-text (via PubEx)

Sentiment Analysis of Movie Reviews: A Study of Machine Learning Algorithms with Various Feature Selection Methods

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v5i9.113121 ◽

2017 ◽

Vol 5 (9) ◽

Cited By ~ 1

Author(s):

Rajwinder Kaur

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Sentiment Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Selection Methods

Get full-text (via PubEx)

The Effectiveness of the Fused Weighted Filter Feature Selection Method to Improve Software Fault Prediction

Journal of Communications Technology Electronics and Computer Science ◽

10.22385/jctecs.v8i0.96 ◽

2016 ◽

Vol 8 ◽

pp. 5 ◽

Cited By ~ 1

Author(s):

Fatemeh Alighardashi ◽

Mohammad Ali Zare Chahooki

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Machine Learning Algorithms ◽

Fault Prediction ◽

Filter Method ◽

Selection Methods ◽

Software Projects ◽

Software Fault Prediction ◽

Software Fault

Improving the software product quality before releasing by periodic tests is one of the most expensive activities in software projects. Due to limited resources to modules test in software projects, it is important to identify fault-prone modules and use the test sources for fault prediction in these modules. Software fault predictors based on machine learning algorithms, are effective tools for identifying fault-prone modules. Extensive studies are being done in this field to find the connection between features of software modules, and their fault-prone. Some of features in predictive algorithms are ineffective and reduce the accuracy of prediction process. So, feature selection methods to increase performance of prediction models in fault-prone modules are widely used. In this study, we proposed a feature selection method for effective selection of features, by using combination of filter feature selection methods. In the proposed filter method, the combination of several filter feature selection methods presented as fused weighed filter method. Then, the proposed method caused convergence rate of feature selection as well as the accuracy improvement. The obtained results on NASA and PROMISE with ten datasets, indicates the effectiveness of proposed method in improvement of accuracy and convergence of software fault prediction.

Get full-text (via PubEx)

Feature Selection Methods in Sentiment Analysis

Proceedings of the 3rd International Conference on Networking, Information Systems & Security ◽

10.1145/3386723.3387840 ◽

2020 ◽

Author(s):

Nurilhami Izzatie Khairi ◽

Azlinah Mohamed ◽

Nor Nadiah Yusof

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Selection Methods

Get full-text (via PubEx)

A Unified View of Causal and Non-causal Feature Selection

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3436891 ◽

2021 ◽

Vol 15 (4) ◽

pp. 1-46

Author(s):

Kui Yu ◽

Lin Liu ◽

Jiuyong Li

Keyword(s):

Feature Selection ◽

Bayesian Network ◽

Synthetic Data ◽

Selection Methods ◽

Bayesian Network Model ◽

Real World Data ◽

Feature Sets ◽

Unified View ◽

Optimal Feature ◽

Different Levels

In this article, we aim to develop a unified view of causal and non-causal feature selection methods. The unified view will fill in the gap in the research of the relation between the two types of methods. Based on the Bayesian network framework and information theory, we first show that causal and non-causal feature selection methods share the same objective. That is to find the Markov blanket of a class attribute, the theoretically optimal feature set for classification. We then examine the assumptions made by causal and non-causal feature selection methods when searching for the optimal feature set, and unify the assumptions by mapping them to the restrictions on the structure of the Bayesian network model of the studied problem. We further analyze in detail how the structural assumptions lead to the different levels of approximations employed by the methods in their search, which then result in the approximations in the feature sets found by the methods with respect to the optimal feature set. With the unified view, we can interpret the output of non-causal methods from a causal perspective and derive the error bounds of both types of methods. Finally, we present practical understanding of the relation between causal and non-causal methods using extensive experiments with synthetic data and various types of real-world data.

Get full-text (via PubEx)