Comparative Study of Microarray Based Disease Prediction - A Survey

Recognition of genetic expression becomes an important issue for research while diagnosing genetic diseases. Microarrays are considered as the representation for identifying gene behaviors that may help in detection process. Hence, it is used in analyzing samples that may be normal or affected, also in diagnosing various gene-based diseases. Various clustering and classification techniques were used to face the challenges in handling microarray. High dimensional data is one of the major issues caused while handling microarray. Also because of this issue, possibilities of redundant, irrelevant and noisy data may occur. To solve this problem feature selection process which optimally extracts the features is introduced in clustering in classification techniques. This survey observes some various techniques of classification, clustering of genes and feature selection methods such as supervised, unsupervised and semi-supervised methods. To determine the suitable semi-supervised algorithm that combines and analyze for detecting new or difficult mutated disease. This survey shows that how semi-supervised approach evolves and outperforms the existing algorithms.

Download Full-text

AN ANALYSIS ON FEATURE SELECTION METHODS, CLUSTERING AND CLASSIFICATION USED IN HEART DISEASE PREDICTION –A MACHINE LEARNING APPROACH

Journal of Critical Reviews ◽

10.31838/jcr.07.06.27 ◽

2020 ◽

Vol 7 (06) ◽

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Heart Disease ◽

Learning Approach ◽

Disease Prediction ◽

Selection Methods ◽

Machine Learning Approach ◽

Clustering And Classification

Download Full-text

Enhanced Filter Feature Selection Methods for Arabic Text Categorization

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2018040101 ◽

2018 ◽

Vol 8 (2) ◽

pp. 1-24 ◽

Cited By ~ 1

Author(s):

Abdullah Saeed Ghareb ◽

Azuraliza Abu Bakara ◽

Qasem A. Al-Radaideh ◽

Abdul Razak Hamdan

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Selection Process ◽

High Dimensional Data ◽

Relevant Information ◽

High Dimensional ◽

Arabic Text ◽

Relevant Feature ◽

Associative Classification ◽

Selection Methods

The filtering of a large amount of data is an important process in data mining tasks, particularly for the categorization of unstructured high dimensional data. Therefore, a feature selection process is desired to reduce the space of high dimensional data into small relevant subset dimensions that represent the best features for text categorization. In this article, three enhanced filter feature selection methods, Category Relevant Feature Measure, Modified Category Discriminated Measure, and Odd Ratio2, are proposed. These methods combine the relevant information about features in both the inter- and intra-category. The effectiveness of the proposed methods with Naïve Bayes and associative classification is evaluated by traditional measures of text categorization, namely, macro-averaging of precision, recall, and F-measure. Experiments are conducted on three Arabic text datasets used for text categorization. The experimental results showed that the proposed methods are able to achieve better and comparable results when compared to 12 well known traditional methods.

Download Full-text

Opportunities and Challenges of Feature Selection Methods for High Dimensional Data: A Review

Ingénierie des systèmes d information ◽

10.18280/isi.260107 ◽

2021 ◽

Vol 26 (1) ◽

pp. 67-77

Author(s):

Siva Sankari Subbiah ◽

Jayakumar Chinnappan

Keyword(s):

Feature Selection ◽

Big Data ◽

Large Scale ◽

High Dimensional Data ◽

Research Work ◽

Basic Feature ◽

High Dimensional ◽

Selection Methods ◽

Fast Development ◽

Improved Accuracy

Now a day, all the organizations collecting huge volume of data without knowing its usefulness. The fast development of Internet helps the organizations to capture data in many different formats through Internet of Things (IoT), social media and from other disparate sources. The dimension of the dataset increases day by day at an extraordinary rate resulting in large scale dataset with high dimensionality. The present paper reviews the opportunities and challenges of feature selection for processing the high dimensional data with reduced complexity and improved accuracy. In the modern big data world the feature selection has a significance in reducing the dimensionality and overfitting of the learning process. Many feature selection methods have been proposed by researchers for obtaining more relevant features especially from the big datasets that helps to provide accurate learning results without degradation in performance. This paper discusses the importance of feature selection, basic feature selection approaches, centralized and distributed big data processing using Hadoop and Spark, challenges of feature selection and provides the summary of the related research work done by various researchers. As a result, the big data analysis with the feature selection improves the accuracy of the learning.

Download Full-text

Incorporating Pathway Information into Feature Selection towards Better Performed Gene Signatures

BioMed Research International ◽

10.1155/2019/2497509 ◽

2019 ◽

Vol 2019 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Suyan Tian ◽

Chi Wang ◽

Bing Wang

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Gene Selection ◽

Selection Process ◽

Biological Knowledge ◽

Expression Data ◽

Selection Methods ◽

Its Gene ◽

Active Research

To analyze gene expression data with sophisticated grouping structures and to extract hidden patterns from such data, feature selection is of critical importance. It is well known that genes do not function in isolation but rather work together within various metabolic, regulatory, and signaling pathways. If the biological knowledge contained within these pathways is taken into account, the resulting method is a pathway-based algorithm. Studies have demonstrated that a pathway-based method usually outperforms its gene-based counterpart in which no biological knowledge is considered. In this article, a pathway-based feature selection is firstly divided into three major categories, namely, pathway-level selection, bilevel selection, and pathway-guided gene selection. With bilevel selection methods being regarded as a special case of pathway-guided gene selection process, we discuss pathway-guided gene selection methods in detail and the importance of penalization in such methods. Last, we point out the potential utilizations of pathway-guided gene selection in one active research avenue, namely, to analyze longitudinal gene expression data. We believe this article provides valuable insights for computational biologists and biostatisticians so that they can make biology more computable.

Download Full-text

Ranking Based Unsupervised Feature Selection Methods: An Empirical Comparative Study in High Dimensional Datasets

Advances in Soft Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-030-04491-6_16 ◽

2018 ◽

pp. 205-218

Author(s):

Saúl Solorio-Fernández ◽

J. Ariel Carrasco-Ochoa ◽

José Fco. Martínez-Trinidad

Keyword(s):

Feature Selection ◽

Comparative Study ◽

High Dimensional ◽

Selection Methods ◽

Unsupervised Feature Selection ◽

High Dimensional Datasets

Download Full-text

Hybrid feature selection methods for high-dimensional multi-class datasets

International Journal of Data Mining Modelling and Management ◽

10.1504/ijdmmm.2017.088411 ◽

2017 ◽

Vol 9 (4) ◽

pp. 315

Author(s):

Amit Kumar Saxena ◽

Vimal Kumar Dubey ◽

John Wang

Keyword(s):

Feature Selection ◽

High Dimensional ◽

Selection Methods

Download Full-text

On the scalability of feature selection methods on high-dimensional data

Knowledge and Information Systems ◽

10.1007/s10115-017-1140-3 ◽

2017 ◽

Vol 56 (2) ◽

pp. 395-442 ◽

Cited By ~ 11

Author(s):

V. Bolón-Canedo ◽

D. Rego-Fernández ◽

D. Peteiro-Barral ◽

A. Alonso-Betanzos ◽

B. Guijarro-Berdiñas ◽

...

Keyword(s):

Feature Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Selection Methods

Download Full-text

Robust Feature Selection on Incomplete Data

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/443 ◽

2018 ◽

Cited By ~ 1

Author(s):

Wei Zheng ◽

Xiaofeng Zhu ◽

Yonghua Zhu ◽

Shichao Zhang

Keyword(s):

Feature Selection ◽

Incomplete Data ◽

High Dimensional ◽

Data Sets ◽

Selection Methods ◽

Limited Ability ◽

Training Samples ◽

Indicator Matrix ◽

Selection Framework ◽

Incomplete Datasets

Feature selection is an indispensable preprocessing procedure for high-dimensional data analysis,but previous feature selection methods usually ignore sample diversity (i.e., every sample has individual contribution for the model construction) andhave limited ability to deal with incomplete datasets where a part of training samples have unobserved data. To address these issues, in this paper, we firstly propose a robust feature selectionframework to relieve the influence of outliers, andthen introduce an indicator matrix to avoid unobserved data to take participation in numerical computation of feature selection so that both our proposed feature selection framework and exiting feature selection frameworks are available to conductfeature selection on incomplete data sets. We further propose a new optimization algorithm to optimize the resulting objective function as well asprove our algorithm to converge fast. Experimental results on both real and artificial incompletedata sets demonstrated that our proposed methodoutperformed the feature selection methods undercomparison in terms of clustering performance.

Download Full-text

Comparative Analysis of Selected Heterogeneous Classifiers for Software Defects Prediction Using Filter-Based Feature Selection Methods

FUOYE Journal of Engineering and Technology ◽

10.46792/fuoyejet.v3i1.178 ◽

2018 ◽

Vol 3 (1) ◽

Cited By ~ 3

Author(s):

Abimbola G Akintola ◽

Abdullateef Balogun ◽

Fatimah B Lafenwa-Balogun ◽

Hameed A Mojeed

Keyword(s):

Feature Selection ◽

Multilayer Perceptron ◽

Principal Component ◽

Classification Model ◽

Selection Methods ◽

Software Defects ◽

Classification Techniques ◽

Irrelevant Attributes ◽

Neural Network Classifiers ◽

Data Program

Classification techniques is a popular approach to predict software defects and it involves categorizing modules, which is represented by a set of metrics or code attributes into fault prone (FP) and non-fault prone (NFP) by means of a classification model. Nevertheless, there is existence of low quality, unreliable, redundant and noisy data which negatively affect the process of observing knowledge and useful pattern. Therefore, researchers need to retrieve relevant data from huge records using feature selection methods. Feature selection is the process of identifying the most relevant attributes and removing the redundant and irrelevant attributes. In this study, the researchers investigated the effect of filter feature selection on classification techniques in software defects prediction. Ten publicly available datasets of NASA and Metric Data Program software repository were used. The topmost discriminatory attributes of the dataset were evaluated using Principal Component Analysis (PCA), CFS and FilterSubsetEval. The datasets were classified by the selected classifiers which were carefully selected based on heterogeneity. Naïve Bayes was selected from Bayes category Classifier, KNN was selected from Instance Based Learner category, J48 Decision Tree from Trees Function classifier and Multilayer perceptron was selected from the neural network classifiers. The experimental results revealed that the application of feature selection to datasets before classification in software defects prediction is better and should be encouraged and Multilayer perceptron with FilterSubsetEval had the best accuracy. It can be concluded that feature selection methods are capable of improving the performance of learning algorithms in software defects prediction.

Download Full-text

SPARSITY SCORE: A NOVEL GRAPH-PRESERVING FEATURE SELECTION METHOD

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001414500098 ◽

2014 ◽

Vol 28 (04) ◽

pp. 1450009 ◽

Cited By ~ 16

Author(s):

MINGXIA LIU ◽

DAOQIANG ZHANG

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Original Data ◽

Selection Method ◽

Compact Representation ◽

Data Sets ◽

Selection Methods ◽

Clustering And Classification ◽

Filter Type ◽

Selection Framework

As thousands of features are available in many pattern recognition and machine learning applications, feature selection remains an important task to find the most compact representation of the original data. In the literature, although a number of feature selection methods have been developed, most of them focus on optimizing specific objective functions. In this paper, we first propose a general graph-preserving feature selection framework where graphs to be preserved vary in specific definitions, and show that a number of existing filter-type feature selection algorithms can be unified within this framework. Then, based on the proposed framework, a new filter-type feature selection method called sparsity score (SS) is proposed. This method aims to preserve the structure of a pre-defined l1 graph that is proven robust to data noise. Here, the modified sparse representation based on an l1-norm minimization problem is used to determine the graph adjacency structure and corresponding affinity weight matrix simultaneously. Furthermore, a variant of SS called supervised SS (SuSS) is also proposed, where the l1 graph to be preserved is constructed by using only data points from the same class. Experimental results of clustering and classification tasks on a series of benchmark data sets show that the proposed methods can achieve better performance than conventional filter-type feature selection methods.

Download Full-text