Population-Based Feature Selection for Biomedical Data Classification

Biomedical Engineering ◽

10.4018/978-1-5225-3158-6.ch008 ◽

2018 ◽

pp. 199-231

Author(s):

Seyed Jalaleddin Mousavirad ◽

Hossein Ebrahimpour-Komleh

Keyword(s):

Feature Selection ◽

Learning Algorithm ◽

Selection Process ◽

Data Classification ◽

Population Based ◽

Statistical Characteristics ◽

Biomedical Data ◽

Filter Methods ◽

Embedded Methods

Classification of biomedical data plays a significant role in prediction and diagnosis of disease. The existence of redundant and irrelevant features is one of the major problems in biomedical data classification. Excluding these features can improve the performance of classification algorithm. Feature selection is the problem of selecting a subset of features without reducing the accuracy of the original set of features. These algorithms are divided into three categories: wrapper, filter, and embedded methods. Wrapper methods use the learning algorithm for selection of features while filter methods use statistical characteristics of data. In the embedded methods, feature selection process combines with the learning process. Population-based metaheuristics can be applied for wrapper feature selection. In these algorithms, a population of candidate solutions is created. Then, they try to improve the objective function using some operators. This chapter presents the application of population-based feature selection to deal with issues of high dimensionality in the biomedical data classification. The result shows that population-based feature selection has presented acceptable performance in biomedical data classification.

Download Full-text

Feature Selection using FFS and PCA in Biomedical Data Classification with AdaBoost-SVM

International Journal of Intelligent Systems and Applications in Engineering ◽

10.18201/ijisae.2018637928 ◽

2018 ◽

Vol 1 (6) ◽

pp. 33-39

Author(s):

Rahime Ceylan

Keyword(s):

Feature Selection ◽

Data Classification ◽

Biomedical Data

Download Full-text

Feature selection algorithm for high dimensional biomedical data classification based on redundant removal

10.14236/ewic/hci2018.232 ◽

2018 ◽

Author(s):

Bingtao Zhang ◽

Peng Cao ◽

Yi Zhang ◽

Chaochao Zhang ◽

Zhe Li ◽

...

Keyword(s):

Feature Selection ◽

Data Classification ◽

High Dimensional ◽

Biomedical Data ◽

Selection Algorithm ◽

Feature Selection Algorithm

Download Full-text

Classification of Online Grooming on Chat Logs Using Two Term Weighting Schemes

International Journal of Innovative Computing ◽

10.11113/ijic.v9n2.239 ◽

2019 ◽

Vol 9 (2) ◽

Author(s):

Nur Rafeeqkha Sulaiman ◽

Maheyzah Md. Siraj

Keyword(s):

Feature Selection ◽

Selection Process ◽

Support Vector ◽

Term Weighting ◽

Human Errors ◽

Weighting Schemes ◽

Intelligence Gathering ◽

Internet Users ◽

Document Frequency

Due to the growth of Internet, it has not only become the medium for getting information, it has also become a platform for communicating. Social Network Service (SNS) is one of the main platform where Internet users can communicate by distributing, sharing of information and knowledge. Chatting has become a popular communication medium for Internet users whereby users can communicate directly and privately with each other. However, due to the privacy of chat rooms or chatting mediums, the content of chat logs is not monitored and not filtered. Thus, easing cyber predators preying on their preys. Cyber groomers are one of cyber predators who prey on children or minors to satisfy their sexual desire. Workforce expertise that involve in intelligence gathering always deals with difficulty as the complexity of crime increases, human errors and time constraints. Hence, it is difficult to prevent undesired content, such as grooming conversation, in chat logs. An investigation on two term weighting schemes on two datasets are used to improve the content-based classification techniques. This study aims to improve the content-based classification accuracy on chat logs by comparing two term weighting schemes in classifying grooming contents. Two term weighting schemes namely Term Frequency – Inverse Document Frequency – Inverse Class Space Density Frequency (TF.IDF.ICSdF) and Fuzzy Rough Feature Selection (FRFS) are used as feature selection process in filtering chat logs. The performance of these techniques were examined via datasets, and the accuracy of their result was measured by Support Vector Machine (SVM). TF.IDF.ICSdF and FRFS are judged based on accuracy, precision, recall and F score measurement.

Download Full-text

Analysis of DNA Sequence Classification Using CNN and Hybrid Models

Computational and Mathematical Methods in Medicine ◽

10.1155/2021/1835056 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Hemalatha Gunasekaran ◽

K. Ramalakshmi ◽

A. Rex Macedo Arokiaraj ◽

S. Deepa Kanmani ◽

Chandran Venkatesan ◽

...

Keyword(s):

Dna Sequence ◽

Selection Process ◽

High Accuracy ◽

Machine Learning Techniques ◽

Biomedical Data ◽

Sequence Classification ◽

Learning Techniques ◽

Testing Data ◽

Bidirectional Lstm

In a general computational context for biomedical data analysis, DNA sequence classification is a crucial challenge. Several machine learning techniques have used to complete this task in recent years successfully. Identification and classification of viruses are essential to avoid an outbreak like COVID-19. Regardless, the feature selection process remains the most challenging aspect of the issue. The most commonly used representations worsen the case of high dimensionality, and sequences lack explicit features. It also helps in detecting the effect of viruses and drug design. In recent days, deep learning (DL) models can automatically extract the features from the input. In this work, we employed CNN, CNN-LSTM, and CNN-Bidirectional LSTM architectures using Label and K -mer encoding for DNA sequence classification. The models are evaluated on different classification metrics. From the experimental results, the CNN and CNN-Bidirectional LSTM with K -mer encoding offers high accuracy with 93.16% and 93.13%, respectively, on testing data.

Download Full-text

MapReduce-based big data classification model using feature subset selection and hyperparameter tuned deep belief network

Scientific Reports ◽

10.1038/s41598-021-03019-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Surendran Rajendran ◽

Osamah Ibrahim Khalaf ◽

Youseef Alotaibi ◽

Saleh Alghamdi

Keyword(s):

Feature Selection ◽

Big Data ◽

Selection Process ◽

Data Classification ◽

Deep Belief Network ◽

Feature Subset Selection ◽

Classification Model ◽

Feature Subset ◽

Belief Network ◽

Big Data Classification

AbstractIn recent times, big data classification has become a hot research topic in various domains, such as healthcare, e-commerce, finance, etc. The inclusion of the feature selection process helps to improve the big data classification process and can be done by the use of metaheuristic optimization algorithms. This study focuses on the design of a big data classification model using chaotic pigeon inspired optimization (CPIO)-based feature selection with an optimal deep belief network (DBN) model. The proposed model is executed in the Hadoop MapReduce environment to manage big data. Initially, the CPIO algorithm is applied to select a useful subset of features. In addition, the Harris hawks optimization (HHO)-based DBN model is derived as a classifier to allocate appropriate class labels. The design of the HHO algorithm to tune the hyperparameters of the DBN model assists in boosting the classification performance. To examine the superiority of the presented technique, a series of simulations were performed, and the results were inspected under various dimensions. The resultant values highlighted the supremacy of the presented technique over the recent techniques.

Download Full-text

Classification of Diabetes using Random Forest with Feature Selection Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3595.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 1295-1300 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Electronic Health Records ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Health Records

Diabetes has become a serious problem now a day. So there is a need to take serious precautions to eradicate this. To eradicate, we should know the level of occurrence. In this project we predict the level of occurrence of diabetes. We predict the level of occurrence of diabetes using Random Forest, a Machine Learning Algorithm. Using the patient’s Electronic Health Records (EHR) we can build accurate models that predict the presence of diabetes.

Download Full-text

A multivariate feature selection framework for high dimensional biomedical data classification

2017 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) ◽

10.1109/cibcb.2017.8058528 ◽

2017 ◽

Cited By ~ 1

Author(s):

Abeer Alzubaidi ◽

Georgina Cosma

Keyword(s):

Feature Selection ◽

Data Classification ◽

High Dimensional ◽

Biomedical Data ◽

Selection Framework

Download Full-text

A Hybrid Feature Selection Based on Mutual Information and Genetic Algorithm

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v7.i1.pp214-225 ◽

2017 ◽

Vol 7 (1) ◽

pp. 214

Author(s):

Yuan-Dong Lan

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Mutual Information ◽

Learning Algorithm ◽

Selection Process ◽

Optimal Subset ◽

Benchmark Datasets ◽

Two Phases ◽

Feature Dimension ◽

Necessary And Sufficient

Feature selection aims to choose an optimal subset of features that are necessary and sufficient to improve the generalization performance and the running efficiency of the learning algorithm. To get the optimal subset in the feature selection process, a hybrid feature selection based on mutual information and genetic algorithm is proposed in this paper. In order to make full use of the advantages of filter and wrapper model, the algorithm is divided into two phases: the filter phase and the wrapper phase. In the filter phase, this algorithm first uses the mutual information to sort the feature, and provides the heuristic information for the subsequent genetic algorithm, to accelerate the search process of the genetic algorithm. In the wrapper phase, using the genetic algorithm as the search strategy, considering the performance of the classifier and dimension of subset as an evaluation criterion, search the best subset of features. Experimental results on benchmark datasets show that the proposed algorithm has higher classification accuracy and smaller feature dimension, and its running time is less than the time of using genetic algorithm.

Download Full-text