Impact of feature representation on supervised classifiers — A comparative analysis

Abstract Background Nucleosome plays an important role in the process of genome expression, DNA replication, DNA repair and transcription. Therefore, the research of nucleosome positioning has invariably received extensive attention. Considering the diversity of DNA sequence representation methods, we tried to integrate multiple features to analyze its effect in the process of nucleosome positioning analysis. This process can also deepen our understanding of the theoretical analysis of nucleosome positioning. Results Here, we not only used frequency chaos game representation (FCGR) to construct DNA sequence features, but also integrated it with other features and adopted the principal component analysis (PCA) algorithm. Simultaneously, support vector machine (SVM), extreme learning machine (ELM), extreme gradient boosting (XGBoost), multilayer perceptron (MLP) and convolutional neural networks (CNN) are used as predictors for nucleosome positioning prediction analysis, respectively. The integrated feature vector prediction quality is significantly superior to a single feature. After using principal component analysis (PCA) to reduce the feature dimension, the prediction quality of H. sapiens dataset has been significantly improved. Conclusions Comparative analysis and prediction on H. sapiens, C. elegans, D. melanogaster and S. cerevisiae datasets, demonstrate that the application of FCGR to nucleosome positioning is feasible, and we also found that integrative feature representation would be better.

Download Full-text

Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms

Briefings in Bioinformatics ◽

10.1093/bib/bby107 ◽

2018 ◽

Cited By ~ 19

Author(s):

Leyi Wei ◽

Jie Hu ◽

Fuyi Li ◽

Jiangning Song ◽

Ran Su ◽

...

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Quorum Sensing ◽

Learning Algorithms ◽

Representation Learning ◽

Machine Learning Algorithms ◽

Feature Representation

Download Full-text

A Comparative Analysis of Supervised Classifiers Employing NCA for Feature Selection to Secure Generation Control

2021 1st International Conference on Power Electronics and Energy (ICPEE) ◽

10.1109/icpee50452.2021.9358601 ◽

2021 ◽

Author(s):

Siddhartha Deb Roy ◽

Sanjoy Debbarma ◽

Subhasish Deb

Keyword(s):

Feature Selection ◽

Comparative Analysis ◽

Supervised Classifiers ◽

Generation Control

Download Full-text

The Influence of Feature Representation of Text on the Performance of Document Classification

Applied Sciences ◽

10.3390/app9040743 ◽

2019 ◽

Vol 9 (4) ◽

pp. 743 ◽

Cited By ~ 4

Author(s):

Sanda Martinčić-Ipšić ◽

Tanja Miličić ◽

and Todorovski

Keyword(s):

Comparative Analysis ◽

Document Classification ◽

Feature Representation ◽

Superior Performance ◽

Bag Of Words ◽

Document Representation ◽

Text Documents ◽

Continuous Space ◽

The One ◽

Low Dimensional

In this paper we perform a comparative analysis of three models for a feature representation of text documents in the context of document classification. In particular, we consider the most often used family of bag-of-words models, the recently proposed continuous space models word2vec and doc2vec, and the model based on the representation of text documents as language networks. While the bag-of-word models have been extensively used for the document classification task, the performance of the other two models for the same task have not been well understood. This is especially true for the network-based models that have been rarely considered for the representation of text documents for classification. In this study, we measure the performance of the document classifiers trained using the method of random forests for features generated with the three models and their variants. Multi-objective rankings are proposed as the framework for multi-criteria comparative analysis of the results. Finally, the results of the empirical comparison show that the commonly used bag-of-words model has a performance comparable to the one obtained by the emerging continuous-space model of doc2vec. In particular, the low-dimensional variants of doc2vec generating up to 75 features are among the top-performing document representation models. The results finally point out that doc2vec shows a superior performance in the tasks of classifying large documents.

Download Full-text