feature hashing Latest Research Papers

The most common machine-learning methods solve supervised and unsupervised problems based on datasets where the problem’s features belong to a numerical space. However, many problems often include data where numerical and categorical data coexist, which represents a challenge to manage them. To transform categorical data into a numeric form, preprocessing tasks are compulsory. Methods such as one-hot and feature-hashing have been the most widely used encoding approaches at the expense of a significant increase in the dimensionality of the dataset. This effect introduces unexpected challenges to deal with the overabundance of variables and/or noisy data. In this regard, in this paper we propose a novel encoding approach that maps mixed-type data into an information space using Shannon’s Theory to model the amount of information contained in the original data. We evaluated our proposal with ten mixed-type datasets from the UCI repository and two datasets representing real-world problems obtaining promising results. For demonstrating the performance of our proposal, this was applied for preparing these datasets for classification, regression, and clustering tasks. We demonstrate that our encoding proposal is remarkably superior to one-hot and feature-hashing encoding in terms of memory efficiency. Our proposal can preserve the information conveyed by the original data.

Download Full-text

Effect of Feature Hashing on Fair Classification

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD ◽

10.1145/3371158.3371230 ◽

2020 ◽

Author(s):

Ritik Dutta ◽

Varun Gohil ◽

Atishay Jain

Keyword(s):

Feature Hashing

Download Full-text

Local Feature Hashing With Binary Auto-Encoder for Face Recognition

IEEE Access ◽

10.1109/access.2020.2973472 ◽

2020 ◽

Vol 8 ◽

pp. 37526-37540

Author(s):

Jing Chen ◽

Yunxiao Zu

Keyword(s):

Face Recognition ◽

Local Feature ◽

Feature Hashing

Download Full-text

Local Feature Hashing with Graph Regularized Binary Auto-encoder for Face Recognition

2019 11th International Conference on Wireless Communications and Signal Processing (WCSP) ◽

10.1109/wcsp.2019.8928087 ◽

2019 ◽

Author(s):

Jing Chen ◽

Yunxiao Zu

Keyword(s):

Face Recognition ◽

Local Feature ◽

Feature Hashing

Download Full-text

Extremely Fast and Accurate Open Modification Spectral Library Searching of High-Resolution Mass Spectra Using Feature Hashing and Graphics Processing Units

Journal of Proteome Research ◽

10.1021/acs.jproteome.9b00291 ◽

2019 ◽

Vol 18 (10) ◽

pp. 3792-3799 ◽

Cited By ~ 7

Author(s):

Wout Bittremieux ◽

Kris Laukens ◽

William Stafford Noble

Keyword(s):

High Resolution ◽

Graphics Processing Units ◽

Mass Spectra ◽

Spectral Library ◽

High Resolution Mass ◽

Graphics Processing ◽

Feature Hashing ◽

Resolution Mass

Download Full-text

Extremely fast and accurate open modification spectral library searching of high-resolution mass spectra using feature hashing and graphics processing units

10.1101/627497 ◽

2019 ◽

Cited By ~ 1

Author(s):

Wout Bittremieux ◽

Kris Laukens ◽

William Stafford Noble

Keyword(s):

High Resolution ◽

Graphics Processing Units ◽

Protein Modification ◽

Large Scale ◽

Computational Cost ◽

Selection Procedure ◽

Spectral Library ◽

Data Set ◽

Graphics Processing ◽

Feature Hashing

AbstractOpen modification searching (OMS) is a powerful search strategy to identify peptides with any type of modification. OMS works by using a very wide precursor mass window to allow modified spectra to match against their unmodified variants, after which the modification types can be inferred from the corresponding precursor mass differences. A disadvantage of this strategy, however, is the large computational cost, because each query spectrum has to be compared against a multitude of candidate peptides.We have previously introduced the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. Here we demonstrate how this candidate selection procedure can be further optimized using graphics processing units. Additionally, we introduce a feature hashing scheme to convert high-resolution spectra to low-dimensional vectors. Based on these algorithmic advances, along with low-level code optimizations, the new version of ANN-SoLo is up to an order of magnitude faster than its initial version. This makes it possible to efficiently perform open searches on a large scale to gain a deeper understanding about the protein modification landscape. We demonstrate the computational efficiency and identification performance of ANN-SoLo based on a large data set of the draft human proteome.ANN-SoLo is implemented in Python and C++. It is freely available under the Apache 2.0 license at https://github.com/bittremieux/ANN-SoLo.

Download Full-text

Sentiment Analysis of Twitter Feeds, Effect of Feature Hashing on Model Accuracy

2018 IEEE 7th International Conference on Adaptive Science & Technology (ICAST) ◽

10.1109/icastech.2018.8507109 ◽

2018 ◽

Keyword(s):

Sentiment Analysis ◽

Model Accuracy ◽

Feature Hashing

Download Full-text

feature hashing
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Feature Hashing with Insertion and Deletion of Features

Compact feature hashing for machine learning based malware detection

A New Feature Hashing Approach Based on Term Weight for Dimensional Reduction

A Memory-Efficient Encoding Method for Processing Mixed-Type Data on Machine Learning

Effect of Feature Hashing on Fair Classification

Local Feature Hashing With Binary Auto-Encoder for Face Recognition

Local Feature Hashing with Graph Regularized Binary Auto-encoder for Face Recognition

Extremely Fast and Accurate Open Modification Spectral Library Searching of High-Resolution Mass Spectra Using Feature Hashing and Graphics Processing Units

Extremely fast and accurate open modification spectral library searching of high-resolution mass spectra using feature hashing and graphics processing units

Sentiment Analysis of Twitter Feeds, Effect of Feature Hashing on Model Accuracy

Export Citation Format

feature hashingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Feature Hashing with Insertion and Deletion of Features

Compact feature hashing for machine learning based malware detection

A New Feature Hashing Approach Based on Term Weight for Dimensional Reduction

A Memory-Efficient Encoding Method for Processing Mixed-Type Data on Machine Learning

Effect of Feature Hashing on Fair Classification

Local Feature Hashing With Binary Auto-Encoder for Face Recognition

Local Feature Hashing with Graph Regularized Binary Auto-encoder for Face Recognition

Extremely Fast and Accurate Open Modification Spectral Library Searching of High-Resolution Mass Spectra Using Feature Hashing and Graphics Processing Units

Extremely fast and accurate open modification spectral library searching of high-resolution mass spectra using feature hashing and graphics processing units

Sentiment Analysis of Twitter Feeds, Effect of Feature Hashing on Model Accuracy

feature hashing
Recently Published Documents