Supervised Short-Length Hashing

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/420 ◽

2019 ◽

Cited By ~ 3

Author(s):

Xingbo Liu ◽

Xiushan Nie ◽

Quan Zhou ◽

Xiaoming Xi ◽

Lei Zhu ◽

...

Keyword(s):

Short Length ◽

Superior Performance ◽

High Dimensional ◽

Robust Estimator ◽

Compact Binary ◽

Hash Code ◽

Label Information ◽

Efficient Retrieval ◽

And Storage ◽

Hash Codes

Hashing can compress high-dimensional data into compact binary codes, while preserving the similarity, to facilitate efficient retrieval and storage. However, when retrieving using an extremely short length hash code learned by the existing methods, the performance cannot be guaranteed because of severe information loss. To address this issue, in this study, we propose a novel supervised short-length hashing (SSLH). In this proposed SSLH, mutual reconstruction between the short-length hash codes and original features are performed to reduce semantic loss. Furthermore, to enhance the robustness and accuracy of the hash representation, a robust estimator term is added to fully utilize the label information. Extensive experiments conducted on four image benchmarks demonstrate the superior performance of the proposed SSLH with short-length hash codes. In addition, the proposed SSLH outperforms the existing methods, with long-length hash codes. To the best of our knowledge, this is the first linear-based hashing method that focuses on both short and long-length hash codes for maintaining high precision.

Download Full-text

Jointly Multiple Hash Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019981 ◽

2019 ◽

Vol 33 ◽

pp. 9981-9982

Author(s):

Xingbo Liu ◽

Xiushan Nie ◽

Yingxin Wang ◽

Yilong Yin

Keyword(s):

Superior Performance ◽

Closed Form Solutions ◽

Fixed Length ◽

Compact Binary ◽

Proposed Model ◽

Efficient Retrieval ◽

Benchmark Datasets ◽

One Step ◽

Types Of Information ◽

Hash Codes

Hashing can compress heterogeneous high-dimensional data into compact binary codes while preserving the similarity to facilitate efficient retrieval and storage, and thus hashing has recently received much attention from information retrieval researchers. Most of the existing hashing methods first predefine a fixed length (e.g., 32, 64, or 128 bit) for the hash codes before learning them with this fixed length. However, one sample can be represented by various hash codes with different lengths, and thus there must be some associations and relationships among these different hash codes because they represent the same sample. Therefore, harnessing these relationships will boost the performance of hashing methods. Inspired by this possibility, in this study, we propose a new model jointly multiple hash learning (JMH), which can learn hash codes with multiple lengths simultaneously. In the proposed JMH method, three types of information are used for hash learning, which come from hash codes with different lengths, the original features of the samples and label. In contrast to the existing hashing methods, JMH can learn hash codes with different lengths in one step. Users can select appropriate hash codes for their retrieval tasks according to the requirements in terms of accuracy and complexity. To the best of our knowledge, JMH is one of the first attempts to learn multi-length hash codes simultaneously. In addition, in the proposed model, discrete and closed-form solutions for variables can be obtained by cyclic coordinate descent, thereby making the proposed model much faster during training. Extensive experiments were performed based on three benchmark datasets and the results demonstrated the superior performance of the proposed method.

Download Full-text

Scalable supervised online hashing for image retrieval

Journal of Computational Design and Engineering ◽

10.1093/jcde/qwab052 ◽

2021 ◽

Vol 8 (5) ◽

pp. 1391-1406

Author(s):

Yuzhi Fang ◽

Li Liu

Keyword(s):

Data Streams ◽

Data Stream ◽

Superior Performance ◽

Binary Codes ◽

Retrieval Performance ◽

Unified Framework ◽

Retrieval Task ◽

Compact Binary ◽

Hash Codes ◽

Existing Data

Abstract Online hashing methods aim to learn compact binary codes of the new data stream, and update the hash function to renew the codes of the existing data. However, the addition of new data streams has a vital impact on the retrieval performance of the entire retrieval system, especially the similarity measurement between new data streams and existing data, which has always been one of the focuses of online retrieval research. In this paper, we present a novel scalable supervised online hashing method, to solve the above problems within a unified framework. Specifically, the similarity matrix is established by the label matrix of the existing data and the new data stream. The projection of the existing data label matrix is then used as an intermediate term to approximate the binary codes of the existing data, which not only realizes the semantic information of the hash codes learning but also effectively alleviates the problem of data imbalance. In addition, an alternate optimization algorithm is proposed to efficiently make the solution of the model. Extensive experiments on three widely used datasets validate its superior performance over several state-of-the-art methods in terms of both accuracy and scalability for online retrieval task.

Download Full-text

Improved CNN-Based Hashing for Encrypted Image Retrieval

Security and Communication Networks ◽

10.1155/2021/5556634 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Wenyan Pan ◽

Meimin Wang ◽

Jiaohua Qin ◽

Zhili Zhou

Keyword(s):

Image Retrieval ◽

Computational Cost ◽

Image Data ◽

Image Size ◽

Encrypted Image ◽

Compact Binary ◽

Hash Code ◽

Retrieval Efficiency ◽

Efficient Retrieval ◽

Encrypted Images

As more and more image data are stored in the encrypted form in the cloud computing environment, it has become an urgent problem that how to efficiently retrieve images on the encryption domain. Recently, Convolutional Neural Network (CNN) features have achieved promising performance in the field of image retrieval, but the high dimension of CNN features will cause low retrieval efficiency. Also, it is not suitable to directly apply them for image retrieval on the encryption domain. To solve the above issues, this paper proposes an improved CNN-based hashing method for encrypted image retrieval. First, the image size is increased and inputted into the CNN to improve the representation ability. Then, a lightweight module is introduced to replace a part of modules in the CNN to reduce the parameters and computational cost. Finally, a hash layer is added to generate a compact binary hash code. In the retrieval process, the hash code is used for encrypted image retrieval, which greatly improves the retrieval efficiency. The experimental results show that the scheme allows an effective and efficient retrieval of encrypted images.

Download Full-text

A Mobile-Oriented System for Integrity Preserving in Audio Forensics

Applied Sciences ◽

10.3390/app9153097 ◽

2019 ◽

Vol 9 (15) ◽

pp. 3097 ◽

Cited By ~ 2

Author(s):

Diego Renza ◽

Jaime Andres Arango ◽

Dora Maria Ballesteros

Keyword(s):

Computational Cost ◽

Digital Evidence ◽

Audio Signals ◽

Audio Forensics ◽

Audio Recordings ◽

Hash Code ◽

Chain Of Custody ◽

Code Changes ◽

Hash Codes

This paper addresses a problem in the field of audio forensics. With the aim of providing a solution that helps Chain of Custody (CoC) processes, we propose an integrity verification system that includes capture (mobile based), hash code calculation and cloud storage. When the audio is recorded, a hash code is generated in situ by the capture module (an application), and it is sent immediately to the cloud. Later, the integrity of the audio recording given as evidence can be verified according to the information stored in the cloud. To validate the properties of the proposed scheme, we conducted several tests to evaluate if two different inputs could generate the same hash code (collision resistance), and to evaluate how much the hash code changes when small changes occur in the input (sensitivity analysis). According to the results, all selected audio signals provide different hash codes, and these values are very sensitive to small changes over the recorded audio. On the other hand, in terms of computational cost, less than 2 s per minute of recording are required to calculate the hash code. With the above results, our system is useful to verify the integrity of audio recordings that may be relied on as digital evidence.

Download Full-text

Multiobjective Optimization for High Dimensional Expensively Constrained Black-Box Problems

Journal of Mechanical Design ◽

10.1115/1.4050749 ◽

2021 ◽

pp. 1-59

Author(s):

George Cheng ◽

G. Gary Wang ◽

Yeong-Maw Hwang

Keyword(s):

Multiobjective Optimization ◽

Trust Region ◽

Adaptive Strategy ◽

Black Box ◽

Superior Performance ◽

High Dimensional ◽

Multi Objective Optimization ◽

Semiconductor Substrate ◽

Multi Objective ◽

Computationally Expensive

Abstract Multi-objective optimization (MOO) problems with computationally expensive constraints are commonly seen in real-world engineering design. However, metamodel based design optimization (MBDO) approaches for MOO are often not suitable for high-dimensional problems and often do not support expensive constraints. In this work, the Situational Adaptive Kreisselmeier and Steinhauser (SAKS) method was combined with a new multi-objective trust region optimizer (MTRO) strategy to form the SAKS-MTRO method for MOO problems with expensive black-box constraint functions. The SAKS method is an approach that hybridizes the modeling and aggregation of expensive constraints and adds an adaptive strategy to control the level of hybridization. The MTRO strategy uses a combination of objective decomposition and K-means clustering to handle MOO problems. SAKS-MTRO was benchmarked against four popular multi-objective optimizers and demonstrated superior performance on average. SAKS-MTRO was also applied to optimize the design of a semiconductor substrate and the design of an industrial recessed impeller.

Download Full-text

Comparison of skin biopsy sample processing and storage methods on high dimensional immune gene expression using the Nanostring nCounter system

Diagnostic Pathology ◽

10.1186/s13000-020-00974-4 ◽

2020 ◽

Vol 15 (1) ◽

Author(s):

Jelena Vider ◽

Andrew Croaker ◽

Amanda J. Cox ◽

Emma Raymond ◽

Rebecca Rogers ◽

...

Keyword(s):

Gene Expression ◽

Skin Biopsy ◽

Biopsy Sample ◽

High Dimensional ◽

Immune Gene ◽

Sample Processing ◽

Immune Gene Expression ◽

Storage Methods ◽

Processing And Storage ◽

And Storage

Download Full-text

Unsupervised Hashing with Gradient Attention

Symmetry ◽

10.3390/sym12071193 ◽

2020 ◽

Vol 12 (7) ◽

pp. 1193

Author(s):

Shaochen Jiang ◽

Liejun Wang ◽

Shuli Cheng ◽

Anyu Du ◽

Yongming Li

Keyword(s):

Gradient Descent ◽

Network Models ◽

Image Features ◽

Similarity Matrix ◽

Hash Code ◽

Unsupervised Training ◽

Public Datasets ◽

Cosine Distance ◽

Hash Codes ◽

Trained Network

The existing learning-based unsupervised hashing method usually uses a pre-trained network to extract features, and then uses the extracted feature vectors to construct a similarity matrix which guides the generation of hash codes through gradient descent. Existing research shows that the algorithm based on gradient descent will cause the hash codes of the paired images to be updated toward each other’s position during the training process. For unsupervised training, this situation will cause large fluctuations in the hash code during training and limit the learning efficiency of the hash code. In this paper, we propose a method named Deep Unsupervised Hashing with Gradient Attention (UHGA) to solve this problem. UHGA mainly includes the following contents: (1) use pre-trained network models to extract image features; (2) calculate the cosine distance of the corresponding features of the pair of images, and construct a similarity matrix through the cosine distance to guide the generation of hash codes; (3) a gradient attention mechanism is added during the training of the hash code to pay attention to the gradient. Experiments on two existing public datasets show that our proposed method can obtain more discriminating hash codes.

Download Full-text

Packet Capture and Analysis on MEDINA, A Massively Distributed Network Data Caching Platform

Parallel Processing Letters ◽

10.1142/s0129626417500104 ◽

2017 ◽

Vol 27 (03n04) ◽

pp. 1750010 ◽

Cited By ~ 1

Author(s):

Amedeo Sapio ◽

Mario Baldi ◽

Fulvio Risso ◽

Narendra Anand ◽

Antonio Nucci

Keyword(s):

Fog Computing ◽

Observation Point ◽

Packet Capture ◽

Distributed Index ◽

Efficient Retrieval ◽

Distributed Execution ◽

Generic Tasks ◽

Ip Routers ◽

Processing And Storage ◽

And Storage

Traffic capture and analysis is key to many domains including network management, security and network forensics. Traditionally, it is performed by a dedicated device accessing traffic at a specific point within the network through a link tap or a port of a node mirroring packets. This approach is problematic because the dedicated device must be equipped with a large amount of computation and storage resources to store and analyze packets. Alternatively, in order to achieve scalability, analysis can be performed by a cluster of hosts. However, this is normally located at a remote location with respect to the observation point, hence requiring to move across the network a large volume of captured traffic. To address this problem, this paper presents an algorithm to distribute the task of capturing, processing and storing packets traversing a network across multiple packet forwarding nodes (e.g., IP routers). Essentially, our solution allows individual nodes on the path of a flow to operate on subsets of packets of that flow in a completely distributed and decentralized manner. The algorithm ensures that each packet is processed by n nodes, where n can be set to 1 to minimize overhead or to a higher value to achieve redundancy. Nodes create a distributed index that enables efficient retrieval of packets they store (e.g., for forensics applications). Finally, the basic principles of the presented solution can also be applied, with minimal changes, to the distributed execution of generic tasks on data flowing through a network of nodes with processing and storage capabilities. This has applications in various fields ranging from Fog Computing, to microservice architectures and the Internet of Things.

Download Full-text

AVBH: Asymmetric Learning to Hash with Variable Bit Encoding

Scientific Programming ◽

10.1155/2020/2424381 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Yanduo Ren ◽

Jiangbo Qian ◽

Yihong Dong ◽

Yu Xin ◽

Huahui Chen

Keyword(s):

Low Frequency ◽

Large Data ◽

Data Retrieval ◽

Query Point ◽

Compact Binary ◽

Hash Code ◽

Asymmetric Learning ◽

Learning To Hash ◽

Public Datasets ◽

The Cost

Nearest neighbour search (NNS) is the core of large data retrieval. Learning to hash is an effective way to solve the problems by representing high-dimensional data into a compact binary code. However, existing learning to hash methods needs long bit encoding to ensure the accuracy of query, and long bit encoding brings large cost of storage, which severely restricts the long bit encoding in the application of big data. An asymmetric learning to hash with variable bit encoding algorithm (AVBH) is proposed to solve the problem. The AVBH hash algorithm uses two types of hash mapping functions to encode the dataset and the query set into different length bits. For datasets, the hash code frequencies of datasets after random Fourier feature encoding are statistically analysed. The hash code with high frequency is compressed into a longer coding representation, and the hash code with low frequency is compressed into a shorter coding representation. The query point is quantized to a long bit hash code and compared with the same length cascade concatenated data point. Experiments on public datasets show that the proposed algorithm effectively reduces the cost of storage and improves the accuracy of query.

Download Full-text

Supervised clustering of high-dimensional data using regularized mixture modeling

Briefings in Bioinformatics ◽

10.1093/bib/bbaa291 ◽

2020 ◽

Author(s):

Wennan Chang ◽

Changlin Wan ◽

Yong Zang ◽

Chi Zhang ◽

Sha Cao

Keyword(s):

Clustering Algorithm ◽

Expectation Maximization Algorithm ◽

Substantial Improvement ◽

Superior Performance ◽

High Dimensional ◽

Analysis Tool ◽

Supervised Clustering ◽

Mixture Regression ◽

Clustering Problem ◽

Clinical Presentations

Abstract Identifying relationships between genetic variations and their clinical presentations has been challenged by the heterogeneous causes of a disease. It is imperative to unveil the relationship between the high-dimensional genetic manifestations and the clinical presentations, while taking into account the possible heterogeneity of the study subjects.We proposed a novel supervised clustering algorithm using penalized mixture regression model, called component-wise sparse mixture regression (CSMR), to deal with the challenges in studying the heterogeneous relationships between high-dimensional genetic features and a phenotype. The algorithm was adapted from the classification expectation maximization algorithm, which offers a novel supervised solution to the clustering problem, with substantial improvement on both the computational efficiency and biological interpretability. Experimental evaluation on simulated benchmark datasets demonstrated that the CSMR can accurately identify the subspaces on which subset of features are explanatory to the response variables, and it outperformed the baseline methods. Application of CSMR on a drug sensitivity dataset again demonstrated the superior performance of CSMR over the others, where CSMR is powerful in recapitulating the distinct subgroups hidden in the pool of cell lines with regards to their coping mechanisms to different drugs. CSMR represents a big data analysis tool with the potential to resolve the complexity of translating the clinical representations of the disease to the real causes underpinning it. We believe that it will bring new understanding to the molecular basis of a disease and could be of special relevance in the growing field of personalized medicine.

Download Full-text