Scalable supervised online hashing for image retrieval

Abstract Online hashing methods aim to learn compact binary codes of the new data stream, and update the hash function to renew the codes of the existing data. However, the addition of new data streams has a vital impact on the retrieval performance of the entire retrieval system, especially the similarity measurement between new data streams and existing data, which has always been one of the focuses of online retrieval research. In this paper, we present a novel scalable supervised online hashing method, to solve the above problems within a unified framework. Specifically, the similarity matrix is established by the label matrix of the existing data and the new data stream. The projection of the existing data label matrix is then used as an intermediate term to approximate the binary codes of the existing data, which not only realizes the semantic information of the hash codes learning but also effectively alleviates the problem of data imbalance. In addition, an alternate optimization algorithm is proposed to efficiently make the solution of the model. Extensive experiments on three widely used datasets validate its superior performance over several state-of-the-art methods in terms of both accuracy and scalability for online retrieval task.

Download Full-text

Discriminative Deep Hashing for Scalable Face Image Retrieval

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/315 ◽

2017 ◽

Cited By ~ 14

Author(s):

Jie Lin ◽

Zechao Li ◽

Jinhui Tang

Keyword(s):

Image Retrieval ◽

Large Scale ◽

State Of The Art ◽

Face Image ◽

Superior Performance ◽

Prediction Errors ◽

Unified Framework ◽

Multi Scale ◽

Deep Hashing ◽

Hash Codes

With the explosive growth of images containing faces, scalable face image retrieval has attracted increasing attention. Due to the amazing effectiveness, deep hashing has become a popular hashing method recently. In this work, we propose a new Discriminative Deep Hashing (DDH) network to learn discriminative and compact hash codes for large-scale face image retrieval. The proposed network incorporates the end-to-end learning, the divide-and-encode module and the desired discrete code learning into a unified framework. Specifically, a network with a stack of convolution-pooling layers is proposed to extract multi-scale and robust features by merging the outputs of the third max pooling layer and the fourth convolutional layer. To reduce the redundancy among hash codes and the network parameters simultaneously, a divide-and-encode module to generate compact hash codes. Moreover, a loss function is introduced to minimize the prediction errors of the learned hash codes, which can lead to discriminative hash codes. Extensive experiments on two datasets demonstrate that the proposed method achieves superior performance compared with some state-of-the-art hashing methods.

Download Full-text

Jointly Multiple Hash Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019981 ◽

2019 ◽

Vol 33 ◽

pp. 9981-9982

Author(s):

Xingbo Liu ◽

Xiushan Nie ◽

Yingxin Wang ◽

Yilong Yin

Keyword(s):

Superior Performance ◽

Closed Form Solutions ◽

Fixed Length ◽

Compact Binary ◽

Proposed Model ◽

Efficient Retrieval ◽

Benchmark Datasets ◽

One Step ◽

Types Of Information ◽

Hash Codes

Hashing can compress heterogeneous high-dimensional data into compact binary codes while preserving the similarity to facilitate efficient retrieval and storage, and thus hashing has recently received much attention from information retrieval researchers. Most of the existing hashing methods first predefine a fixed length (e.g., 32, 64, or 128 bit) for the hash codes before learning them with this fixed length. However, one sample can be represented by various hash codes with different lengths, and thus there must be some associations and relationships among these different hash codes because they represent the same sample. Therefore, harnessing these relationships will boost the performance of hashing methods. Inspired by this possibility, in this study, we propose a new model jointly multiple hash learning (JMH), which can learn hash codes with multiple lengths simultaneously. In the proposed JMH method, three types of information are used for hash learning, which come from hash codes with different lengths, the original features of the samples and label. In contrast to the existing hashing methods, JMH can learn hash codes with different lengths in one step. Users can select appropriate hash codes for their retrieval tasks according to the requirements in terms of accuracy and complexity. To the best of our knowledge, JMH is one of the first attempts to learn multi-length hash codes simultaneously. In addition, in the proposed model, discrete and closed-form solutions for variables can be obtained by cyclic coordinate descent, thereby making the proposed model much faster during training. Extensive experiments were performed based on three benchmark datasets and the results demonstrated the superior performance of the proposed method.

Download Full-text

Supervised Short-Length Hashing

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/420 ◽

2019 ◽

Cited By ~ 3

Author(s):

Xingbo Liu ◽

Xiushan Nie ◽

Quan Zhou ◽

Xiaoming Xi ◽

Lei Zhu ◽

...

Keyword(s):

Short Length ◽

Superior Performance ◽

High Dimensional ◽

Robust Estimator ◽

Compact Binary ◽

Hash Code ◽

Label Information ◽

Efficient Retrieval ◽

And Storage ◽

Hash Codes

Hashing can compress high-dimensional data into compact binary codes, while preserving the similarity, to facilitate efficient retrieval and storage. However, when retrieving using an extremely short length hash code learned by the existing methods, the performance cannot be guaranteed because of severe information loss. To address this issue, in this study, we propose a novel supervised short-length hashing (SSLH). In this proposed SSLH, mutual reconstruction between the short-length hash codes and original features are performed to reduce semantic loss. Furthermore, to enhance the robustness and accuracy of the hash representation, a robust estimator term is added to fully utilize the label information. Extensive experiments conducted on four image benchmarks demonstrate the superior performance of the proposed SSLH with short-length hash codes. In addition, the proposed SSLH outperforms the existing methods, with long-length hash codes. To the best of our knowledge, this is the first linear-based hashing method that focuses on both short and long-length hash codes for maintaining high precision.

Download Full-text

Recurring concept memory management in data streams: exploiting data stream concept evolution to improve performance and transparency

Data Mining and Knowledge Discovery ◽

10.1007/s10618-021-00736-w ◽

2021 ◽

Author(s):

Ben Halstead ◽

Yun Sing Koh ◽

Patricia Riddle ◽

Russel Pears ◽

Mykola Pechenizkiy ◽

...

Keyword(s):

Data Streams ◽

Data Stream ◽

Memory Management ◽

Improve Performance ◽

Concept Evolution

Download Full-text

Semantic Search on Unstructured Data

International Journal on Semantic Web and Information Systems ◽

10.4018/jswis.2010040102 ◽

2010 ◽

Vol 6 (2) ◽

pp. 17-35 ◽

Cited By ~ 2

Author(s):

Alex Kohn ◽

François Bry ◽

Alexander Manta

Keyword(s):

Retrieval System ◽

Pharmaceutical Research ◽

Information Retrieval System ◽

Semantic Search ◽

Unstructured Data ◽

Search Performance ◽

Adaptive Search ◽

Retrieval Performance ◽

Enterprise Search ◽

Existing Data

Studies agree that searchers are often not satisfied with the performance of current enterprise search engines. As a consequence, more scientists worldwide are actively investigating new avenues for searching to improve retrieval performance. This paper contributes to YASA (Your Adaptive Search Agent), a fully implemented and thoroughly evaluated ontology-based information retrieval system for the enterprise. A salient particularity of YASA is that large parts of the ontology are automatically filled with facts by recycling and transforming existing data. YASA offers context-based personalization, faceted navigation, as well as semantic search capabilities. YASA has been deployed and evaluated in the pharmaceutical research department of Roche, Penzberg, and results show that already semantically simple ontologies suffice to considerably improve search performance.

Download Full-text

Deep Learning to Ternary Hash Codes by Continuation

10.36227/techrxiv.17083019.v1 ◽

2021 ◽

Author(s):

Mingrui Chen ◽

Weiyu Li ◽

weizhi lu

Keyword(s):

Deep Learning ◽

Image Retrieval ◽

Continuation Method ◽

Binary Codes ◽

Hard Thresholding ◽

Joint Learning ◽

Ternary Codes ◽

First Time ◽

Hash Codes ◽

Discrete Functions

Recently, it has been observed that $\{0,\pm1\}$-ternary codes which are simply generated from deep features by hard thresholding, tend to outperform $\{-1, 1\}$-binary codes in image retrieval. To obtain better ternary codes, we for the first time propose to jointly learn the features with the codes by appending a smoothed function to the networks. During training, the function could evolve into a non-smoothed ternary function by a continuation method, and then generate ternary codes. The method circumvents the difficulty of directly training discrete functions and reduces the quantization errors of ternary codes. Experiments show that the proposed joint learning indeed could produce better ternary codes.

Download Full-text

Distribution-preserving data augmentation

PeerJ Computer Science ◽

10.7717/peerj-cs.571 ◽

2021 ◽

Vol 7 ◽

pp. e571

Author(s):

Nurdan Ayse Saran ◽

Murat Saran ◽

Fatih Nar

Keyword(s):

Data Augmentation ◽

Image Data ◽

Large Data ◽

Data Availability ◽

Superior Performance ◽

Color Distribution ◽

Spatial Transformations ◽

Wide Range ◽

Dataset Size ◽

Existing Data

In the last decade, deep learning has been applied in a wide range of problems with tremendous success. This success mainly comes from large data availability, increased computational power, and theoretical improvements in the training phase. As the dataset grows, the real world is better represented, making it possible to develop a model that can generalize. However, creating a labeled dataset is expensive, time-consuming, and sometimes not likely in some domains if not challenging. Therefore, researchers proposed data augmentation methods to increase dataset size and variety by creating variations of the existing data. For image data, variations can be obtained by applying color or spatial transformations, only one or a combination. Such color transformations perform some linear or nonlinear operations in the entire image or in the patches to create variations of the original image. The current color-based augmentation methods are usually based on image processing methods that apply color transformations such as equalizing, solarizing, and posterizing. Nevertheless, these color-based data augmentation methods do not guarantee to create plausible variations of the image. This paper proposes a novel distribution-preserving data augmentation method that creates plausible image variations by shifting pixel colors to another point in the image color distribution. We achieved this by defining a regularized density decreasing direction to create paths from the original pixels’ color to the distribution tails. The proposed method provides superior performance compared to existing data augmentation methods which is shown using a transfer learning scenario on the UC Merced Land-use, Intel Image Classification, and Oxford-IIIT Pet datasets for classification and segmentation tasks.

Download Full-text

Deep Learning to Ternary Hash Codes by Continuation

10.36227/techrxiv.17083019 ◽

2021 ◽

Author(s):

Mingrui Chen ◽

Weiyu Li ◽

weizhi lu

Keyword(s):

Deep Learning ◽

Image Retrieval ◽

Continuation Method ◽

Binary Codes ◽

Hard Thresholding ◽

Joint Learning ◽

Ternary Codes ◽

First Time ◽

Hash Codes ◽

Discrete Functions

Download Full-text

Analysis of Data Stream Processing At Edge Layer for Internet of Things

Journal of ISMAC - June 2019 ◽

10.36548/jismac.2020.1.003 ◽

2020 ◽

Vol 2 (1) ◽

pp. 26-37

Author(s):

Dr. Pasumponpandian

Keyword(s):

Internet Of Things ◽

Data Streams ◽

Data Stream ◽

Smart Cities ◽

Stream Processing ◽

Middle Layer ◽

Cloud Services ◽

Decentralized Systems ◽

Data Stream Processing ◽

Edge Layer

The progress of internet of things at a rapid pace and simultaneous development of the technologies and the processing capabilities has paved way for the development of decentralized systems that are relying on cloud services. Though the decentralized systems are founded on cloud complexities still prevail in transferring all the information’s that are been sensed through the IOT devices to the cloud. This because of the huge streams of information’s gathered by certain applications and the expectation to have a timely response, incurring minimized delay, computing energy and enhanced reliability. So this kind of decentralization has led to the development of middle layer between the cloud and the IOT, and was termed as the Edge layer, meaning bringing down the service of the cloud to the user edge. The paper puts forth the analysis of the data stream processing in the edge layer taking in the complexities involved in the computing the data streams of IOT in an edge layer and puts forth the real time analytics in the edge layer to examine the data streams of the internet of things offering a data- driven insight for parking system in the smart cities.

Download Full-text

Knowledge Discovery From Evolving Data Streams

Advances in Business Information Systems and Analytics - Machine Learning Techniques for Improved Business Analytics ◽

10.4018/978-1-5225-3534-8.ch002 ◽

2019 ◽

pp. 19-39

Author(s):

Prasanna Lakshmi Kompalli

Keyword(s):

Real Time ◽

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Data Stream Mining ◽

Time Data ◽

Stream Mining ◽

New Challenges ◽

Mining Data Streams ◽

Different Sources

Data coming from different sources is referred to as data streams. Data stream mining is an online learning technique where each data point must be processed as the data arrives and discarded as the processing is completed. Progress of technologies has resulted in the monitoring these data streams in real time. Data streams has created many new challenges to the researchers in real time. The main features of this type of data are they are fast flowing, large amounts of data which are continuous and growing in nature, and characteristics of data might change in course of time which is termed as concept drift. This chapter addresses the problems in mining data streams with concept drift. Due to which, isolating the correct literature would be a grueling task for researchers and practitioners. This chapter tries to provide a solution as it would be an amalgamation of all techniques used for data stream mining with concept drift.

Download Full-text