Adaptive Hashing with Sparse Modification for Scalable Image Retrieval

Author(s):  
Lifang Zhang ◽  
Qi Shen ◽  
Defang Li ◽  
Guocan Feng ◽  
Xin Tang ◽  
...  

Approximate Nearest Neighbor (ANN) search is a challenging problem with the explosive high-dimensional large-scale data in recent years. The promising technique for ANN search include hashing methods which generate compact binary codes by designing effective hash functions. However, lack of an optimal regularization is the key limitation of most of the existing hash functions. To this end, a new method called Adaptive Hashing with Sparse Modification (AHSM) is proposed. In AHSM, codes consist of vertices on the hypercube and the projection matrix is divided into two separate matrices. Data is rotated through a orthogonal matrix first and modified by a sparse matrix. Here the sparse matrix needs to be learned as a regularization item of hash function which is used to avoid overfitting and reduce quantization distortion. Totally, AHSM has two advantages: improvement of the accuracy without any time cost increasement. Furthermore, we extend AHSM to a supervised version, called Supervised Adaptive Hashing with Sparse Modification (SAHSM), by introducing Canonical Correlation Analysis (CCA) to the original data. Experiments show that the AHSM method stably surpasses several state-of-the-art hashing methods on four data sets. And at the same time, we compare three unsupervised hashing methods with their corresponding supervised version (including SAHSM) on three data sets with labels known. Similarly, SAHSM outperforms other methods on most of the hash bits.

2006 ◽  
Vol 12 (1) ◽  
pp. 44-49
Author(s):  
Sergiy Popov

Visualization of large‐scale data inherently requires dimensionality reduction to 1D, 2D, or 3D space. Autoassociative neural networks with a bottleneck layer are commonly used as a nonlinear dimensionality reduction technique. However, many real‐world problems suffer from incomplete data sets, i.e. some values can be missing. Common methods dealing with missing data include the deletion of all cases with missing values from the data set or replacement with mean or “normal” values for specific variables. Such methods are appropriate when just a few values are missing. But in the case when a substantial portion of data is missing, these methods can significantly bias the results of modeling. To overcome this difficulty, we propose a modified learning procedure for the autoassociative neural network that directly takes the missing values into account. The outputs of the trained network may be used for substitution of the missing values in the original data set.


Author(s):  
Jun Huang ◽  
Linchuan Xu ◽  
Jing Wang ◽  
Lei Feng ◽  
Kenji Yamanishi

Existing multi-label learning (MLL) approaches mainly assume all the labels are observed and construct classification models with a fixed set of target labels (known labels). However, in some real applications, multiple latent labels may exist outside this set and hide in the data, especially for large-scale data sets. Discovering and exploring the latent labels hidden in the data may not only find interesting knowledge but also help us to build a more robust learning model. In this paper, a novel approach named DLCL (i.e., Discovering Latent Class Labels for MLL) is proposed which can not only discover the latent labels in the training data but also predict new instances with the latent and known labels simultaneously. Extensive experiments show a competitive performance of DLCL against other state-of-the-art MLL approaches.


Author(s):  
Vo Ngoc Phu ◽  
Vo Thi Ngoc Tran

Artificial intelligence (ARTINT) and information have been famous fields for many years. A reason has been that many different areas have been promoted quickly based on the ARTINT and information, and they have created many significant values for many years. These crucial values have certainly been used more and more for many economies of the countries in the world, other sciences, companies, organizations, etc. Many massive corporations, big organizations, etc. have been established rapidly because these economies have been developed in the strongest way. Unsurprisingly, lots of information and large-scale data sets have been created clearly from these corporations, organizations, etc. This has been the major challenges for many commercial applications, studies, etc. to process and store them successfully. To handle this problem, many algorithms have been proposed for processing these big data sets.


2017 ◽  
Author(s):  
Shirley M. Matteson ◽  
Sonya E. Sherrod ◽  
Sevket Ceyhun Cetin

2017 ◽  
Vol 8 (2) ◽  
pp. 30-43
Author(s):  
Mrutyunjaya Panda

The Big Data, due to its complicated and diverse nature, poses a lot of challenges for extracting meaningful observations. This sought smart and efficient algorithms that can deal with computational complexity along with memory constraints out of their iterative behavior. This issue may be solved by using parallel computing techniques, where a single machine or a multiple machine can perform the work simultaneously, dividing the problem into sub problems and assigning some private memory to each sub problems. Clustering analysis are found to be useful in handling such a huge data in the recent past. Even though, there are many investigations in Big data analysis are on, still, to solve this issue, Canopy and K-Means++ clustering are used for processing the large-scale data in shorter amount of time with no memory constraints. In order to find the suitability of the approach, several data sets are considered ranging from small to very large ones having diverse filed of applications. The experimental results opine that the proposed approach is fast and accurate.


Sign in / Sign up

Export Citation Format

Share Document