scholarly journals Improving Bag-of-Words model with spatial information

Author(s):  
Edmond Zhang ◽  
Michael Mayo
2018 ◽  
pp. 1307-1321
Author(s):  
Vinh-Tiep Nguyen ◽  
Thanh Duc Ngo ◽  
Minh-Triet Tran ◽  
Duy-Dinh Le ◽  
Duc Anh Duong

Large-scale image retrieval has been shown remarkable potential in real-life applications. The standard approach is based on Inverted Indexing, given images are represented using Bag-of-Words model. However, one major limitation of both Inverted Index and Bag-of-Words presentation is that they ignore spatial information of visual words in image presentation and comparison. As a result, retrieval accuracy is decreased. In this paper, the authors investigate an approach to integrate spatial information into Inverted Index to improve accuracy while maintaining short retrieval time. Experiments conducted on several benchmark datasets (Oxford Building 5K, Oxford Building 5K+100K and Paris 6K) demonstrate the effectiveness of our proposed approach.


Author(s):  
Vinh-Tiep Nguyen ◽  
Thanh Duc Ngo ◽  
Minh-Triet Tran ◽  
Duy-Dinh Le ◽  
Duc Anh Duong

Large-scale image retrieval has been shown remarkable potential in real-life applications. The standard approach is based on Inverted Indexing, given images are represented using Bag-of-Words model. However, one major limitation of both Inverted Index and Bag-of-Words presentation is that they ignore spatial information of visual words in image presentation and comparison. As a result, retrieval accuracy is decreased. In this paper, the authors investigate an approach to integrate spatial information into Inverted Index to improve accuracy while maintaining short retrieval time. Experiments conducted on several benchmark datasets (Oxford Building 5K, Oxford Building 5K+100K and Paris 6K) demonstrate the effectiveness of our proposed approach.


2019 ◽  
Vol 7 (4) ◽  
Author(s):  
Noha Elfiky

The Bag-of-Words (BoW) approach has been successfully applied in the context of category-level image classification. To incorporate spatial image information in the BoW model, Spatial Pyramids (SPs) are used. However, spatial pyramids are rigid in nature and are based on pre-defined grid configurations. As a consequence, they often fail to coincide with the underlying spatial structure of images from different categories which may negatively affect the classification accuracy.The aim of the paper is to use the 3D scene geometry to steer the layout of spatial pyramids for category-level image classification (object recognition). The proposed approach provides an image representation by inferring the constituent geometrical parts of a scene. As a result, the image representation retains the descriptive spatial information to yield a structural description of the image. From large scale experiments on the Pascal VOC2007 and Caltech101, it can be derived that SPs which are obtained by the proposed Generic SPs outperforms the standard SPs.


Author(s):  
Wangbin Chu ◽  
◽  
Yepeng Guan

There are many challenges for face based identity verification. It is one of fundamental topics in image processing and video analysis, and so on. A novel approach has been developed for facial identity verification based on a facial pose pool, which is constructed in an incremental clustering way to find both facial spatial information and orientation diversity. Bag of words is selected to extract image features from the facial pose pool in affine SIFT descriptor. The visual codebook is generated ink-means and Gaussian mixture model. Posterior pseudo probabilities are used to compute the similarities between each visual word and corresponding local features for image representation. Comparisons with some state-of-the-arts have highlighted the superior performance of the proposed method.


2021 ◽  
Author(s):  
Usman Muhammad ◽  
Md. Ziaul Hoque ◽  
Weiqiang Wang ◽  
Mourad Oussalah

The bag-of-words (BoW) model is one of the most popular representation methods for image classification. However, the lack of spatial information, change of illumination, and inter-class similarity among scene categories impair its performance in the remote-sensing domain. To alleviate these issues, this paper proposes to explore the spatial dependencies between different image regions and introduce a neighborhood-based collaborative learning (NBCL) for remote-sensing scene classification. Particularly, our proposed method employs multilevel features learning based on small, medium, and large neighborhood regions to enhance the discriminative power of image representation. To achieve this, image patches are selected through a fixed-size sliding window where each image is represented by four independent image region sequences. Apart from multilevel learning, we explicitly impose Gaussian pyramids to magnify the visual information of the scene images and optimize their position and scale parameters locally. Motivated by this, a local descriptor is exploited to extract multilevel and multiscale features that we represent in terms of codewords histogram by performing k-means clustering. Finally, a simple fusion strategy is proposed to balance the contribution of these features, and the fused features are incorporated into a Bidirectional Long Short-Term Memory (BiLSTM) network for constructing the final representation for classification. Experimental results on NWPU-RESISC45, AID, UC-Merced, and WHU-RS datasets demonstrate that the proposed approach not only surpasses the conventional bag-of-words approaches but also yields significantly higher classification performance than the existing state-of-the-art deep learning methods used nowadays.


2021 ◽  
Author(s):  
Usman Muhammad ◽  
Md. Ziaul Hoque ◽  
Weiqiang Wang ◽  
Mourad Oussalah

The bag-of-words (BoW) model is one of the most popular representation methods for image classification. However, the lack of spatial information, change of illumination, and inter-class similarity among scene categories impair its performance in the remote-sensing domain. To alleviate these issues, this paper proposes to explore the spatial dependencies between different image regions and introduce a neighborhood-based collaborative learning (NBCL) for remote-sensing scene classification. Particularly, our proposed method employs multilevel features learning based on small, medium, and large neighborhood regions to enhance the discriminative power of image representation. To achieve this, image patches are selected through a fixed-size sliding window where each image is represented by four independent image region sequences. Apart from multilevel learning, we explicitly impose Gaussian pyramids to magnify the visual information of the scene images and optimize their position and scale parameters locally. Motivated by this, a local descriptor is exploited to extract multilevel and multiscale features that we represent in terms of codewords histogram by performing k-means clustering. Finally, a simple fusion strategy is proposed to balance the contribution of these features, and the fused features are incorporated into a Bidirectional Long Short-Term Memory (BiLSTM) network for constructing the final representation for classification. Experimental results on NWPU-RESISC45, AID, UC-Merced, and WHU-RS datasets demonstrate that the proposed approach not only surpasses the conventional bag-of-words approaches but also yields significantly higher classification performance than the existing state-of-the-art deep learning methods used nowadays.


Author(s):  
T. A. Welton

Various authors have emphasized the spatial information resident in an electron micrograph taken with adequately coherent radiation. In view of the completion of at least one such instrument, this opportunity is taken to summarize the state of the art of processing such micrographs. We use the usual symbols for the aberration coefficients, and supplement these with £ and 6 for the transverse coherence length and the fractional energy spread respectively. He also assume a weak, biologically interesting sample, with principal interest lying in the molecular skeleton remaining after obvious hydrogen loss and other radiation damage has occurred.


Author(s):  
Vijay Krishnamurthi ◽  
Brent Bailey ◽  
Frederick Lanni

Excitation field synthesis (EFS) refers to the use of an interference optical system in a direct-imaging microscope to improve 3D resolution by axially-selective excitation of fluorescence within a specimen. The excitation field can be thought of as a weighting factor for the point-spread function (PSF) of the microscope, so that the optical transfer function (OTF) gets expanded by convolution with the Fourier transform of the field intensity. The simplest EFS system is the standing-wave fluorescence microscope, in which an axially-periodic excitation field is set up through the specimen by interference of a pair of collimated, coherent, s-polarized beams that enter the specimen from opposite sides at matching angles. In this case, spatial information about the object is recovered in the central OTF passband, plus two symmetric, axially-shifted sidebands. Gaps between these bands represent "lost" information about the 3D structure of the object. Because the sideband shift is equal to the spatial frequency of the standing-wave (SW) field, more complete recovery of information is possible by superposition of fields having different periods. When all of the fields have an antinode at a common plane (set to be coincident with the in-focus plane), the "synthesized" field is peaked in a narrow infocus zone.


Author(s):  
John R. Porter

New ceramic fibers, currently in various stages of commercial development, have been consolidated in intermetallic matrices such as γ-TiAl and FeAl. Fiber types include SiC, TiB2 and polycrystalline and single crystal Al2O3. This work required the development of techniques to characterize the thermochemical stability of these fibers in different matrices.SEM/EDS elemental mapping was used for this work. To obtain qualitative compositional/spatial information, the best realistically achievable counting statistics were required. We established that 128 × 128 maps, acquired with a 20 KeV accelerating voltage, 3 sec. live time per pixel (total mapping time, 18 h) and with beam current adjusted to give 30% dead time, provided adequate image quality at a magnification of 800X. The maps were acquired, with backgrounds subtracted, using a Noran TN 5500 EDS system. The images and maps were transferred to a Macintosh and converted into TIFF files using either TIFF Maker, or TNtolMAGE, a Microsoft QuickBASIC program developed at the Science Center. From TIFF files, images and maps were opened in either NIH Image or Adobe Photoshop for processing and analysis and printed from Microsoft Powerpoint on a Kodak XL7700 dye transfer image printer.


Sign in / Sign up

Export Citation Format

Share Document