scholarly journals Visual and Semantic Similarity Norms for a Photographic Image Stimulus Set Containing Recognizable Objects, Animals and Scenes

2021 ◽  
Author(s):  
Zhuohan Jiang ◽  
D. Merika W. Sanders ◽  
Rosemary Cowell

We collected visual and semantic similarity norms for a set of photographic images comprising 120 recognizable objects/animals and 120 indoor/outdoor scenes. Human observers rated the similarity of pairs of images within four categories of stimulus ‒ inanimate objects, animals, indoor scenes and outdoor scenes ‒ via Amazon's Mechanical Turk. We performed multi-dimensional scaling (MDS) on the collected similarity ratings to visualize the perceived similarity for each image category, for both visual and semantic ratings. The MDS solutions revealed the expected similarity relationships between images within each category, along with intuitively sensible differences between visual and semantic similarity relationships for each category. Stress tests performed on the MDS solutions indicated that the MDS analyses captured meaningful levels of variance in the similarity data. These stimuli, associated norms and naming data are made publicly available, and should provide a useful resource for researchers of vision, memory and conceptual knowledge wishing to run experiments using well-parameterized stimulus sets.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sandro L. Wiesmann ◽  
Laurent Caplette ◽  
Verena Willenbockel ◽  
Frédéric Gosselin ◽  
Melissa L.-H. Võ

AbstractHuman observers can quickly and accurately categorize scenes. This remarkable ability is related to the usage of information at different spatial frequencies (SFs) following a coarse-to-fine pattern: Low SFs, conveying coarse layout information, are thought to be used earlier than high SFs, representing more fine-grained information. Alternatives to this pattern have rarely been considered. Here, we probed all possible SF usage strategies randomly with high resolution in both the SF and time dimensions at two categorization levels. We show that correct basic-level categorizations of indoor scenes are linked to the sampling of relatively high SFs, whereas correct outdoor scene categorizations are predicted by an early use of high SFs and a later use of low SFs (fine-to-coarse pattern of SF usage). Superordinate-level categorizations (indoor vs. outdoor scenes) rely on lower SFs early on, followed by a shift to higher SFs and a subsequent shift back to lower SFs in late stages. In summary, our results show no consistent pattern of SF usage across tasks and only partially replicate the diagnostic SFs found in previous studies. We therefore propose that SF sampling strategies of observers differ with varying stimulus and task characteristics, thus favouring the notion of flexible SF usage.


2016 ◽  
Vol 11 (1) ◽  
pp. 76-93
Author(s):  
Michael Richter ◽  
Roeland van Hout

This paper investigates set-theoretical transitive and intransitive similarity relationships in triplets of verbs that can be deduced from raters’ similarity judgments on the pairs of verbs involved. We collected similarity judgments on pairs made up of 35 German verbs and found that the concept of transitivity adds to the information obtained from collecting pair-wise semantic similarity judgments. The concept of transitive similarity enables more complex relations to be revealed in triplets of verbs. To evaluate the outcomes that we obtained by analyzing transitive similarities we used two previously developed verb classifications of the same set of 35 verbs based on the analysis of large corpora (Richter & van Hout, 2016). We applied a modified form of weak stochastic transitivity (Block & Marschak, 1960; Luce & Suppes, 1965; Tversky, 1969) and found that (1), in contrast to Rips’ claim (2011), similarity relations in raters’ judgments systematically turn out to be transitive, and (2) transitivity discloses lexical and aspectual properties of verbs relevant in distinguishing verb classes.


Author(s):  
Xin Zhao ◽  
Zhe Liu ◽  
Ruolan Hu ◽  
Kaiqi Huang

3D object detection plays an important role in a large number of real-world applications. It requires us to estimate the localizations and the orientations of 3D objects in real scenes. In this paper, we present a new network architecture which focuses on utilizing the front view images and frustum point clouds to generate 3D detection results. On the one hand, a PointSIFT module is utilized to improve the performance of 3D segmentation. It can capture the information from different orientations in space and the robustness to different scale shapes. On the other hand, our network obtains the useful features and suppresses the features with less information by a SENet module. This module reweights channel features and estimates the 3D bounding boxes more effectively. Our method is evaluated on both KITTI dataset for outdoor scenes and SUN-RGBD dataset for indoor scenes. The experimental results illustrate that our method achieves better performance than the state-of-the-art methods especially when point clouds are highly sparse.


2020 ◽  
Vol 12 (21) ◽  
pp. 3488
Author(s):  
Zhihua Hu ◽  
Yaolin Hou ◽  
Pengjie Tao ◽  
Jie Shan

Shape-from-shading and stereo vision are two complementary methods to reconstruct 3D surface from images. Stereo vision can reconstruct the overall shape well but is vulnerable in texture-less and non-Lambertian areas where shape-from-shading can recover fine details. This paper presents a novel, generic shading based method to refine the surface generated by multi-view stereo. Different from most of the shading based surface refinement methods, the new development does not assume the ideal Lambertian reflectance, known illumination, or uniform surface albedo. Instead, specular reflectance is taken into account while the illumination can be arbitrary and the albedo can be non-uniform. Surface refinement is achieved by solving an objective function where the imaging process is modeled with spherical harmonics illumination and specular reflectance. Our experiments are carried out using images of indoor scenes with obvious specular reflection and of outdoor scenes with a mixture of Lambertian and specular reflections. Comparing to surfaces created by current multi-view stereo and shape-from-shading methods, the developed method can recover more fine details with lower omission rates (6.11% vs. 24.25%) in the scenes evaluated. The benefit is more apparent when the images are taken with low-cost, off-the-shelf cameras. It is therefore recommended that a general shading model consisting of varying albedo and specularity shall be used in routine surface reconstruction practice.


2019 ◽  
Vol 11 (4) ◽  
pp. 446 ◽  
Author(s):  
Zacharias Kandylakis ◽  
Konstantinos Vasili ◽  
Konstantinos Karantzalos

Single sensor systems and standard optical—usually RGB CCTV video cameras—fail to provide adequate observations, or the amount of spectral information required to build rich, expressive, discriminative features for object detection and tracking tasks in challenging outdoor and indoor scenes under various environmental/illumination conditions. Towards this direction, we have designed a multisensor system based on thermal, shortwave infrared, and hyperspectral video sensors and propose a processing pipeline able to perform in real-time object detection tasks despite the huge amount of the concurrently acquired video streams. In particular, in order to avoid the computationally intensive coregistration of the hyperspectral data with other imaging modalities, the initially detected targets are projected through a local coordinate system on the hypercube image plane. Regarding the object detection, a detector-agnostic procedure has been developed, integrating both unsupervised (background subtraction) and supervised (deep learning convolutional neural networks) techniques for validation purposes. The detected and verified targets are extracted through the fusion and data association steps based on temporal spectral signatures of both target and background. The quite promising experimental results in challenging indoor and outdoor scenes indicated the robust and efficient performance of the developed methodology under different conditions like fog, smoke, and illumination changes.


2000 ◽  
Vol 21 (4) ◽  
pp. 505-524 ◽  
Author(s):  
WALTER G. CHARLES

The relation between similarity and dissimilarity of meaning and similarity of context was analyzed for synonymous nouns. New semantic similarity and dissimilarity rating tests with an empirically determined series of linguistic anchors and conventional, arbitrarily anchored semantic similarity ratings were compared. Contextual similarity was elicited by a sorting test based on substitution and yielding d-primes. The study found reliable correlations between the d-primes and the different ratings for semantic similarity and dissimilarity of the synonymous nouns across a wide continuum of meaning. The data strongly supported a contextual hypothesis of meaning. The data endorsed the claim that people abstract a contextual representation from experiencing the multiple natural linguistic contexts of a word. Semantic similarity and dissimilarity rating formats with an empirically chosen series of linguistic anchors and a sorting test of contextual similarity yielded stronger support for a contextual hypothesis than did alternative methods of eliciting lexical and contextual similarity.


2019 ◽  
Vol 12 (1) ◽  
pp. 84 ◽  
Author(s):  
Yaxiong Chen ◽  
Xiaoqiang Lu

With the rapid progress of remote sensing (RS) observation technologies, cross-modal RS image-sound retrieval has attracted some attention in recent years. However, these methods perform cross-modal image-sound retrieval by leveraging high-dimensional real-valued features, which can require more storage than low-dimensional binary features (i.e., hash codes). Moreover, these methods cannot directly encode relative semantic similarity relationships. To tackle these issues, we propose a new, deep, cross-modal RS image-sound hashing approach, called deep triplet-based hashing (DTBH), to integrate hash code learning and relative semantic similarity relationship learning into an end-to-end network. Specially, the proposed DTBH method designs a triplet selection strategy to select effective triplets. Moreover, in order to encode relative semantic similarity relationships, we propose the objective function, which makes sure that that the anchor images are more similar to the positive sounds than the negative sounds. In addition, a triplet regularized loss term leverages approximate l1-norm of hash-like codes and hash codes and can effectively reduce the information loss between hash-like codes and hash codes. Extensive experimental results showed that the DTBH method could achieve a superior performance to other state-of-the-art cross-modal image-sound retrieval methods. For a sound query RS image task, the proposed approach achieved a mean average precision (mAP) of up to 60.13% on the UCM dataset, 87.49% on the Sydney dataset, and 22.72% on the RSICD dataset. For RS image query sound task, the proposed approach achieved a mAP of 64.27% on the UCM dataset, 92.45% on the Sydney dataset, and 23.46% on the RSICD dataset. Future work will focus on how to consider the balance property of hash codes to improve image-sound retrieval performance.


2019 ◽  
Author(s):  
Takuma Morimoto ◽  
Sho Kishigami ◽  
João M.M. Linhares ◽  
Sérgio M.C. Nascimento ◽  
Hannah E. Smithson

AbstractObjects placed in real-world scenes receive incident light from every direction, and the spectral content of this light may vary from one direction to another. In computer graphics, environmental illumination is approximated using maps that specify illumination at a point as a function of incident angle. However, to-date, existing public databases of environmental illumination maps specify only three colour channels (RGB). We have captured a new set of 12 environmental illumination maps (eight outdoor scenes; four indoor scenes) using a hyperspectral imaging system with 33 spectral channels. The data reveal a striking directional variation of spectral distribution of lighting in natural environments. We discuss limitations of using daylight models to describe natural environmental illumination.


2021 ◽  
Vol 25 (1) ◽  
pp. 22-38
Author(s):  
David Allen ◽  
Trevor Holster

A robust finding in psycholinguistics is that cognates and loanwords, which are words that typically share some degree of form and meaning across languages, provide the second language learner with benefits in language use when compared to words that do not share form and meaning across languages. This cognate effect has been shown to exist for Japanese learners of English; that is, words such as table are processed faster and more accurately in English because they have a loanword equivalent in Japanese (i.e., テーブル /te:buru/ ‘table’). Previous studies have also shown that the degree of phonological and semantic similarity, as measured on a numerical scale from ‘completely different’ to ‘identical’, also influences processing. However, there has been relatively little appraisal of such cross-linguistic similarity ratings themselves. Therefore, the present study investigated the structure of the similarity ratings using Rasch analysis, which is an analytic approach frequently used in the design and validation of language assessments. The findings showed that a 4-point scale may be optimal for phonological similarity ratings of cognates and a 2-point scale may be most appropriate for semantic similarity ratings. Furthermore, this study reveals that while a few raters and items misfitted the Rasch model, there was substantial agreement in ratings, especially for semantic similarity. The results validate the ratings for use in research and demonstrate the utility of Rasch analysis in the design and validation of research instruments in psychology.


Author(s):  
John Grishin ◽  
Douglas J. Gillan

Information displays should be clear and easily understood. This research examined whether principles developed by Kosslyn (1989) and Carswell and Wickens (1987) for charts, graphs, and object displays could be extended, or adapted, to another type of display, the food item package. We hypothesized that a food package on which label items had been arranged according to their similarity, or semantic relatedness, would facilitate better user performance than a package on which label items had been arranged in other ways. Participants rated the semantic relatedness of 12 label items found on a common food item package. Using multi-dimensional scaling (MDS) outputs from the ratings, we created three versions of a consumer cough drop package: 1) Similarity version—label elements that received higher similarity ratings were depicted closer together than elements with lower similarity ratings, 2) Dissimilarity version—elements that received higher similarity ratings were depicted farther apart than elements with lower similarity ratings, 3) Random version—rating values were randomly assigned to the pairs of elements. We tested user performance on search tasks and integrative tasks on each of the three versions. We hypothesized that the Similarity version would produce the best user performance and the Dissimilarity version would produce the worst. Results only partially supported the hypotheses. On the search tasks, the best performance was achieved on the Similarity and Dissimilarity versions, and the worst on the Random version. On the integrative tasks, the version made no difference in performance. Possible reasons for these results are discussed. Similar results by Fitts and Deininger (1954) and Morin and Grant (1955) suggest that performance on tasks are superior when the relationships are in an ordered structure, rather than randomly assigned, possibly because ordered structures make possible the development of search strategies, whereas random arrangements do not.


Sign in / Sign up

Export Citation Format

Share Document