visual object recognition
Recently Published Documents


TOTAL DOCUMENTS

343
(FIVE YEARS 71)

H-INDEX

44
(FIVE YEARS 5)

Author(s):  
Xi Yang ◽  
Jie Yan ◽  
Wen Wang ◽  
Shaoyi Li ◽  
Bo Hu ◽  
...  

2021 ◽  
pp. 51-101
Author(s):  
Glyn W. Humphreys ◽  
Vicki Bruce

2021 ◽  
pp. 027836492110489
Author(s):  
Vasileios Vasilopoulos ◽  
Georgios Pavlakos ◽  
Karl Schmeckpeper ◽  
Kostas Daniilidis ◽  
Daniel E. Koditschek

This article solves the planar navigation problem by recourse to an online reactive scheme that exploits recent advances in simultaneous localization and mapping (SLAM) and visual object recognition to recast prior geometric knowledge in terms of an offline catalog of familiar objects. The resulting vector field planner guarantees convergence to an arbitrarily specified goal, avoiding collisions along the way with fixed but arbitrarily placed instances from the catalog as well as completely unknown fixed obstacles so long as they are strongly convex and well separated. We illustrate the generic robustness properties of such deterministic reactive planners as well as the relatively modest computational cost of this algorithm by supplementing an extensive numerical study with physical implementation on both a wheeled and legged platform in different settings.


2021 ◽  
Author(s):  
Gaurav Malhotra ◽  
Marin Dujmovic ◽  
Jeffrey S Bowers

A central problem in vision sciences is to understand how humans recognise objects under novel viewing conditions. Recently, statistical inference models such as Convolutional Neural Networks (CNNs) seem to have reproduced this ability by incorporating some architectural constraints of biological vision systems into machine learning models. This has led to the proposal that, like CNNs, humans solve the problem of object recognition by performing a statistical inference over their observations. This hypothesis remains difficult to test as models and humans learn in vastly different environments. Accordingly, any differences in performance could be attributed to the training environment rather than reflect any fundamental difference between statistical inference models and human vision. To overcome these limitations, we conducted a series of experiments and simulations where humans and models had no prior experience with the stimuli. The stimuli contained multiple features that varied in the extent to which they predicted category membership. We observed that human participants frequently ignored features that were highly predictive and clearly visible. Instead, they learned to rely on global features such as colour or shape, even when these features were not the most predictive. When these features were absent they failed to learn the task entirely. By contrast, ideal inference models as well as CNNs always learned to categorise objects based on the most predictive feature. This was the case even when the CNN was pre-trained to have a shape-bias and the convolutional backbone was frozen. These results highlight a fundamental difference between statistical inference models and humans: while statistical inference models such as CNNs learn most diagnostic features with little regard for the computational cost of learning these features, humans are highly constrained by their limited cognitive capacities which results in a qualitatively different approach to object recognition.


2021 ◽  
Author(s):  
Jamal Rodgers Williams ◽  
Yuri Markov ◽  
Natalia Tiurina ◽  
Viola Störmer

Visual object recognition in the real world is not performed in isolation, but is instead dependent on contextual information such as the visual scene an object is found in. And our perceptual experience is not just visual: objects generate specific and unique sounds which can readily predict which objects are outside of our field of view. Here, we test whether and how naturalistic sounds influence visual object processing and demonstrate that auditory information both accelerates visual information processing and modulates the perceptual representation of visual objects. Specifically, using a visual discrimination task and a novel set of ambiguous object stimuli, we find that naturalistic sounds shift visual representations towards the object features that match the sound (Exp. 1a- 1b). In a series of control experiments, we replicate the original effect and show that these effects are not driven by decision- or response biases (Exp. 2a-2b) and are not due to the high-level semantic content of sounds generating explicit expectations (Exp.3). Instead, these sound-induced effects on visual perception appear to be driven by the continuous integration of multisensory inputs during perception itself. Together, our results demonstrate that visual processing is shaped by auditory context which provides independent supplemental information about the entities we encounter in the world.


2021 ◽  
Vol 118 (34) ◽  
pp. e2022792118
Author(s):  
Liron Zipora Gruber ◽  
Shimon Ullman ◽  
Ehud Ahissar

Natural vision is a dynamic and continuous process. Under natural conditions, visual object recognition typically involves continuous interactions between ocular motion and visual contrasts, resulting in dynamic retinal activations. In order to identify the dynamic variables that participate in this process and are relevant for image recognition, we used a set of images that are just above and below the human recognition threshold and whose recognition typically requires >2 s of viewing. We recorded eye movements of participants while attempting to recognize these images within trials lasting 3 s. We then assessed the activation dynamics of retinal ganglion cells resulting from ocular dynamics using a computational model. We found that while the saccadic rate was similar between recognized and unrecognized trials, the fixational ocular speed was significantly larger for unrecognized trials. Interestingly, however, retinal activation level was significantly lower during these unrecognized trials. We used retinal activation patterns and oculomotor parameters of each fixation to train a binary classifier, classifying recognized from unrecognized trials. Only retinal activation patterns could predict recognition, reaching 80% correct classifications on the fourth fixation (on average, ∼2.5 s from trial onset). We thus conclude that the information that is relevant for visual perception is embedded in the dynamic interactions between the oculomotor sequence and the image. Hence, our results suggest that ocular dynamics play an important role in recognition and that understanding the dynamics of retinal activation is crucial for understanding natural vision.


Sign in / Sign up

Export Citation Format

Share Document