A Model of Shape Recognition and Categorisation

To recognise a previously seen object, the visual system must overcome the variability in the object's appearance caused by factors such as illumination and pose. It is possible to counter the influence of these factors, by learning to interpolate between stored views of the target object, taken under representative combinations of viewing conditions. Routine visual tasks, however, typically require not so much recognition as categorisation, that is making sense of objects not seen before. Despite persistent practical difficulties, theorists in computer vision and visual perception traditionally favour the structural route to categorisation, according to which forming a description of a novel shape in terms of its parts and their spatial relationships is a prerequisite to the ability to categorise it. In comparison, we demonstrate that knowledge of instances of each of several representative categories can provide the necessary computational substrate for the categorisation of their new instances, as well as for representation and processing of radically novel shapes, not belonging to any of the familiar categories. The representational scheme underlying this approach, according to which objects are encoded by their similarities to entire reference shapes (S Edelman, 1997 Behavioral and Brain Sciences in press), is computationally viable, and is readily mapped onto the mechanisms of biological vision revealed by recent psychophysical and physiological studies.

Download Full-text

A model of visual recognition and categorization

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.1997.0102 ◽

1997 ◽

Vol 352 (1358) ◽

pp. 1191-1202 ◽

Cited By ~ 53

Author(s):

Shimon Edelman ◽

Sharon Duvdevani-Bar

Keyword(s):

Computer Vision ◽

Visual System ◽

Visual Recognition ◽

Daily Life ◽

Target Object ◽

Biological Vision ◽

Recognition Of Objects ◽

Physiological Studies

To recognize a previously seen object, the visual system must overcome the variability in the object's appearance caused by factors such as illumination and pose. Developments in computer vision suggest that it may be possible to counter the influence of these factors, by learning to interpolate between stored views of the target object, taken under representative combinations of viewing conditions. Daily life situations, however, typically require categorization, rather than recognition, of objects. Due to the open–ended character of both natural and artificial categories, categorization cannot rely on interpolation between stored examples. Nonetheless, knowledge of several representative members, or prototypes, of each of the categories of interest can still provide the necessary computational substrate for the categorization of new instances. The resulting representational scheme based on similarities to prototypes appears to be computationally viable, and is readily mapped onto the mechanisms of biological vision revealed by recent psychophysical and physiological studies.

Download Full-text

2D and 3D Visual Attention for Computer Vision

Innovative Research in Attention Modeling and Computer Vision Applications - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-4666-8723-3.ch001 ◽

2016 ◽

pp. 1-44

Author(s):

Vincent Ricordel ◽

Junle Wang ◽

Matthieu Perreira Da Silva ◽

Patrick Le Callet

Keyword(s):

Computer Vision ◽

Visual Perception ◽

Visual Attention ◽

Visual System ◽

Computational Modeling ◽

Depth Perception ◽

Human Visual System ◽

Visual Importance ◽

2D And 3D ◽

The Impact

Visual attention is one of the most important mechanisms deployed in the human visual system (HVS) to reduce the amount of information that our brain needs to process. An increasing amount of efforts has been dedicated to the study of visual attention, and this chapter proposes to clarify the advances achieved in computational modeling of visual attention. First the concepts of visual attention, including the links between visual salience and visual importance, are detailed. The main characteristics of the HVS involved in the process of visual perception are also explained. Next we focus on eye-tracking, because of its role in the evaluation of the performance of the models. A complete state of the art in computational modeling of visual attention is then presented. The research works that extend some visual attention models to 3D by taking into account of the impact of depth perception are finally explained and compared.

Download Full-text

Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future

Journal of Cognitive Neuroscience ◽

10.1162/jocn_a_01544 ◽

2020 ◽

pp. 1-15 ◽

Cited By ~ 13

Author(s):

Grace W. Lindsay

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Convolutional Neural Networks ◽

Computational Neuroscience ◽

Neural Activity ◽

Vision Research ◽

State Of The Art ◽

Biological Vision ◽

Visual Tasks ◽

And Behavior

Convolutional neural networks (CNNs) were inspired by early findings in the study of biological vision. They have since become successful tools in computer vision and state-of-the-art models of both neural activity and behavior on visual tasks. This review highlights what, in the context of CNNs, it means to be a good model in computational neuroscience and the various ways models can provide insight. Specifically, it covers the origins of CNNs and the methods by which we validate them as models of biological vision. It then goes on to elaborate on what we can learn about biological vision by understanding and experimenting on CNNs and discusses emerging opportunities for the use of CNNs in vision research beyond basic object recognition.

Download Full-text

2D and 3D Visual Attention for Computer Vision

3D Printing ◽

10.4018/978-1-5225-1677-4.ch005 ◽

2017 ◽

pp. 75-118

Author(s):

Vincent Ricordel ◽

Junle Wang ◽

Matthieu Perreira Da Silva ◽

Patrick Le Callet

Keyword(s):

Computer Vision ◽

Visual Perception ◽

Visual Attention ◽

Visual System ◽

Computational Modeling ◽

Depth Perception ◽

Human Visual System ◽

Visual Importance ◽

2D And 3D ◽

The Impact

Download Full-text

Deep CNN and Deep GAN in Computational Visual Perception-Driven Image Analysis

Complexity ◽

10.1155/2021/5541134 ◽

2021 ◽

Vol 2021 ◽

pp. 1-30

Author(s):

R. Nandhini Abirami ◽

P. M. Durai Raj Vincent ◽

Kathiravan Srinivasan ◽

Usman Tariq ◽

Chuan-Yu Chang

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Deep Learning ◽

Visual Perception ◽

Generative Adversarial Networks ◽

Deep Convolutional Neural Networks ◽

Generative Adversarial Network ◽

Biological Vision ◽

Adversarial Network ◽

Adversarial Networks

Computational visual perception, also known as computer vision, is a field of artificial intelligence that enables computers to process digital images and videos in a similar way as biological vision does. It involves methods to be developed to replicate the capabilities of biological vision. The computer vision’s goal is to surpass the capabilities of biological vision in extracting useful information from visual data. The massive data generated today is one of the driving factors for the tremendous growth of computer vision. This survey incorporates an overview of existing applications of deep learning in computational visual perception. The survey explores various deep learning techniques adapted to solve computer vision problems using deep convolutional neural networks and deep generative adversarial networks. The pitfalls of deep learning and their solutions are briefly discussed. The solutions discussed were dropout and augmentation. The results show that there is a significant improvement in the accuracy using dropout and data augmentation. Deep convolutional neural networks’ applications, namely, image classification, localization and detection, document analysis, and speech recognition, are discussed in detail. In-depth analysis of deep generative adversarial network applications, namely, image-to-image translation, image denoising, face aging, and facial attribute editing, is done. The deep generative adversarial network is unsupervised learning, but adding a certain number of labels in practical applications can improve its generating ability. However, it is challenging to acquire many data labels, but a small number of data labels can be acquired. Therefore, combining semisupervised learning and generative adversarial networks is one of the future directions. This article surveys the recent developments in this direction and provides a critical review of the related significant aspects, investigates the current opportunities and future challenges in all the emerging domains, and discusses the current opportunities in many emerging fields such as handwriting recognition, semantic mapping, webcam-based eye trackers, lumen center detection, query-by-string word, intermittently closed and open lakes and lagoons, and landslides.

Download Full-text

Binary Image Classification: A Genetic Programming Approach to the Problem of Limited Training Instances

Evolutionary Computation ◽

10.1162/evco_a_00146 ◽

2016 ◽

Vol 24 (1) ◽

pp. 143-182 ◽

Cited By ~ 10

Author(s):

Harith Al-Sahaf ◽

Mengjie Zhang ◽

Mark Johnston

Keyword(s):

Computer Vision ◽

Pattern Recognition ◽

Genetic Programming ◽

Visual System ◽

Image Classification ◽

Human Visual System ◽

Binary Classification ◽

Programming Approach ◽

Data Sets ◽

New Class

In the computer vision and pattern recognition fields, image classification represents an important yet difficult task. It is a challenge to build effective computer models to replicate the remarkable ability of the human visual system, which relies on only one or a few instances to learn a completely new class or an object of a class. Recently we proposed two genetic programming (GP) methods, one-shot GP and compound-GP, that aim to evolve a program for the task of binary classification in images. The two methods are designed to use only one or a few instances per class to evolve the model. In this study, we investigate these two methods in terms of performance, robustness, and complexity of the evolved programs. We use ten data sets that vary in difficulty to evaluate these two methods. We also compare them with two other GP and six non-GP methods. The results show that one-shot GP and compound-GP outperform or achieve results comparable to competitor methods. Moreover, the features extracted by these two methods improve the performance of other classifiers with handcrafted features and those extracted by a recently developed GP-based method in most cases.

Download Full-text

Learning spatial relationships in computer vision

Proceedings of IEEE 5th International Fuzzy Systems ◽

10.1109/fuzzy.1996.551729 ◽

2002 ◽

Cited By ~ 15

Author(s):

J.M. Keller ◽

Xiaomei Wang

Keyword(s):

Computer Vision ◽

Spatial Relationships

Download Full-text

Development of Efficiency in Visual Functioning: Rationale for a Comprehensive Program

Journal of Visual Impairment & Blindness ◽

10.1177/0145482x7907300401 ◽

1979 ◽

Vol 73 (4) ◽

pp. 121-126 ◽

Cited By ~ 1

Author(s):

Natalie C. Barraga ◽

Marcia E. Collins

Keyword(s):

Visual System ◽

Visual Functioning ◽

Comprehensive Program ◽

Outdoor Environments ◽

Visual Tasks ◽

Indoor And Outdoor Environments ◽

Indoor And Outdoor

The rationale for a comprehensive program in visual functioning is based upon an assumed interaction between: (a) functions performed by the visual system, (b) developmental visual tasks organized in keeping with perceptual/cognitive milestones, and (c) a variety of indoor and outdoor environments.

Download Full-text

Inner-Distance Measurement and Shape Recognition of Target Object Using Networked Binary Sensors

2013 27th International Conference on Advanced Information Networking and Applications Workshops ◽

10.1109/waina.2013.153 ◽

2013 ◽

Cited By ~ 3

Author(s):

S. Shioda

Keyword(s):

Shape Recognition ◽

Target Object ◽

Distance Measurement

Download Full-text

Visual attention strategies for target object detection

10.26686/wgtn.17067635 ◽

2021 ◽

Author(s):

◽

Ibrahim Mohammad Hussain Rahman

Keyword(s):

Visual Attention ◽

Object Detection ◽

Target Object ◽

Detection Accuracy ◽

Estimation Model ◽

Top Down ◽

Bottom Up ◽

Feature Map ◽

Low Level ◽

Visual Tasks

<p>The human visual attention system (HVA) encompasses a set of interconnected neurological modules that are responsible for analyzing visual stimuli by attending to those regions that are salient. Two contrasting biological mechanisms exist in the HVA systems; bottom-up, data-driven attention and top-down, task-driven attention. The former is mostly responsible for low-level instinctive behaviors, while the latter is responsible for performing complex visual tasks such as target object detection. Very few computational models have been proposed to model top-down attention, mainly due to three reasons. The first is that the functionality of top-down process involves many influential factors. The second reason is that there is a diversity in top-down responses from task to task. Finally, many biological aspects of the top-down process are not well understood yet. For the above reasons, it is difficult to come up with a generalized top-down model that could be applied to all high level visual tasks. Instead, this thesis addresses some outstanding issues in modelling top-down attention for one particular task, target object detection. Target object detection is an essential step for analyzing images to further perform complex visual tasks. Target object detection has not been investigated thoroughly when modelling top-down saliency and hence, constitutes the may domain application for this thesis. The thesis will investigate methods to model top-down attention through various high-level data acquired from images. Furthermore, the thesis will investigate different strategies to dynamically combine bottom-up and top-down processes to improve the detection accuracy, as well as the computational efficiency of the existing and new visual attention models. The following techniques and approaches are proposed to address the outstanding issues in modelling top-down saliency: 1. A top-down saliency model that weights low-level attentional features through contextual knowledge of a scene. The proposed model assigns weights to features of a novel image by extracting a contextual descriptor of the image. The contextual descriptor plays the role of tuning the weighting of low-level features to maximize detection accuracy. By incorporating context into the feature weighting mechanism we improve the quality of the assigned weights to these features. 2. Two modules of target features combined with contextual weighting to improve detection accuracy of the target object. In this proposed model, two sets of attentional feature weights are learned, one through context and the other through target features. When both sources of knowledge are used to model top-down attention, a drastic increase in detection accuracy is achieved in images with complex backgrounds and a variety of target objects. 3. A top-down and bottom-up attention combination model based on feature interaction. This model provides a dynamic way for combining both processes by formulating the problem as feature selection. The feature selection exploits the interaction between these features, yielding a robust set of features that would maximize both the detection accuracy and the overall efficiency of the system. 4. A feature map quality score estimation model that is able to accurately predict the detection accuracy score of any previously novel feature map without the need of groundtruth data. The model extracts various local, global, geometrical and statistical characteristic features from a feature map. These characteristics guide a regression model to estimate the quality of a novel map. 5. A dynamic feature integration framework for combining bottom-up and top-down saliencies at runtime. If the estimation model is able to predict the quality score of any novel feature map accurately, then it is possible to perform dynamic feature map integration based on the estimated value. We propose two frameworks for feature map integration using the estimation model. The proposed integration framework achieves higher human fixation prediction accuracy with minimum number of feature maps than that achieved by combining all feature maps. The proposed works in this thesis provide new directions in modelling top-down saliency for target object detection. In addition, dynamic approaches for top-down and bottom-up combination show considerable improvements over existing approaches in both efficiency and accuracy.</p>

Download Full-text