Biologically Inspired Visual System Architecture for Object Recognition in Autonomous Systems

Computer vision is currently one of the most exciting and rapidly evolving fields of science, which affects numerous industries. Research and development breakthroughs, mainly in the field of convolutional neural networks (CNNs), opened the way to unprecedented sensitivity and precision in object detection and recognition tasks. Nevertheless, the findings in recent years on the sensitivity of neural networks to additive noise, light conditions, and to the wholeness of the training dataset, indicate that this technology still lacks the robustness needed for the autonomous robotic industry. In an attempt to bring computer vision algorithms closer to the capabilities of a human operator, the mechanisms of the human visual system was analyzed in this work. Recent studies show that the mechanisms behind the recognition process in the human brain include continuous generation of predictions based on prior knowledge of the world. These predictions enable rapid generation of contextual hypotheses that bias the outcome of the recognition process. This mechanism is especially advantageous in situations of uncertainty, when visual input is ambiguous. In addition, the human visual system continuously updates its knowledge about the world based on the gaps between its prediction and the visual feedback. CNNs are feed forward in nature and lack such top-down contextual attenuation mechanisms. As a result, although they process massive amounts of visual information during their operation, the information is not transformed into knowledge that can be used to generate contextual predictions and improve their performance. In this work, an architecture was designed that aims to integrate the concepts behind the top-down prediction and learning processes of the human visual system with the state-of-the-art bottom-up object recognition models, e.g., deep CNNs. The work focuses on two mechanisms of the human visual system: anticipation-driven perception and reinforcement-driven learning. Imitating these top-down mechanisms, together with the state-of-the-art bottom-up feed-forward algorithms, resulted in an accurate, robust, and continuously improving target recognition model.

Download Full-text

Crowding Reveals Fundamental Differences in Local vs. Global Processing in Humans and Machines

10.1101/744268 ◽

2019 ◽

Cited By ~ 1

Author(s):

A. Doerig ◽

A. Bornet ◽

O. H. Choung ◽

M. H. Herzog

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Visual System ◽

Human Visual System ◽

State Of The Art ◽

Global Processing ◽

Specific Probe ◽

Shape Information ◽

Global Shape ◽

Visual Crowding

AbstractFeedforward Convolutional Neural Networks (ffCNNs) have become state-of-the-art models both in computer vision and neuroscience. However, human-like performance of ffCNNs does not necessarily imply human-like computations. Previous studies have suggested that current ffCNNs do not make use of global shape information. However, it is currently unclear whether this reflects fundamental differences between ffCNN and human processing or is merely an artefact of how ffCNNs are trained. Here, we use visual crowding as a well-controlled, specific probe to test global shape computations. Our results provide evidence that ffCNNs cannot produce human-like global shape computations for principled architectural reasons. We lay out approaches that may address shortcomings of ffCNNs to provide better models of the human visual system.

Download Full-text

OBJECT CLASSIFICATION AND OCCLUSION HANDLING USING QUADRATIC FEATURE CORRELATION MODEL AND NEURAL NETWORKS

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001411008609 ◽

2011 ◽

Vol 25 (02) ◽

pp. 287-298 ◽

Cited By ~ 1

Author(s):

NA FAN

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Pattern Recognition ◽

Object Recognition ◽

State Of The Art ◽

Feature Points ◽

Correlation Model ◽

Occlusion Handling ◽

Occluded Objects ◽

Feature Correlation

Occlusion handling is an old but important problem for the computer vision and pattern recognition community. Features from different objects may twist with each other, and any matched feature points may belong to different objects for many traditional object recognition algorithms. To recognize occlusions, we should not only match objects from different view points but also match features extracted from the same object. In this paper, we propose a method to consider these two perspectives simultaneously by encoding various types of features, such as geometry, color and texture relationships among feature points into a matrix and find the best quadratic feature correlation model to fit them. Experiments on our own built dataset and the publicly available PASCAL VOC dataset shows that, our method can robustly classify objects and handle occluded objects under large occlusions, and the performance is among the state-of-the-art.

Download Full-text

Invariance from the Euclidean Geometer's Perspective

Perception ◽

10.1068/p230547 ◽

1994 ◽

Vol 23 (5) ◽

pp. 547-561 ◽

Cited By ~ 24

Author(s):

Luc J Van Gool ◽

Theo Moons ◽

Eric Pauwels ◽

Johan Wagemans

Keyword(s):

Object Recognition ◽

Machine Vision ◽

Visual System ◽

Human Visual System ◽

State Of The Art ◽

The State ◽

The Other ◽

Other Hand ◽

Trade Offs ◽

Basic Philosophy

It is remarkable how well the human visual system can cope with changing viewpoints when it comes to recognising shapes. The state of the art in machine vision is still quite remote from solving such tasks. Nevertheless, a surge in invariance-based research has led to the development of methods for solving recognition problems still considered hard until recently. A nonmathematical account explains the basic philosophy and trade-offs underlying this strand of research. The principles are explained for the relatively simple case of planar-object recognition under arbitrary viewpoints. Well-known Euclidean concepts form the basis of invariance in this case. Introducing constraints in addition to that of planarity may further simplify the invariants. On the other hand, there are problems for which no invariants exist.

Download Full-text

Binary Image Classification: A Genetic Programming Approach to the Problem of Limited Training Instances

Evolutionary Computation ◽

10.1162/evco_a_00146 ◽

2016 ◽

Vol 24 (1) ◽

pp. 143-182 ◽

Cited By ~ 10

Author(s):

Harith Al-Sahaf ◽

Mengjie Zhang ◽

Mark Johnston

Keyword(s):

Computer Vision ◽

Pattern Recognition ◽

Genetic Programming ◽

Visual System ◽

Image Classification ◽

Human Visual System ◽

Binary Classification ◽

Programming Approach ◽

Data Sets ◽

New Class

In the computer vision and pattern recognition fields, image classification represents an important yet difficult task. It is a challenge to build effective computer models to replicate the remarkable ability of the human visual system, which relies on only one or a few instances to learn a completely new class or an object of a class. Recently we proposed two genetic programming (GP) methods, one-shot GP and compound-GP, that aim to evolve a program for the task of binary classification in images. The two methods are designed to use only one or a few instances per class to evolve the model. In this study, we investigate these two methods in terms of performance, robustness, and complexity of the evolved programs. We use ten data sets that vary in difficulty to evaluate these two methods. We also compare them with two other GP and six non-GP methods. The results show that one-shot GP and compound-GP outperform or achieve results comparable to competitor methods. Moreover, the features extracted by these two methods improve the performance of other classifiers with handcrafted features and those extracted by a recently developed GP-based method in most cases.

Download Full-text

Blindsight and Unconscious Vision: What They Teach Us about the Human Visual System

The Neuroscientist ◽

10.1177/1073858416673817 ◽

2016 ◽

Vol 23 (5) ◽

pp. 529-541 ◽

Cited By ~ 18

Author(s):

Sara Ajina ◽

Holly Bridge

Keyword(s):

Visual Cortex ◽

Visual System ◽

Primary Visual Cortex ◽

Neural Activity ◽

Human Visual System ◽

Visual Information ◽

Visual Loss ◽

Physiological Conditions ◽

The World ◽

The Brain

Damage to the primary visual cortex removes the major input from the eyes to the brain, causing significant visual loss as patients are unable to perceive the side of the world contralateral to the damage. Some patients, however, retain the ability to detect visual information within this blind region; this is known as blindsight. By studying the visual pathways that underlie this residual vision in patients, we can uncover additional aspects of the human visual system that likely contribute to normal visual function but cannot be revealed under physiological conditions. In this review, we discuss the residual abilities and neural activity that have been described in blindsight and the implications of these findings for understanding the intact system.

Download Full-text

Transformers in Pedestrian Image Retrieval and Person Re-Identification in a Multi-Camera Surveillance System

Applied Sciences ◽

10.3390/app11199197 ◽

2021 ◽

Vol 11 (19) ◽

pp. 9197

Author(s):

Muhammad Tahir ◽

Saeed Anwar

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Object Recognition ◽

Image Retrieval ◽

Surveillance System ◽

Input Image ◽

Benchmark Datasets ◽

Overall Performance ◽

Camera Surveillance ◽

Surveillance Applications

Person Re-Identification is an essential task in computer vision, particularly in surveillance applications. The aim is to identify a person based on an input image from surveillance photographs in various scenarios. Most Person re-ID techniques utilize Convolutional Neural Networks (CNNs); however, Vision Transformers are replacing pure CNNs for various computer vision tasks such as object recognition, classification, etc. The vision transformers contain information about local regions of the image. The current techniques take this advantage to improve the accuracy of the tasks underhand. We propose to use the vision transformers in conjunction with vanilla CNN models to investigate the true strength of transformers in person re-identification. We employ three backbones with different combinations of vision transformers on two benchmark datasets. The overall performance of the backbones increased, showing the importance of vision transformers. We provide ablation studies and show the importance of various components of the vision transformers in re-identification tasks.

Download Full-text

Friendly Farmer

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-1414 ◽

2021 ◽

pp. 488-491

Author(s):

Ritwik Chavhan ◽

Kadir Sheikh ◽

Rishikesh Bondade ◽

Swaraj Dhanulkar ◽

Aniket Ninave ◽

...

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Food Security ◽

Convolutional Neural Networks ◽

Image Recognition ◽

Associate Degree ◽

Plant Disease ◽

State Of The Art ◽

Smallholder Farmers ◽

Definite Diagnosis

Plant disease is an ongoing challenge for smallholder farmers, which threatens income and food security. The recent revolution in smartphone penetration and computer vision models has created an opportunity for image classification in agriculture. The project focuses on providing the data relating to the pesticide/insecticide and therefore the quantity of pesticide/insecticide to be used for associate degree unhealthy crop. The user, is that the farmer clicks an image of the crop and uploads it to the server via the humanoid application. When uploading the image the farmer gets associate degree distinctive ID displayed on his application screen. The farmer must create note of that ID since that ID must be utilized by the farmer later to retrieve the message when a minute. The uploaded image is then processed by Convolutional Neural Networks. Convolutional Neural Networks (CNNs) are considered state-of-the-art in image recognition and offer the ability to provide a prompt and definite diagnosis. Then the result consisting of the malady name and therefore the affected space is retrieved. This result's then uploaded into the message table within the server. Currently the Farmer are going to be ready to retrieve the whole info during a respectable format by coming into the distinctive ID he had received within the Application.

Download Full-text

Convolutional Neural Networks Inference Memory Optimization with Receptive Field-Based InputTiling

10.21203/rs.3.rs-743636/v1 ◽

2021 ◽

Author(s):

Weihao Zhuang ◽

Tristan Hascoet ◽

Xunquan Chen ◽

Ryoichi Takashima ◽

Tetsuya Takiguchi ◽

...

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Convolutional Neural Networks ◽

Language Processing ◽

State Of The Art ◽

Input Image ◽

Memory Consumption ◽

Excellent Performance ◽

Conceptual Approach ◽

Recent Developments

Abstract Currently, deep learning plays an indispensable role in many fields, including computer vision, natural language processing, and speech recognition. Convolutional Neural Networks (CNNs) have demonstrated excellent performance in computer vision tasks thanks to their powerful feature extraction capability. However, as the larger models have shown higher accuracy, recent developments have led to state-of-the-art CNN models with increasing resource consumption. This paper investigates a conceptual approach to reduce the memory consumption of CNN inference. Our method consists of processing the input image in a sequence of carefully designed tiles within the lower subnetwork of the CNN, so as to minimize its peak memory consumption, while keeping the end-to-end computation unchanged. This method introduces a trade-off between memory consumption and computations, which is particularly suitable for high-resolution inputs. Our experimental results show that MobileNetV2 memory consumption can be reduced by up to 5.3 times with our proposed method. For ResNet50, one of the most commonly used CNN models in computer vision tasks, memory can be optimized by up to 2.3 times.

Download Full-text

Quantification of the suitable rooftop area for solar panel installation from overhead imagery using Convolutional Neural Networks

Journal of Physics Conference Series ◽

10.1088/1742-6596/2042/1/012002 ◽

2021 ◽

Vol 2042 (1) ◽

pp. 012002

Author(s):

Roberto Castello ◽

Alina Walch ◽

Raphaël Attias ◽

Riccardo Cadei ◽

Shasha Jiang ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Neural Networks ◽

Computer Vision ◽

State Of The Art ◽

Solar Panel ◽

Post Processing ◽

Processing Step ◽

Recent Method

Abstract The integration of solar technology in the built environment is realized mainly through rooftop-installed panels. In this paper, we leverage state-of-the-art Machine Learning and computer vision techniques applied on overhead images to provide a geo-localization of the available rooftop surfaces for solar panel installation. We further exploit a 3D building database to associate them to the corresponding roof geometries by means of a geospatial post-processing approach. The stand-alone Convolutional Neural Network used to segment suitable rooftop areas reaches an intersection over union of 64% and an accuracy of 93%, while a post-processing step using building database improves the rejection of false positives. The model is applied to a case study area in the canton of Geneva and the results are compared with another recent method used in the literature to derive the realistic available area.

Download Full-text

Analysis of Explainers of Black Box Deep Neural Networks for Computer Vision: A Survey

Machine Learning and Knowledge Extraction ◽

10.3390/make3040048 ◽

2021 ◽

Vol 3 (4) ◽

pp. 966-989

Author(s):

Vanessa Buhrmester ◽

David Münch ◽

Michael Arens

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Deep Neural Networks ◽

State Of The Art ◽

Black Box ◽

Complex Data ◽

Comprehensive Overview ◽

Nonlinear Structure ◽

Black Boxes ◽

Insight Into

Deep Learning is a state-of-the-art technique to make inference on extensive or complex data. As a black box model due to their multilayer nonlinear structure, Deep Neural Networks are often criticized as being non-transparent and their predictions not traceable by humans. Furthermore, the models learn from artificially generated datasets, which often do not reflect reality. By basing decision-making algorithms on Deep Neural Networks, prejudice and unfairness may be promoted unknowingly due to a lack of transparency. Hence, several so-called explanators, or explainers, have been developed. Explainers try to give insight into the inner structure of machine learning black boxes by analyzing the connection between the input and output. In this survey, we present the mechanisms and properties of explaining systems for Deep Neural Networks for Computer Vision tasks. We give a comprehensive overview about the taxonomy of related studies and compare several survey papers that deal with explainability in general. We work out the drawbacks and gaps and summarize further research ideas.

Download Full-text