Visual Object Detection for an Autonomous Indoor Robotic System

As automated vehicles have been considered one of the important trends in intelligent transportation systems, various research is being conducted to enhance their safety. In particular, the importance of technologies for the design of preventive automated driving systems, such as detection of surrounding objects and estimation of distance between vehicles. Object detection is mainly performed through cameras and LiDAR, but due to the cost and limits of LiDAR’s recognition distance, the need to improve Camera recognition technique, which is relatively convenient for commercialization, is increasing. This study learned convolutional neural network (CNN)-based faster regions with CNN (Faster R-CNN) and You Only Look Once (YOLO) V2 to improve the recognition techniques of vehicle-mounted monocular cameras for the design of preventive automated driving systems, recognizing surrounding vehicles in black box highway driving videos and estimating distances from surrounding vehicles through more suitable models for automated driving systems. Moreover, we learned the PASCAL visual object classes (VOC) dataset for model comparison. Faster R-CNN showed similar accuracy, with a mean average precision (mAP) of 76.4 to YOLO with a mAP of 78.6, but with a Frame Per Second (FPS) of 5, showing slower processing speed than YOLO V2 with an FPS of 40, and a Faster R-CNN, which we had difficulty detecting. As a result, YOLO V2, which shows better performance in accuracy and processing speed, was determined to be a more suitable model for automated driving systems, further progressing in estimating the distance between vehicles. For distance estimation, we conducted coordinate value conversion through camera calibration and perspective transform, set the threshold to 0.7, and performed object detection and distance estimation, showing more than 80% accuracy for near-distance vehicles. Through this study, it is believed that it will be able to help prevent accidents in automated vehicles, and it is expected that additional research will provide various accident prevention alternatives such as calculating and securing appropriate safety distances, depending on the vehicle types.

Download Full-text

Revisiting knowledge distillation for light-weight visual object detection

Transactions of the Institute of Measurement and Control ◽

10.1177/01423312211022877 ◽

2021 ◽

Vol 43 (13) ◽

pp. 2888-2898

Author(s):

Tianze Gao ◽

Yunfeng Gao ◽

Yu Li ◽

Peiyuan Qin

Keyword(s):

Object Detection ◽

Essential Element ◽

Detection Algorithm ◽

Positive Sample ◽

Detection Methods ◽

Visual Object ◽

Light Weight ◽

Model Compression ◽

Novel Approach ◽

Knowledge Distillation

An essential element for intelligent perception in mechatronic and robotic systems (M&RS) is the visual object detection algorithm. With the ever-increasing advance of artificial neural networks (ANN), researchers have proposed numerous ANN-based visual object detection methods that have proven to be effective. However, networks with cumbersome structures do not befit the real-time scenarios in M&RS, necessitating the techniques of model compression. In the paper, a novel approach to training light-weight visual object detection networks is developed by revisiting knowledge distillation. Traditional knowledge distillation methods are oriented towards image classification is not compatible with object detection. Therefore, a variant of knowledge distillation is developed and adapted to a state-of-the-art keypoint-based visual detection method. Two strategies named as positive sample retaining and early distribution softening are employed to yield a natural adaption. The mutual consistency between teacher model and student model is further promoted through a hint-based distillation. By extensive controlled experiments, the proposed method is testified to be effective in enhancing the light-weight network’s performance by a large margin.

Download Full-text

Uncertainty for Identifying Open-Set Errors in Visual Object Detection

IEEE Robotics and Automation Letters ◽

10.1109/lra.2021.3123374 ◽

2021 ◽

pp. 1-1

Author(s):

Dimity Miller ◽

Niko Sunderhauf ◽

Michael J Milford ◽

Feras Dayoub

Keyword(s):

Object Detection ◽

Visual Object ◽

Open Set

Download Full-text

Background Appearance Modeling with Applications to Visual Object Detection in an Open-Pit Mine

Journal of Field Robotics ◽

10.1002/rob.21667 ◽

2016 ◽

Vol 34 (1) ◽

pp. 53-73 ◽

Cited By ~ 3

Author(s):

Alex Bewley ◽

Ben Upcroft

Keyword(s):

Object Detection ◽

Open Pit Mine ◽

Open Pit ◽

Visual Object ◽

Appearance Modeling

Download Full-text

Visual object detection with deformable part models

Communications of the ACM ◽

10.1145/2494532 ◽

2013 ◽

Vol 56 (9) ◽

pp. 97-105 ◽

Cited By ~ 2

Author(s):

Pedro Felzenszwalb ◽

Ross Girshick ◽

David McAllester ◽

Deva Ramanan

Keyword(s):

Object Detection ◽

Visual Object ◽

Deformable Part Models

Download Full-text

Multi-Objective Neural Network Optimization for Visual Object Detection

Multi-Objective Machine Learning - Studies in Computational Intelligence ◽

10.1007/11399346_27 ◽

2006 ◽

pp. 629-655

Author(s):

Stefan Roth ◽

Alexander Gepperth ◽

Christian Igel

Keyword(s):

Neural Network ◽

Object Detection ◽

Network Optimization ◽

Visual Object ◽

Neural Network Optimization ◽

Multi Objective

Download Full-text

Boosted Algorithms for Visual Object Detection on Graphics Processing Units

Computer Vision – ACCV 2006 - Lecture Notes in Computer Science ◽

10.1007/11612704_26 ◽

2006 ◽

pp. 254-263 ◽

Cited By ~ 5

Author(s):

Hicham Ghorayeb ◽

Bruno Steux ◽

Claude Laurgeau

Keyword(s):

Object Detection ◽

Graphics Processing Units ◽

Visual Object ◽

Graphics Processing

Download Full-text

GC-YOLOv3: You Only Look Once with Global Context Block

Electronics ◽

10.3390/electronics9081235 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1235

Author(s):

Yang Yang ◽

Hongmin Deng

Keyword(s):

Object Detection ◽

Irrelevant Information ◽

Detection Algorithm ◽

Visual Object ◽

Detection Accuracy ◽

Feature Maps ◽

Average Precision ◽

Global Context ◽

Pascal Voc ◽

Feature Pyramid

In order to make the classification and regression of single-stage detectors more accurate, an object detection algorithm named Global Context You-Only-Look-Once v3 (GC-YOLOv3) is proposed based on the You-Only-Look-Once (YOLO) in this paper. Firstly, a better cascading model with learnable semantic fusion between a feature extraction network and a feature pyramid network is designed to improve detection accuracy using a global context block. Secondly, the information to be retained is screened by combining three different scaling feature maps together. Finally, a global self-attention mechanism is used to highlight the useful information of feature maps while suppressing irrelevant information. Experiments show that our GC-YOLOv3 reaches a maximum of 55.5 object detection mean Average Precision (mAP)@0.5 on Common Objects in Context (COCO) 2017 test-dev and that the mAP is 5.1% higher than that of the YOLOv3 algorithm on Pascal Visual Object Classes (PASCAL VOC) 2007 test set. Therefore, experiments indicate that the proposed GC-YOLOv3 model exhibits optimal performance on the PASCAL VOC and COCO datasets.

Download Full-text

Object Detection Algorithm Based on Multiheaded Attention

Applied Sciences ◽

10.3390/app9091829 ◽

2019 ◽

Vol 9 (9) ◽

pp. 1829 ◽

Cited By ~ 1

Author(s):

Jie Jiang ◽

Hui Xu ◽

Shichang Zhang ◽

Yujie Fang

Keyword(s):

Object Detection ◽

Linear Interpolation ◽

Detection Algorithm ◽

Attention Mechanism ◽

Visual Object ◽

Single Shot ◽

Object Class ◽

Feature Information ◽

Base Network ◽

Detector Model

This study proposes a multiheaded object detection algorithm referred to as MANet. The main purpose of the study is to integrate feature layers of different scales based on the attention mechanism and to enhance contextual connections. To achieve this, we first replaced the feed-forward base network of the single-shot detector with the ResNet–101 (inspired by the Deconvolutional Single-Shot Detector) and then applied linear interpolation and the attention mechanism. The information of the feature layers at different scales was fused to improve the accuracy of target detection. The primary contributions of this study are the propositions of (a) a fusion attention mechanism, and (b) a multiheaded attention fusion method. Our final MANet detector model effectively unifies the feature information among the feature layers at different scales, thus enabling it to detect objects with different sizes and with higher precision. We used the 512 × 512 input MANet (the backbone is ResNet–101) to obtain a mean accuracy of 82.7% based on the PASCAL visual object class 2007 test. These results demonstrated that our proposed method yielded better accuracy than those provided by the conventional Single-shot detector (SSD) and other advanced detectors.

Download Full-text

From ImageNet to Mining: Adapting Visual Object Detection with Minimal Supervision

Springer Tracts in Advanced Robotics - Field and Service Robotics ◽

10.1007/978-3-319-27702-8_33 ◽

2016 ◽

pp. 501-514 ◽

Cited By ~ 3

Author(s):

Alex Bewley ◽

Ben Upcroft

Keyword(s):

Object Detection ◽

Visual Object

Download Full-text