scholarly journals From ImageNet to Mining: Adapting Visual Object Detection with Minimal Supervision

Author(s):  
Alex Bewley ◽  
Ben Upcroft
Electronics ◽  
2021 ◽  
Vol 10 (14) ◽  
pp. 1737
Author(s):  
Wooseop Lee ◽  
Min-Hee Kang ◽  
Jaein Song ◽  
Keeyeon Hwang

As automated vehicles have been considered one of the important trends in intelligent transportation systems, various research is being conducted to enhance their safety. In particular, the importance of technologies for the design of preventive automated driving systems, such as detection of surrounding objects and estimation of distance between vehicles. Object detection is mainly performed through cameras and LiDAR, but due to the cost and limits of LiDAR’s recognition distance, the need to improve Camera recognition technique, which is relatively convenient for commercialization, is increasing. This study learned convolutional neural network (CNN)-based faster regions with CNN (Faster R-CNN) and You Only Look Once (YOLO) V2 to improve the recognition techniques of vehicle-mounted monocular cameras for the design of preventive automated driving systems, recognizing surrounding vehicles in black box highway driving videos and estimating distances from surrounding vehicles through more suitable models for automated driving systems. Moreover, we learned the PASCAL visual object classes (VOC) dataset for model comparison. Faster R-CNN showed similar accuracy, with a mean average precision (mAP) of 76.4 to YOLO with a mAP of 78.6, but with a Frame Per Second (FPS) of 5, showing slower processing speed than YOLO V2 with an FPS of 40, and a Faster R-CNN, which we had difficulty detecting. As a result, YOLO V2, which shows better performance in accuracy and processing speed, was determined to be a more suitable model for automated driving systems, further progressing in estimating the distance between vehicles. For distance estimation, we conducted coordinate value conversion through camera calibration and perspective transform, set the threshold to 0.7, and performed object detection and distance estimation, showing more than 80% accuracy for near-distance vehicles. Through this study, it is believed that it will be able to help prevent accidents in automated vehicles, and it is expected that additional research will provide various accident prevention alternatives such as calculating and securing appropriate safety distances, depending on the vehicle types.


2021 ◽  
Vol 43 (13) ◽  
pp. 2888-2898
Author(s):  
Tianze Gao ◽  
Yunfeng Gao ◽  
Yu Li ◽  
Peiyuan Qin

An essential element for intelligent perception in mechatronic and robotic systems (M&RS) is the visual object detection algorithm. With the ever-increasing advance of artificial neural networks (ANN), researchers have proposed numerous ANN-based visual object detection methods that have proven to be effective. However, networks with cumbersome structures do not befit the real-time scenarios in M&RS, necessitating the techniques of model compression. In the paper, a novel approach to training light-weight visual object detection networks is developed by revisiting knowledge distillation. Traditional knowledge distillation methods are oriented towards image classification is not compatible with object detection. Therefore, a variant of knowledge distillation is developed and adapted to a state-of-the-art keypoint-based visual detection method. Two strategies named as positive sample retaining and early distribution softening are employed to yield a natural adaption. The mutual consistency between teacher model and student model is further promoted through a hint-based distillation. By extensive controlled experiments, the proposed method is testified to be effective in enhancing the light-weight network’s performance by a large margin.


Author(s):  
Dimity Miller ◽  
Niko Sunderhauf ◽  
Michael J Milford ◽  
Feras Dayoub

2013 ◽  
Vol 56 (9) ◽  
pp. 97-105 ◽  
Author(s):  
Pedro Felzenszwalb ◽  
Ross Girshick ◽  
David McAllester ◽  
Deva Ramanan

Electronics ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1235
Author(s):  
Yang Yang ◽  
Hongmin Deng

In order to make the classification and regression of single-stage detectors more accurate, an object detection algorithm named Global Context You-Only-Look-Once v3 (GC-YOLOv3) is proposed based on the You-Only-Look-Once (YOLO) in this paper. Firstly, a better cascading model with learnable semantic fusion between a feature extraction network and a feature pyramid network is designed to improve detection accuracy using a global context block. Secondly, the information to be retained is screened by combining three different scaling feature maps together. Finally, a global self-attention mechanism is used to highlight the useful information of feature maps while suppressing irrelevant information. Experiments show that our GC-YOLOv3 reaches a maximum of 55.5 object detection mean Average Precision (mAP)@0.5 on Common Objects in Context (COCO) 2017 test-dev and that the mAP is 5.1% higher than that of the YOLOv3 algorithm on Pascal Visual Object Classes (PASCAL VOC) 2007 test set. Therefore, experiments indicate that the proposed GC-YOLOv3 model exhibits optimal performance on the PASCAL VOC and COCO datasets.


2019 ◽  
Vol 9 (9) ◽  
pp. 1829 ◽  
Author(s):  
Jie Jiang ◽  
Hui Xu ◽  
Shichang Zhang ◽  
Yujie Fang

This study proposes a multiheaded object detection algorithm referred to as MANet. The main purpose of the study is to integrate feature layers of different scales based on the attention mechanism and to enhance contextual connections. To achieve this, we first replaced the feed-forward base network of the single-shot detector with the ResNet–101 (inspired by the Deconvolutional Single-Shot Detector) and then applied linear interpolation and the attention mechanism. The information of the feature layers at different scales was fused to improve the accuracy of target detection. The primary contributions of this study are the propositions of (a) a fusion attention mechanism, and (b) a multiheaded attention fusion method. Our final MANet detector model effectively unifies the feature information among the feature layers at different scales, thus enabling it to detect objects with different sizes and with higher precision. We used the 512 × 512 input MANet (the backbone is ResNet–101) to obtain a mean accuracy of 82.7% based on the PASCAL visual object class 2007 test. These results demonstrated that our proposed method yielded better accuracy than those provided by the conventional Single-shot detector (SSD) and other advanced detectors.


Proceedings ◽  
2020 ◽  
Vol 39 (1) ◽  
pp. 18
Author(s):  
Nenchoo ◽  
Tantrairatn

This paper presents an estimation of 3D UAV position in real-time condition by using Intel RealSense Depth camera D435i with visual object detection technique as a local positioning system for indoor environment. Nowadays, global positioning system or GPS is able to specify UAV position for outdoor environment. However, for indoor environment GPS hasn’t a capability to determine UAV position. Therefore, Depth stereo camera D435i is proposed to observe on ground to specify UAV position for indoor environment instead of GPS. Using deep learning for object detection to identify target object with depth camera to specifies 2D position of target object. In addition, depth position is estimated by stereo camera and target size. For experiment, Parrot Bebop2 as a target object is detected by using YOLOv3 as a real-time object detection system. However, trained Fully Convolutional Neural Networks (FCNNs) model is considerably significant for object detection, thus the model has been trained for bebop2 only. To conclude, this proposed system is able to specifies 3D position of bebop2 for indoor environment. For future work, this research will be developed and apply for visualized navigation control of drone swarm.


Sign in / Sign up

Export Citation Format

Share Document