A Survey of Recent Trends in Two-Stage Object Detection Methods

Author(s):  
M. F. Ansari ◽  
K. A. Lodi
Author(s):  
M. N. Favorskaya ◽  
L. C. Jain

Introduction:Saliency detection is a fundamental task of computer vision. Its ultimate aim is to localize the objects of interest that grab human visual attention with respect to the rest of the image. A great variety of saliency models based on different approaches was developed since 1990s. In recent years, the saliency detection has become one of actively studied topic in the theory of Convolutional Neural Network (CNN). Many original decisions using CNNs were proposed for salient object detection and, even, event detection.Purpose:A detailed survey of saliency detection methods in deep learning era allows to understand the current possibilities of CNN approach for visual analysis conducted by the human eyes’ tracking and digital image processing.Results:A survey reflects the recent advances in saliency detection using CNNs. Different models available in literature, such as static and dynamic 2D CNNs for salient object detection and 3D CNNs for salient event detection are discussed in the chronological order. It is worth noting that automatic salient event detection in durable videos became possible using the recently appeared 3D CNN combining with 2D CNN for salient audio detection. Also in this article, we have presented a short description of public image and video datasets with annotated salient objects or events, as well as the often used metrics for the results’ evaluation.Practical relevance:This survey is considered as a contribution in the study of rapidly developed deep learning methods with respect to the saliency detection in the images and videos.


Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 2894
Author(s):  
Minh-Quan Dao ◽  
Vincent Frémont

Multi-Object Tracking (MOT) is an integral part of any autonomous driving pipelines because it produces trajectories of other moving objects in the scene and predicts their future motion. Thanks to the recent advances in 3D object detection enabled by deep learning, track-by-detection has become the dominant paradigm in 3D MOT. In this paradigm, a MOT system is essentially made of an object detector and a data association algorithm which establishes track-to-detection correspondence. While 3D object detection has been actively researched, association algorithms for 3D MOT has settled at bipartite matching formulated as a Linear Assignment Problem (LAP) and solved by the Hungarian algorithm. In this paper, we adapt a two-stage data association method which was successfully applied to image-based tracking to the 3D setting, thus providing an alternative for data association for 3D MOT. Our method outperforms the baseline using one-stage bipartite matching for data association by achieving 0.587 Average Multi-Object Tracking Accuracy (AMOTA) in NuScenes validation set and 0.365 AMOTA (at level 2) in Waymo test set.


Electronics ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 517
Author(s):  
Seong-heum Kim ◽  
Youngbae Hwang

Owing to recent advancements in deep learning methods and relevant databases, it is becoming increasingly easier to recognize 3D objects using only RGB images from single viewpoints. This study investigates the major breakthroughs and current progress in deep learning-based monocular 3D object detection. For relatively low-cost data acquisition systems without depth sensors or cameras at multiple viewpoints, we first consider existing databases with 2D RGB photos and their relevant attributes. Based on this simple sensor modality for practical applications, deep learning-based monocular 3D object detection methods that overcome significant research challenges are categorized and summarized. We present the key concepts and detailed descriptions of representative single-stage and multiple-stage detection solutions. In addition, we discuss the effectiveness of the detection models on their baseline benchmarks. Finally, we explore several directions for future research on monocular 3D object detection.


2021 ◽  
Vol 11 (13) ◽  
pp. 6006
Author(s):  
Huy Le ◽  
Minh Nguyen ◽  
Wei Qi Yan ◽  
Hoa Nguyen

Augmented reality is one of the fastest growing fields, receiving increased funding for the last few years as people realise the potential benefits of rendering virtual information in the real world. Most of today’s augmented reality marker-based applications use local feature detection and tracking techniques. The disadvantage of applying these techniques is that the markers must be modified to match the unique classified algorithms or they suffer from low detection accuracy. Machine learning is an ideal solution to overcome the current drawbacks of image processing in augmented reality applications. However, traditional data annotation requires extensive time and labour, as it is usually done manually. This study incorporates machine learning to detect and track augmented reality marker targets in an application using deep neural networks. We firstly implement the auto-generated dataset tool, which is used for the machine learning dataset preparation. The final iOS prototype application incorporates object detection, object tracking and augmented reality. The machine learning model is trained to recognise the differences between targets using one of YOLO’s most well-known object detection methods. The final product makes use of a valuable toolkit for developing augmented reality applications called ARKit.


2021 ◽  
Vol 13 (13) ◽  
pp. 2558
Author(s):  
Lei Yu ◽  
Haoyu Wu ◽  
Zhi Zhong ◽  
Liying Zheng ◽  
Qiuyue Deng ◽  
...  

Synthetic aperture radar (SAR) is an active earth observation system with a certain surface penetration capability and can be employed to observations all-day and all-weather. Ship detection using SAR is of great significance to maritime safety and port management. With the wide application of in-depth learning in ordinary images and good results, an increasing number of detection algorithms began entering the field of remote sensing images. SAR image has the characteristics of small targets, high noise, and sparse targets. Two-stage detection methods, such as faster regions with convolution neural network (Faster RCNN), have good results when applied to ship target detection based on the SAR graph, but their efficiency is low and their structure requires many computing resources, so they are not suitable for real-time detection. One-stage target detection methods, such as single shot multibox detector (SSD), make up for the shortage of the two-stage algorithm in speed but lack effective use of information from different layers, so it is not as good as the two-stage algorithm in small target detection. We propose the two-way convolution network (TWC-Net) based on a two-way convolution structure and use multiscale feature mapping to process SAR images. The two-way convolution module can effectively extract the feature from SAR images, and the multiscale mapping module can effectively process shallow and deep feature information. TWC-Net can avoid the loss of small target information during the feature extraction, while guaranteeing good perception of a large target by the deep feature map. We tested the performance of our proposed method using a common SAR ship dataset SSDD. The experimental results show that our proposed method has a higher recall rate and precision, and the F-Measure is 93.32%. It has smaller parameters and memory consumption than other methods and is superior to other methods.


2021 ◽  
Vol 43 (13) ◽  
pp. 2888-2898
Author(s):  
Tianze Gao ◽  
Yunfeng Gao ◽  
Yu Li ◽  
Peiyuan Qin

An essential element for intelligent perception in mechatronic and robotic systems (M&RS) is the visual object detection algorithm. With the ever-increasing advance of artificial neural networks (ANN), researchers have proposed numerous ANN-based visual object detection methods that have proven to be effective. However, networks with cumbersome structures do not befit the real-time scenarios in M&RS, necessitating the techniques of model compression. In the paper, a novel approach to training light-weight visual object detection networks is developed by revisiting knowledge distillation. Traditional knowledge distillation methods are oriented towards image classification is not compatible with object detection. Therefore, a variant of knowledge distillation is developed and adapted to a state-of-the-art keypoint-based visual detection method. Two strategies named as positive sample retaining and early distribution softening are employed to yield a natural adaption. The mutual consistency between teacher model and student model is further promoted through a hint-based distillation. By extensive controlled experiments, the proposed method is testified to be effective in enhancing the light-weight network’s performance by a large margin.


Sensors ◽  
2018 ◽  
Vol 18 (10) ◽  
pp. 3415 ◽  
Author(s):  
Jinpeng Zhang ◽  
Jinming Zhang ◽  
Shan Yu

In the image object detection task, a huge number of candidate boxes are generated to match with a relatively very small amount of ground-truth boxes, and through this method the learning samples can be created. But in fact the vast majority of the candidate boxes do not contain valid object instances and should be recognized and rejected during the training and evaluation of the network. This leads to extra high computation burden and a serious imbalance problem between object and none-object samples, thereby impeding the algorithm’s performance. Here we propose a new heuristic sampling method to generate candidate boxes for two-stage detection algorithms. It is generally applicable to the current two-stage detection algorithms to improve their detection performance. Experiments on COCO dataset showed that, relative to the baseline model, this new method could significantly increase the detection accuracy and efficiency.


Object detection (OD) within a video is one of the relevant and critical research areas in the computer vision field. Due to the widespread of Artificial Intelligence, the basic principle in real life nowadays and its exponential growth predicted in the epochs to come, it will transmute the public. Object Detection has been extensively implemented in several areas, including human-machine Interaction, autonomous vehicles, security with video surveillance, and various fields that will be mentioned further. However, this augmentation of OD tackles different challenges such as occlusion, illumination variation, object motion, without ignoring the real-time aspect that can be quite problematic. This paper also includes some methods of application to take into account these issues. These techniques are divided into five subcategories: Point Detection, segmentation, supervised classifier, optical flow, a background modeling. This survey decorticates various methods and techniques used in object detection, as well as application domains and the problems faced. Our study discusses the cruciality of deep learning algorithms and their efficiency on future improvement in object detection topics within video sequences.


2021 ◽  
pp. 370-380
Author(s):  
Xuan Jiang ◽  
Zhe Wu ◽  
Yajie Zhang ◽  
Li Su ◽  
Qingming Huang
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document