A Survey of Recent Trends in Two-Stage Object Detection Methods

Introduction:Saliency detection is a fundamental task of computer vision. Its ultimate aim is to localize the objects of interest that grab human visual attention with respect to the rest of the image. A great variety of saliency models based on different approaches was developed since 1990s. In recent years, the saliency detection has become one of actively studied topic in the theory of Convolutional Neural Network (CNN). Many original decisions using CNNs were proposed for salient object detection and, even, event detection.Purpose:A detailed survey of saliency detection methods in deep learning era allows to understand the current possibilities of CNN approach for visual analysis conducted by the human eyes’ tracking and digital image processing.Results:A survey reflects the recent advances in saliency detection using CNNs. Different models available in literature, such as static and dynamic 2D CNNs for salient object detection and 3D CNNs for salient event detection are discussed in the chronological order. It is worth noting that automatic salient event detection in durable videos became possible using the recently appeared 3D CNN combining with 2D CNN for salient audio detection. Also in this article, we have presented a short description of public image and video datasets with annotated salient objects or events, as well as the often used metrics for the results’ evaluation.Practical relevance:This survey is considered as a contribution in the study of rapidly developed deep learning methods with respect to the saliency detection in the images and videos.

A Survey on Object Detection, Annotation and Anomaly Detection Methods for Endoscopic Videos

2020 5th International Conference on Computing, Communication and Security (ICCCS) ◽

10.1109/icccs49678.2020.9277436 ◽

2020 ◽

Author(s):

Tejas Chheda ◽

Soumya Koppaka ◽

Rithvika Iyer ◽

Dhananjay Kalbande

Keyword(s):

Anomaly Detection ◽

Object Detection ◽

Detection Methods ◽

Endoscopic Videos

A Two-Stage Data Association Approach for 3D Multi-Object Tracking

Sensors ◽

10.3390/s21092894 ◽

2021 ◽

Vol 21 (9) ◽

pp. 2894

Author(s):

Minh-Quan Dao ◽

Vincent Frémont

Keyword(s):

Object Detection ◽

Object Tracking ◽

Moving Objects ◽

Data Association ◽

Autonomous Driving ◽

Tracking Accuracy ◽

Two Stage ◽

Bipartite Matching ◽

3D Object ◽

3D Object Detection

Multi-Object Tracking (MOT) is an integral part of any autonomous driving pipelines because it produces trajectories of other moving objects in the scene and predicts their future motion. Thanks to the recent advances in 3D object detection enabled by deep learning, track-by-detection has become the dominant paradigm in 3D MOT. In this paradigm, a MOT system is essentially made of an object detector and a data association algorithm which establishes track-to-detection correspondence. While 3D object detection has been actively researched, association algorithms for 3D MOT has settled at bipartite matching formulated as a Linear Assignment Problem (LAP) and solved by the Hungarian algorithm. In this paper, we adapt a two-stage data association method which was successfully applied to image-based tracking to the 3D setting, thus providing an alternative for data association for 3D MOT. Our method outperforms the baseline using one-stage bipartite matching for data association by achieving 0.587 Average Multi-Object Tracking Accuracy (AMOTA) in NuScenes validation set and 0.365 AMOTA (at level 2) in Waymo test set.

A Survey on Deep Learning Based Methods and Datasets for Monocular 3D Object Detection

Electronics ◽

10.3390/electronics10040517 ◽

2021 ◽

Vol 10 (4) ◽

pp. 517

Author(s):

Seong-heum Kim ◽

Youngbae Hwang

Keyword(s):

Deep Learning ◽

Object Detection ◽

Low Cost ◽

Detection Methods ◽

Future Research ◽

3D Object ◽

Practical Applications ◽

Depth Sensors ◽

Significant Research ◽

3D Object Detection

Owing to recent advancements in deep learning methods and relevant databases, it is becoming increasingly easier to recognize 3D objects using only RGB images from single viewpoints. This study investigates the major breakthroughs and current progress in deep learning-based monocular 3D object detection. For relatively low-cost data acquisition systems without depth sensors or cameras at multiple viewpoints, we first consider existing databases with 2D RGB photos and their relevant attributes. Based on this simple sensor modality for practical applications, deep learning-based monocular 3D object detection methods that overcome significant research challenges are categorized and summarized. We present the key concepts and detailed descriptions of representative single-stage and multiple-stage detection solutions. In addition, we discuss the effectiveness of the detection models on their baseline benchmarks. Finally, we explore several directions for future research on monocular 3D object detection.

Augmented Reality and Machine Learning Incorporation Using YOLOv3 and ARKit

Applied Sciences ◽

10.3390/app11136006 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6006

Author(s):

Huy Le ◽

Minh Nguyen ◽

Wei Qi Yan ◽

Hoa Nguyen

Keyword(s):

Machine Learning ◽

Augmented Reality ◽

Object Detection ◽

Feature Detection ◽

Detection Methods ◽

Detection Accuracy ◽

Data Annotation ◽

Machine Learning Model ◽

Potential Benefits ◽

Feature Detection And Tracking

Augmented reality is one of the fastest growing fields, receiving increased funding for the last few years as people realise the potential benefits of rendering virtual information in the real world. Most of today’s augmented reality marker-based applications use local feature detection and tracking techniques. The disadvantage of applying these techniques is that the markers must be modified to match the unique classified algorithms or they suffer from low detection accuracy. Machine learning is an ideal solution to overcome the current drawbacks of image processing in augmented reality applications. However, traditional data annotation requires extensive time and labour, as it is usually done manually. This study incorporates machine learning to detect and track augmented reality marker targets in an application using deep neural networks. We firstly implement the auto-generated dataset tool, which is used for the machine learning dataset preparation. The final iOS prototype application incorporates object detection, object tracking and augmented reality. The machine learning model is trained to recognise the differences between targets using one of YOLO’s most well-known object detection methods. The final product makes use of a valuable toolkit for developing augmented reality applications called ARKit.

TWC-Net: A SAR Ship Detection Using Two-Way Convolution and Multiscale Feature Mapping

Remote Sensing ◽

10.3390/rs13132558 ◽

2021 ◽

Vol 13 (13) ◽

pp. 2558

Author(s):

Lei Yu ◽

Haoyu Wu ◽

Zhi Zhong ◽

Liying Zheng ◽

Qiuyue Deng ◽

...

Keyword(s):

Target Detection ◽

Detection Methods ◽

Observation System ◽

Single Shot ◽

Small Target ◽

Feature Mapping ◽

Sar Images ◽

Two Stage ◽

Ship Detection ◽

Deep Feature

Synthetic aperture radar (SAR) is an active earth observation system with a certain surface penetration capability and can be employed to observations all-day and all-weather. Ship detection using SAR is of great significance to maritime safety and port management. With the wide application of in-depth learning in ordinary images and good results, an increasing number of detection algorithms began entering the field of remote sensing images. SAR image has the characteristics of small targets, high noise, and sparse targets. Two-stage detection methods, such as faster regions with convolution neural network (Faster RCNN), have good results when applied to ship target detection based on the SAR graph, but their efficiency is low and their structure requires many computing resources, so they are not suitable for real-time detection. One-stage target detection methods, such as single shot multibox detector (SSD), make up for the shortage of the two-stage algorithm in speed but lack effective use of information from different layers, so it is not as good as the two-stage algorithm in small target detection. We propose the two-way convolution network (TWC-Net) based on a two-way convolution structure and use multiscale feature mapping to process SAR images. The two-way convolution module can effectively extract the feature from SAR images, and the multiscale mapping module can effectively process shallow and deep feature information. TWC-Net can avoid the loss of small target information during the feature extraction, while guaranteeing good perception of a large target by the deep feature map. We tested the performance of our proposed method using a common SAR ship dataset SSDD. The experimental results show that our proposed method has a higher recall rate and precision, and the F-Measure is 93.32%. It has smaller parameters and memory consumption than other methods and is superior to other methods.

Revisiting knowledge distillation for light-weight visual object detection

Transactions of the Institute of Measurement and Control ◽

10.1177/01423312211022877 ◽

2021 ◽

Vol 43 (13) ◽

pp. 2888-2898

Author(s):

Tianze Gao ◽

Yunfeng Gao ◽

Yu Li ◽

Peiyuan Qin

Keyword(s):

Object Detection ◽

Essential Element ◽

Detection Algorithm ◽

Positive Sample ◽

Detection Methods ◽

Visual Object ◽

Light Weight ◽

Model Compression ◽

Novel Approach ◽

Knowledge Distillation

An essential element for intelligent perception in mechatronic and robotic systems (M&RS) is the visual object detection algorithm. With the ever-increasing advance of artificial neural networks (ANN), researchers have proposed numerous ANN-based visual object detection methods that have proven to be effective. However, networks with cumbersome structures do not befit the real-time scenarios in M&RS, necessitating the techniques of model compression. In the paper, a novel approach to training light-weight visual object detection networks is developed by revisiting knowledge distillation. Traditional knowledge distillation methods are oriented towards image classification is not compatible with object detection. Therefore, a variant of knowledge distillation is developed and adapted to a state-of-the-art keypoint-based visual detection method. Two strategies named as positive sample retaining and early distribution softening are employed to yield a natural adaption. The mutual consistency between teacher model and student model is further promoted through a hint-based distillation. By extensive controlled experiments, the proposed method is testified to be effective in enhancing the light-weight network’s performance by a large margin.

Hot Anchors: A Heuristic Anchors Sampling Method in RCNN-Based Object Detection

Sensors ◽

10.3390/s18103415 ◽

2018 ◽

Vol 18 (10) ◽

pp. 3415 ◽

Cited By ~ 2

Author(s):

Jinpeng Zhang ◽

Jinming Zhang ◽

Shan Yu

Keyword(s):

Object Detection ◽

Sampling Method ◽

Ground Truth ◽

Detection Accuracy ◽

Two Stage ◽

Baseline Model ◽

Detection Algorithms ◽

Imbalance Problem ◽

Image Object Detection ◽

Image Object

In the image object detection task, a huge number of candidate boxes are generated to match with a relatively very small amount of ground-truth boxes, and through this method the learning samples can be created. But in fact the vast majority of the candidate boxes do not contain valid object instances and should be recognized and rejected during the training and evaluation of the network. This leads to extra high computation burden and a serious imbalance problem between object and none-object samples, thereby impeding the algorithm’s performance. Here we propose a new heuristic sampling method to generate candidate boxes for two-stage detection algorithms. It is generally applicable to the current two-stage detection algorithms to improve their detection performance. Experiments on COCO dataset showed that, relative to the baseline model, this new method could significantly increase the detection accuracy and efficiency.

Taxonomy of Object Detection methods: A Survey

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2022/041012022 ◽

2022 ◽

Vol 10 (1) ◽

pp. 20-25

Keyword(s):

Object Detection ◽

Autonomous Vehicles ◽

Real Life ◽

Object Motion ◽

Detection Methods ◽

Illumination Variation ◽

Human Machine Interaction ◽

Research Areas ◽

To Come ◽

Point Detection

Object detection (OD) within a video is one of the relevant and critical research areas in the computer vision field. Due to the widespread of Artificial Intelligence, the basic principle in real life nowadays and its exponential growth predicted in the epochs to come, it will transmute the public. Object Detection has been extensively implemented in several areas, including human-machine Interaction, autonomous vehicles, security with video surveillance, and various fields that will be mentioned further. However, this augmentation of OD tackles different challenges such as occlusion, illumination variation, object motion, without ignoring the real-time aspect that can be quite problematic. This paper also includes some methods of application to take into account these issues. These techniques are divided into five subcategories: Point Detection, segmentation, supervised classifier, optical flow, a background modeling. This survey decorticates various methods and techniques used in object detection, as well as application domains and the problems faced. Our study discusses the cruciality of deep learning algorithms and their efficiency on future improvement in object detection topics within video sequences.

Two-Stage Polishing Network for Camouflaged Object Detection

10.1007/978-3-030-87355-4_31 ◽

2021 ◽

pp. 370-380

Author(s):

Xuan Jiang ◽

Zhe Wu ◽

Yajie Zhang ◽

Li Su ◽

Qingming Huang

Keyword(s):

Object Detection ◽

Two Stage