Deep Transfer Learning Based Multiway Feature Pyramid Network for Object Detection in Images

Object detection is being widely used in many fields, and therefore, the demand for more accurate and fast methods for object detection is also increasing. In this paper, we propose a method for object detection in digital images that is more accurate and faster. The proposed model is based on Single-Stage Multibox Detector (SSD) architecture. This method creates many anchor boxes of various aspect ratios based on the backbone network and multiscale feature network and calculates the classes and balances of the anchor boxes to detect objects at various scales. Instead of the VGG16-based deep transfer learning model in SSD, we have used a more efficient base network, i.e., EfficientNet. Detection of objects of different sizes is still an inspiring task. We have used Multiway Feature Pyramid Network (MFPN) to solve this problem. The input to the base network is given to MFPN, and then, the fused features are given to bounding box prediction and class prediction networks. Softer-NMS is applied instead of NMS in SSD to reduce the number of bounding boxes gently. The proposed method is validated on MSCOCO 2017, PASCAL VOC 2007, and PASCAL VOC 2012 datasets and compared to existing state-of-the-art techniques. Our method shows better detection quality in terms of mean Average Precision (mAP).

Download Full-text

A new multi-scale backbone network for object detection based on asymmetric convolutions

Science Progress ◽

10.1177/00368504211011343 ◽

2021 ◽

Vol 104 (2) ◽

pp. 003685042110113

Author(s):

Xianghua Ma ◽

Zhenkun Yang

Keyword(s):

Object Detection ◽

Image Features ◽

Detection Accuracy ◽

Mobile Platforms ◽

Multi Scale ◽

Backbone Network ◽

Aspect Ratios ◽

Pascal Voc ◽

Scale Characteristics ◽

Detection Speed

Real-time object detection on mobile platforms is a crucial but challenging computer vision task. However, it is widely recognized that although the lightweight object detectors have a high detection speed, the detection accuracy is relatively low. In order to improve detecting accuracy, it is beneficial to extract complete multi-scale image features in visual cognitive tasks. Asymmetric convolutions have a useful quality, that is, they have different aspect ratios, which can be used to exact image features of objects, especially objects with multi-scale characteristics. In this paper, we exploit three different asymmetric convolutions in parallel and propose a new multi-scale asymmetric convolution unit, namely MAC block to enhance multi-scale representation ability of CNNs. In addition, MAC block can adaptively merge the features with different scales by allocating learnable weighted parameters to three different asymmetric convolution branches. The proposed MAC blocks can be inserted into the state-of-the-art backbone such as ResNet-50 to form a new multi-scale backbone network of object detectors. To evaluate the performance of MAC block, we conduct experiments on CIFAR-100, PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO 2014 datasets. Experimental results show that the detection precision can be greatly improved while a fast detection speed is guaranteed as well.

Download Full-text

GC-YOLOv3: You Only Look Once with Global Context Block

Electronics ◽

10.3390/electronics9081235 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1235

Author(s):

Yang Yang ◽

Hongmin Deng

Keyword(s):

Object Detection ◽

Irrelevant Information ◽

Detection Algorithm ◽

Visual Object ◽

Detection Accuracy ◽

Feature Maps ◽

Average Precision ◽

Global Context ◽

Pascal Voc ◽

Feature Pyramid

In order to make the classification and regression of single-stage detectors more accurate, an object detection algorithm named Global Context You-Only-Look-Once v3 (GC-YOLOv3) is proposed based on the You-Only-Look-Once (YOLO) in this paper. Firstly, a better cascading model with learnable semantic fusion between a feature extraction network and a feature pyramid network is designed to improve detection accuracy using a global context block. Secondly, the information to be retained is screened by combining three different scaling feature maps together. Finally, a global self-attention mechanism is used to highlight the useful information of feature maps while suppressing irrelevant information. Experiments show that our GC-YOLOv3 reaches a maximum of 55.5 object detection mean Average Precision (mAP)@0.5 on Common Objects in Context (COCO) 2017 test-dev and that the mAP is 5.1% higher than that of the YOLOv3 algorithm on Pascal Visual Object Classes (PASCAL VOC) 2007 test set. Therefore, experiments indicate that the proposed GC-YOLOv3 model exhibits optimal performance on the PASCAL VOC and COCO datasets.

Download Full-text

ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6945 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12557-12564 ◽

Cited By ~ 4

Author(s):

Zhenbo Xu ◽

Wei Zhang ◽

Xiaoqing Ye ◽

Xiao Tan ◽

Wei Yang ◽

...

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Disparity Estimation ◽

3D Object ◽

Detection Model ◽

Occluded Objects ◽

Bounding Boxes ◽

Detection Quality ◽

3D Object Detection

3D object detection is an essential task in autonomous driving and robotics. Though great progress has been made, challenges remain in estimating 3D pose for distant and occluded objects. In this paper, we present a novel framework named ZoomNet for stereo imagery-based 3D detection. The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes. To further exploit the abundant texture cues in rgb images for more accurate disparity estimation, we introduce a conceptually straight-forward module – adaptive zooming, which simultaneously resizes 2D instance bounding boxes to a unified resolution and adjusts the camera intrinsic parameters accordingly. In this way, we are able to estimate higher-quality disparity maps from the resized box images then construct dense point clouds for both nearby and distant objects. Moreover, we introduce to learn part locations as complementary features to improve the resistance against occlusion and put forward the 3D fitting score to better estimate the 3D detection quality. Extensive experiments on the popular KITTI 3D detection dataset indicate ZoomNet surpasses all previous state-of-the-art methods by large margins (improved by 9.4% on APbv (IoU=0.7) over pseudo-LiDAR). Ablation study also demonstrates that our adaptive zooming strategy brings an improvement of over 10% on AP3d (IoU=0.7). In addition, since the official KITTI benchmark lacks fine-grained annotations like pixel-wise part locations, we also present our KFG dataset by augmenting KITTI with detailed instance-wise annotations including pixel-wise part location, pixel-wise disparity, etc.. Both the KFG dataset and our codes will be publicly available at https://github.com/detectRecog/ZoomNet.

Download Full-text

Object Detection in Autonomous Driving Scenarios Based on an Improved Faster-RCNN

Applied Sciences ◽

10.3390/app112411630 ◽

2021 ◽

Vol 11 (24) ◽

pp. 11630

Author(s):

Yan Zhou ◽

Sijie Wen ◽

Dongli Wang ◽

Jinzhen Mu ◽

Irampaye Richard

Keyword(s):

Object Detection ◽

Autonomous Driving ◽

Detection Algorithm ◽

Data Sets ◽

False Detection ◽

Automatic Driving ◽

Occluded Objects ◽

Pyramid Structure ◽

Feature Pyramid ◽

Bounding Boxes

Object detection is one of the key algorithms in automatic driving systems. Aiming at addressing the problem of false detection and the missed detection of both small and occluded objects in automatic driving scenarios, an improved Faster-RCNN object detection algorithm is proposed. First, deformable convolution and a spatial attention mechanism are used to improve the ResNet-50 backbone network to enhance the feature extraction of small objects; then, an improved feature pyramid structure is introduced to reduce the loss of features in the fusion process. Three cascade detectors are introduced to solve the problem of IOU (Intersection-Over-Union) threshold mismatch, and side-aware boundary localization is applied for frame regression. Finally, Soft-NMS (Soft Non-maximum Suppression) is used to remove bounding boxes to obtain the best results. The experimental results show that the improved Faster-RCNN can better detect small objects and occluded objects, and its accuracy is 7.7% and 4.1% respectively higher than that of the baseline in the eight categories selected from the COCO2017 and BDD100k data sets.

Download Full-text

Mask OBB: A Semantic Attention-Based Mask Oriented Bounding Box Representation for Multi-Category Object Detection in Aerial Images

Remote Sensing ◽

10.3390/rs11242930 ◽

2019 ◽

Vol 11 (24) ◽

pp. 2930 ◽

Cited By ~ 6

Author(s):

Jinwang Wang ◽

Jian Ding ◽

Haowen Guo ◽

Wensheng Cheng ◽

Ting Pan ◽

...

Keyword(s):

Object Detection ◽

Empirical Studies ◽

Classification Problem ◽

Aerial Images ◽

Detection Methods ◽

Detection Accuracy ◽

Lateral Connection ◽

Feature Pyramid ◽

Bounding Boxes ◽

Definition Of

Object detection in aerial images is a fundamental yet challenging task in remote sensing field. As most objects in aerial images are in arbitrary orientations, oriented bounding boxes (OBBs) have a great superiority compared with traditional horizontal bounding boxes (HBBs). However, the regression-based OBB detection methods always suffer from ambiguity in the definition of learning targets, which will decrease the detection accuracy. In this paper, we provide a comprehensive analysis of OBB representations and cast the OBB regression as a pixel-level classification problem, which can largely eliminate the ambiguity. The predicted masks are subsequently used to generate OBBs. To handle huge scale changes of objects in aerial images, an Inception Lateral Connection Network (ILCN) is utilized to enhance the Feature Pyramid Network (FPN). Furthermore, a Semantic Attention Network (SAN) is adopted to provide the semantic feature, which can help distinguish the object of interest from the cluttered background effectively. Empirical studies show that the entire method is simple yet efficient. Experimental results on two widely used datasets, i.e., DOTA and HRSC2016, demonstrate that the proposed method outperforms state-of-the-art methods.

Download Full-text

Self-Adaptive Aspect Ratio Anchor for Oriented Object Detection in Remote Sensing Images

Remote Sensing ◽

10.3390/rs13071318 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1318

Author(s):

Jie-Bo Hou ◽

Xiaobin Zhu ◽

Xu-Cheng Yin

Keyword(s):

Remote Sensing ◽

Aspect Ratio ◽

Object Detection ◽

Detection Methods ◽

Remote Sensing Images ◽

Feature Maps ◽

Aspect Ratios ◽

Feature Pyramid ◽

Oriented Object ◽

Self Adaptive

Object detection is a significant and challenging problem in the study of remote sensing. Since remote sensing images are typically captured with a bird’s-eye view, the aspect ratios of objects in the same category may obey a Gaussian distribution. Generally, existing object detection methods ignore exploring the distribution character of aspect ratios for improving performance in remote sensing tasks. In this paper, we propose a novel Self-Adaptive Aspect Ratio Anchor (SARA) to explicitly explore aspect ratio variations of objects in remote sensing images. To be concrete, our SARA can self-adaptively learn an appropriate aspect ratio for each category. In this way, we can only utilize a simple squared anchor (related to the strides of feature maps in Feature Pyramid Networks) to regress objects in various aspect ratios. Finally, we adopt an Oriented Box Decoder (OBD) to align the feature maps and encode the orientation information of oriented objects. Our method achieves a promising mAP value of 79.91% on the DOTA dataset.

Download Full-text

Object Detection Using Stacked YOLOv3

Ingénierie des systèmes d information ◽

10.18280/isi.250517 ◽

2020 ◽

Vol 25 (5) ◽

pp. 691-697

Author(s):

Sai Shilpa Padmanabula ◽

Ramya Chowdary Puvvada ◽

Venkatramaphanikumar Sistla ◽

Venkata Krishna Kishore Kolli

Keyword(s):

Object Detection ◽

Detection Probability ◽

Learning Approaches ◽

Proposed Model ◽

Detection Of Objects ◽

Bounding Boxes ◽

Probability Bounding ◽

Range Of Values ◽

Maximal Suppression ◽

Better Than

Object detection is a stimulating task in the applications of computer vision. It is gaining a lot of attention in many real-time applications such as detection of number plates of suspect cars, identifying trespassers under surveillance areas, detecting unmasked faces in security gates during the COVID-19 period, etc. Region-based Convolution Neural Networks(R-CNN), You only Look once (YOLO) based CNNs, etc., comes under Deep Learning approaches. In this proposed work, an improved stacked Yolov3 model is designed for the detection of objects by bounding boxes. Hyperparameters are tuned to get optimum performance. The proposed model evaluated using the COCO dataset, and the performance is better than other existing object detection models. Anchor boxes are used for overlapping objects. After removing all the predicted bounding boxes that have a low detection probability, bounding boxes with the highest detection probability are selected and eliminated all the bounding boxes whose Intersection Over Union value is higher than 0.4. Non-Maximal Suppression (NMS) is used to only keep the best bounding box. In this experimentation, we have tried with various range of values, but finally got better result at threshold 0.5.

Download Full-text

Multi Angle Rotation Object Detection for Remote Sensing Image Based on Modified Feature Pyramid Networks

International Journal of Remote Sensing ◽

10.1080/01431161.2021.1910371 ◽

2021 ◽

Vol 42 (14) ◽

pp. 5257-5280

Author(s):

Lianyu Cao ◽

Xiaolu Zhang ◽

Zhaoshun Wang ◽

Guangyu Ding

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Remote Sensing Image ◽

Angle Rotation ◽

Feature Pyramid

Download Full-text

A Two-Phase Fashion Apparel Detection Method Based on YOLOv4

Applied Sciences ◽

10.3390/app11093782 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3782

Author(s):

Chu-Hui Lee ◽

Chen-Wei Lin

Keyword(s):

Object Detection ◽

Transfer Learning ◽

Detection Method ◽

Phase Transfer ◽

Recognition Task ◽

Phase Detection ◽

Target Domain ◽

Two Phase ◽

Detection Technology ◽

Fashion Apparel

Object detection is one of the important technologies in the field of computer vision. In the area of fashion apparel, object detection technology has various applications, such as apparel recognition, apparel detection, fashion recommendation, and online search. The recognition task is difficult for a computer because fashion apparel images have different characteristics of clothing appearance and material. Currently, fast and accurate object detection is the most important goal in this field. In this study, we proposed a two-phase fashion apparel detection method named YOLOv4-TPD (YOLOv4 Two-Phase Detection), based on the YOLOv4 algorithm, to address this challenge. The target categories for model detection were divided into the jacket, top, pants, skirt, and bag. According to the definition of inductive transfer learning, the purpose was to transfer the knowledge from the source domain to the target domain that could improve the effect of tasks in the target domain. Therefore, we used the two-phase training method to implement the transfer learning. Finally, the experimental results showed that the mAP of our model was better than the original YOLOv4 model through the two-phase transfer learning. The proposed model has multiple potential applications, such as an automatic labeling system, style retrieval, and similarity detection.

Download Full-text

Multiscale Object Detection from Drone Imagery Using Ensemble Transfer Learning

Drones ◽

10.3390/drones5030066 ◽

2021 ◽

Vol 5 (3) ◽

pp. 66

Author(s):

Rahee Walambe ◽

Aboli Marathe ◽

Ketan Kotecha

Keyword(s):

Object Detection ◽

Transfer Learning ◽

Data Augmentation ◽

Test Time ◽

Complex Task ◽

Open Domain ◽

End User ◽

Aerial Vehicle ◽

Uav Images ◽

Voting Strategy

Object detection in uncrewed aerial vehicle (UAV) images has been a longstanding challenge in the field of computer vision. Specifically, object detection in drone images is a complex task due to objects of various scales such as humans, buildings, water bodies, and hills. In this paper, we present an implementation of ensemble transfer learning to enhance the performance of the base models for multiscale object detection in drone imagery. Combined with a test-time augmentation pipeline, the algorithm combines different models and applies voting strategies to detect objects of various scales in UAV images. The data augmentation also presents a solution to the deficiency of drone image datasets. We experimented with two specific datasets in the open domain: the VisDrone dataset and the AU-AIR Dataset. Our approach is more practical and efficient due to the use of transfer learning and two-level voting strategy ensemble instead of training custom models on entire datasets. The experimentation shows significant improvement in the mAP for both VisDrone and AU-AIR datasets by employing the ensemble transfer learning method. Furthermore, the utilization of voting strategies further increases the 3reliability of the ensemble as the end-user can select and trace the effects of the mechanism for bounding box predictions.

Download Full-text