scholarly journals Object Detection Based on Region Decomposition and Assembly

Author(s):  
Seung-Hwan Bae

Region-based object detection infers object regions for one or more categories in an image. Due to the recent advances in deep learning and region proposal methods, object detectors based on convolutional neural networks (CNNs) have been flourishing and provided the promising detection results. However, the detection accuracy is degraded often because of the low discriminability of object CNN features caused by occlusions and inaccurate region proposals. In this paper, we therefore propose a region decomposition and assembly detector (R-DAD) for more accurate object detection.In the proposed R-DAD, we first decompose an object region into multiple small regions. To capture an entire appearance and part details of the object jointly, we extract CNN features within the whole object region and decomposed regions. We then learn the semantic relations between the object and its parts by combining the multi-region features stage by stage with region assembly blocks, and use the combined and high-level semantic features for the object classification and localization. In addition, for more accurate region proposals, we propose a multi-scale proposal layer that can generate object proposals of various scales. We integrate the R-DAD into several feature extractors, and prove the distinct performance improvement on PASCAL07/12 and MSCOCO18 compared to the recent convolutional detectors.

2021 ◽  
Vol 104 (2) ◽  
pp. 003685042110113
Author(s):  
Xianghua Ma ◽  
Zhenkun Yang

Real-time object detection on mobile platforms is a crucial but challenging computer vision task. However, it is widely recognized that although the lightweight object detectors have a high detection speed, the detection accuracy is relatively low. In order to improve detecting accuracy, it is beneficial to extract complete multi-scale image features in visual cognitive tasks. Asymmetric convolutions have a useful quality, that is, they have different aspect ratios, which can be used to exact image features of objects, especially objects with multi-scale characteristics. In this paper, we exploit three different asymmetric convolutions in parallel and propose a new multi-scale asymmetric convolution unit, namely MAC block to enhance multi-scale representation ability of CNNs. In addition, MAC block can adaptively merge the features with different scales by allocating learnable weighted parameters to three different asymmetric convolution branches. The proposed MAC blocks can be inserted into the state-of-the-art backbone such as ResNet-50 to form a new multi-scale backbone network of object detectors. To evaluate the performance of MAC block, we conduct experiments on CIFAR-100, PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO 2014 datasets. Experimental results show that the detection precision can be greatly improved while a fast detection speed is guaranteed as well.


2019 ◽  
Vol 11 (18) ◽  
pp. 2176 ◽  
Author(s):  
Chen ◽  
Zhong ◽  
Tan

Detecting objects in aerial images is a challenging task due to multiple orientations and relatively small size of the objects. Although many traditional detection models have demonstrated an acceptable performance by using the imagery pyramid and multiple templates in a sliding-window manner, such techniques are inefficient and costly. Recently, convolutional neural networks (CNNs) have successfully been used for object detection, and they have demonstrated considerably superior performance than that of traditional detection methods; however, this success has not been expanded to aerial images. To overcome such problems, we propose a detection model based on two CNNs. One of the CNNs is designed to propose many object-like regions that are generated from the feature maps of multi scales and hierarchies with the orientation information. Based on such a design, the positioning of small size objects becomes more accurate, and the generated regions with orientation information are more suitable for the objects arranged with arbitrary orientations. Furthermore, another CNN is designed for object recognition; it first extracts the features of each generated region and subsequently makes the final decisions. The results of the extensive experiments performed on the vehicle detection in aerial imagery (VEDAI) and overhead imagery research data set (OIRDS) datasets indicate that the proposed model performs well in terms of not only the detection accuracy but also the detection speed.


2020 ◽  
Vol 34 (07) ◽  
pp. 10599-10606 ◽  
Author(s):  
Zuyao Chen ◽  
Qianqian Xu ◽  
Runmin Cong ◽  
Qingming Huang

Deep convolutional neural networks have achieved competitive performance in salient object detection, in which how to learn effective and comprehensive features plays a critical role. Most of the previous works mainly adopted multiple-level feature integration yet ignored the gap between different features. Besides, there also exists a dilution process of high-level features as they passed on the top-down pathway. To remedy these issues, we propose a novel network named GCPANet to effectively integrate low-level appearance features, high-level semantic features, and global context features through some progressive context-aware Feature Interweaved Aggregation (FIA) modules and generate the saliency map in a supervised way. Moreover, a Head Attention (HA) module is used to reduce information redundancy and enhance the top layers features by leveraging the spatial and channel-wise attention, and the Self Refinement (SR) module is utilized to further refine and heighten the input features. Furthermore, we design the Global Context Flow (GCF) module to generate the global context information at different stages, which aims to learn the relationship among different salient regions and alleviate the dilution effect of high-level features. Experimental results on six benchmark datasets demonstrate that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.


Author(s):  
Kai Zhao ◽  
Wei Shen ◽  
Shanghua Gao ◽  
Dandan Li ◽  
Ming-Ming Cheng

In natural images, the scales (thickness) of object skeletons may dramatically vary among objects and object parts. Thus, robust skeleton detection requires powerful multi-scale feature integration ability. To address this issue, we present a new convolutional neural network (CNN) architecture by introducing a novel hierarchical feature integration mechanism, named Hi-Fi, to address the object skeleton detection problem. The proposed CNN-based approach intrinsically captures high-level semantics from deeper layers, as well as low-level details from shallower layers. By hierarchically integrating different CNN feature levels with bidirectional guidance, our approach (1) enables mutual refinement across features of different levels, and (2) possesses the strong ability to capture both rich object context and high-resolution details. Experimental results show that our method significantly outperforms the state-of-the-art methods in terms of effectively fusing features from very different scales, as evidenced by a considerable performance improvement on several benchmarks.


Author(s):  
Yuxia Wang ◽  
Wenzhu Yang ◽  
Tongtong Yuan ◽  
Qian Li

Lower detection accuracy and insufficient detection ability for small objects are the main problems of the region-free object detection algorithm. Aiming at solving the abovementioned problems, an improved object detection method using feature map refinement and anchor optimization is proposed. Firstly, the reverse fusion operation is performed on each of the object detection layer, which can provide the lower layers with more semantic information by the fusion of detection features at different levels. Secondly, the self-attention module is used to refine each detection feature map, calibrates the features between channels, and enhances the expression ability of local features. In addition, the anchor optimization model is introduced on each feature layer associated with anchors, and the anchors with higher probability of containing an object and more closely match the location and size of the object are obtained. In this model, semantic features are used to confirm and remove negative anchors to reduce search space of the objects, and preliminary adjustments are made to the locations and sizes of anchors. Comprehensive experimental results on PASCAL VOC detection dataset demonstrate the effectiveness of the proposed method. In particular, with VGG-16 and lower dimension 300×300 input size, the proposed method achieves a mAP of 79.1% on VOC 2007 test set with an inference speed of 24.7 milliseconds per image.


Sensors ◽  
2019 ◽  
Vol 19 (16) ◽  
pp. 3523 ◽  
Author(s):  
Lili Zhang ◽  
Yi Zhang ◽  
Zhen Zhang ◽  
Jie Shen ◽  
Huibin Wang

In this paper, we consider water surface object detection in natural scenes. Generally, background subtraction and image segmentation are the classical object detection methods. The former is highly susceptible to variable scenes, so its accuracy will be greatly reduced when detecting water surface objects due to the changing of the sunlight and waves. The latter is more sensitive to the selection of object features, which will lead to poor generalization as a result, so it cannot be applied widely. Consequently, methods based on deep learning have recently been proposed. The River Chief System has been implemented in China recently, and one of the important requirements is to detect and deal with the water surface floats in a timely fashion. In response to this case, we propose a real-time water surface object detection method in this paper which is based on the Faster R-CNN. The proposed network model includes two modules and integrates low-level features with high-level features to improve detection accuracy. Moreover, we propose to set the different scales and aspect ratios of anchors by analyzing the distribution of object scales in our dataset, so our method has good robustness and high detection accuracy for multi-scale objects in complex natural scenes. We utilized the proposed method to detect the floats on the water surface via a three-day video surveillance stream of the North Canal in Beijing, and validated its performance. The experiments show that the mean average precision (MAP) of the proposed method was 83.7%, and the detection speed was 13 frames per second. Therefore, our method can be applied in complex natural scenes and mostly meets the requirements of accuracy and speed of water surface object detection online.


Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3031
Author(s):  
Jing Lian ◽  
Yuhang Yin ◽  
Linhui Li ◽  
Zhenghao Wang ◽  
Yafu Zhou

There are many small objects in traffic scenes, but due to their low resolution and limited information, their detection is still a challenge. Small object detection is very important for the understanding of traffic scene environments. To improve the detection accuracy of small objects in traffic scenes, we propose a small object detection method in traffic scenes based on attention feature fusion. First, a multi-scale channel attention block (MS-CAB) is designed, which uses local and global scales to aggregate the effective information of the feature maps. Based on this block, an attention feature fusion block (AFFB) is proposed, which can better integrate contextual information from different layers. Finally, the AFFB is used to replace the linear fusion module in the object detection network and obtain the final network structure. The experimental results show that, compared to the benchmark model YOLOv5s, this method has achieved a higher mean Average Precison (mAP) under the premise of ensuring real-time performance. It increases the mAP of all objects by 0.9 percentage points on the validation set of the traffic scene dataset BDD100K, and at the same time, increases the mAP of small objects by 3.5%.


Author(s):  
Seokyong Shin ◽  
Hyunho Han ◽  
Sang Hun Lee

YOLOv3 is a deep learning-based real-time object detector and is mainly used in applications such as video surveillance and autonomous vehicles. In this paper, we proposed an improved YOLOv3 (You Only Look Once version 3) applied Duplex FPN, which enhanced large object detection by utilizing low-level feature information. The conventional YOLOv3 improved the small object detection performance by applying FPN (Feature Pyramid Networks) structure to YOLOv2. However, YOLOv3 with an FPN structure specialized in detecting small objects, so it is difficult to detect large objects. Therefore, this paper proposed an improved YOLOv3 applied Duplex FPN, which can utilize low-level location information in high-level feature maps instead of the existing FPN structure of YOLOv3. This improved the detection accuracy of large objects. Also, an extra detection layer was added to the top-level feature map to prevent failure of detection of parts of large objects. Further, dimension clusters of each detection layer were reassigned to learn quickly how to accurately detect objects. The proposed method was compared and analyzed in the PASCAL VOC dataset. The experimental results showed that the bounding box accuracy of large objects improved owing to the Duplex FPN and extra detection layer, and the proposed method succeeded in detecting large objects that the existing YOLOv3 did not.


Sign in / Sign up

Export Citation Format

Share Document