Can Cosegmentation Improve the Object Detection Quality?

Author(s):  
Timo Lüddecke
2020 ◽  
Vol 34 (07) ◽  
pp. 12557-12564 ◽  
Author(s):  
Zhenbo Xu ◽  
Wei Zhang ◽  
Xiaoqing Ye ◽  
Xiao Tan ◽  
Wei Yang ◽  
...  

3D object detection is an essential task in autonomous driving and robotics. Though great progress has been made, challenges remain in estimating 3D pose for distant and occluded objects. In this paper, we present a novel framework named ZoomNet for stereo imagery-based 3D detection. The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes. To further exploit the abundant texture cues in rgb images for more accurate disparity estimation, we introduce a conceptually straight-forward module – adaptive zooming, which simultaneously resizes 2D instance bounding boxes to a unified resolution and adjusts the camera intrinsic parameters accordingly. In this way, we are able to estimate higher-quality disparity maps from the resized box images then construct dense point clouds for both nearby and distant objects. Moreover, we introduce to learn part locations as complementary features to improve the resistance against occlusion and put forward the 3D fitting score to better estimate the 3D detection quality. Extensive experiments on the popular KITTI 3D detection dataset indicate ZoomNet surpasses all previous state-of-the-art methods by large margins (improved by 9.4% on APbv (IoU=0.7) over pseudo-LiDAR). Ablation study also demonstrates that our adaptive zooming strategy brings an improvement of over 10% on AP3d (IoU=0.7). In addition, since the official KITTI benchmark lacks fine-grained annotations like pixel-wise part locations, we also present our KFG dataset by augmenting KITTI with detailed instance-wise annotations including pixel-wise part location, pixel-wise disparity, etc.. Both the KFG dataset and our codes will be publicly available at https://github.com/detectRecog/ZoomNet.


Author(s):  
Robert Manthey ◽  
Falk Schmidsberger ◽  
Rico Thomanek ◽  
Christian Roschke ◽  
Tony Rolletschke ◽  
...  

2020 ◽  
Vol 15 (90) ◽  
pp. 42-57
Author(s):  
Anna A. Kuznetsova ◽  

Average precision (AP) as the area under the Precision – Recall curve is the de facto standard for comparing the quality of algorithms for classification, information retrieval, object detection, etc. However, traditional Precision – Recall curves usually have a zigzag shape, which makes it difficult to calculate the average precision and to compare algorithms. This paper proposes a statistical approach to the construction of Precision – Recall curves when assessing the quality of algorithms for object detection in images. This approach is based on calculating Statistical Precision and Statistical Recall. Instead of the traditional confidence level, a statistical confidence level is calculated for each image as a percentage of objects detected. For each threshold value of the statistical confidence level, the total number of correctly detected objects (Integral TP) and the total number of background objects mistakenly assigned by the algorithm to one of the classes (Integral FP) are calculated for each image. Next, the values of Precision and Recall are calculated. Precision – Recall statistical curves, unlike traditional curves, are guaranteed to be monotonically non-increasing. At the same time, the Statistical Average Precision of object detection algorithms on small test datasets turns out to be less than the traditional Average Precision. On relatively large test image datasets, these differences are smoothed out. The comparison of the use of conventional and statistical Precision – Recall curves is given on a specific example.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Parvinder Kaur ◽  
Baljit Singh Khehra ◽  
Amar Partap Singh Pharwaha

Object detection is being widely used in many fields, and therefore, the demand for more accurate and fast methods for object detection is also increasing. In this paper, we propose a method for object detection in digital images that is more accurate and faster. The proposed model is based on Single-Stage Multibox Detector (SSD) architecture. This method creates many anchor boxes of various aspect ratios based on the backbone network and multiscale feature network and calculates the classes and balances of the anchor boxes to detect objects at various scales. Instead of the VGG16-based deep transfer learning model in SSD, we have used a more efficient base network, i.e., EfficientNet. Detection of objects of different sizes is still an inspiring task. We have used Multiway Feature Pyramid Network (MFPN) to solve this problem. The input to the base network is given to MFPN, and then, the fused features are given to bounding box prediction and class prediction networks. Softer-NMS is applied instead of NMS in SSD to reduce the number of bounding boxes gently. The proposed method is validated on MSCOCO 2017, PASCAL VOC 2007, and PASCAL VOC 2012 datasets and compared to existing state-of-the-art techniques. Our method shows better detection quality in terms of mean Average Precision (mAP).


Electronics ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 90
Author(s):  
Donghyeon Lee ◽  
Joonyoung Kim ◽  
Kyomin Jung

Fully convolutional structures provide feature maps acquiring local contexts of an image by only stacking numerous convolutional layers. These structures are known to be effective in modern state-of-the-art object detectors such as Faster R-CNN and SSD to find objects from local contexts. However, the quality of object detectors can be further improved by incorporating global contexts when some ambiguous objects should be identified by surrounding objects or background. In this paper, we introduce a self-attention module for object detectors to incorporate global contexts. More specifically, our self-attention module allows the feature extractor to compute feature maps with global contexts by the self-attention mechanism. Our self-attention module computes relationships among all elements in the feature maps, and then blends the feature maps considering the computed relationships. Therefore, this module can capture long-range relationships among objects or backgrounds, which is difficult for fully convolutional structures. Furthermore, our proposed module is not limited to any specific object detectors, and it can be applied to any CNN-based model for any computer vision task. In the experimental results on the object detection task, our method shows remarkable gains in average precision (AP) compared to popular models that have fully convolutional structures. In particular, compared to Faster R-CNN with the ResNet-50 backbone, our module applied to the same backbone achieved +4.0 AP gains without the bells and whistles. In image semantic segmentation and panoptic segmentation tasks, our module improved the performance in all metrics used for each task.


Author(s):  
Кonstantin А. Elshin ◽  
Еlena I. Molchanova ◽  
Мarina V. Usoltseva ◽  
Yelena V. Likhoshway

Using the TensorFlow Object Detection API, an approach to identifying and registering Baikal diatom species Synedra acus subsp. radians has been tested. As a result, a set of images was formed and training was conducted. It is shown that аfter 15000 training iterations, the total value of the loss function was obtained equal to 0,04. At the same time, the classification accuracy is equal to 95%, and the accuracy of construction of the bounding box is also equal to 95%.


Sign in / Sign up

Export Citation Format

Share Document