scholarly journals Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression

2020 ◽  
Vol 34 (07) ◽  
pp. 12993-13000 ◽  
Author(s):  
Zhaohui Zheng ◽  
Ping Wang ◽  
Wei Liu ◽  
Jinze Li ◽  
Rongguang Ye ◽  
...  

Bounding box regression is the crucial step in object detection. In existing methods, while ℓn-norm loss is widely adopted for bounding box regression, it is not tailored to the evaluation metric, i.e., Intersection over Union (IoU). Recently, IoU loss and generalized IoU (GIoU) loss have been proposed to benefit the IoU metric, but still suffer from the problems of slow convergence and inaccurate regression. In this paper, we propose a Distance-IoU (DIoU) loss by incorporating the normalized distance between the predicted box and the target box, which converges much faster in training than IoU and GIoU losses. Furthermore, this paper summarizes three geometric factors in bounding box regression, i.e., overlap area, central point distance and aspect ratio, based on which a Complete IoU (CIoU) loss is proposed, thereby leading to faster convergence and better performance. By incorporating DIoU and CIoU losses into state-of-the-art object detection algorithms, e.g., YOLO v3, SSD and Faster R-CNN, we achieve notable performance gains in terms of not only IoU metric but also GIoU metric. Moreover, DIoU can be easily adopted into non-maximum suppression (NMS) to act as the criterion, further boosting performance improvement. The source code and trained models are available at https://github.com/Zzh-tju/DIoU.

2020 ◽  
Vol 17 (2) ◽  
pp. 172988142090257
Author(s):  
Dan Xiong ◽  
Huimin Lu ◽  
Qinghua Yu ◽  
Junhao Xiao ◽  
Wei Han ◽  
...  

High tracking frame rates have been achieved based on traditional tracking methods which however would fail due to drifts of the object template or model, especially when the object disappears from the camera’s field of view. To deal with it, tracking-and-detection-combination has become more and more popular for long-term unknown object tracking, whose detector almost does not drift and can regain the disappeared object when it comes back. However, for online machine learning and multiscale object detection, expensive computing resources and time are required. So it is not a good idea to combine tracking and detection sequentially like Tracking-Learning-Detection algorithm. Inspired from parallel tracking and mapping, this article proposes a framework of parallel tracking and detection for unknown object tracking. The object tracking algorithm is split into two separate tasks—tracking and detection which can be processed in two different threads, respectively. One thread is used to deal with the tracking between consecutive frames with a high processing speed. The other thread runs online learning algorithms to construct a discriminative model for object detection. Using our proposed framework, high tracking frame rates and the ability of correcting and recovering the failed tracker can be combined effectively. Furthermore, our framework provides open interfaces to integrate state-of-the-art object tracking and detection algorithms. We carry out an evaluation of several popular tracking and detection algorithms using the proposed framework. The experimental results show that different tracking and detection algorithms can be integrated and compared effectively by our proposed framework, and robust and fast long-term object tracking can be realized.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1926
Author(s):  
Kai Yin ◽  
Juncheng Jia ◽  
Xing Gao ◽  
Tianrui Sun ◽  
Zhengyin Zhou

A series of sky surveys were launched in search of supernovae and generated a tremendous amount of data, which pushed astronomy into a new era of big data. However, it can be a disastrous burden to manually identify and report supernovae, because such data have huge quantity and sparse positives. While the traditional machine learning methods can be used to deal with such data, deep learning methods such as Convolutional Neural Networks demonstrate more powerful adaptability in this area. However, most data in the existing works are either simulated or without generality. How do the state-of-the-art object detection algorithms work on real supernova data is largely unknown, which greatly hinders the development of this field. Furthermore, the existing works of supernovae classification usually assume the input images are properly cropped with a single candidate located in the center, which is not true for our dataset. Besides, the performance of existing detection algorithms can still be improved for the supernovae detection task. To address these problems, we collected and organized all the known objectives of the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) and the Popular Supernova Project (PSP), resulting in two datasets, and then compared several detection algorithms on them. After that, the selected Fully Convolutional One-Stage (FCOS) method is used as the baseline and further improved with data augmentation, attention mechanism, and small object detection technique. Extensive experiments demonstrate the great performance enhancement of our detection algorithm with the new datasets.


2019 ◽  
Vol 77 (4) ◽  
pp. 1427-1439 ◽  
Author(s):  
Qiong Li ◽  
Xin Sun ◽  
Junyu Dong ◽  
Shuqun Song ◽  
Tongtong Zhang ◽  
...  

Abstract Phytoplankton plays an important role in marine ecological environment and aquaculture. However, the recognition and detection of phytoplankton rely on manual operations. As the foundation of achieving intelligence and releasing human labour, a phytoplankton microscopic image dataset PMID2019 for phytoplankton automated detection is presented. The PMID2019 dataset contains 10 819 phytoplankton microscopic images of 24 different categories. We leverage microscopes to collect images of phytoplankton in the laboratory environment. Each object in the images is manually labelled with a bounding box and category of ground-truth. In addition, living cells move quickly making it difficult to capture images of them. In order to generalize the dataset for in situ applications, we further utilize Cycle-GAN to achieve the domain migration between dead and living cell samples. We built a synthetic dataset to generate the corresponding living cell samples from the original dead ones. The PMID2019 dataset will not only benefit the development of phytoplankton microscopic vision technology in the future, but also can be widely used to assess the performance of the state-of-the-art object detection algorithms for phytoplankton recognition. Finally, we illustrate the performances of some state-of-the-art object detection algorithms, which may provide new ideas for monitoring marine ecosystems.


2020 ◽  
Vol 9 (1) ◽  
pp. 2526-2534

This paper principally combines ideas of laptop vision, machine learning and deep learning for correct detection of traffic lights and their classifications. It checks for each circular and arrow stoplight cases. Color filtering and blob discover ion area unit principally to detect the candidates (traffic lights) [6]. Then, a PCA network is employed as a multiclass classifier which provides the result sporadically. MOT will used for more trailing method and prediction filters out false positives. Sometimes, vote theme can even be used rather than MOT. This method will be simply fitted into ADAS vehicles once hardware thinks about. Recognition is as vital as detective work the traffic lights. While not recognition, no full data will be transmitted [2]. Many complicated TLR’s will give advance functions like observing the most the most for a specific route (when there's quite one) and the way removed from the driving force [3]. Deep learning is additionally one among the rising techniques for analysis areas [7]. Object detection comes as associate integral a part of laptop vision. Object detection will be best utilized in create estimation, vehicle detection, police work etc. In detection algorithms, we tend to incline to draw a bounding box round the object of interest to find it among the image. Also, the drawing of the bounding box isn't distinctive and might hyperbolically looking on the need [9].


Agriculture ◽  
2021 ◽  
Vol 11 (10) ◽  
pp. 1003
Author(s):  
Shenglian Lu ◽  
Zhen Song ◽  
Wenkang Chen ◽  
Tingting Qian ◽  
Yingyu Zhang ◽  
...  

The leaf is the organ that is crucial for photosynthesis and the production of nutrients in plants; as such, the number of leaves is one of the key indicators with which to describe the development and growth of a canopy. The irregular shape and distribution of the blades, as well as the effect of natural light, make the segmentation and detection process of the blades difficult. The inaccurate acquisition of plant phenotypic parameters may affect the subsequent judgment of crop growth status and crop yield. To address the challenge in counting dense and overlapped plant leaves under natural environments, we proposed an improved deep-learning-based object detection algorithm by merging a space-to-depth module, a Convolutional Block Attention Module (CBAM) and Atrous Spatial Pyramid Pooling (ASPP) into the network, and applying the smoothL1 function to improve the loss function of object prediction. We evaluated our method on images of five different plant species collected under indoor and outdoor environments. The experimental results demonstrated that our algorithm which counts dense leaves improved average detection accuracy of 85% to 96%. Our algorithm also showed better performance in both detection accuracy and time consumption compared to other state-of-the-art object detection algorithms.


Author(s):  
Karen Gishyan

Ground-image based object detection algorithms have had great improvements over the years and provided good results for challenging image datasets such as COCO and PASCAL VOC. These models, however, are not as successful when it comes to unmanned aerial vehicle (UAV)-based object detection and commonly performance deterioration is observed. It is due to the reason that it is a much harder task for the models to detect and classify smaller objects rather than medium-size or large-size objects, and drone imagery is prone to variances caused by different flying altitudes, weather conditions, camera angles and quality. This work explores the performance of two state-of-art-object detection algorithms on the drone object detection task and proposes image augmentation 1 procedures to improve model performance. We compose three image augmentation sequences and propose two new image augmentation techniques and further explore their different combinations on the performances of the models. The augmenters are evaluated for two deep learning models, which include model-training with high-resolution images (1056×1056 pixels) to observe their overall effectiveness. We provide a comparison of augmentation techniques across each model. We identify two augmentation procedures that increase object detection accuracy more effectively than others and obtain our best model using a transfer learning 2 approach, where the weights for the transfer are obtained from training the model with our proposed augmentation technique. At the end of the experiments, we achieve a robust model performance and accuracy, and identify the aspects of improvement as part of our future work.


2021 ◽  
Vol 15 ◽  
Author(s):  
Zhiguo Zhou ◽  
Jiaen Sun ◽  
Jiabao Yu ◽  
Kaiyuan Liu ◽  
Junwei Duan ◽  
...  

Water surface object detection is one of the most significant tasks in autonomous driving and water surface vision applications. To date, existing public large-scale datasets collected from websites do not focus on specific scenarios. As a characteristic of these datasets, the quantity of the images and instances is also still at a low level. To accelerate the development of water surface autonomous driving, this paper proposes a large-scale, high-quality annotated benchmark dataset, named Water Surface Object Detection Dataset (WSODD), to benchmark different water surface object detection algorithms. The proposed dataset consists of 7,467 water surface images in different water environments, climate conditions, and shooting times. In addition, the dataset comprises a total of 14 common object categories and 21,911 instances. Simultaneously, more specific scenarios are focused on in WSODD. In order to find a straightforward architecture to provide good performance on WSODD, a new object detector, named CRB-Net, is proposed to serve as a baseline. In experiments, CRB-Net was compared with 16 state-of-the-art object detection methods and outperformed all of them in terms of detection precision. In this paper, we further discuss the effect of the dataset diversity (e.g., instance size, lighting conditions), training set size, and dataset details (e.g., method of categorization). Cross-dataset validation shows that WSODD significantly outperforms other relevant datasets and that the adaptability of CRB-Net is excellent.


Author(s):  
Кonstantin А. Elshin ◽  
Еlena I. Molchanova ◽  
Мarina V. Usoltseva ◽  
Yelena V. Likhoshway

Using the TensorFlow Object Detection API, an approach to identifying and registering Baikal diatom species Synedra acus subsp. radians has been tested. As a result, a set of images was formed and training was conducted. It is shown that аfter 15000 training iterations, the total value of the loss function was obtained equal to 0,04. At the same time, the classification accuracy is equal to 95%, and the accuracy of construction of the bounding box is also equal to 95%.


Author(s):  
Zhihui Huang ◽  
Huimin Zhao ◽  
Jin Zhan ◽  
Huakang Li

AbstractSiamPRN algorithm performs well in visual tracking, but it is easy to drift under occlusion and fast motion scenes because it uses $$\ell _1$$ ℓ 1 -smooth loss function to measure the regression location of bounding box. In this paper, we propose a multivariate intersection over union (MIOU) loss in SiamRPN tracking framework. Firstly, MIOU loss includes three geometric factors in regression: the overlap area ratio, the center distance ratio, and the aspect ratio, which can better reflect the coincidence degree of target box and prediction box. Secondly, we improve the definition of aspect ratio loss to avoid gradient explosion, improve the optimization performance of prediction box. Finally, based on SiamPRN tracker, we compared the tracking performance of $$\ell _1$$ ℓ 1 -smooth loss, IOU loss, GIOU loss, DIOU loss, and MIOU loss. Experimental results show that the MIOU loss has better target location regression than other loss functions on the OTB2015 and VOT2016 benchmark, especially for the challenges of occlusion, illumination change and fast motion.


Sign in / Sign up

Export Citation Format

Share Document