A New Bounding Box based Pseudo Annotation Generation Method for Semantic Segmentation

Author(s):  
Xiaolong Xu ◽  
Fanman Meng ◽  
Hongliang Li ◽  
Qingbo Wu ◽  
King Ngi Ngan ◽  
...  
2020 ◽  
Vol 12 (21) ◽  
pp. 3630
Author(s):  
Jin Liu ◽  
Haokun Zheng

Object detection and recognition in aerial and remote sensing images has become a hot topic in the field of computer vision in recent years. As these images are usually taken from a bird’s-eye view, the targets often have different shapes and are densely arranged. Therefore, using an oriented bounding box to mark the target is a mainstream choice. However, this general method is designed based on horizontal box annotation, while the improved method for detecting an oriented bounding box has a high computational complexity. In this paper, we propose a method called ellipse field network (EFN) to organically integrate semantic segmentation and object detection. It predicts the probability distribution of the target and obtains accurate oriented bounding boxes through a post-processing step. We tested our method on the HRSC2016 and DOTA data sets, achieving mAP values of 0.863 and 0.701, respectively. At the same time, we also tested the performance of EFN on natural images and obtained a mAP of 84.7 in the VOC2012 data set. These extensive experiments demonstrate that EFN can achieve state-of-the-art results in aerial image tests and can obtain a good score when considering natural images.


Symmetry ◽  
2019 ◽  
Vol 11 (9) ◽  
pp. 1081
Author(s):  
Chaochao Meng ◽  
Hong Bao ◽  
Yan Ma ◽  
Xinkai Xu ◽  
Yuqing Li

The gradual application of deep learning in the field of computer vision and image processing has made great breakthroughs. Applications such as object detection, recognition and image semantic segmentation have been improved. In this study, to measure the distance of the vehicle ahead, a preceding vehicle ranging system based on fitting method was designed. First obtaining an accurate bounding box frame in the vehicle detection, the Mask R-CNN (region-convolutional neural networks) algorithm was improved and tested in the BDD100K (Berkeley deep derive) asymmetry dataset. This method can shorten vehicle detection time by 33% without reducing the accuracy. Then, according to the pixel value of the bounding box in the image, the fitting method was applied to the vehicle monocular camera for ranging. Experimental results demonstrate that the method can measure the distance of the preceding vehicle effectively, with a ranging error of less than 10%. The accuracy of the measurement results meets the requirements of collision warning for safe driving.


Sensors ◽  
2019 ◽  
Vol 19 (19) ◽  
pp. 4092 ◽  
Author(s):  
Li Wang ◽  
Ruifeng Li ◽  
Jingwen Sun ◽  
Xingxing Liu ◽  
Lijun Zhao ◽  
...  

To autonomously move and operate objects in cluttered indoor environments, a service robot requires the ability of 3D scene perception. Though 3D object detection can provide an object-level environmental description to fill this gap, a robot always encounters incomplete object observation, recurring detections of the same object, error in detection, or intersection between objects when conducting detection continuously in a cluttered room. To solve these problems, we propose a two-stage 3D object detection algorithm which is to fuse multiple views of 3D object point clouds in the first stage and to eliminate unreasonable and intersection detections in the second stage. For each view, the robot performs a 2D object semantic segmentation and obtains 3D object point clouds. Then, an unsupervised segmentation method called Locally Convex Connected Patches (LCCP) is utilized to segment the object accurately from the background. Subsequently, the Manhattan Frame estimation is implemented to calculate the main orientation of the object and subsequently, the 3D object bounding box can be obtained. To deal with the detected objects in multiple views, we construct an object database and propose an object fusion criterion to maintain it automatically. Thus, the same object observed in multi-view is fused together and a more accurate bounding box can be calculated. Finally, we propose an object filtering approach based on prior knowledge to remove incorrect and intersecting objects in the object dataset. Experiments are carried out on both SceneNN dataset and a real indoor environment to verify the stability and accuracy of 3D semantic segmentation and bounding box detection of the object with multi-view fusion.


2021 ◽  
pp. 147592172098543
Author(s):  
Chaobo Zhang ◽  
Chih-chen Chang ◽  
Maziar Jamshidi

Deep learning techniques have attracted significant attention in the field of visual inspection of civil infrastructure systems recently. Currently, most deep learning-based visual inspection techniques utilize a convolutional neural network to recognize surface defects either by detecting a bounding box of each defect or classifying all pixels on an image without distinguishing between different defect instances. These outputs cannot be directly used for acquiring the geometric properties of each individual defect in an image, thus hindering the development of fully automated structural assessment techniques. In this study, a novel fully convolutional model is proposed for simultaneously detecting and grouping the image pixels for each individual defect on an image. The proposed model integrates an optimized mask subnet with a box-level detection network, where the former outputs a set of position-sensitive score maps for pixel-level defect detection and the latter predicts a bounding box for each defect to group the detected pixels. An image dataset containing three common types of concrete defects, crack, spalling and exposed rebar, is used for training and testing of the model. Results demonstrate that the proposed model is robust to various defect sizes and shapes and can achieve a mask-level mean average precision ( mAP) of 82.4% and a mean intersection over union ( mIoU) of 75.5%, with a processing speed of about 10 FPS at input image size of 576 × 576 when tested on an NVIDIA GeForce GTX 1060 GPU. Its performance is compared with the state-of-the-art instance segmentation network Mask R-CNN and the semantic segmentation network U-Net. The comparative studies show that the proposed model has a distinct defect boundary delineation capability and outperforms the Mask R-CNN and the U-Net in both accuracy and speed.


2019 ◽  
Vol 9 (6) ◽  
pp. 1054 ◽  
Author(s):  
Hongbo Qin ◽  
Haodi Zhang ◽  
Hai Wang ◽  
Yujin Yan ◽  
Min Zhang ◽  
...  

An outside mutual correction (OMC) algorithm for natural scene text detection using multibox and semantic segmentation was developed. In the OMC algorithm, semantic segmentation and multibox were processed in parallel, and the text detection results were mutually corrected. The mutual correction process was divided into two steps: (1) The semantic segmentation results were employed in the bounding box enhancement module (BEM) to correct the multibox results. (2) The semantic bounding box module (SBM) was used to optimize the adhesion text boundary of the semantic segmentation results. Non-maximum suppression (NMS) was adopted to merge the SBM and BEM results. Our algorithm was evaluated on the ICDAR2013 and SVT datasets. The experimental results show that the developed algorithm had a maximum increase of 13.62% in the F-measure score and the highest F-measure score was 81.38%.


2019 ◽  
Vol 11 (21) ◽  
pp. 2506 ◽  
Author(s):  
Xiaowu Xiao ◽  
Zhiqiang Zhou ◽  
Bo Wang ◽  
Linhao Li ◽  
Lingjuan Miao

It is still challenging to effectively detect ship objects in optical remote-sensing images with complex backgrounds. Many current CNN-based one-stage and two-stage detection methods usually first predefine a series of anchors with various scales, aspect ratios and angles, and then the detection results can be outputted by performing once or twice classification and bounding box regression for predefined anchors. However, most of the defined anchors have relatively low accuracy, and are useless for the following classification and regression. In addition, the preset anchors are not robust to produce good performance for other different detection datasets. To avoid the above problems, in this paper we design a paired semantic segmentation network to generate more accurate rotated anchors with smaller numbers. Specifically, the paired segmentation network predicts four parts (i.e., top-left, bottom-right, top-right, and bottom-left parts) of ships. By combining paired top-left and bottom-right parts (or top-right and bottom-left parts), we can take the minimum bounding box of these two parts as the rotated anchor. This way can be more robust to different ship datasets, and the generated anchors are more accurate and have fewer numbers. Furthermore, to effectively use fine-scale detail information and coarse-scale semantic information, we use the magnified convolutional features to classify and regress the generated rotated anchors. Meanwhile, the horizontal minimum bounding box of the rotated anchor is also used to combine more context information. We compare the proposed algorithm with state-of-the-art object-detection methods for natural images and ship-detection methods, and demonstrate the superiority of our method.


2019 ◽  
Vol 28 (02) ◽  
pp. 1 ◽  
Author(s):  
Hao Zhou ◽  
Jun Lei ◽  
Fenglei Wang ◽  
Jun Zhang

Sign in / Sign up

Export Citation Format

Share Document