Multi-peak Graph-based Multi-instance Learning for Weakly Supervised Object Detection

Author(s):  
Ruyi Ji ◽  
Zeyu Liu ◽  
Libo Zhang ◽  
Jianwei Liu ◽  
Xin Zuo ◽  
...  

Weakly supervised object detection (WSOD), aiming to detect objects with only image-level annotations, has become one of the research hotspots over the past few years. Recently, much effort has been devoted to WSOD for the simple yet effective architecture and remarkable improvements have been achieved. Existing approaches using multiple-instance learning usually pay more attention to the proposals individually, ignoring relation information between proposals. Besides, to obtain pseudo-ground-truth boxes for WSOD, MIL-based methods tend to select the region with the highest confidence score and regard those with small overlap as background category, which leads to mislabeled instances. As a result, these methods suffer from mislabeling instances and lacking relations between proposals, degrading the performance of WSOD. To tackle these issues, this article introduces a multi-peak graph-based model for WSOD. Specifically, we use the instance graph to model the relations between proposals, which reinforces multiple-instance learning process. In addition, a multi-peak discovery strategy is designed to avert mislabeling instances. The proposed model is trained by stochastic gradients decent optimizer using back-propagation in an end-to-end manner. Extensive quantitative and qualitative evaluations on two publicly challenging benchmarks, PASCAL VOC 2007 and PASCAL VOC 2012, demonstrate the superiority and effectiveness of the proposed approach.

Author(s):  
Wanqing Zhao ◽  
Ziyu Guan ◽  
Hangzai Luo ◽  
Jinye Peng ◽  
Jianping Fan

Multi-keyword query is widely supported in text search engines. However, an analogue in image retrieval systems, multi-object query, is rarely studied. Meanwhile, traditional object-based image retrieval methods often involve multiple steps separately and need expensive location labeling for detecting objects. In this work, we propose a weakly-supervised Deep Multiple Instance Hashing (DMIH) framework for object-based image retrieval. DMIH integrates object detection and hashing learning on the basis of a popular CNN model to build the end-to-end relation between a raw image and the binary hashing codes of multiple objects in it. Specifically, we cast the object detection of each object class as a binary multiple instance learning problem where instances are object proposals extracted from multi-scale convolutional feature maps. For hashing training, we sample image pairs to learn their semantic relationships in terms of hash codes of the most probable proposals for owned labels as guided by object predictors. The two objectives benefit each other in learning. DMIH outperforms state-of-the-arts on public benchmarks for object-based image retrieval and achieves promising results for multi-object queries.


2018 ◽  
Vol 84 ◽  
pp. 68-81 ◽  
Author(s):  
Yongqiang Zhang ◽  
Yaicheng Bai ◽  
Mingli Ding ◽  
Yongqiang Li ◽  
Bernard Ghanem

Author(s):  
Se-Hun Kim ◽  
Min-Seok Seo ◽  
Chun-Myung Park ◽  
Kyujoong Lee ◽  
Hyuk-Jae Lee

2021 ◽  
pp. 108233
Author(s):  
Wei Gao ◽  
Fang Wan ◽  
Jun Yue ◽  
Songcen Xu ◽  
Qixiang Ye

2020 ◽  
Vol 34 (07) ◽  
pp. 11482-11489
Author(s):  
Chenhao Lin ◽  
Siwen Wang ◽  
Dongqi Xu ◽  
Yu Lu ◽  
Wayne Zhang

Weakly supervised object detection (WSOD) using only image-level annotations has attracted growing attention over the past few years. Existing approaches using multiple instance learning easily fall into local optima, because such mechanism tends to learn from the most discriminative object in an image for each category. Therefore, these methods suffer from missing object instances which degrade the performance of WSOD. To address this problem, this paper introduces an end-to-end object instance mining (OIM) framework for weakly supervised object detection. OIM attempts to detect all possible object instances existing in each image by introducing information propagation on the spatial and appearance graphs, without any additional annotations. During the iterative learning process, the less discriminative object instances from the same class can be gradually detected and utilized for training. In addition, we design an object instance reweighted loss to learn larger portion of each object instance to further improve the performance. The experimental results on two publicly available databases, VOC 2007 and 2012, demonstrate the efficacy of proposed approach.


2020 ◽  
Vol 34 (07) ◽  
pp. 12765-12772
Author(s):  
Bingfeng Zhang ◽  
Jimin Xiao ◽  
Yunchao Wei ◽  
Mingjie Sun ◽  
Kaizhu Huang

Weakly supervised semantic segmentation is a challenging task as it only takes image-level information as supervision for training but produces pixel-level predictions for testing. To address such a challenging task, most recent state-of-the-art approaches propose to adopt two-step solutions, i.e. 1) learn to generate pseudo pixel-level masks, and 2) engage FCNs to train the semantic segmentation networks with the pseudo masks. However, the two-step solutions usually employ many bells and whistles in producing high-quality pseudo masks, making this kind of methods complicated and inelegant. In this work, we harness the image-level labels to produce reliable pixel-level annotations and design a fully end-to-end network to learn to predict segmentation maps. Concretely, we firstly leverage an image classification branch to generate class activation maps for the annotated categories, which are further pruned into confident yet tiny object/background regions. Such reliable regions are then directly served as ground-truth labels for the parallel segmentation branch, where a newly designed dense energy loss function is adopted for optimization. Despite its apparent simplicity, our one-step solution achieves competitive mIoU scores (val: 62.6, test: 62.9) on Pascal VOC compared with those two-step state-of-the-arts. By extending our one-step method to two-step, we get a new state-of-the-art performance on the Pascal VOC (val: 66.3, test: 66.5).


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5269
Author(s):  
Guan-Ting Lin ◽  
Vinay Malligere Shivanna ◽  
Jiun-In Guo

This paper proposes a deep-learning model with task-specific bounding box regressors (TSBBRs) and conditional back-propagation mechanisms for detection of objects in motion for advanced driver assistance system (ADAS) applications. The proposed model separates the object detection networks for objects of different sizes and applies the proposed algorithm to achieve better detection results for both larger and tinier objects. For larger objects, a neural network with a larger visual receptive field is used to acquire information from larger areas. For the detection of tinier objects, the network of a smaller receptive field utilizes fine grain features. A conditional back-propagation mechanism yields different types of TSBBRs to perform data-driven learning for the set criterion and learn the representation of different object sizes without degrading each other. The design of dual-path object bounding box regressors can simultaneously detect objects in various kinds of dissimilar scales and aspect ratios. Only a single inference of neural network is needed for each frame to support the detection of multiple types of object, such as bicycles, motorbikes, cars, buses, trucks, and pedestrians, and to locate their exact positions. The proposed model was developed and implemented on different NVIDIA devices such as 1080 Ti, DRIVE-PX2 and Jetson TX-2 with the respective processing performance of 67 frames per second (fps), 19.4 fps, and 8.9 fps for the video input of 448 × 448 resolution, respectively. The proposed model can detect objects as small as 13 × 13 pixels and achieves 86.54% accuracy on a publicly available Pascal Visual Object Class (VOC) car database and 82.4% mean average precision (mAP) on a large collection of common road real scenes database (iVS database).


Sign in / Sign up

Export Citation Format

Share Document