Tell Me What They're Holding: Weakly-Supervised Object Detection with Transferable Knowledge from Human-Object Interaction

In this work, we introduce a novel weakly supervised object detection (WSOD) paradigm to detect objects belonging to rare classes that have not many examples using transferable knowledge from human-object interactions (HOI). While WSOD shows lower performance than full supervision, we mainly focus on HOI as the main context which can strongly supervise complex semantics in images. Therefore, we propose a novel module called RRPN (relational region proposal network) which outputs an object-localizing attention map only with human poses and action verbs. In the source domain, we fully train an object detector and the RRPN with full supervision of HOI. With transferred knowledge about localization map from the trained RRPN, a new object detector can learn unseen objects with weak verbal supervision of HOI without bounding box annotations in the target domain. Because the RRPN is designed as an add-on type, we can apply it not only to the object detection but also to other domains such as semantic segmentation. The experimental results on HICO-DET dataset show the possibility that the proposed method can be a cheap alternative for the current supervised object detection paradigm. Moreover, qualitative results demonstrate that our model can properly localize unseen objects on HICO-DET and V-COCO datasets.

Download Full-text

Detecting Human-Object Interactions via Functional Generalization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6616 ◽

2020 ◽

Vol 34 (07) ◽

pp. 10460-10469 ◽

Cited By ~ 9

Author(s):

Ankan Bansal ◽

Sai Saketh Rambhatla ◽

Abhinav Shrivastava ◽

Rama Chellappa

Keyword(s):

Experimental Validation ◽

State Of The Art ◽

Visual Features ◽

Average Precision ◽

Proposed Model ◽

Human Object ◽

Significant Performance ◽

Unseen Objects ◽

Performance Gains ◽

Object Interactions

We present an approach for detecting human-object interactions (HOIs) in images, based on the idea that humans interact with functionally similar objects in a similar manner. The proposed model is simple and efficiently uses the data, visual features of the human, relative spatial orientation of the human and the object, and the knowledge that functionally similar objects take part in similar interactions with humans. We provide extensive experimental validation for our approach and demonstrate state-of-the-art results for HOI detection. On the HICO-Det dataset our method achieves a gain of over 2.5% absolute points in mean average precision (mAP) over state-of-the-art. We also show that our approach leads to significant performance gains for zero-shot HOI detection in the seen object setting. We further demonstrate that using a generic object detector, our model can generalize to interactions involving previously unseen objects.

Download Full-text

Multi-evidence Filtering and Fusion for Multi-label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition ◽

10.1109/cvpr.2018.00139 ◽

2018 ◽

Cited By ~ 56

Author(s):

Weifeng Ge ◽

Sibei Yang ◽

Yizhou Yu

Keyword(s):

Object Detection ◽

Supervised Learning ◽

Semantic Segmentation ◽

Weakly Supervised Learning ◽

Weakly Supervised

Download Full-text

INFLUENCE OF DOMAIN SHIFT FACTORS ON DEEP SEGMENTATION OF THE DRIVABLE PATH OF AN AUTONOMOUS VEHICLE

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-141-2018 ◽

2018 ◽

Vol XLII-2 ◽

pp. 141-148

Author(s):

R. P. A. Bormans ◽

R. C. Lindenbergh ◽

F. Karimi Nejadasl

Keyword(s):

Autonomous Vehicles ◽

Domain Adaptation ◽

Autonomous Vehicle ◽

Semantic Segmentation ◽

Negative Influence ◽

Target Domain ◽

Real World Data ◽

Horizon Line ◽

Rgb Images ◽

Weakly Supervised

One of the biggest challenges for an autonomous vehicle (and hence the WEpod) is to see the world as humans would see it. This understanding is the base for a successful and reliable future of autonomous vehicles. Real-world data and semantic segmentation generally are used to achieve full understanding of its surroundings. However, deploying a pretrained segmentation network to a new, previously unseen domain will not attain similar performance as it would on the domain where it is trained on due to the differences between the domains. Although research is done concerning the mitigation of this domain shift, the factors that cause these differences are not yet fully explored. We filled this gap with the investigation of several factors. A base network was created by a two-step finetuning procedure on a convolutional neural network (SegNet) which is pretrained on CityScapes (a dataset for semantic segmentation). The first tuning step is based on RobotCar (road scenery dataset recorded in Oxford, UK) while afterwards this network is fine-tuned for a second time but now on the KITTI (road scenery dataset recorded in Germany) dataset. With this base, experiments are used to obtain the importance of factors such as horizon line, colour and training order for a successful domain adaptation. In this case the domain adaptation is from the KITTI and RobotCar domain to the WEpod domain. For evaluation, groundtruth labels are created in a weakly-supervised setting. Negative influence was obtained for training on greyscale images instead of RGB images. This resulted in drops of IoU values up to 23.9&thinsp;% for WEpod test images. The training order is a main contributor for domain adaptation with an increase in IoU of 4.7&thinsp;%. This shows that the target domain (WEpod) is more closely related to RobotCar than to KITTI.

Download Full-text

Multi-model Integrated Weakly Supervised Semantic Segmentation Method

Journal of Computer-Aided Design & Computer Graphics ◽

10.3724/sp.j.1089.2019.17379 ◽

2019 ◽

Vol 31 (5) ◽

pp. 800

Author(s):

Changzhen Xiong ◽

Hui Zhi

Keyword(s):

Semantic Segmentation ◽

Segmentation Method ◽

Weakly Supervised

Download Full-text

Vision-Based Navigation of Autonomous Vehicles in Roadway Environments with Unexpected Hazards

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198119855606 ◽

2019 ◽

Vol 2673 (12) ◽

pp. 494-507 ◽

Cited By ~ 1

Author(s):

Mhafuzul Islam ◽

Mashrur Chowdhury ◽

Hongda Li ◽

Hongxin Hu

Keyword(s):

Object Detection ◽

Autonomous Vehicles ◽

Autonomous Vehicle ◽

Semantic Segmentation ◽

Steering Wheel ◽

Potential Hazard ◽

Driving System ◽

Hazardous Object ◽

Vision Based Navigation ◽

Navigational System

Vision-based navigation of autonomous vehicles primarily depends on the deep neural network (DNN) based systems in which the controller obtains input from sensors/detectors, such as cameras, and produces a vehicle control output, such as a steering wheel angle to navigate the vehicle safely in a roadway traffic environment. Typically, these DNN-based systems in the autonomous vehicle are trained through supervised learning; however, recent studies show that a trained DNN-based system can be compromised by perturbation or adverse inputs. Similarly, this perturbation can be introduced into the DNN-based systems of autonomous vehicles by unexpected roadway hazards, such as debris or roadblocks. In this study, we first introduce a hazardous roadway environment that can compromise the DNN-based navigational system of an autonomous vehicle, and produce an incorrect steering wheel angle, which could cause crashes resulting in fatality or injury. Then, we develop a DNN-based autonomous vehicle driving system using object detection and semantic segmentation to mitigate the adverse effect of this type of hazard, which helps the autonomous vehicle to navigate safely around such hazards. We find that our developed DNN-based autonomous vehicle driving system, including hazardous object detection and semantic segmentation, improves the navigational ability of an autonomous vehicle to avoid a potential hazard by 21% compared with the traditional DNN-based autonomous vehicle driving system.

Download Full-text

Weakly-Supervised Recommended Traversable Area Segmentation Using Automatically Labeled Images for Autonomous Driving in Pedestrian Environment with No Edges

Sensors ◽

10.3390/s21020437 ◽

2021 ◽

Vol 21 (2) ◽

pp. 437

Author(s):

Yuya Onozuka ◽

Ryosuke Matsumi ◽

Motoki Shino

Keyword(s):

Visual Information ◽

Data Augmentation ◽

Semantic Segmentation ◽

Autonomous Driving ◽

Weighting Method ◽

Personal Mobility ◽

Human Understanding ◽

Autonomous Mobility ◽

Weakly Supervised ◽

Traffic Rules

Detection of traversable areas is essential to navigation of autonomous personal mobility systems in unknown pedestrian environments. However, traffic rules may recommend or require driving in specified areas, such as sidewalks, in environments where roadways and sidewalks coexist. Therefore, it is necessary for such autonomous mobility systems to estimate the areas that are mechanically traversable and recommended by traffic rules and to navigate based on this estimation. In this paper, we propose a method for weakly-supervised recommended traversable area segmentation in environments with no edges using automatically labeled images based on paths selected by humans. This approach is based on the idea that a human-selected driving path more accurately reflects both mechanical traversability and human understanding of traffic rules and visual information. In addition, we propose a data augmentation method and a loss weighting method for detecting the appropriate recommended traversable area from a single human-selected path. Evaluation of the results showed that the proposed learning methods are effective for recommended traversable area detection and found that weakly-supervised semantic segmentation using human-selected path information is useful for recommended area detection in environments with no edges.

Download Full-text

CSENet: Cascade semantic erasing network for weakly-supervised semantic segmentation

Neurocomputing ◽

10.1016/j.neucom.2020.05.107 ◽

2020 ◽

Author(s):

Jiahui Liu ◽

Changqian Yu ◽

Beibei Yang ◽

Changxin Gao ◽

Nong Sang

Keyword(s):

Semantic Segmentation ◽

Weakly Supervised

Download Full-text

Contrastive consistent feature learning for weakly supervised object localization semantic segmentation

Neurocomputing ◽

10.1016/j.neucom.2021.03.023 ◽

2021 ◽

Author(s):

Minsong Ki ◽

Youngjung Uh ◽

Wonyoung Lee ◽

Hyeran Byun

Keyword(s):

Feature Learning ◽

Semantic Segmentation ◽

Object Localization ◽

Consistent Feature ◽

Weakly Supervised

Download Full-text

A Two-Phase Fashion Apparel Detection Method Based on YOLOv4

Applied Sciences ◽

10.3390/app11093782 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3782

Author(s):

Chu-Hui Lee ◽

Chen-Wei Lin

Keyword(s):

Object Detection ◽

Transfer Learning ◽

Detection Method ◽

Phase Transfer ◽

Recognition Task ◽

Phase Detection ◽

Target Domain ◽

Two Phase ◽

Detection Technology ◽

Fashion Apparel

Object detection is one of the important technologies in the field of computer vision. In the area of fashion apparel, object detection technology has various applications, such as apparel recognition, apparel detection, fashion recommendation, and online search. The recognition task is difficult for a computer because fashion apparel images have different characteristics of clothing appearance and material. Currently, fast and accurate object detection is the most important goal in this field. In this study, we proposed a two-phase fashion apparel detection method named YOLOv4-TPD (YOLOv4 Two-Phase Detection), based on the YOLOv4 algorithm, to address this challenge. The target categories for model detection were divided into the jacket, top, pants, skirt, and bag. According to the definition of inductive transfer learning, the purpose was to transfer the knowledge from the source domain to the target domain that could improve the effect of tasks in the target domain. Therefore, we used the two-phase training method to implement the transfer learning. Finally, the experimental results showed that the mAP of our model was better than the original YOLOv4 model through the two-phase transfer learning. The proposed model has multiple potential applications, such as an automatic labeling system, style retrieval, and similarity detection.

Download Full-text

RSS-Net: Weakly-Supervised Multi-Class Semantic Segmentation with FMCW Radar

2020 IEEE Intelligent Vehicles Symposium (IV) ◽

10.1109/iv47402.2020.9304674 ◽

2020 ◽

Author(s):

Prannay Kaul ◽

Daniele de Martini ◽

Matthew Gadd ◽

Paul Newman

Keyword(s):

Semantic Segmentation ◽

Fmcw Radar ◽

Weakly Supervised

Download Full-text