scholarly journals Tell Me What They're Holding: Weakly-Supervised Object Detection with Transferable Knowledge from Human-Object Interaction

2020 ◽  
Vol 34 (07) ◽  
pp. 11246-11253
Author(s):  
Daesik Kim ◽  
Gyujeong Lee ◽  
Jisoo Jeong ◽  
Nojun Kwak

In this work, we introduce a novel weakly supervised object detection (WSOD) paradigm to detect objects belonging to rare classes that have not many examples using transferable knowledge from human-object interactions (HOI). While WSOD shows lower performance than full supervision, we mainly focus on HOI as the main context which can strongly supervise complex semantics in images. Therefore, we propose a novel module called RRPN (relational region proposal network) which outputs an object-localizing attention map only with human poses and action verbs. In the source domain, we fully train an object detector and the RRPN with full supervision of HOI. With transferred knowledge about localization map from the trained RRPN, a new object detector can learn unseen objects with weak verbal supervision of HOI without bounding box annotations in the target domain. Because the RRPN is designed as an add-on type, we can apply it not only to the object detection but also to other domains such as semantic segmentation. The experimental results on HICO-DET dataset show the possibility that the proposed method can be a cheap alternative for the current supervised object detection paradigm. Moreover, qualitative results demonstrate that our model can properly localize unseen objects on HICO-DET and V-COCO datasets.

2020 ◽  
Vol 34 (07) ◽  
pp. 10460-10469 ◽  
Author(s):  
Ankan Bansal ◽  
Sai Saketh Rambhatla ◽  
Abhinav Shrivastava ◽  
Rama Chellappa

We present an approach for detecting human-object interactions (HOIs) in images, based on the idea that humans interact with functionally similar objects in a similar manner. The proposed model is simple and efficiently uses the data, visual features of the human, relative spatial orientation of the human and the object, and the knowledge that functionally similar objects take part in similar interactions with humans. We provide extensive experimental validation for our approach and demonstrate state-of-the-art results for HOI detection. On the HICO-Det dataset our method achieves a gain of over 2.5% absolute points in mean average precision (mAP) over state-of-the-art. We also show that our approach leads to significant performance gains for zero-shot HOI detection in the seen object setting. We further demonstrate that using a generic object detector, our model can generalize to interactions involving previously unseen objects.


Author(s):  
R. P. A. Bormans ◽  
R. C. Lindenbergh ◽  
F. Karimi Nejadasl

One of the biggest challenges for an autonomous vehicle (and hence the WEpod) is to see the world as humans would see it. This understanding is the base for a successful and reliable future of autonomous vehicles. Real-world data and semantic segmentation generally are used to achieve full understanding of its surroundings. However, deploying a pretrained segmentation network to a new, previously unseen domain will not attain similar performance as it would on the domain where it is trained on due to the differences between the domains. Although research is done concerning the mitigation of this domain shift, the factors that cause these differences are not yet fully explored. We filled this gap with the investigation of several factors. A base network was created by a two-step finetuning procedure on a convolutional neural network (SegNet) which is pretrained on CityScapes (a dataset for semantic segmentation). The first tuning step is based on RobotCar (road scenery dataset recorded in Oxford, UK) while afterwards this network is fine-tuned for a second time but now on the KITTI (road scenery dataset recorded in Germany) dataset. With this base, experiments are used to obtain the importance of factors such as horizon line, colour and training order for a successful domain adaptation. In this case the domain adaptation is from the KITTI and RobotCar domain to the WEpod domain. For evaluation, groundtruth labels are created in a weakly-supervised setting. Negative influence was obtained for training on greyscale images instead of RGB images. This resulted in drops of IoU values up to 23.9 % for WEpod test images. The training order is a main contributor for domain adaptation with an increase in IoU of 4.7 %. This shows that the target domain (WEpod) is more closely related to RobotCar than to KITTI.


Author(s):  
Mhafuzul Islam ◽  
Mashrur Chowdhury ◽  
Hongda Li ◽  
Hongxin Hu

Vision-based navigation of autonomous vehicles primarily depends on the deep neural network (DNN) based systems in which the controller obtains input from sensors/detectors, such as cameras, and produces a vehicle control output, such as a steering wheel angle to navigate the vehicle safely in a roadway traffic environment. Typically, these DNN-based systems in the autonomous vehicle are trained through supervised learning; however, recent studies show that a trained DNN-based system can be compromised by perturbation or adverse inputs. Similarly, this perturbation can be introduced into the DNN-based systems of autonomous vehicles by unexpected roadway hazards, such as debris or roadblocks. In this study, we first introduce a hazardous roadway environment that can compromise the DNN-based navigational system of an autonomous vehicle, and produce an incorrect steering wheel angle, which could cause crashes resulting in fatality or injury. Then, we develop a DNN-based autonomous vehicle driving system using object detection and semantic segmentation to mitigate the adverse effect of this type of hazard, which helps the autonomous vehicle to navigate safely around such hazards. We find that our developed DNN-based autonomous vehicle driving system, including hazardous object detection and semantic segmentation, improves the navigational ability of an autonomous vehicle to avoid a potential hazard by 21% compared with the traditional DNN-based autonomous vehicle driving system.


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 437
Author(s):  
Yuya Onozuka ◽  
Ryosuke Matsumi ◽  
Motoki Shino

Detection of traversable areas is essential to navigation of autonomous personal mobility systems in unknown pedestrian environments. However, traffic rules may recommend or require driving in specified areas, such as sidewalks, in environments where roadways and sidewalks coexist. Therefore, it is necessary for such autonomous mobility systems to estimate the areas that are mechanically traversable and recommended by traffic rules and to navigate based on this estimation. In this paper, we propose a method for weakly-supervised recommended traversable area segmentation in environments with no edges using automatically labeled images based on paths selected by humans. This approach is based on the idea that a human-selected driving path more accurately reflects both mechanical traversability and human understanding of traffic rules and visual information. In addition, we propose a data augmentation method and a loss weighting method for detecting the appropriate recommended traversable area from a single human-selected path. Evaluation of the results showed that the proposed learning methods are effective for recommended traversable area detection and found that weakly-supervised semantic segmentation using human-selected path information is useful for recommended area detection in environments with no edges.


2020 ◽  
Author(s):  
Jiahui Liu ◽  
Changqian Yu ◽  
Beibei Yang ◽  
Changxin Gao ◽  
Nong Sang

2021 ◽  
Vol 11 (9) ◽  
pp. 3782
Author(s):  
Chu-Hui Lee ◽  
Chen-Wei Lin

Object detection is one of the important technologies in the field of computer vision. In the area of fashion apparel, object detection technology has various applications, such as apparel recognition, apparel detection, fashion recommendation, and online search. The recognition task is difficult for a computer because fashion apparel images have different characteristics of clothing appearance and material. Currently, fast and accurate object detection is the most important goal in this field. In this study, we proposed a two-phase fashion apparel detection method named YOLOv4-TPD (YOLOv4 Two-Phase Detection), based on the YOLOv4 algorithm, to address this challenge. The target categories for model detection were divided into the jacket, top, pants, skirt, and bag. According to the definition of inductive transfer learning, the purpose was to transfer the knowledge from the source domain to the target domain that could improve the effect of tasks in the target domain. Therefore, we used the two-phase training method to implement the transfer learning. Finally, the experimental results showed that the mAP of our model was better than the original YOLOv4 model through the two-phase transfer learning. The proposed model has multiple potential applications, such as an automatic labeling system, style retrieval, and similarity detection.


Sign in / Sign up

Export Citation Format

Share Document