scholarly journals Collaborative Learning for Weakly Supervised Object Detection

Author(s):  
Jiajie Wang ◽  
Jiangchao Yao ◽  
Ya Zhang ◽  
Rui Zhang

Weakly supervised object detection has recently received much attention, since it only requires image-level labels instead of the bounding-box labels consumed in strongly supervised learning. Nevertheless, the save in labeling expense is usually at the cost of model accuracy.In this paper, we propose a simple but effective weakly supervised collaborative learning framework to resolve this problem, which trains a weakly supervised learner and a strongly supervised learner jointly by enforcing partial feature sharing and prediction consistency. For object detection, taking WSDDN-like architecture as weakly supervised detector sub-network and Faster-RCNN-like architecture as strongly supervised detector sub-network, we propose an end-to-end Weakly Supervised Collaborative Detection Network. As there is no strong supervision available to train the Faster-RCNN-like sub-network, a new prediction consistency loss is defined to enforce consistency of predictions between the two sub-networks as well as within the Faster-RCNN-like sub-networks. At the same time, the two detectors are designed to partially share features to further guarantee the model consistency at perceptual level. Extensive experiments on PASCAL VOC 2007 and 2012 data sets have demonstrated the effectiveness of the proposed framework.

2019 ◽  
Vol 10 (1) ◽  
pp. 64
Author(s):  
Yi Lin ◽  
Honggang Zhang

In the era of Big Data, multi-instance learning, as a weakly supervised learning framework, has various applications since it is helpful to reduce the cost of the data-labeling process. Due to this weakly supervised setting, learning effective instance representation/embedding is challenging. To address this issue, we propose an instance-embedding regularizer that can boost the performance of both instance- and bag-embedding learning in a unified fashion. Specifically, the crux of the instance-embedding regularizer is to maximize correlation between instance-embedding and underlying instance-label similarities. The embedding-learning framework was implemented using a neural network and optimized in an end-to-end manner using stochastic gradient descent. In experiments, various applications were studied, and the results show that the proposed instance-embedding-regularization method is highly effective, having state-of-the-art performance.


2020 ◽  
Vol 12 (2) ◽  
pp. 297 ◽  
Author(s):  
Nasehe Jamshidpour ◽  
Abdolreza Safari ◽  
Saeid Homayouni

This paper introduces a novel multi-view multi-learner (MVML) active learning method, in which the different views are generated by a genetic algorithm (GA). The GA-based view generation method attempts to construct diverse, sufficient, and independent views by considering both inter- and intra-view confidences. Hyperspectral data inherently owns high dimensionality, which makes it suitable for multi-view learning algorithms. Furthermore, by employing multiple learners at each view, a more accurate estimation of the underlying data distribution can be obtained. We also implemented a spectral-spatial graph-based semi-supervised learning (SSL) method as the classifier, which improved the performance of the classification task in comparison with supervised learning. The evaluation of the proposed method was based on three different benchmark hyperspectral data sets. The results were also compared with other state-of-the-art AL-SSL methods. The experimental results demonstrated the efficiency and statistically significant superiority of the proposed method. The GA-MVML AL method improved the classification performances by 16.68%, 18.37%, and 15.1% for different data sets after 40 iterations.


2017 ◽  
Vol 19 (2) ◽  
pp. 393-407 ◽  
Author(s):  
Yuxing Tang ◽  
Xiaofang Wang ◽  
Emmanuel Dellandrea ◽  
Liming Chen

Author(s):  
ZHE WANG ◽  
MINGZHE LU ◽  
ZENGXIN NIU ◽  
XIANGYANG XUE ◽  
DAQI GAO

Multi-view learning aims to effectively learn from data represented by multiple independent sets of attributes, where each set is taken as one view of the original data. In real-world application, each view should be acquired in unequal cost. Taking web-page classification for example, it is cheaper to get the words on itself (view one) than to get the words contained in anchor texts of inbound hyper-links (view two). However, almost all the existing multi-view learning does not consider the cost of acquiring the views or the cost of evaluating them. In this paper, we support that different views should adopt different representations and lead to different acquisition cost. Thus we develop a new view-dependent cost different from the existing both class-dependent cost and example-dependent cost. To this end, we generalize the framework of multi-view learning with the cost-sensitive technique and further propose a Cost-sensitive Multi-View Learning Machine named CMVLM for short. In implementation, we take into account and measure both the acquisition cost and the discriminant scatter of each view. Then through eliminating the useless views with a predefined threshold, we use the reserved views to train the final classifier. The experimental results on a broad range of data sets including the benchmark UCI, image, and bioinformatics data sets validate that the proposed algorithm can effectively reduce the total cost and have a competitive even better classification performance. The contributions of this paper are that: (1) first proposing a view-dependent cost; (2) establishing a cost-sensitive multi-view learning framework; (3) developing a wrapper technique that is universal to most multiple kernel based classifier.


Author(s):  
Asif Salekin ◽  
Jeremy W. Eberle ◽  
Jeffrey J. Glenn ◽  
Bethany A. Teachman ◽  
John A. Stankovic

Author(s):  
Jiapeng Wang ◽  
Tianwei Wang ◽  
Guozhi Tang ◽  
Lianwen Jin ◽  
Weihong Ma ◽  
...  

Visual information extraction (VIE) has attracted increasing attention in recent years. The existing methods usually first organized optical character recognition (OCR) results in plain texts and then utilized token-level category annotations as supervision to train a sequence tagging model. However, it expends great annotation costs and may be exposed to label confusion, the OCR errors will also significantly affect the final performance. In this paper, we propose a unified weakly-supervised learning framework called TCPNet (Tag, Copy or Predict Network), which introduces 1) an efficient encoder to simultaneously model the semantic and layout information in 2D OCR results, 2) a weakly-supervised training method that utilizes only sequence-level supervision; and 3) a flexible and switchable decoder which contains two inference modes: one (Copy or Predict Mode) is to output key information sequences of different categories by copying a token from the input or predicting one in each time step, and the other (Tag Mode) is to directly tag the input sequence in a single forward pass. Our method shows new state-of-the-art performance on several public benchmarks, which fully proves its effectiveness.


Sign in / Sign up

Export Citation Format

Share Document