scholarly journals CIAN: Cross-Image Affinity Net for Weakly Supervised Semantic Segmentation

2020 ◽  
Vol 34 (07) ◽  
pp. 10762-10769
Author(s):  
Junsong Fan ◽  
Zhaoxiang Zhang ◽  
Tieniu Tan ◽  
Chunfeng Song ◽  
Jun Xiao

Weakly supervised semantic segmentation with only image-level labels saves large human effort to annotate pixel-level labels. Cutting-edge approaches rely on various innovative constraints and heuristic rules to generate the masks for every single image. Although great progress has been achieved by these methods, they treat each image independently and do not take account of the relationships across different images. In this paper, however, we argue that the cross-image relationship is vital for weakly supervised segmentation. Because it connects related regions across images, where supplementary representations can be propagated to obtain more consistent and integral regions. To leverage this information, we propose an end-to-end cross-image affinity module, which exploits pixel-level cross-image relationships with only image-level labels. By means of this, our approach achieves 64.3% and 65.3% mIoU on Pascal VOC 2012 validation and test set respectively, which is a new state-of-the-art result by only using image-level labels for weakly supervised semantic segmentation, demonstrating the superiority of our approach.

Author(s):  
Zaid Al-Huda ◽  
Donghai Zhai ◽  
Yan Yang ◽  
Riyadh Nazar Ali Algburi

Deep convolutional neural networks (DCNNs) trained on the pixel-level annotated images have achieved improvements in semantic segmentation. Due to the high cost of labeling training data, their applications may have great limitation. However, weakly supervised segmentation approaches can significantly reduce human labeling efforts. In this paper, we introduce a new framework to generate high-quality initial pixel-level annotations. By using a hierarchical image segmentation algorithm to predict the boundary map, we select the optimal scale of high-quality hierarchies. In the initialization step, scribble annotations and the saliency map are combined to construct a graphic model over the optimal scale segmentation. By solving the minimal cut problem, it can spread information from scribbles to unmarked regions. In the training process, the segmentation network is trained by using the initial pixel-level annotations. To iteratively optimize the segmentation, we use a graphical model to refine segmentation masks and retrain the segmentation network to get more precise pixel-level annotations. The experimental results on Pascal VOC 2012 dataset demonstrate that the proposed framework outperforms most of weakly supervised semantic segmentation methods and achieves the state-of-the-art performance, which is [Formula: see text] mIoU.


2018 ◽  
Vol 2018 ◽  
pp. 1-10 ◽  
Author(s):  
Zhongmin Liu ◽  
Zhicai Chen ◽  
Zhanming Li ◽  
Wenjin Hu

In recent years, techniques based on the deep detection model have achieved overwhelming improvements in the accuracy of detection, which makes them being the most adapted for the applications, such as pedestrian detection. However, speed and accuracy are a pair of contradictions that always exist and have long puzzled researchers. How to achieve the good trade-off between them is a problem we must consider while designing the detectors. To this end, we employ the general detector YOLOv2, a state-of-the-art method in the general detection tasks, in the pedestrian detection. Then we modify the network parameters and structures, according to the characteristics of the pedestrians, making this method more suitable for detecting pedestrians. Experimental results in INRIA pedestrian detection dataset show that it has a fairly high detection speed with a small precision gap compared with the state-of-the-art pedestrian detection methods. Furthermore, we add weak semantic segmentation networks after shared convolution layers to illuminate pedestrians and employ a scale-aware structure in our model according to the characteristics of the wide size range in Caltech pedestrian detection dataset, which make great progress under the original improvement.


Author(s):  
Tong Wu ◽  
Bicheng Dai ◽  
Shuxin Chen ◽  
Yanyun Qu ◽  
Yuan Xie

Despite recent great progress on semantic segmentation, there still exist huge challenges in medical ultra-resolution image segmentation. The methods based on multi-branch structure can make a good balance between computational burdens and segmentation accuracy. However, the fusion structure in these methods require to be designed elaborately to achieve desirable result, which leads to model redundancy. In this paper, we propose Meta Segmentation Network (MSN) to solve this challenging problem. With the help of meta-learning, the fusion module of MSN is quite simple but effective. MSN can fast generate the weights of fusion layers through a simple meta-learner, requiring only a few training samples and epochs to converge. In addition, to avoid learning all branches from scratch, we further introduce a particular weight sharing mechanism to realize a fast knowledge adaptation and share the weights among multiple branches, resulting in the performance improvement and significant parameters reduction. The experimental results on two challenging ultra-resolution medical datasets BACH and ISIC show that MSN achieves the best performance compared with the state-of-the-art approaches.


Author(s):  
Lixiang Ru ◽  
Bo Du ◽  
Chen Wu

Current weakly-supervised semantic segmentation (WSSS) methods with image-level labels mainly adopt class activation maps (CAM) to generate the initial pseudo labels. However, CAM usually only identifies the most discriminative object extents, which is attributed to the fact that the network doesn't need to discover the integral object to recognize image-level labels. In this work, to tackle this problem, we proposed to simultaneously learn the image-level labels and local visual word labels. Specifically, in each forward propagation, the feature maps of the input image will be encoded to visual words with a learnable codebook. By enforcing the network to classify the encoded fine-grained visual words, the generated CAM could cover more semantic regions. Besides, we also proposed a hybrid spatial pyramid pooling module that could preserve local maximum and global average values of feature maps, so that more object details and less background were considered. Based on the proposed methods, we conducted experiments on the PASCAL VOC 2012 dataset. Our proposed method achieved 67.2% mIoU on the val set and 67.3% mIoU on the test set, which outperformed recent state-of-the-art methods.


2020 ◽  
Vol 34 (05) ◽  
pp. 9644-9651
Author(s):  
Chao Zhao ◽  
Snigdha Chaturvedi

Opinion summarization from online product reviews is a challenging task, which involves identifying opinions related to various aspects of the product being reviewed. While previous works require additional human effort to identify relevant aspects, we instead apply domain knowledge from external sources to automatically achieve the same goal. This work proposes AspMem, a generative method that contains an array of memory cells to store aspect-related knowledge. This explicit memory can help obtain a better opinion representation and infer the aspect information more precisely. We evaluate this method on both aspect identification and opinion summarization tasks. Our experiments show that AspMem outperforms the state-of-the-art methods even though, unlike the baselines, it does not rely on human supervision which is carefully handcrafted for the given tasks.


2020 ◽  
Vol 34 (07) ◽  
pp. 12765-12772
Author(s):  
Bingfeng Zhang ◽  
Jimin Xiao ◽  
Yunchao Wei ◽  
Mingjie Sun ◽  
Kaizhu Huang

Weakly supervised semantic segmentation is a challenging task as it only takes image-level information as supervision for training but produces pixel-level predictions for testing. To address such a challenging task, most recent state-of-the-art approaches propose to adopt two-step solutions, i.e. 1) learn to generate pseudo pixel-level masks, and 2) engage FCNs to train the semantic segmentation networks with the pseudo masks. However, the two-step solutions usually employ many bells and whistles in producing high-quality pseudo masks, making this kind of methods complicated and inelegant. In this work, we harness the image-level labels to produce reliable pixel-level annotations and design a fully end-to-end network to learn to predict segmentation maps. Concretely, we firstly leverage an image classification branch to generate class activation maps for the annotated categories, which are further pruned into confident yet tiny object/background regions. Such reliable regions are then directly served as ground-truth labels for the parallel segmentation branch, where a newly designed dense energy loss function is adopted for optimization. Despite its apparent simplicity, our one-step solution achieves competitive mIoU scores (val: 62.6, test: 62.9) on Pascal VOC compared with those two-step state-of-the-arts. By extending our one-step method to two-step, we get a new state-of-the-art performance on the Pascal VOC (val: 66.3, test: 66.5).


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 437
Author(s):  
Yuya Onozuka ◽  
Ryosuke Matsumi ◽  
Motoki Shino

Detection of traversable areas is essential to navigation of autonomous personal mobility systems in unknown pedestrian environments. However, traffic rules may recommend or require driving in specified areas, such as sidewalks, in environments where roadways and sidewalks coexist. Therefore, it is necessary for such autonomous mobility systems to estimate the areas that are mechanically traversable and recommended by traffic rules and to navigate based on this estimation. In this paper, we propose a method for weakly-supervised recommended traversable area segmentation in environments with no edges using automatically labeled images based on paths selected by humans. This approach is based on the idea that a human-selected driving path more accurately reflects both mechanical traversability and human understanding of traffic rules and visual information. In addition, we propose a data augmentation method and a loss weighting method for detecting the appropriate recommended traversable area from a single human-selected path. Evaluation of the results showed that the proposed learning methods are effective for recommended traversable area detection and found that weakly-supervised semantic segmentation using human-selected path information is useful for recommended area detection in environments with no edges.


2020 ◽  
Author(s):  
Jiahui Liu ◽  
Changqian Yu ◽  
Beibei Yang ◽  
Changxin Gao ◽  
Nong Sang

Sign in / Sign up

Export Citation Format

Share Document