Weakly Supervised Few-shot Object Segmentation using Co-Attention with Visual and Semantic Embeddings

Significant progress has been made recently in developing few-shot object segmentation methods. Learning is shown to be successful in few-shot segmentation settings, using pixel-level, scribbles and bounding box supervision. This paper takes another approach, i.e., only requiring image-level label for few-shot object segmentation. We propose a novel multi-modal interaction module for few-shot object segmentation that utilizes a co-attention mechanism using both visual and word embedding. Our model using image-level labels achieves 4.8% improvement over previously proposed image-level few-shot object segmentation. It also outperforms state-of-the-art methods that use weak bounding box supervision on PASCAL-5^i. Our results show that few-shot segmentation benefits from utilizing word embeddings, and that we are able to perform few-shot segmentation using stacked joint visual semantic processing with weak image-level labels. We further propose a novel setup, Temporal Object Segmentation for Few-shot Learning (TOSFL) for videos. TOSFL can be used on a variety of public video data such as Youtube-VOS, as demonstrated in both instance-level and category-level TOSFL experiments.

Download Full-text

Automatic moving object segmentation methods under varying illumination conditions for video data: comparative study, and an improved method

Multimedia Tools and Applications ◽

10.1007/s11042-015-2927-4 ◽

2015 ◽

Vol 75 (23) ◽

pp. 16209-16264 ◽

Cited By ~ 2

Author(s):

Alok Kumar Singh Kushwaha ◽

Rajeev Srivastava

Keyword(s):

Comparative Study ◽

Object Segmentation ◽

Video Data ◽

Moving Object ◽

Improved Method ◽

Moving Object Segmentation ◽

Segmentation Methods

Download Full-text

Weakly supervised video object segmentation initialized with referring expression

Neurocomputing ◽

10.1016/j.neucom.2020.06.129 ◽

2020 ◽

Author(s):

XiaoQing Bu ◽

YuKuan Sun ◽

JianMing Wang ◽

KunLiang Liu ◽

JiaYu Liang ◽

...

Keyword(s):

Object Segmentation ◽

Video Object Segmentation ◽

Video Object ◽

Weakly Supervised

Download Full-text

PyConvU-Net: a lightweight and multiscale network for biomedical image segmentation

BMC Bioinformatics ◽

10.1186/s12859-020-03943-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Changyong Li ◽

Yongxian Fan ◽

Xiaodong Cai

Keyword(s):

Image Segmentation ◽

Deep Learning ◽

State Of The Art ◽

Experimental Results ◽

Actual Situation ◽

Controlled Experiments ◽

Biomedical Image ◽

Segmentation Methods ◽

Art Performance

Abstract Background With the development of deep learning (DL), more and more methods based on deep learning are proposed and achieve state-of-the-art performance in biomedical image segmentation. However, these methods are usually complex and require the support of powerful computing resources. According to the actual situation, it is impractical that we use huge computing resources in clinical situations. Thus, it is significant to develop accurate DL based biomedical image segmentation methods which depend on resources-constraint computing. Results A lightweight and multiscale network called PyConvU-Net is proposed to potentially work with low-resources computing. Through strictly controlled experiments, PyConvU-Net predictions have a good performance on three biomedical image segmentation tasks with the fewest parameters. Conclusions Our experimental results preliminarily demonstrate the potential of proposed PyConvU-Net in biomedical image segmentation with resources-constraint computing.

Download Full-text

Accurate Instance-Based Segmentation for Boundary Detection in Robot Grasping Application

Applied Sciences ◽

10.3390/app11094248 ◽

2021 ◽

Vol 11 (9) ◽

pp. 4248

Author(s):

Hong Hai Hoang ◽

Bao Long Tran

Keyword(s):

Object Segmentation ◽

State Of The Art ◽

Rapid Development ◽

Spatial Relationship ◽

Learning Technologies ◽

Average Precision ◽

Novel Approach ◽

3D Camera ◽

Robot Grasping ◽

Instance Segmentation

With the rapid development of cameras and deep learning technologies, computer vision tasks such as object detection, object segmentation and object tracking are being widely applied in many fields of life. For robot grasping tasks, object segmentation aims to classify and localize objects, which helps robots to be able to pick objects accurately. The state-of-the-art instance segmentation network framework, Mask Region-Convolution Neural Network (Mask R-CNN), does not always perform an excellent accurate segmentation at the edge or border of objects. The approach using 3D camera, however, is able to extract the entire (foreground) objects easily but can be difficult or require a large amount of computation effort to classify it. We propose a novel approach, in which we combine Mask R-CNN with 3D algorithms by adding a 3D process branch for instance segmentation. Both outcomes of two branches are contemporaneously used to classify the pixels at the edge objects by dealing with the spatial relationship between edge region and mask region. We analyze the effectiveness of the method by testing with harsh cases of object positions, for example, objects are closed, overlapped or obscured by each other to focus on edge and border segmentation. Our proposed method is about 4 to 7% higher and more stable in IoU (intersection of union). This leads to a reach of 46% of mAP (mean Average Precision), which is a higher accuracy than its counterpart. The feasibility experiment shows that our method could be a remarkable promoting for the research of the grasping robot.

Download Full-text

Centralized Ranking Loss with Weakly Supervised Localization for Fine-Grained Object Retrieval

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/171 ◽

2018 ◽

Cited By ~ 9

Author(s):

Xiawu Zheng ◽

Rongrong Ji ◽

Xiaoshuai Sun ◽

Yongjian Wu ◽

Feiyue Huang ◽

...

Keyword(s):

State Of The Art ◽

Feature Learning ◽

Target Object ◽

Object Retrieval ◽

Unified Framework ◽

Fine Grained ◽

Discriminative Feature ◽

Triplet Loss ◽

Weakly Supervised ◽

Ranking Loss

Fine-grained object retrieval has attracted extensive research focus recently. Its state-of-the-art schemesare typically based upon convolutional neural network (CNN) features. Despite the extensive progress, two issues remain open. On one hand, the deep features are coarsely extracted at image level rather than precisely at object level, which are interrupted by background clutters. On the other hand, training CNN features with a standard triplet loss is time consuming and incapable to learn discriminative features. In this paper, we present a novel fine-grained object retrieval scheme that conquers these issues in a unified framework. Firstly, we introduce a novel centralized ranking loss (CRL), which achieves a very efficient (1,000times training speedup comparing to the triplet loss) and discriminative feature learning by a ?centralized? global pooling. Secondly, a weakly supervised attractive feature extraction is proposed, which segments object contours with top-down saliency. Consequently, the contours are integrated into the CNN response map to precisely extract features ?within? the target object. Interestingly, we have discovered that the combination of CRL and weakly supervised learning can reinforce each other. We evaluate the performance ofthe proposed scheme on widely-used benchmarks including CUB200-2011 and CARS196. We havereported significant gains over the state-of-the-art schemes, e.g., 5.4% over SCDA [Wei et al., 2017]on CARS196, and 3.7% on CUB200-2011.

Download Full-text

Bounding Box Tightness Prior for Weakly Supervised Image Segmentation

10.1007/978-3-030-87196-3_49 ◽

2021 ◽

pp. 526-536

Author(s):

Juan Wang ◽

Bin Xia

Keyword(s):

Image Segmentation ◽

Bounding Box ◽

Weakly Supervised

Download Full-text

Object segmentation methods for online model acquisition to guide robotic grasping

10.32920/ryerson.14655186.v1 ◽

2021 ◽

Author(s):

Dmitri Ignakov

Keyword(s):

Object Segmentation ◽

Vision System ◽

Geometric Model ◽

Graph Cuts ◽

Segmentation Algorithm ◽

Service Robots ◽

Segmentation Method ◽

Robotic Grasping ◽

Model Acquisition ◽

Segmentation Methods

A vision system is an integral component of many autonomous robots. It enables the robot to perform essential tasks such as mapping, localization, or path planning. A vision system also assists with guiding the robot's grasping and manipulation tasks. As an increased demand is placed on service robots to operate in uncontrolled environments, advanced vision systems must be created that can function effectively in visually complex and cluttered settings. This thesis presents the development of segmentation algorithms to assist in online model acquisition for guiding robotic manipulation tasks. Specifically, the focus is placed on localizing door handles to assist in robotic door opening, and on acquiring partial object models to guide robotic grasping. . First, a method for localizing a door handle of unknown geometry based on a proposed 3D segmentation method is presented. Following segmentation, localization is performed by fitting a simple box model to the segmented handle. The proposed method functions without requiring assumptions about the appearance of the handle or the door, and without a geometric model of the handle. Next, an object segmentation algorithm is developed, which combines multiple appearance (intensity and texture) and geometric (depth and curvature) cues. The algorithm is able to segment objects without utilizing any a priori appearance or geometric information in visually complex and cluttered environments. The segmentation method is based on the Conditional Random Fields (CRF) framework, and the graph cuts energy minimization technique. A simple and efficient method for initializing the proposed algorithm which overcomes graph cuts' reliance on user interaction is also developed. Finally, an improved segmentation algorithm is developed which incorporates a distance metric learning (DML) step as a means of weighing various appearance and geometric segmentation cues, allowing the method to better adapt to the available data. The improved method also models the distribution of 3D points in space as a distribution of algebraic distances from an ellipsoid fitted to the object, improving the method's ability to predict which points are likely to belong to the object or the background. Experimental validation of all methods is performed. Each method is evaluated in a realistic setting, utilizing scenarios of various complexities. Experimental results have demonstrated the effectiveness of the handle localization method, and the object segmentation methods.

Download Full-text