Self-Supervised Tuning for Few-Shot Segmentation

Few-shot segmentation aims at assigning a category label to each image pixel with few annotated samples. It is a challenging task since the dense prediction can only be achieved under the guidance of latent features defined by sparse annotations. Existing meta-learning based method tends to fail in generating category-specifically discriminative descriptor when the visual features extracted from support images are marginalized in embedding space. To address this issue, this paper presents an adaptive tuning framework, in which the distribution of latent features across different episodes is dynamically adjusted based on a self-segmentation scheme, augmenting category-specific descriptors for label prediction. Specifically, a novel self-supervised inner-loop is firstly devised as the base learner to extract the underlying semantic features from the support image. Then, gradient maps are calculated by back-propagating self-supervised loss through the obtained features, and leveraged as guidance for augmenting the corresponding elements in the embedding space. Finally, with the ability to continuously learn from different episodes, an optimization-based meta-learner is adopted as outer loop of our proposed framework to gradually refine the segmentation results. Extensive experiments on benchmark PASCAL-5i and COCO-20i datasets demonstrate the superiority of our proposed method over state-of-the-art.

Download Full-text

Zero-Shot Learning from Adversarial Feature Residual to Compact Visual Feature

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6821 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11547-11554

Author(s):

Bo Liu ◽

Qiulei Dong ◽

Zhanyi Hu

Keyword(s):

State Of The Art ◽

Feature Space ◽

Visual Features ◽

Selection Strategy ◽

Semantic Features ◽

Visual Feature ◽

Adversarial Network ◽

Benchmark Datasets ◽

Residual Generator ◽

Object Features

Recently, many zero-shot learning (ZSL) methods focused on learning discriminative object features in an embedding feature space, however, the distributions of the unseen-class features learned by these methods are prone to be partly overlapped, resulting in inaccurate object recognition. Addressing this problem, we propose a novel adversarial network to synthesize compact semantic visual features for ZSL, consisting of a residual generator, a prototype predictor, and a discriminator. The residual generator is to generate the visual feature residual, which is integrated with a visual prototype predicted via the prototype predictor for synthesizing the visual feature. The discriminator is to distinguish the synthetic visual features from the real ones extracted from an existing categorization CNN. Since the generated residuals are generally numerically much smaller than the distances among all the prototypes, the distributions of the unseen-class features synthesized by the proposed network are less overlapped. In addition, considering that the visual features from categorization CNNs are generally inconsistent with their semantic features, a simple feature selection strategy is introduced for extracting more compact semantic visual features. Extensive experimental results on six benchmark datasets demonstrate that our method could achieve a significantly better performance than existing state-of-the-art methods by ∼1.2-13.2% in most cases.

Download Full-text

Differentiable Meta-Learning Model for Few-Shot Semantic Segmentation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6887 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12087-12094

Author(s):

Pinzhuo Tian ◽

Zhangkai Wu ◽

Lei Qi ◽

Lei Wang ◽

Yinghuan Shi ◽

...

Keyword(s):

Object Segmentation ◽

Semantic Segmentation ◽

Classification Problem ◽

Single Object ◽

Pixel Classification ◽

Shot Segmentation ◽

Meta Learning ◽

Base Learner ◽

Segmentation Task ◽

Global And Local

To address the annotation scarcity issue in some cases of semantic segmentation, there have been a few attempts to develop the segmentation model in the few-shot learning paradigm. However, most existing methods only focus on the traditional 1-way segmentation setting (i.e., one image only contains a single object). This is far away from practical semantic segmentation tasks where the K-way setting (K > 1) is usually required by performing the accurate multi-object segmentation. To deal with this issue, we formulate the few-shot semantic segmentation task as a learning-based pixel classification problem, and propose a novel framework called MetaSegNet based on meta-learning. In MetaSegNet, an architecture of embedding module consisting of the global and local feature branches is developed to extract the appropriate meta-knowledge for the few-shot segmentation. Moreover, we incorporate a linear model into MetaSegNet as a base learner to directly predict the label of each pixel for the multi-object segmentation. Furthermore, our MetaSegNet can be trained by the episodic training mechanism in an end-to-end manner from scratch. Experiments on two popular semantic segmentation datasets, i.e., PASCAL VOC and COCO, reveal the effectiveness of the proposed MetaSegNet in the K-way few-shot semantic segmentation task.

Download Full-text

TransET: Knowledge Graph Embedding with Entity Types

Electronics ◽

10.3390/electronics10121407 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1407

Author(s):

Peng Wang ◽

Jing Zhou ◽

Yuzhang Liu ◽

Xingchen Zhou

Keyword(s):

Link Prediction ◽

State Of The Art ◽

Score Function ◽

Graph Embedding ◽

Vector Spaces ◽

Knowledge Graph ◽

Semantic Features ◽

Knowledge Graphs ◽

Real World Datasets ◽

Low Dimensional

Knowledge graph embedding aims to embed entities and relations into low-dimensional vector spaces. Most existing methods only focus on triple facts in knowledge graphs. In addition, models based on translation or distance measurement cannot fully represent complex relations. As well-constructed prior knowledge, entity types can be employed to learn the representations of entities and relations. In this paper, we propose a novel knowledge graph embedding model named TransET, which takes advantage of entity types to learn more semantic features. More specifically, circle convolution based on the embeddings of entity and entity types is utilized to map head entity and tail entity to type-specific representations, then translation-based score function is used to learn the presentation triples. We evaluated our model on real-world datasets with two benchmark tasks of link prediction and triple classification. Experimental results demonstrate that it outperforms state-of-the-art models in most cases.

Download Full-text

Efficient Rank-Based Diffusion Process with Assured Convergence

Journal of Imaging ◽

10.3390/jimaging7030049 ◽

2021 ◽

Vol 7 (3) ◽

pp. 49

Author(s):

Daniel Carlos Guimarães Pedronette ◽

Lucas Pascotti Valem ◽

Longin Jan Latecki

Keyword(s):

Diffusion Process ◽

Learning Strategies ◽

State Of The Art ◽

Representation Learning ◽

Theoretical Background ◽

High Dimensional ◽

Visual Features ◽

Learning Approaches ◽

Previous Decade ◽

Asymptotic Complexity

Visual features and representation learning strategies experienced huge advances in the previous decade, mainly supported by deep learning approaches. However, retrieval tasks are still performed mainly based on traditional pairwise dissimilarity measures, while the learned representations lie on high dimensional manifolds. With the aim of going beyond pairwise analysis, post-processing methods have been proposed to replace pairwise measures by globally defined measures, capable of analyzing collections in terms of the underlying data manifold. The most representative approaches are diffusion and ranked-based methods. While the diffusion approaches can be computationally expensive, the rank-based methods lack theoretical background. In this paper, we propose an efficient Rank-based Diffusion Process which combines both approaches and avoids the drawbacks of each one. The obtained method is capable of efficiently approximating a diffusion process by exploiting rank-based information, while assuring its convergence. The algorithm exhibits very low asymptotic complexity and can be computed regionally, being suitable to outside of dataset queries. An experimental evaluation conducted for image retrieval and person re-ID tasks on diverse datasets demonstrates the effectiveness of the proposed approach with results comparable to the state-of-the-art.

Download Full-text

Bimodal fusion of low-level visual features and high-level semantic features for near-duplicate video clip detection

Signal Processing Image Communication ◽

10.1016/j.image.2011.04.001 ◽

2011 ◽

Vol 26 (10) ◽

pp. 612-627 ◽

Cited By ~ 2

Author(s):

Hyun-seok Min ◽

Jae Young Choi ◽

Wesley De Neve ◽

Yong Man Ro

Keyword(s):

Video Clip ◽

Visual Features ◽

Semantic Features ◽

Low Level ◽

High Level ◽

Duplicate Video

Download Full-text

Color Based Image Retrieval by Combining Various Features

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b3163.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 454-460

Keyword(s):

Image Retrieval ◽

Retrieval System ◽

Visual Features ◽

Semantic Features ◽

Retrieval Rate ◽

Image Retrieval System ◽

Mathematical Formulas ◽

Image Set ◽

Different Color ◽

Image Collection

Content based image retrieval system retrieve the images according to the strong feature related to desire as color, texture and shape of an image. Although visual features cannot be completely determined by semantic features, but still semantic features can be integrate easily into mathematical formulas. This paper is focused on retrieval of images within a large image collection, based on color projection by applying segmentation and quantification on different color models and compared for good result. This method is applied on different categories of image set and evaluated its retrieval rate in different models

Download Full-text

Ensemble-based deep meta learning for medical image segmentation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219221 ◽

2021 ◽

pp. 1-7

Author(s):

Usman Ahmed ◽

Jerry Chun-Wei Lin ◽

Gautam Srivastava

Keyword(s):

Image Segmentation ◽

Deep Learning ◽

Learning Algorithm ◽

State Of The Art ◽

Medical Image Segmentation ◽

The State ◽

Data Set ◽

Deep Learning Algorithm ◽

Meta Learning ◽

Small Set

Deep learning methods have led to a state of the art medical applications, such as image classification and segmentation. The data-driven deep learning application can help stakeholders to collaborate. However, limited labelled data set limits the deep learning algorithm to generalize for one domain into another. To handle the problem, meta-learning helps to learn from a small set of data. We proposed a meta learning-based image segmentation model that combines the learning of the state-of-the-art model and then used it to achieve domain adoption and high accuracy. Also, we proposed a prepossessing algorithm to increase the usability of the segments part and remove noise from the new test image. The proposed model can achieve 0.94 precision and 0.92 recall. The ability to increase 3.3% among the state-of-the-art algorithms.

Download Full-text

Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6803 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11402-11409

Author(s):

Siqi Li ◽

Changqing Zou ◽

Yipeng Li ◽

Xibin Zhao ◽

Yue Gao

Keyword(s):

State Of The Art ◽

Semantic Segmentation ◽

Spatial Dimension ◽

Semantic Features ◽

Convolutional Network ◽

The Real ◽

Single View ◽

Depth Cues ◽

Semantic Scene ◽

3D Scene

This paper presents an end-to-end 3D convolutional network named attention-based multi-modal fusion network (AMFNet) for the semantic scene completion (SSC) task of inferring the occupancy and semantic labels of a volumetric 3D scene from single-view RGB-D images. Compared with previous methods which use only the semantic features extracted from RGB-D images, the proposed AMFNet learns to perform effective 3D scene completion and semantic segmentation simultaneously via leveraging the experience of inferring 2D semantic segmentation from RGB-D images as well as the reliable depth cues in spatial dimension. It is achieved by employing a multi-modal fusion architecture boosted from 2D semantic segmentation and a 3D semantic completion network empowered by residual attention blocks. We validate our method on both the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset and the results show that our method respectively achieves the gains of 2.5% and 2.6% on the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset against the state-of-the-art method.

Download Full-text

Meta-Learning PAC-Bayes Priors in Model Averaging

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5841 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4198-4205

Author(s):

Yimin Huang ◽

Weiran Huang ◽

Liang Li ◽

Zhenguo Li

Keyword(s):

Model Averaging ◽

Selection Procedure ◽

Real Data ◽

Poor Quality ◽

Quality Data ◽

Main Challenge ◽

Meta Learning ◽

Model Set ◽

Base Learner ◽

Proper Priors

Nowadays model uncertainty has become one of the most important problems in both academia and industry. In this paper, we mainly consider the scenario in which we have a common model set used for model averaging instead of selecting a single final model via a model selection procedure to account for this model's uncertainty in order to improve reliability and accuracy of inferences. Here one main challenge is to learn the prior over the model set. To tackle this problem, we propose two data-based algorithms to get proper priors for model averaging. One is for meta-learner, the analysts should use historical similar tasks to extract the information about the prior. The other one is for base-learner, a subsampling method is used to deal with the data step by step. Theoretically, an upper bound of risk for our algorithm is presented to guarantee the performance of the worst situation. In practice, both methods perform well in simulations and real data studies, especially with poor quality data.

Download Full-text

Pattern Based Feature Construction in Semantic Data Mining

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2014010102 ◽

2014 ◽

Vol 10 (1) ◽

pp. 27-65 ◽

Cited By ~ 11

Author(s):

Agnieszka Ławrynowicz ◽

Jędrzej Potoniec

Keyword(s):

Data Mining ◽

Pattern Mining ◽

Semantic Features ◽

Semantic Data ◽

Data Mining Approach ◽

Meta Learning ◽

New Type ◽

Domain Ontologies ◽

Semantic Data Mining

The authors propose a new method for mining sets of patterns for classification, where patterns are represented as SPARQL queries over RDFS. The method contributes to so-called semantic data mining, a data mining approach where domain ontologies are used as background knowledge, and where the new challenge is to mine knowledge encoded in domain ontologies, rather than only purely empirical data. The authors have developed a tool that implements this approach. Using this the authors have conducted an experimental evaluation including comparison of our method to state-of-the-art approaches to classification of semantic data and an experimental study within emerging subfield of meta-learning called semantic meta-mining. The most important research contributions of the paper to the state-of-art are as follows. For pattern mining research or relational learning in general, the paper contributes a new algorithm for discovery of new type of patterns. For Semantic Web research, it theoretically and empirically illustrates how semantic, structured data can be used in traditional machine learning methods through a pattern-based approach for constructing semantic features.

Download Full-text