One-Shot Learning for Long-Tail Visual Relation Detection

Weitao Wang; Meng Wang; Sen Wang; Guodong Long; Lina Yao; Guilin Qi; Yang Chen

doi:10.1609/aaai.v34i07.6904

One-Shot Learning for Long-Tail Visual Relation Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6904 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12225-12232

Author(s):

Weitao Wang ◽

Meng Wang ◽

Sen Wang ◽

Guodong Long ◽

Lina Yao ◽

...

Keyword(s):

Question Answering ◽

Dual Graph ◽

Image Captioning ◽

Long Tail ◽

Training Scheme ◽

Training Samples ◽

Latent Features ◽

The One ◽

Novel Model ◽

Conventional Detection

The aim of visual relation detection is to provide a comprehensive understanding of an image by describing all the objects within the scene, and how they relate to each other, in < object-predicate-object > form; for example, < person-lean on-wall > . This ability is vital for image captioning, visual question answering, and many other applications. However, visual relationships have long-tailed distributions and, thus, the limited availability of training samples is hampering the practicability of conventional detection approaches. With this in mind, we designed a novel model for visual relation detection that works in one-shot settings. The embeddings of objects and predicates are extracted through a network that includes a feature-level attention mechanism. Attention alleviates some of the problems with feature sparsity, and the resulting representations capture more discriminative latent features. The core of our model is a dual graph neural network that passes and aggregates the context information of predicates and objects in an episodic training scheme to improve recognition of the one-shot predicates and then generate the triplets. To the best of our knowledge, we are the first to center on the viability of one-shot learning for visual relation detection. Extensive experiments on two newly-constructed datasets show that our model significantly improved the performance of two tasks PredCls and SGCls from 2.8% to 12.2% compared with state-of-the-art baselines.

Download Full-text

C-Planarity Testing of Embedded Clustered Graphs with Bounded Dual Carving-Width

Algorithmica ◽

10.1007/s00453-021-00839-2 ◽

2021 ◽

Author(s):

Giordano Da Lozzo ◽

David Eppstein ◽

Michael T. Goodrich ◽

Siddharth Gupta

Keyword(s):

Dual Graph ◽

Graph Visualization ◽

Closed Disk ◽

Running Time ◽

Planar Embedding ◽

Bounded Treewidth ◽

Fpt Algorithm ◽

Vertex Set ◽

The One ◽

Clustered Planarity

AbstractFor a clustered graph, i.e, a graph whose vertex set is recursively partitioned into clusters, the C-Planarity Testing problem asks whether it is possible to find a planar embedding of the graph and a representation of each cluster as a region homeomorphic to a closed disk such that (1) the subgraph induced by each cluster is drawn in the interior of the corresponding disk, (2) each edge intersects any disk at most once, and (3) the nesting between clusters is reflected by the representation, i.e., child clusters are properly contained in their parent cluster. The computational complexity of this problem, whose study has been central to the theory of graph visualization since its introduction in 1995 [Feng, Cohen, and Eades, Planarity for clustered graphs, ESA’95], has only been recently settled [Fulek and Tóth, Atomic Embeddability, Clustered Planarity, and Thickenability, to appear at SODA’20]. Before such a breakthrough, the complexity question was still unsolved even when the graph has a prescribed planar embedding, i.e, for embedded clustered graphs. We show that the C-Planarity Testing problem admits a single-exponential single-parameter FPT (resp., XP) algorithm for embedded flat (resp., non-flat) clustered graphs, when parameterized by the carving-width of the dual graph of the input. These are the first FPT and XP algorithms for this long-standing open problem with respect to a single notable graph-width parameter. Moreover, the polynomial dependency of our FPT algorithm is smaller than the one of the algorithm by Fulek and Tóth. In particular, our algorithm runs in quadratic time for flat instances of bounded treewidth and bounded face size. To further strengthen the relevance of this result, we show that an algorithm with running time O(r(n)) for flat instances whose underlying graph has pathwidth 1 would result in an algorithm with running time O(r(n)) for flat instances and with running time $$O(r(n^2) + n^2)$$ O ( r ( n 2 ) + n 2 ) for general, possibly non-flat, instances.

Download Full-text

The Tutte polynomial of symmetric hyperplane arrangements

Journal of Knot Theory and Its Ramifications ◽

10.1142/s0218216520500042 ◽

2020 ◽

Vol 29 (03) ◽

pp. 2050004

Author(s):

Hery Randriamaro

Keyword(s):

Hyperplane Arrangement ◽

Dual Graph ◽

Tutte Polynomial ◽

Hyperplane Arrangements ◽

Weyl Groups ◽

Reflection Groups ◽

Tutte Polynomials ◽

One Year ◽

The One ◽

Definition Of

The Tutte polynomial is originally a bivariate polynomial which enumerates the colorings of a graph and of its dual graph. Ardila extended in 2007 the definition of the Tutte polynomial on the real hyperplane arrangements. He particularly computed the Tutte polynomials of the hyperplane arrangements associated to the classical Weyl groups. Those associated to the exceptional Weyl groups were computed by De Concini and Procesi one year later. This paper has two objectives: On the one side, we extend the Tutte polynomial computing to the complex hyperplane arrangements. On the other side, we introduce a wider class of hyperplane arrangements which is that of the symmetric hyperplane arrangements. Computing the Tutte polynomial of a symmetric hyperplane arrangement permits us to deduce the Tutte polynomials of some hyperplane arrangements, particularly of those associated to the imprimitive reflection groups.

Download Full-text

Dual Graph Convolutional Network for Hyperspectral Image Classification With Limited Training Samples

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2021.3061088 ◽

2021 ◽

pp. 1-18

Author(s):

Xin He ◽

Yushi Chen ◽

Pedram Ghamisi

Keyword(s):

Image Classification ◽

Hyperspectral Image ◽

Dual Graph ◽

Hyperspectral Image Classification ◽

Convolutional Network ◽

Training Samples ◽

Limited Training Samples

Download Full-text

A Survey of Techniques for Constructing Chinese Knowledge Graphs and Their Applications

Sustainability ◽

10.3390/su10093245 ◽

2018 ◽

Vol 10 (9) ◽

pp. 3245 ◽

Cited By ~ 7

Author(s):

Tianxing Wu ◽

Guilin Qi ◽

Cheng Li ◽

Meng Wang

Keyword(s):

Artificial Intelligence ◽

Question Answering ◽

Knowledge Representation And Reasoning ◽

Knowledge Graph ◽

Development History ◽

One Belt One Road ◽

History Of ◽

Knowledge Graphs ◽

The One ◽

The Impact

With the continuous development of intelligent technologies, knowledge graph, the backbone of artificial intelligence, has attracted much attention from both academic and industrial communities due to its powerful capability of knowledge representation and reasoning. In recent years, knowledge graph has been widely applied in different kinds of applications, such as semantic search, question answering, knowledge management and so on. Techniques for building Chinese knowledge graphs are also developing rapidly and different Chinese knowledge graphs have been constructed to support various applications. Under the background of the “One Belt One Road (OBOR)” initiative, cooperating with the countries along OBOR on studying knowledge graph techniques and applications will greatly promote the development of artificial intelligence. At the same time, the accumulated experience of China in developing knowledge graphs is also a good reference to develop non-English knowledge graphs. In this paper, we aim to introduce the techniques of constructing Chinese knowledge graphs and their applications, as well as analyse the impact of knowledge graph on OBOR. We first describe the background of OBOR, and then introduce the concept and development history of knowledge graph and typical Chinese knowledge graphs. Afterwards, we present the details of techniques for constructing Chinese knowledge graphs, and demonstrate several applications of Chinese knowledge graphs. Finally, we list some examples to explain the potential impacts of knowledge graph on OBOR.

Download Full-text

Measuring Machine Intelligence Through Visual Question Answering

AI Magazine ◽

10.1609/aimag.v37i1.2647 ◽

2016 ◽

Vol 37 (1) ◽

pp. 63-72 ◽

Cited By ~ 10

Author(s):

C. Lawrence Zitnick ◽

Aishwarya Agrawal ◽

Stanislaw Antol ◽

Margaret Mitchell ◽

Dhruv Batra ◽

...

Keyword(s):

Question Answering ◽

Machine Intelligence ◽

Image Captioning ◽

Visual Question Answering ◽

Language And Vision ◽

Measuring Machine

As machines have become more intelligent, there has been a renewed interest in methods for measuring their intelligence. A common approach is to propose tasks for which a human excels, but one which machines find difficult. However, an ideal task should also be easy to evaluate and not be easily gameable. We begin with a case study exploring the recently popular task of image captioning and its limitations as a task for measuring machine intelligence. An alternative and more promising task is Visual Question Answering that tests a machine’s ability to reason about language and vision. We describe a dataset unprecedented in size created for the task that contains over 760,000 human generated questions about images. Using around 10 million human generated answers, machines may be easily evaluated.

Download Full-text

Unified Vision-Language Pre-Training for Image Captioning and VQA

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.7005 ◽

2020 ◽

Vol 34 (07) ◽

pp. 13041-13049 ◽

Cited By ~ 11

Author(s):

Luowei Zhou ◽

Hamid Palangi ◽

Lei Zhang ◽

Houdong Hu ◽

Jason Corso ◽

...

Keyword(s):

Unsupervised Learning ◽

Question Answering ◽

State Of The Art ◽

Learning Objectives ◽

Image Captioning ◽

Language Generation ◽

Visual Question Answering ◽

Benchmark Datasets

This paper presents a unified Vision-Language Pre-training (VLP) model. The model is unified in that (1) it can be fine-tuned for either vision-language generation (e.g., image captioning) or understanding (e.g., visual question answering) tasks, and (2) it uses a shared multi-layer transformer network for both encoding and decoding, which differs from many existing methods where the encoder and decoder are implemented using separate models. The unified VLP model is pre-trained on a large amount of image-text pairs using the unsupervised learning objectives of two tasks: bidirectional and sequence-to-sequence (seq2seq) masked vision-language prediction. The two tasks differ solely in what context the prediction conditions on. This is controlled by utilizing specific self-attention masks for the shared transformer network. To the best of our knowledge, VLP is the first reported model that achieves state-of-the-art results on both vision-language generation and understanding tasks, as disparate as image captioning and visual question answering, across three challenging benchmark datasets: COCO Captions, Flickr30k Captions, and VQA 2.0. The code and the pre-trained models are available at https://github.com/LuoweiZhou/VLP.

Download Full-text

Three-Dimensional ResNeXt Network Using Feature Fusion and Label Smoothing for Hyperspectral Image Classification

Sensors ◽

10.3390/s20061652 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1652 ◽

Cited By ~ 4

Author(s):

Peida Wu ◽

Ziguan Cui ◽

Zongliang Gan ◽

Feng Liu

Keyword(s):

Hyperspectral Image ◽

Feature Fusion ◽

Three Dimensional ◽

Spectral Feature ◽

Feature Learning ◽

Hyperspectral Image Classification ◽

Training Samples ◽

Dramatic Rise ◽

Classification Tasks ◽

The One

In recent years, deep learning methods have been widely used in the hyperspectral image (HSI) classification tasks. Among them, spectral-spatial combined methods based on the three-dimensional (3-D) convolution have shown good performance. However, because of the three-dimensional convolution, increasing network depth will result in a dramatic rise in the number of parameters. In addition, the previous methods do not make full use of spectral information. They mostly use the data after dimensionality reduction directly as the input of networks, which result in poor classification ability in some categories with small numbers of samples. To address the above two issues, in this paper, we designed an end-to-end 3D-ResNeXt network which adopts feature fusion and label smoothing strategy further. On the one hand, the residual connections and split-transform-merge strategy can alleviate the declining-accuracy phenomenon and decrease the number of parameters. We can adjust the hyperparameter cardinality instead of the network depth to extract more discriminative features of HSIs and improve the classification accuracy. On the other hand, in order to improve the classification accuracies of classes with small numbers of samples, we enrich the input of the 3D-ResNeXt spectral-spatial feature learning network by additional spectral feature learning, and finally use a loss function modified by label smoothing strategy to solve the imbalance of classes. The experimental results on three popular HSI datasets demonstrate the superiority of our proposed network and an effective improvement in the accuracies especially for the classes with small numbers of training samples.

Download Full-text

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition ◽

10.1109/cvpr.2018.00636 ◽

2018 ◽

Cited By ~ 512

Author(s):

Peter Anderson ◽

Xiaodong He ◽

Chris Buehler ◽

Damien Teney ◽

Mark Johnson ◽

...

Keyword(s):

Question Answering ◽

Image Captioning ◽

Top Down ◽

Bottom Up ◽

Visual Question Answering

Download Full-text

Can Image Captioning Help Passage Retrieval in Multimodal Question Answering?

Lecture Notes in Computer Science - Advances in Information Retrieval ◽

10.1007/978-3-030-15719-7_12 ◽

2019 ◽

pp. 94-101 ◽

Cited By ~ 1

Author(s):

Shurong Sheng ◽

Katrien Laenen ◽

Marie-Francine Moens

Keyword(s):

Question Answering ◽

Image Captioning ◽

Passage Retrieval

Download Full-text

Application of Probabilistic Neural Network in Pattern Classification

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.441.738 ◽

2013 ◽

Vol 441 ◽

pp. 738-741 ◽

Cited By ~ 3

Author(s):

Shuo Ding ◽

Xiao Heng Chang ◽

Qing Hui Wu

Keyword(s):

Neural Network ◽

Neural Networks ◽

Pattern Classification ◽

Probabilistic Neural Network ◽

Back Propagation ◽

Back Propagation Neural Networks ◽

Training Samples ◽

The One ◽

Speed Accuracy

The network model of probabilistic neural network and its method of pattern classification and discrimination are first introduced in this paper. Then probabilistic neural network and three usually used back propagation neural networks are established through MATLAB7.0. The pattern classification of dots on a two-dimensional plane is taken as an example. Probabilistic neural network and improved back propagation neural networks are used to classify these dots respectively. Their classification results are compared with each other. The simulation results show that compared with back propagation neural networks, probabilistic neural network has simpler learning rules, faster training speed and it needs fewer training samples; the pattern classification method based on probabilistic neural network is very effective, and it is superior to the one based on back propagation neural networks in classifying speed, accuracy as well as generalization ability.

Download Full-text