scholarly journals One-Shot Learning for Long-Tail Visual Relation Detection

2020 ◽  
Vol 34 (07) ◽  
pp. 12225-12232
Author(s):  
Weitao Wang ◽  
Meng Wang ◽  
Sen Wang ◽  
Guodong Long ◽  
Lina Yao ◽  
...  

The aim of visual relation detection is to provide a comprehensive understanding of an image by describing all the objects within the scene, and how they relate to each other, in < object-predicate-object > form; for example, < person-lean on-wall > . This ability is vital for image captioning, visual question answering, and many other applications. However, visual relationships have long-tailed distributions and, thus, the limited availability of training samples is hampering the practicability of conventional detection approaches. With this in mind, we designed a novel model for visual relation detection that works in one-shot settings. The embeddings of objects and predicates are extracted through a network that includes a feature-level attention mechanism. Attention alleviates some of the problems with feature sparsity, and the resulting representations capture more discriminative latent features. The core of our model is a dual graph neural network that passes and aggregates the context information of predicates and objects in an episodic training scheme to improve recognition of the one-shot predicates and then generate the triplets. To the best of our knowledge, we are the first to center on the viability of one-shot learning for visual relation detection. Extensive experiments on two newly-constructed datasets show that our model significantly improved the performance of two tasks PredCls and SGCls from 2.8% to 12.2% compared with state-of-the-art baselines.

Algorithmica ◽  
2021 ◽  
Author(s):  
Giordano Da Lozzo ◽  
David Eppstein ◽  
Michael T. Goodrich ◽  
Siddharth Gupta

AbstractFor a clustered graph, i.e, a graph whose vertex set is recursively partitioned into clusters, the C-Planarity Testing problem asks whether it is possible to find a planar embedding of the graph and a representation of each cluster as a region homeomorphic to a closed disk such that (1) the subgraph induced by each cluster is drawn in the interior of the corresponding disk, (2) each edge intersects any disk at most once, and (3) the nesting between clusters is reflected by the representation, i.e., child clusters are properly contained in their parent cluster. The computational complexity of this problem, whose study has been central to the theory of graph visualization since its introduction in 1995 [Feng, Cohen, and Eades, Planarity for clustered graphs, ESA’95], has only been recently settled [Fulek and Tóth, Atomic Embeddability, Clustered Planarity, and Thickenability, to appear at SODA’20]. Before such a breakthrough, the complexity question was still unsolved even when the graph has a prescribed planar embedding, i.e, for embedded clustered graphs. We show that the C-Planarity Testing problem admits a single-exponential single-parameter FPT (resp., XP) algorithm for embedded flat (resp., non-flat) clustered graphs, when parameterized by the carving-width of the dual graph of the input. These are the first FPT and XP algorithms for this long-standing open problem with respect to a single notable graph-width parameter. Moreover, the polynomial dependency of our FPT algorithm is smaller than the one of the algorithm by Fulek and Tóth. In particular, our algorithm runs in quadratic time for flat instances of bounded treewidth and bounded face size. To further strengthen the relevance of this result, we show that an algorithm with running time O(r(n)) for flat instances whose underlying graph has pathwidth 1 would result in an algorithm with running time O(r(n)) for flat instances and with running time $$O(r(n^2) + n^2)$$ O ( r ( n 2 ) + n 2 ) for general, possibly non-flat, instances.


2020 ◽  
Vol 29 (03) ◽  
pp. 2050004
Author(s):  
Hery Randriamaro

The Tutte polynomial is originally a bivariate polynomial which enumerates the colorings of a graph and of its dual graph. Ardila extended in 2007 the definition of the Tutte polynomial on the real hyperplane arrangements. He particularly computed the Tutte polynomials of the hyperplane arrangements associated to the classical Weyl groups. Those associated to the exceptional Weyl groups were computed by De Concini and Procesi one year later. This paper has two objectives: On the one side, we extend the Tutte polynomial computing to the complex hyperplane arrangements. On the other side, we introduce a wider class of hyperplane arrangements which is that of the symmetric hyperplane arrangements. Computing the Tutte polynomial of a symmetric hyperplane arrangement permits us to deduce the Tutte polynomials of some hyperplane arrangements, particularly of those associated to the imprimitive reflection groups.


2018 ◽  
Vol 10 (9) ◽  
pp. 3245 ◽  
Author(s):  
Tianxing Wu ◽  
Guilin Qi ◽  
Cheng Li ◽  
Meng Wang

With the continuous development of intelligent technologies, knowledge graph, the backbone of artificial intelligence, has attracted much attention from both academic and industrial communities due to its powerful capability of knowledge representation and reasoning. In recent years, knowledge graph has been widely applied in different kinds of applications, such as semantic search, question answering, knowledge management and so on. Techniques for building Chinese knowledge graphs are also developing rapidly and different Chinese knowledge graphs have been constructed to support various applications. Under the background of the “One Belt One Road (OBOR)” initiative, cooperating with the countries along OBOR on studying knowledge graph techniques and applications will greatly promote the development of artificial intelligence. At the same time, the accumulated experience of China in developing knowledge graphs is also a good reference to develop non-English knowledge graphs. In this paper, we aim to introduce the techniques of constructing Chinese knowledge graphs and their applications, as well as analyse the impact of knowledge graph on OBOR. We first describe the background of OBOR, and then introduce the concept and development history of knowledge graph and typical Chinese knowledge graphs. Afterwards, we present the details of techniques for constructing Chinese knowledge graphs, and demonstrate several applications of Chinese knowledge graphs. Finally, we list some examples to explain the potential impacts of knowledge graph on OBOR.


AI Magazine ◽  
2016 ◽  
Vol 37 (1) ◽  
pp. 63-72 ◽  
Author(s):  
C. Lawrence Zitnick ◽  
Aishwarya Agrawal ◽  
Stanislaw Antol ◽  
Margaret Mitchell ◽  
Dhruv Batra ◽  
...  

As machines have become more intelligent, there has been a renewed interest in methods for measuring their intelligence. A common approach is to propose tasks for which a human excels, but one which machines find difficult. However, an ideal task should also be easy to evaluate and not be easily gameable. We begin with a case study exploring the recently popular task of image captioning and its limitations as a task for measuring machine intelligence. An alternative and more promising task is Visual Question Answering that tests a machine’s ability to reason about language and vision. We describe a dataset unprecedented in size created for the task that contains over 760,000 human generated questions about images. Using around 10 million human generated answers, machines may be easily evaluated.


2020 ◽  
Vol 34 (07) ◽  
pp. 13041-13049 ◽  
Author(s):  
Luowei Zhou ◽  
Hamid Palangi ◽  
Lei Zhang ◽  
Houdong Hu ◽  
Jason Corso ◽  
...  

This paper presents a unified Vision-Language Pre-training (VLP) model. The model is unified in that (1) it can be fine-tuned for either vision-language generation (e.g., image captioning) or understanding (e.g., visual question answering) tasks, and (2) it uses a shared multi-layer transformer network for both encoding and decoding, which differs from many existing methods where the encoder and decoder are implemented using separate models. The unified VLP model is pre-trained on a large amount of image-text pairs using the unsupervised learning objectives of two tasks: bidirectional and sequence-to-sequence (seq2seq) masked vision-language prediction. The two tasks differ solely in what context the prediction conditions on. This is controlled by utilizing specific self-attention masks for the shared transformer network. To the best of our knowledge, VLP is the first reported model that achieves state-of-the-art results on both vision-language generation and understanding tasks, as disparate as image captioning and visual question answering, across three challenging benchmark datasets: COCO Captions, Flickr30k Captions, and VQA 2.0. The code and the pre-trained models are available at https://github.com/LuoweiZhou/VLP.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1652 ◽  
Author(s):  
Peida Wu ◽  
Ziguan Cui ◽  
Zongliang Gan ◽  
Feng Liu

In recent years, deep learning methods have been widely used in the hyperspectral image (HSI) classification tasks. Among them, spectral-spatial combined methods based on the three-dimensional (3-D) convolution have shown good performance. However, because of the three-dimensional convolution, increasing network depth will result in a dramatic rise in the number of parameters. In addition, the previous methods do not make full use of spectral information. They mostly use the data after dimensionality reduction directly as the input of networks, which result in poor classification ability in some categories with small numbers of samples. To address the above two issues, in this paper, we designed an end-to-end 3D-ResNeXt network which adopts feature fusion and label smoothing strategy further. On the one hand, the residual connections and split-transform-merge strategy can alleviate the declining-accuracy phenomenon and decrease the number of parameters. We can adjust the hyperparameter cardinality instead of the network depth to extract more discriminative features of HSIs and improve the classification accuracy. On the other hand, in order to improve the classification accuracies of classes with small numbers of samples, we enrich the input of the 3D-ResNeXt spectral-spatial feature learning network by additional spectral feature learning, and finally use a loss function modified by label smoothing strategy to solve the imbalance of classes. The experimental results on three popular HSI datasets demonstrate the superiority of our proposed network and an effective improvement in the accuracies especially for the classes with small numbers of training samples.


2013 ◽  
Vol 441 ◽  
pp. 738-741 ◽  
Author(s):  
Shuo Ding ◽  
Xiao Heng Chang ◽  
Qing Hui Wu

The network model of probabilistic neural network and its method of pattern classification and discrimination are first introduced in this paper. Then probabilistic neural network and three usually used back propagation neural networks are established through MATLAB7.0. The pattern classification of dots on a two-dimensional plane is taken as an example. Probabilistic neural network and improved back propagation neural networks are used to classify these dots respectively. Their classification results are compared with each other. The simulation results show that compared with back propagation neural networks, probabilistic neural network has simpler learning rules, faster training speed and it needs fewer training samples; the pattern classification method based on probabilistic neural network is very effective, and it is superior to the one based on back propagation neural networks in classifying speed, accuracy as well as generalization ability.


Sign in / Sign up

Export Citation Format

Share Document