scholarly journals Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries

2020 ◽  
Vol 34 (07) ◽  
pp. 12152-12159
Author(s):  
Hao Wang ◽  
Cheng Deng ◽  
Fan Ma ◽  
Yi Yang

Actor and action video segmentation with language queries aims to segment out the expression referred objects in the video. This process requires comprehensive language reasoning and fine-grained video understanding. Previous methods mainly leverage dynamic convolutional networks to match visual and semantic representations. However, the dynamic convolution neglects spatial context when processing each region in the frame and is thus challenging to segment similar objects in the complex scenarios. To address such limitation, we construct a context modulated dynamic convolutional network. Specifically, we propose a context modulated dynamic convolutional operation in the proposed framework. The kernels for the specific region are generated from both language sentences and surrounding context features. Moreover, we devise a temporal encoder to incorporate motions into the visual features to further match the query descriptions. Extensive experiments on two benchmark datasets, Actor-Action Dataset Sentences (A2D Sentences) and J-HMDB Sentences, demonstrate that our proposed approach notably outperforms state-of-the-art methods.

2020 ◽  
Vol 34 (07) ◽  
pp. 11045-11052
Author(s):  
Linjiang Huang ◽  
Yan Huang ◽  
Wanli Ouyang ◽  
Liang Wang

Recently, graph convolutional networks have achieved remarkable performance for skeleton-based action recognition. In this work, we identify a problem posed by the GCNs for skeleton-based action recognition, namely part-level action modeling. To address this problem, a novel Part-Level Graph Convolutional Network (PL-GCN) is proposed to capture part-level information of skeletons. Different from previous methods, the partition of body parts is learnable rather than manually defined. We propose two part-level blocks, namely Part Relation block (PR block) and Part Attention block (PA block), which are achieved by two differentiable operations, namely graph pooling operation and graph unpooling operation. The PR block aims at learning high-level relations between body parts while the PA block aims at highlighting the important body parts in the action. Integrating the original GCN with the two blocks, the PL-GCN can learn both part-level and joint-level information of the action. Extensive experiments on two benchmark datasets show the state-of-the-art performance on skeleton-based action recognition and demonstrate the effectiveness of the proposed method.


2021 ◽  
Vol 11 (15) ◽  
pp. 6975
Author(s):  
Tao Zhang ◽  
Lun He ◽  
Xudong Li ◽  
Guoqing Feng

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.


Author(s):  
Zhichao Huang ◽  
Xutao Li ◽  
Yunming Ye ◽  
Michael K. Ng

Graph Convolutional Networks (GCNs) have been extensively studied in recent years. Most of existing GCN approaches are designed for the homogenous graphs with a single type of relation. However, heterogeneous graphs of multiple types of relations are also ubiquitous and there is a lack of methodologies to tackle such graphs. Some previous studies address the issue by performing conventional GCN on each single relation and then blending their results. However, as the convolutional kernels neglect the correlations across relations, the strategy is sub-optimal. In this paper, we propose the Multi-Relational Graph Convolutional Network (MR-GCN) framework by developing a novel convolution operator on multi-relational graphs. In particular, our multi-dimension convolution operator extends the graph spectral analysis into the eigen-decomposition of a Laplacian tensor. And the eigen-decomposition is formulated with a generalized tensor product, which can correspond to any unitary transform instead of limited merely to Fourier transform. We conduct comprehensive experiments on four real-world multi-relational graphs to solve the semi-supervised node classification task, and the results show the superiority of MR-GCN against the state-of-the-art competitors.


Sensors ◽  
2020 ◽  
Vol 20 (14) ◽  
pp. 3818
Author(s):  
Ye Zhang ◽  
Yi Hou ◽  
Shilin Zhou ◽  
Kewei Ouyang

Recent advances in time series classification (TSC) have exploited deep neural networks (DNN) to improve the performance. One promising approach encodes time series as recurrence plot (RP) images for the sake of leveraging the state-of-the-art DNN to achieve accuracy. Such an approach has been shown to achieve impressive results, raising the interest of the community in it. However, it remains unsolved how to handle not only the variability in the distinctive region scale and the length of sequences but also the tendency confusion problem. In this paper, we tackle the problem using Multi-scale Signed Recurrence Plots (MS-RP), an improvement of RP, and propose a novel method based on MS-RP images and Fully Convolutional Networks (FCN) for TSC. This method first introduces phase space dimension and time delay embedding of RP to produce multi-scale RP images; then, with the use of asymmetrical structure, constructed RP images can represent very long sequences (>700 points). Next, MS-RP images are obtained by multiplying designed sign masks in order to remove the tendency confusion. Finally, FCN is trained with MS-RP images to perform classification. Experimental results on 45 benchmark datasets demonstrate that our method improves the state-of-the-art in terms of classification accuracy and visualization evaluation.


Algorithms ◽  
2020 ◽  
Vol 13 (3) ◽  
pp. 60 ◽  
Author(s):  
Wen Liu ◽  
Yankui Sun ◽  
Qingge Ji

Optical coherence tomography (OCT) is an optical high-resolution imaging technique for ophthalmic diagnosis. In this paper, we take advantages of multi-scale input, multi-scale side output and dual attention mechanism and present an enhanced nested U-Net architecture (MDAN-UNet), a new powerful fully convolutional network for automatic end-to-end segmentation of OCT images. We have evaluated two versions of MDAN-UNet (MDAN-UNet-16 and MDAN-UNet-32) on two publicly available benchmark datasets which are the Duke Diabetic Macular Edema (DME) dataset and the RETOUCH dataset, in comparison with other state-of-the-art segmentation methods. Our experiment demonstrates that MDAN-UNet-32 achieved the best performance, followed by MDAN-UNet-16 with smaller parameter, for multi-layer segmentation and multi-fluid segmentation respectively.


2020 ◽  
Vol 34 (07) ◽  
pp. 11547-11554
Author(s):  
Bo Liu ◽  
Qiulei Dong ◽  
Zhanyi Hu

Recently, many zero-shot learning (ZSL) methods focused on learning discriminative object features in an embedding feature space, however, the distributions of the unseen-class features learned by these methods are prone to be partly overlapped, resulting in inaccurate object recognition. Addressing this problem, we propose a novel adversarial network to synthesize compact semantic visual features for ZSL, consisting of a residual generator, a prototype predictor, and a discriminator. The residual generator is to generate the visual feature residual, which is integrated with a visual prototype predicted via the prototype predictor for synthesizing the visual feature. The discriminator is to distinguish the synthetic visual features from the real ones extracted from an existing categorization CNN. Since the generated residuals are generally numerically much smaller than the distances among all the prototypes, the distributions of the unseen-class features synthesized by the proposed network are less overlapped. In addition, considering that the visual features from categorization CNNs are generally inconsistent with their semantic features, a simple feature selection strategy is introduced for extracting more compact semantic visual features. Extensive experimental results on six benchmark datasets demonstrate that our method could achieve a significantly better performance than existing state-of-the-art methods by ∼1.2-13.2% in most cases.


Author(s):  
Liang Yang ◽  
Zesheng Kang ◽  
Xiaochun Cao ◽  
Di Jin ◽  
Bo Yang ◽  
...  

In the past few years, semi-supervised node classification in attributed network has been developed rapidly. Inspired by the success of deep learning, researchers adopt the convolutional neural network to develop the Graph Convolutional Networks (GCN), and they have achieved surprising classification accuracy by considering the topological information and employing the fully connected network (FCN). However, the given network topology may also induce a performance degradation if it is directly employed in classification, because it may possess high sparsity and certain noises. Besides, the lack of learnable filters in GCN also limits the performance. In this paper, we propose a novel Topology Optimization based Graph Convolutional Networks (TO-GCN) to fully utilize the potential information by jointly refining the network topology and learning the parameters of the FCN. According to our derivations, TO-GCN is more flexible than GCN, in which the filters are fixed and only the classifier can be updated during the learning process. Extensive experiments on real attributed networks demonstrate the superiority of the proposed TO-GCN against the state-of-the-art approaches.


Author(s):  
Xiaotong Zhang ◽  
Han Liu ◽  
Qimai Li ◽  
Xiao-Ming Wu

Attributed graph clustering is challenging as it requires joint modelling of graph structures and node attributes. Recent progress on graph convolutional networks has proved that graph convolution is effective in combining structural and content information, and several recent methods based on it have achieved promising clustering performance on some real attributed networks. However, there is limited understanding of how graph convolution affects clustering performance and how to properly use it to optimize performance for different graphs. Existing methods essentially use graph convolution of a fixed and low order that only takes into account neighbours within a few hops of each node, which underutilizes node relations and ignores the diversity of graphs. In this paper, we propose an adaptive graph convolution method for attributed graph clustering that exploits high-order graph convolution to capture global cluster structure and adaptively selects the appropriate order for different graphs. We establish the validity of our method by theoretical analysis and extensive experiments on benchmark datasets. Empirical results show that our method compares favourably with state-of-the-art methods.


Author(s):  
Shengqiong Wu ◽  
Hao Fei ◽  
Yafeng Ren ◽  
Donghong Ji ◽  
Jingye Li

In this paper, we propose to enhance the pair-wise aspect and opinion terms extraction (PAOTE) task by incorporating rich syntactic knowledge. We first build a syntax fusion encoder for encoding syntactic features, including a label-aware graph convolutional network (LAGCN) for modeling the dependency edges and labels, as well as the POS tags unifiedly, and a local-attention module encoding POS tags for better term boundary detection. During pairing, we then adopt Biaffine and Triaffine scoring for high-order aspect-opinion term pairing, in the meantime re-harnessing the syntax-enriched representations in LAGCN for syntactic-aware scoring. Experimental results on four benchmark datasets demonstrate that our model outperforms current state-of-the-art baselines, meanwhile yielding explainable predictions with syntactic knowledge.


2020 ◽  
Vol 10 (12) ◽  
pp. 4081
Author(s):  
Zhe Wang ◽  
Chun-Hua Wu ◽  
Qing-Biao Li ◽  
Bo Yan ◽  
Kang-Feng Zheng

Personality recognition is a classic and important problem in social engineering. Due to the small number and particularity of personality recognition databases, only limited research has explored convolutional neural networks for this task. In this paper, we explore the use of graph convolutional network techniques for inferring a user’s personality traits from their Facebook status updates or essay information. Since the basic five personality traits (such as openness) and their aspects (such as status information) are related to a wide range of text features, this work takes the Big Five personality model as the core of the study. We construct a single user personality graph for the corpus based on user-document relations, document-word relations, and word co-occurrence and then learn the personality graph convolutional networks (personality GCN) for the user. The parameters or the inputs of our personality GCN are initialized with a one-hot representation for users, words and documents; then, under the supervision of users and documents with known class labels, it jointly learns the embeddings for users, words, and documents. We used feature information sharing to incorporate the correlation between the five personality traits into personality recognition to perfect the personality GCN. Our experimental results on two public and authoritative benchmark datasets show that the general personality GCN without any external word embeddings or knowledge is superior to the state-of-the-art methods for personality recognition. The personality GCN method is efficient on small datasets, and the average F1-score and accuracy of personality recognition are improved by up to approximately 3.6% and 2.4–2.57%, respectively.


Sign in / Sign up

Export Citation Format

Share Document