Syntactic Representation Learning For Neural Network Based TTS with Syntactic Parse Tree Traversal

Facial action unit (AU) recognition is a crucial task for facial expressions analysis and has attracted extensive attention in the field of artificial intelligence and computer vision. Existing works have either focused on designing or learning complex regional feature representations, or delved into various types of AU relationship modeling. Albeit with varying degrees of progress, it is still arduous for existing methods to handle complex situations. In this paper, we investigate how to integrate the semantic relationship propagation between AUs in a deep neural network framework to enhance the feature representation of facial regions, and propose an AU semantic relationship embedded representation learning (SRERL) framework. Specifically, by analyzing the symbiosis and mutual exclusion of AUs in various facial expressions, we organize the facial AUs in the form of structured knowledge-graph and integrate a Gated Graph Neural Network (GGNN) in a multi-scale CNN framework to propagate node information through the graph for generating enhanced AU representation. As the learned feature involves both the appearance characteristics and the AU relationship reasoning, the proposed model is more robust and can cope with more challenging cases, e.g., illumination change and partial occlusion. Extensive experiments on the two public benchmarks demonstrate that our method outperforms the previous work and achieves state of the art performance.

Download Full-text

Unsupervised Representation Learning with Deep Convolutional Neural Network for Remote Sensing Images

Lecture Notes in Computer Science - Image and Graphics ◽

10.1007/978-3-319-71589-6_9 ◽

2017 ◽

pp. 97-108 ◽

Cited By ~ 17

Author(s):

Yang Yu ◽

Zhiqiang Gong ◽

Ping Zhong ◽

Jiaxin Shan

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Convolutional Neural Network ◽

Representation Learning ◽

Deep Convolutional Neural Network ◽

Remote Sensing Images

Download Full-text

Generalized Sparse Convolutional Neural Networks for Semantic Segmentation of Point Clouds Derived from Tri-Stereo Satellite Imagery

Remote Sensing ◽

10.3390/rs12081289 ◽

2020 ◽

Vol 12 (8) ◽

pp. 1289

Author(s):

Stefan Bachhofner ◽

Ana-Maria Loghin ◽

Johannes Otepka ◽

Norbert Pfeifer ◽

Michael Hornacek ◽

...

Keyword(s):

Neural Network ◽

Decision Tree ◽

Convolutional Neural Network ◽

Semantic Segmentation ◽

Point Clouds ◽

Representation Learning ◽

Geometric Features ◽

Color Information ◽

Geometric Information ◽

Learning Technique

We studied the applicability of point clouds derived from tri-stereo satellite imagery for semantic segmentation for generalized sparse convolutional neural networks by the example of an Austrian study area. We examined, in particular, if the distorted geometric information, in addition to color, influences the performance of segmenting clutter, roads, buildings, trees, and vehicles. In this regard, we trained a fully convolutional neural network that uses generalized sparse convolution one time solely on 3D geometric information (i.e., 3D point cloud derived by dense image matching), and twice on 3D geometric as well as color information. In the first experiment, we did not use class weights, whereas in the second we did. We compared the results with a fully convolutional neural network that was trained on a 2D orthophoto, and a decision tree that was once trained on hand-crafted 3D geometric features, and once trained on hand-crafted 3D geometric as well as color features. The decision tree using hand-crafted features has been successfully applied to aerial laser scanning data in the literature. Hence, we compared our main interest of study, a representation learning technique, with another representation learning technique, and a non-representation learning technique. Our study area is located in Waldviertel, a region in Lower Austria. The territory is a hilly region covered mainly by forests, agriculture, and grasslands. Our classes of interest are heavily unbalanced. However, we did not use any data augmentation techniques to counter overfitting. For our study area, we reported that geometric and color information only improves the performance of the Generalized Sparse Convolutional Neural Network (GSCNN) on the dominant class, which leads to a higher overall performance in our case. We also found that training the network with median class weighting partially reverts the effects of adding color. The network also started to learn the classes with lower occurrences. The fully convolutional neural network that was trained on the 2D orthophoto generally outperforms the other two with a kappa score of over 90% and an average per class accuracy of 61%. However, the decision tree trained on colors and hand-crafted geometric features has a 2% higher accuracy for roads.

Download Full-text

An Attention-Based Graph Neural Network for Heterogeneous Structural Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5833 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4132-4139

Author(s):

Huiting Hong ◽

Hantao Guo ◽

Yucheng Lin ◽

Xiaoqing Yang ◽

Zang Li ◽

...

Keyword(s):

Neural Network ◽

Structural Information ◽

Representation Learning ◽

Graph Representation ◽

Heterogeneous Information ◽

Domain Experts ◽

Proposed Model ◽

Meta Path ◽

Low Dimensional ◽

Public Datasets

In this paper, we focus on graph representation learning of heterogeneous information network (HIN), in which various types of vertices are connected by various types of relations. Most of the existing methods conducted on HIN revise homogeneous graph embedding models via meta-paths to learn low-dimensional vector space of HIN. In this paper, we propose a novel Heterogeneous Graph Structural Attention Neural Network (HetSANN) to directly encode structural information of HIN without meta-path and achieve more informative representations. With this method, domain experts will not be needed to design meta-path schemes and the heterogeneous information can be processed automatically by our proposed model. Specifically, we implicitly represent heterogeneous information using the following two methods: 1) we model the transformation between heterogeneous vertices through a projection in low-dimensional entity spaces; 2) afterwards, we apply the graph neural network to aggregate multi-relational information of projected neighborhood by means of attention mechanism. We also present three extensions of HetSANN, i.e., voices-sharing product attention for the pairwise relationships in HIN, cycle-consistency loss to retain the transformation between heterogeneous entity spaces, and multi-task learning with full use of information. The experiments conducted on three public datasets demonstrate that our proposed models achieve significant and consistent improvements compared to state-of-the-art solutions.

Download Full-text

A Graph Regularized Deep Neural Network for Unsupervised Image Representation Learning

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr.2017.746 ◽

2017 ◽

Cited By ~ 13

Author(s):

Shijie Yang ◽

Liang Li ◽

Shuhui Wang ◽

Weigang Zhang ◽

Qingming Huang

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Image Representation ◽

Representation Learning

Download Full-text

Unsupervised Inductive Graph-Level Representation Learning via Graph-Graph Proximity

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/275 ◽

2019 ◽

Cited By ~ 5

Author(s):

Yunsheng Bai ◽

Hao Ding ◽

Yang Qiao ◽

Agustin Marinovic ◽

Ken Gu ◽

...

Keyword(s):

Neural Network ◽

Vector Space ◽

General Framework ◽

Representation Learning ◽

Generation Mechanism ◽

Graph Visualization ◽

Graph Classification ◽

Training Set ◽

Multi Scale ◽

Novel Approach

We introduce a novel approach to graph-level representation learning, which is to embed an entire graph into a vector space where the embeddings of two graphs preserve their graph-graph proximity. Our approach, UGraphEmb, is a general framework that provides a novel means to performing graph-level embedding in a completely unsupervised and inductive manner. The learned neural network can be considered as a function that receives any graph as input, either seen or unseen in the training set, and transforms it into an embedding. A novel graph-level embedding generation mechanism called Multi-Scale Node Attention (MSNA), is proposed. Experiments on five real graph datasets show that UGraphEmb achieves competitive accuracy in the tasks of graph classification, similarity ranking, and graph visualization.

Download Full-text

Domain Adaptation and Domain Generalization with Representation Learning

10.26686/wgtn.17014700 ◽

2021 ◽

Author(s):

◽

Muhammad Ghifary

Keyword(s):

Neural Network ◽

Object Recognition ◽

Domain Adaptation ◽

State Of The Art ◽

Representation Learning ◽

Training Data ◽

Data Representations ◽

Source Data ◽

Target Environment ◽

Target Data

<p>Machine learning has achieved great successes in the area of computer vision, especially in object recognition or classification. One of the core factors of the successes is the availability of massive labeled image or video data for training, collected manually by human. Labeling source training data, however, can be expensive and time consuming. Furthermore, a large amount of labeled source data may not always guarantee traditional machine learning techniques to generalize well; there is a potential bias or mismatch in the data, i.e., the training data do not represent the target environment. To mitigate the above dataset bias/mismatch, one can consider domain adaptation: utilizing labeled training data and unlabeled target data to develop a well-performing classifier on the target environment. In some cases, however, the unlabeled target data are nonexistent, but multiple labeled sources of data exist. Such situations can be addressed by domain generalization: using multiple source training sets to produce a classifier that generalizes on the unseen target domain. Although several domain adaptation and generalization approaches have been proposed, the domain mismatch in object recognition remains a challenging, open problem – the model performance has yet reached to a satisfactory level in real world applications. The overall goal of this thesis is to progress towards solving dataset bias in visual object recognition through representation learning in the context of domain adaptation and domain generalization. Representation learning is concerned with finding proper data representations or features via learning rather than via engineering by human experts. This thesis proposes several representation learning solutions based on deep learning and kernel methods. This thesis introduces a robust-to-noise deep neural network for handwritten digit classification trained on “clean” images only, which we name Deep Hybrid Network (DHN). DHNs are based on a particular combination of sparse autoencoders and restricted Boltzmann machines. The results show that DHN performs better than the standard deep neural network in recognizing digits with Gaussian and impulse noise, block and border occlusions. This thesis proposes the Domain Adaptive Neural Network (DaNN), a neural network based domain adaptation algorithm that minimizes the classification error and the domain discrepancy between the source and target data representations. The experiments show the competitiveness of DaNN against several state-of-the-art methods on a benchmark object dataset. This thesis develops the Multi-task Autoencoder (MTAE), a domain generalization algorithm based on autoencoders trained via multi-task learning. MTAE learns to transform the original image into its analogs in multiple related domains simultaneously. The results show that the MTAE’s representations provide better classification performance than some alternative autoencoder-based models as well as the current state-of-the-art domain generalization algorithms. This thesis proposes a fast kernel-based representation learning algorithm for both domain adaptation and domain generalization, Scatter Component Analysis (SCA). SCA finds a data representation that trades between maximizing the separability of classes, minimizing the mismatch between domains, and maximizing the separability of the whole data points. The results show that SCA performs much faster than some competitive algorithms, while providing state-of-the-art accuracy in both domain adaptation and domain generalization. Finally, this thesis presents the Deep Reconstruction-Classification Network (DRCN), a deep convolutional network for domain adaptation. DRCN learns to classify labeled source data and also to reconstruct unlabeled target data via a shared encoding representation. The results show that DRCN provides competitive or better performance than the prior state-of-the-art model on several cross-domain object datasets.</p>

Download Full-text

A comparison of neural network methods for unsupervised representation learning on the zero resource speech challenge

10.21437/interspeech.2015-644 ◽

2015 ◽

Author(s):

Daniel Renshaw ◽

Herman Kamper ◽

Aren Jansen ◽

Sharon Goldwater

Keyword(s):

Neural Network ◽

Representation Learning ◽

Network Methods

Download Full-text