scholarly journals Representation Learning for Scene Graph Completion via Jointly Structural and Visual Embedding

Author(s):  
Hai Wan ◽  
Yonghao Luo ◽  
Bo Peng ◽  
Wei-Shi Zheng

This paper focuses on scene graph completion which aims at predicting new relations between two entities utilizing existing scene graphs and images. By comparing with the well-known knowledge graph, we first identify that each scene graph is associated with an image and each entity of a visual triple in a scene graph is composed of its entity type with attributes and grounded with a bounding box in its corresponding image. We then propose an end-to-end model named Representation Learning via Jointly Structural and Visual Embedding (RLSV) to take advantages of structural and visual information in scene graphs. In RLSV model, we provide a fully-convolutional module to extract the visual embeddings of a visual triple and apply hierarchical projection to combine the structural and visual embeddings of a visual triple. In experiments, we evaluate our model on two scene graph completion tasks: link prediction and visual triple classification, and further analyze by case studies. Experimental results demonstrate that our model outperforms all baselines in both tasks, which justifies the significance of combining structural and visual information for scene graph completion.

Author(s):  
Ruobing Xie ◽  
Zhiyuan Liu ◽  
Huanbo Luan ◽  
Maosong Sun

Entity images could provide significant visual information for knowledge representation learning. Most conventional methods learn knowledge representations merely from structured triples, ignoring rich visual information extracted from entity images. In this paper, we propose a novel Image-embodied Knowledge Representation Learning model (IKRL), where knowledge representations are learned with both triple facts and images. More specifically, we first construct representations for all images of an entity with a neural image encoder. These image representations are then integrated into an aggregated image-based representation via an attention-based method. We evaluate our IKRL models on knowledge graph completion and triple classification. Experimental results demonstrate that our models outperform all baselines on both tasks, which indicates the significance of visual information for knowledge representations and the capability of our models in learning knowledge representations with images.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Luogeng Tian ◽  
Bailong Yang ◽  
Xinli Yin ◽  
Kai Kang ◽  
Jing Wu

In the past, most of the entity prediction methods based on embedding lacked the training of local core relationships, resulting in a deficiency in the end-to-end training. Aiming at this problem, we propose an end-to-end knowledge graph embedding representation method. It involves local graph convolution and global cross learning in this paper, which is called the TransC graph convolutional network (TransC-GCN). Firstly, multiple local semantic spaces are divided according to the largest neighbor. Secondly, a translation model is used to map the local entities and relationships into a cross vector, which serves as the input of GCN. Thirdly, through training and learning of local semantic relations, the best entities and strongest relations are found. The optimal entity relation combination ranking is obtained by evaluating the posterior loss function based on the mutual information entropy. Experiments show that this paper can obtain local entity feature information more accurately through the convolution operation of the lightweight convolutional neural network. Also, the maximum pooling operation helps to grasp the strong signal on the local feature, thereby avoiding the globally redundant feature. Compared with the mainstream triad prediction baseline model, the proposed algorithm can effectively reduce the computational complexity while achieving strong robustness. It also increases the inference accuracy of entities and relations by 8.1% and 4.4%, respectively. In short, this new method can not only effectively extract the local nodes and relationship features of the knowledge graph but also satisfy the requirements of multilayer penetration and relationship derivation of a knowledge graph.


2022 ◽  
Vol 40 (3) ◽  
pp. 1-28
Author(s):  
Surong Yan ◽  
Kwei-Jay Lin ◽  
Xiaolin Zheng ◽  
Haosen Wang

Explicit and implicit knowledge about users and items have been used to describe complex and heterogeneous side information for recommender systems (RSs). Many existing methods use knowledge graph embedding (KGE) to learn the representation of a user-item knowledge graph (KG) in low-dimensional space. In this article, we propose a lightweight end-to-end joint learning framework for fusing the tasks of KGE and RSs at the model level. Our method proposes a lightweight KG embedding method by using bidirectional bijection relation-type modeling to enable scalability for large graphs while using self-adaptive negative sampling to optimize negative sample generating. Our method further generates the integrated views for users and items based on relation-types to explicitly model users’ preferences and items’ features, respectively. Finally, we add virtual “recommendation” relations between the integrated views of users and items to model the preferences of users on items, seamlessly integrating RS with user-item KG over a unified graph. Experimental results on multiple datasets and benchmarks show that our method can achieve a better accuracy of recommendation compared with existing state-of-the-art methods. Complexity and runtime analysis suggests that our method can gain a lower time and space complexity than most of existing methods and improve scalability.


2020 ◽  
Author(s):  
Ellen Yi-Ge ◽  
Rui Fan ◽  
Zechun Liu ◽  
Zhiqiang Shen

<div>Keypoints of objects reflect their concise abstractions, while the corresponding connection links (CL) build the skeleton by detecting the intrinsic relations between keypoints. Existing approaches are typically computationally-intensive, inapplicable for instances belonging to multiple classes, and/or infeasible to simultaneously encode connection information. To address the aforementioned issues, we propose an end-to-end category-implicit Keypoint and Link Prediction Network (KLPNet), which is the first approach for simultaneous semantic keypoint detection (for multi-class instances) and CL rejuvenation. In our KLPNet, a novel Conditional Link Prediction Graph is proposed for link prediction among keypoints that are contingent on a predefined category. Furthermore, a Cross-stage Keypoint Localization Module (CKLM) is introduced to explore feature aggregation for coarse-to-fine keypoint localization. Comprehensive experiments conducted on three publicly available benchmarks demonstrate that our KLPNet consistently outperforms all other state-of-the-art approaches. Furthermore, the experimental results of CL prediction also show the effectiveness of our KLPNet with respect to occlusion problems.</div>


Author(s):  
Zhihao Fan ◽  
Zhongyu Wei ◽  
Siyuan Wang ◽  
Ruize Wang ◽  
Zejun Li ◽  
...  

Existing research for image captioning usually represents an image using a scene graph with low-level facts (objects and relations) and fails to capture the high-level semantics. In this paper, we propose a Theme Concepts extended Image Captioning (TCIC) framework that incorporates theme concepts to represent high-level cross-modality semantics. In practice, we model theme concepts as memory vectors and propose Transformer with Theme Nodes (TTN) to incorporate those vectors for image captioning. Considering that theme concepts can be learned from both images and captions, we propose two settings for their representations learning based on TTN. On the vision side, TTN is configured to take both scene graph based features and theme concepts as input for visual representation learning. On the language side, TTN is configured to take both captions and theme concepts as input for text representation re-construction. Both settings aim to generate target captions with the same transformer-based decoder. During the training, we further align representations of theme concepts learned from images and corresponding captions to enforce the cross-modality learning. Experimental results on MS COCO show the effectiveness of our approach compared to some state-of-the-art models.


Author(s):  
Xinhua Suo ◽  
Bing Guo ◽  
Yan Shen ◽  
Wei Wang ◽  
Yaosen Chen ◽  
...  

Knowledge representation learning (knowledge graph embedding) plays a critical role in the application of knowledge graph construction. The multi-source information knowledge representation learning, which is one class of the most promising knowledge representation learning at present, mainly focuses on learning a large number of useful additional information of entities and relations in the knowledge graph into their embeddings, such as the text description information, entity type information, visual information, graph structure information, etc. However, there is a kind of simple but very common information — the number of an entity’s relations which means the number of an entity’s semantic types has been ignored. This work proposes a multi-source knowledge representation learning model KRL-NER, which embodies information of the number of an entity’s relations between entities into the entities’ embeddings through the attention mechanism. Specifically, first of all, we design and construct a submodel of the KRL-NER LearnNER which learns an embedding including the information on the number of an entity’s relations; then, we obtain a new embedding by exerting attention onto the embedding learned by the models such as TransE with this embedding; finally, we translate based onto the new embedding. Experiments, such as related tasks on knowledge graph: entity prediction, entity prediction under different relation types, and triple classification, are carried out to verify our model. The results show that our model is effective on the large-scale knowledge graphs, e.g. FB15K.


2020 ◽  
Vol 34 (01) ◽  
pp. 841-848
Author(s):  
Farzan Masrour ◽  
Tyler Wilson ◽  
Heng Yan ◽  
Pang-Ning Tan ◽  
Abdol Esfahanian

Link prediction is an important task in online social networking as it can be used to infer new or previously unknown relationships of a network. However, due to the homophily principle, current algorithms are susceptible to promoting links that may lead to increase segregation of the network—an effect known as filter bubble. In this study, we examine the filter bubble problem from the perspective of algorithm fairness and introduce a dyadic-level fairness criterion based on network modularity measure. We show how the criterion can be utilized as a postprocessing step to generate more heterogeneous links in order to overcome the filter bubble problem. In addition, we also present a novel framework that combines adversarial network representation learning with supervised link prediction to alleviate the filter bubble problem. Experimental results conducted on several real-world datasets showed the effectiveness of the proposed methods compared to other baseline approaches, which include conventional link prediction and fairness-aware methods for i.i.d data.


2021 ◽  
Vol 15 ◽  
Author(s):  
Yichen Song ◽  
Aiping Li ◽  
Hongkui Tu ◽  
Kai Chen ◽  
Chenchen Li

With the rapid development of artificial intelligence, Cybernetics, and other High-tech subject technology, robots have been made and used in increasing fields. And studies on robots have attracted growing research interests from different communities. The knowledge graph can act as the brain of a robot and provide intelligence, to support the interaction between the robot and the human beings. Although the large-scale knowledge graphs contain a large amount of information, they are still incomplete compared with real-world knowledge. Most existing methods for knowledge graph completion focus on entity representation learning. However, the importance of relation representation learning is ignored, as well as the cross-interaction between entities and relations. In this paper, we propose an encoder-decoder model which embeds the interaction between entities and relations, and adds a gate mechanism to control the attention mechanism. Experimental results show that our method achieves better link prediction performance than state-of-the-art embedding models on two benchmark datasets, WN18RR and FB15k-237.


Sign in / Sign up

Export Citation Format

Share Document