scholarly journals Adversarial Learning Based Semantic Correlation Representation for Cross-Modal Retrieval

Author(s):  
Lei Zhu ◽  
Jiayu Song ◽  
Xiangxiang Wei ◽  
Long Jun

With the rapid development of Internet and the widely usage of smart devices, massive multimedia data are generated, collected, stored and shared on the Internet. This trend makes cross-modal retrieval problem become a hot issue in this years. Many existing works pay attentions on correlation learning to generate a common subspace for cross-modal correlation measurement, and others uses adversarial learning technique to abate the heterogeneity of multi-modal data. However, very few works combine correlation learning and adversarial learning to bridge the inter-modal semantic gap and diminish cross-modal heterogeneity. This paper propose a novel cross-modal retrieval method, named ALSCOR, which is an end-to-end framework to integrate cross-modal representation learning, correlation learning and adversarial. CCA model, accompanied by two representation model, VisNet and TxtNet is proposed to capture non-linear correlation. Beside, intra-modal classifier and modality classifier are used to learn intra-modal discrimination and minimize the inter-modal heterogeneity. Comprehensive experiments are conducted on three benchmark datasets. The results demonstrate that the proposed ALSCOR has better performance than the state-of-the-arts.

2017 ◽  
Vol 11 (02) ◽  
pp. 209-227 ◽  
Author(s):  
Yilin Yan ◽  
Mei-Ling Shyu

In the past decades, we have witnessed an explosion of multimedia data, especially with the development of social media websites and blooming popularity of smart devices. As a result, multimedia semantic concept mining and retrieval whose objective is to mine useful information from the large amount of multimedia data including texts, images, and videos has become more and more important. The huge amount of multimedia data and the semantic gap between low-level features and high-level semantic concepts have made it even more challenging. To address these challenges, the correlations among the classes can provide important context cues to help bridge the semantic gap. Meanwhile, many real-world datasets do not have uniform class distributions while the minority instances actually represent the concept of interests, like frauds in transactions, intrusions in network security, and unusual events in surveillance. Despite extensive research efforts, imbalanced concept retrieval remains one of the most challenging research problems in multimedia data mining. Different from existing frameworks regarding concept correlations among labels, this paper presents a novel concept correlation analysis model using the correlation between the retrieval scores and labels. Experimental results on the TRECVID benchmark datasets demonstrate that the proposed framework can enhance imbalanced concept mining and retrieval even with trivial scores from the minority class.


2021 ◽  
Vol 15 ◽  
Author(s):  
Yichen Song ◽  
Aiping Li ◽  
Hongkui Tu ◽  
Kai Chen ◽  
Chenchen Li

With the rapid development of artificial intelligence, Cybernetics, and other High-tech subject technology, robots have been made and used in increasing fields. And studies on robots have attracted growing research interests from different communities. The knowledge graph can act as the brain of a robot and provide intelligence, to support the interaction between the robot and the human beings. Although the large-scale knowledge graphs contain a large amount of information, they are still incomplete compared with real-world knowledge. Most existing methods for knowledge graph completion focus on entity representation learning. However, the importance of relation representation learning is ignored, as well as the cross-interaction between entities and relations. In this paper, we propose an encoder-decoder model which embeds the interaction between entities and relations, and adds a gate mechanism to control the attention mechanism. Experimental results show that our method achieves better link prediction performance than state-of-the-art embedding models on two benchmark datasets, WN18RR and FB15k-237.


2022 ◽  
Vol 40 (2) ◽  
pp. 1-26
Author(s):  
Chengyuan Zhang ◽  
Yang Wang ◽  
Lei Zhu ◽  
Jiayu Song ◽  
Hongzhi Yin

With the rapid development of online social recommendation system, substantial methods have been proposed. Unlike traditional recommendation system, social recommendation performs by integrating social relationship features, where there are two major challenges, i.e., early summarization and data sparsity. Thus far, they have not been solved effectively. In this article, we propose a novel social recommendation approach, namely Multi-Graph Heterogeneous Interaction Fusion (MG-HIF), to solve these two problems. Our basic idea is to fuse heterogeneous interaction features from multi-graphs, i.e., user–item bipartite graph and social relation network, to improve the vertex representation learning. A meta-path cross-fusion model is proposed to fuse multi-hop heterogeneous interaction features via discrete cross-correlations. Based on that, a social relation GAN is developed to explore latent friendships of each user. We further fuse representations from two graphs by a novel multi-graph information fusion strategy with attention mechanism. To the best of our knowledge, this is the first work to combine meta-path with social relation representation. To evaluate the performance of MG-HIF, we compare MG-HIF with seven states of the art over four benchmark datasets. The experimental results show that MG-HIF achieves better performance.


Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4666
Author(s):  
Zhiqiang Pan ◽  
Honghui Chen

Collaborative filtering (CF) aims to make recommendations for users by detecting user’s preference from the historical user–item interactions. Existing graph neural networks (GNN) based methods achieve satisfactory performance by exploiting the high-order connectivity between users and items, however they suffer from the poor training efficiency problem and easily introduce bias for information propagation. Moreover, the widely applied Bayesian personalized ranking (BPR) loss is insufficient to provide supervision signals for training due to the extremely sparse observed interactions. To deal with the above issues, we propose the Efficient Graph Collaborative Filtering (EGCF) method. Specifically, EGCF adopts merely one-layer graph convolution to model the collaborative signal for users and items from the first-order neighbors in the user–item interactions. Moreover, we introduce contrastive learning to enhance the representation learning of users and items by deriving the self-supervisions, which is jointly trained with the supervised learning. Extensive experiments are conducted on two benchmark datasets, i.e., Yelp2018 and Amazon-book, and the experimental results demonstrate that EGCF can achieve the state-of-the-art performance in terms of Recall and normalized discounted cumulative gain (NDCG), especially on ranking the target items at right positions. In addition, EGCF shows obvious advantages in the training efficiency compared with the competitive baselines, making it practicable for potential applications.


2017 ◽  
Vol 2017 ◽  
pp. 1-10 ◽  
Author(s):  
Wen-Jun Li ◽  
Qiang Dong ◽  
Yan Fu

As the rapid development of mobile Internet and smart devices, more and more online content providers begin to collect the preferences of their customers through various apps on mobile devices. These preferences could be largely reflected by the ratings on the online items with explicit scores. Both of positive and negative ratings are helpful for recommender systems to provide relevant items to a target user. Based on the empirical analysis of three real-world movie-rating data sets, we observe that users’ rating criterions change over time, and past positive and negative ratings have different influences on users’ future preferences. Given this, we propose a recommendation model on a session-based temporal graph, considering the difference of long- and short-term preferences, and the different temporal effect of positive and negative ratings. The extensive experiment results validate the significant accuracy improvement of our proposed model compared with the state-of-the-art methods.


Author(s):  
Haidi Hasan Badr ◽  
Nayer Mahmoud Wanas ◽  
Magda Fayek

Since labeled data availability differs greatly across domains, Domain Adaptation focuses on learning in new and unfamiliar domains by reducing distribution divergence. Recent research suggests that the adversarial learning approach could be a promising way to achieve the domain adaptation objective. Adversarial learning is a strategy for learning domain-transferable features in robust deep networks. This paper introduces the TSAL paradigm, a two-step adversarial learning framework. It addresses the real-world problem of text classification, where source domain(s) has labeled data but target domain (s) has only unlabeled data. TSAL utilizes joint adversarial learning with class information and domain alignment deep network architecture to learn both domain-invariant and domain-specific features extractors. It consists of two training steps that are similar to the paradigm, in which pre-trained model weights are used as initialization for training with new data. TSAL’s two training phases, however, are based on the same data, not different data, as is the case with fine-tuning. Furthermore, TSAL only uses the learned domain-invariant feature extractor from the first training as an initialization for its peer in subsequent training. By doubling the training, TSAL can emphasize the leverage of the small unlabeled target domain and learn effectively what to share between various domains. A detailed analysis of many benchmark datasets reveals that our model consistently outperforms the prior art across a wide range of dataset distributions.


2019 ◽  
Vol 10 (1) ◽  
pp. 66-82
Author(s):  
Roy Wentas

The progress of science and technology tends to give rise to differences between the older generation and the younger generation. Therefore learn the value orientation among young people and learners, especially the attitude of diversity is certainly important. Coaching youth as the next generation is a shared responsibility between families, communities and the nation state. Religious Education can run and practice the teachings of Hinduism so that the formation budhi noble character and noble morals. In the holy book, Bhagavadgita stated two trends affect the human character, the properties of all devata's (daivi sampat) and properties of giant (asuri sampat). Both of these trends are directly or indirectly will shape human character. The rapid development of science and technology these days have influenced the characters of the children, who are faced with heavy challenges. Teaching children should be then directed towards strenghtening their morals. Regarding that, it needs a neo-traditional norm that is based on the traditional origins. The Hindu education could become the normative agent that builds any modern Indonesian characters through their local wisdoms that are motivative to the children. On the instrumental level, the primary values to be taught are autonomy, dignity, creativity, morality, pride, and sense of aesthetics, and democracy awareness. They should preserve the local cultural heritage, including the languages and the arts, while adapting the global trend. As the educators, the teachers at schools as well as the parents at homes must be the role models whose responsibilities and disciplines are followed


2021 ◽  
Author(s):  
Enshuai Hou ◽  
Jie zhu

Tibetan is a low-resource language. In order to alleviate the shortage of parallel corpus between Tibetan and Chinese, this paper uses two monolingual corpora and a small number of seed dictionaries to learn the semi-supervised method with seed dictionaries and self-supervised adversarial training method through the similarity calculation of word clusters in different embedded spaces and puts forward an improved self-supervised adversarial learning method of Tibetan and Chinese monolingual data alignment only. The experimental results are as follows. First, the experimental results of Tibetan syllables Chinese characters are not good, which reflects the weak semantic correlation between Tibetan syllables and Chinese characters; second, the seed dictionary of semi-supervised method made before 10 predicted word accuracy of 66.5 (Tibetan - Chinese) and 74.8 (Chinese - Tibetan) results, to improve the self-supervision methods in both language directions have reached 53.5 accuracy.


2020 ◽  
Vol 36 (4) ◽  
pp. 305-323
Author(s):  
Quan Hoang Nguyen ◽  
Ly Vu ◽  
Quang Uy Nguyen

Sentiment classification (SC) aims to determine whether a document conveys a positive or negative opinion. Due to the rapid development of the digital world, SC has become an important research topic that affects many aspects of our life. In SC based on machine learning, the representation of the document strongly influences on its accuracy. Word Embedding (WE)-based techniques, i.e., Word2vec techniques, are proved to be beneficial techniques to the SC problem. However, Word2vec is often not enough to represent the semantic of documents with complex sentences of Vietnamese. In this paper, we propose a new representation learning model called a \textbf{two-channel vector} to learn a higher-level feature of a document in SC. Our model uses two neural networks to learn the semantic feature, i.e., Word2vec and the syntactic feature, i.e., Part of Speech tag (POS). Two features are then combined and input to a \textit{Softmax} function to make the final classification. We carry out intensive experiments on $4$ recent Vietnamese sentiment datasets to evaluate the performance of the proposed architecture. The experimental results demonstrate that the proposed model can significantly enhance the accuracy of SC problems compared to two single models and a state-of-the-art ensemble method.


Sign in / Sign up

Export Citation Format

Share Document