An Unsupervised Aspect-Aware Recommendation Model with Explanation Text Generation

Peijie Sun; Le Wu; Kun Zhang; Yu Su; Meng Wang

doi:10.1145/3483611

An Unsupervised Aspect-Aware Recommendation Model with Explanation Text Generation

ACM Transactions on Information Systems ◽

10.1145/3483611 ◽

2022 ◽

Vol 40 (3) ◽

pp. 1-29

Author(s):

Peijie Sun ◽

Le Wu ◽

Kun Zhang ◽

Yu Su ◽

Meng Wang

Keyword(s):

Data Privacy ◽

Auxiliary Information ◽

Generation Process ◽

Text Generation ◽

Generation Task ◽

Auxiliary Data ◽

Fine Grained ◽

Aspect Extraction ◽

Learning Framework ◽

Real World Datasets

Review based recommendation utilizes both users’ rating records and the associated reviews for recommendation. Recently, with the rapid demand for explanations of recommendation results, reviews are used to train the encoder–decoder models for explanation text generation. As most of the reviews are general text without detailed evaluation, some researchers leveraged auxiliary information of users or items to enrich the generated explanation text. Nevertheless, the auxiliary data is not available in most scenarios and may suffer from data privacy problems. In this article, we argue that the reviews contain abundant semantic information to express the users’ feelings for various aspects of items, while these information are not fully explored in current explanation text generation task. To this end, we study how to generate more fine-grained explanation text in review based recommendation without any auxiliary data. Though the idea is simple, it is non-trivial since the aspect is hidden and unlabeled. Besides, it is also very challenging to inject aspect information for generating explanation text with noisy review input. To solve these challenges, we first leverage an advanced unsupervised neural aspect extraction model to learn the aspect-aware representation of each review sentence. Thus, users and items can be represented in the aspect space based on their historical associated reviews. After that, we detail how to better predict ratings and generate explanation text with the user and item representations in the aspect space. We further dynamically assign review sentences which contain larger proportion of aspect words with larger weights to control the text generation process, and jointly optimize rating prediction accuracy and explanation text generation quality with a multi-task learning framework. Finally, extensive experimental results on three real-world datasets demonstrate the superiority of our proposed model for both recommendation accuracy and explainability.

Download Full-text

UGSD: User Generated Sentiment Dictionaries from Online Customer Reviews

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301313 ◽

2019 ◽

Vol 33 ◽

pp. 313-320 ◽

Cited By ~ 1

Author(s):

Chun-Hsiang Wang ◽

Kang-Chun Fan ◽

Chuan-Ju Wang ◽

Ming-Feng Tsai

Keyword(s):

Representation Learning ◽

Customer Reviews ◽

Fine Grained ◽

Learning Framework ◽

Domain Specific ◽

Entity Ranking ◽

Online Customer Reviews ◽

Rich Information ◽

Real World Datasets ◽

Low Dimensional

Customer reviews on platforms such as TripAdvisor and Amazon provide rich information about the ways that people convey sentiment on certain domains. Given these kinds of user reviews, this paper proposes UGSD, a representation learning framework for constructing domain-specific sentiment dictionaries from online customer reviews, in which we leverage the relationship between user-generated reviews and the ratings of the reviews to associate the reviewer sentiment with certain entities. The proposed framework has the following three main advantages. First, no additional annotations of words or external dictionaries are needed for the proposed framework; the only resources needed are the review texts and entity ratings. Second, the framework is applicable across a variety of user-generated content from different domains to construct domain-specific sentiment dictionaries. Finally, each word in the constructed dictionary is associated with a low-dimensional dense representation and a degree of relatedness to a certain rating, which enable us to obtain more fine-grained dictionaries and enhance the application scalability of the constructed dictionaries as the word representations can be adopted for various tasks or applications, such as entity ranking and dictionary expansion. The experimental results on three real-world datasets show that the framework is effective in constructing high-quality domain-specific sentiment dictionaries from customer reviews.

Download Full-text

Privacy-preserving Collaborative Training for Medical Image Analysis Based on Multi-Blockchain

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207323666201022110616 ◽

2020 ◽

Vol 23 ◽

Author(s):

Wanlu Zhang ◽

Qigang Wang ◽

Mei Li

Keyword(s):

Medical Image ◽

Data Privacy ◽

Medical Image Analysis ◽

Auxiliary Information ◽

Training Process ◽

Private Data ◽

Medical Institutions ◽

Model Training ◽

Collaborative Training ◽

Similar Task

Background: As artificial intelligence and big data analysis develop rapidly, data privacy, especially patient medical data privacy, is getting more and more attention. Objective: To strengthen the protection of private data while ensuring the model training process, this article introduces a multi-Blockchain-based decentralized collaborative machine learning training method for medical image analysis. In this way, researchers from different medical institutions are able to collaborate to train models without exchanging sensitive patient data. Method: Partial parameter update method is applied to prevent indirect privacy leakage during model propagation. With the peer-to-peer communication in the multi-Blockchain system, a machine learning task can leverage auxiliary information from another similar task in another Blockchain. In addition, after the collaborative training process, personalized models of different medical institutions will be trained. Results: The experimental results show that our method achieves similar performance with the centralized model-training method by collecting data sets of all participants and prevents private data leakage at the same time. Transferring auxiliary information from similar task on another Blockchain has also been proven to effectively accelerate model convergence and improve model accuracy, especially in the scenario of absence of data. Personalization training process further improves model performance. Conclusion: Our approach can effectively help researchers from different organizations to achieve collaborative training without disclosing their private data.

Download Full-text

Knowing What, How and Why: A Near Complete Solution for Aspect-Based Sentiment Analysis

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6383 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8600-8607

Author(s):

Haiyun Peng ◽

Lu Xu ◽

Lidong Bing ◽

Fei Huang ◽

Wei Lu ◽

...

Keyword(s):

Sentiment Analysis ◽

State Of The Art ◽

Complete Solution ◽

Unified Model ◽

Two Stage ◽

Fine Grained ◽

Aspect Extraction ◽

Second Stage ◽

Opinion Extraction ◽

Complete Story

Target-based sentiment analysis or aspect-based sentiment analysis (ABSA) refers to addressing various sentiment analysis tasks at a fine-grained level, which includes but is not limited to aspect extraction, aspect sentiment classification, and opinion extraction. There exist many solvers of the above individual subtasks or a combination of two subtasks, and they can work together to tell a complete story, i.e. the discussed aspect, the sentiment on it, and the cause of the sentiment. However, no previous ABSA research tried to provide a complete solution in one shot. In this paper, we introduce a new subtask under ABSA, named aspect sentiment triplet extraction (ASTE). Particularly, a solver of this task needs to extract triplets (What, How, Why) from the inputs, which show WHAT the targeted aspects are, HOW their sentiment polarities are and WHY they have such polarities (i.e. opinion reasons). For instance, one triplet from “Waiters are very friendly and the pasta is simply average” could be (‘Waiters’, positive, ‘friendly’). We propose a two-stage framework to address this task. The first stage predicts what, how and why in a unified model, and then the second stage pairs up the predicted what (how) and why from the first stage to output triplets. In the experiments, our framework has set a benchmark performance in this novel triplet extraction task. Meanwhile, it outperforms a few strong baselines adapted from state-of-the-art related methods.

Download Full-text

Semi-Supervised Aspect-Based Sentiment Analysis for Case-Related Microblog Reviews Using Case Knowledge Graph Embedding

International Journal of Asian Language Processing ◽

10.1142/s2717554520500125 ◽

2021 ◽

pp. 2050012

Author(s):

Peilian Zhao ◽

Cunli Mao ◽

Zhengtao Yu

Keyword(s):

Sentiment Analysis ◽

Domain Knowledge ◽

Opinion Mining ◽

Data Augmentation ◽

Training Data ◽

Knowledge Graph ◽

Fine Grained ◽

Learning Framework ◽

Proposed Model ◽

Real World Applications

Aspect-Based Sentiment Analysis (ABSA), a fine-grained task of opinion mining, which aims to extract sentiment of specific target from text, is an important task in many real-world applications, especially in the legal field. Therefore, in this paper, we study the problem of limitation of labeled training data required and ignorance of in-domain knowledge representation for End-to-End Aspect-Based Sentiment Analysis (E2E-ABSA) in legal field. We proposed a new method under deep learning framework, named Semi-ETEKGs, which applied E2E framework using knowledge graph (KG) embedding in legal field after data augmentation (DA). Specifically, we pre-trained the BERT embedding and in-domain KG embedding for unlabeled data and labeled data with case elements after DA, and then we put two embeddings into the E2E framework to classify the polarity of target-entity. Finally, we built a case-related dataset based on a popular benchmark for ABSA to prove the efficiency of Semi-ETEKGs, and experiments on case-related dataset from microblog comments show that our proposed model outperforms the other compared methods significantly.

Download Full-text

Unsupervised Neural Aspect Extraction with Sememes

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/712 ◽

2019 ◽

Cited By ~ 3

Author(s):

Ling Luo ◽

Xiang Ao ◽

Yan Song ◽

Jinyao Li ◽

Xiaopeng Yang ◽

...

Keyword(s):

Real World ◽

Latent Variables ◽

Lexical Semantics ◽

Word Meanings ◽

Lexical Semantic ◽

Aspect Extraction ◽

Real World Datasets ◽

Semantic Resources

Aspect extraction relies on identifying aspects by discovering coherence among words, which is challenging when word meanings are diversified and processing on short texts. To enhance the performance on aspect extraction, leveraging lexical semantic resources is a possible solution to such challenge. In this paper, we present an unsupervised neural framework that leverages sememes to enhance lexical semantics. The overall framework is analogous to an autoenoder which reconstructs sentence representations and learns aspects by latent variables. Two models that form sentence representations are proposed by exploiting sememes via (1) a hierarchical attention; (2) a context-enhanced attention. Experiments on two real-world datasets demonstrate the validity and the effectiveness of our models, which significantly outperforms existing baselines.

Download Full-text

A Novel Hybrid Approach for Multi-Dimensional Data Anonymization for Apache Spark

ACM Transactions on Privacy and Security ◽

10.1145/3484945 ◽

2022 ◽

Vol 25 (1) ◽

pp. 1-25

Author(s):

Sibghat Ullah Bazai ◽

Julian Jang-Jaccard ◽

Hooman Alavizadeh

Keyword(s):

Critical Analysis ◽

Data Privacy ◽

High Performance ◽

Distributed Processing ◽

Hybrid Approach ◽

Relative Size ◽

Optimal Number ◽

Data Anonymization ◽

Fine Grained ◽

Message Exchange

Multi-dimensional data anonymization approaches (e.g., Mondrian) ensure more fine-grained data privacy by providing a different anonymization strategy applied for each attribute. Many variations of multi-dimensional anonymization have been implemented on different distributed processing platforms (e.g., MapReduce, Spark) to take advantage of their scalability and parallelism supports. According to our critical analysis on overheads, either existing iteration-based or recursion-based approaches do not provide effective mechanisms for creating the optimal number of and relative size of resilient distributed datasets (RDDs), thus heavily suffer from performance overheads. To solve this issue, we propose a novel hybrid approach for effectively implementing a multi-dimensional data anonymization strategy (e.g., Mondrian) that is scalable and provides high-performance. Our hybrid approach provides a mechanism to create far fewer RDDs and smaller size partitions attached to each RDD than existing approaches. This optimal RDD creation and operations approach is critical for many multi-dimensional data anonymization applications that create tremendous execution complexity. The new mechanism in our proposed hybrid approach can dramatically reduce the critical overheads involved in re-computation cost, shuffle operations, message exchange, and cache management.

Download Full-text

Embedding-Based Complex Feature Value Coupling Learning for Detecting Outliers in Non-IID Categorical Data

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015541 ◽

2019 ◽

Vol 33 ◽

pp. 5541-5548 ◽

Cited By ~ 2

Author(s):

Hongzuo Xu ◽

Yongjun Wang ◽

Zhiyue Wu ◽

Yijie Wang

Keyword(s):

Outlier Detection ◽

Categorical Data ◽

State Of The Art ◽

High Order ◽

Detection Methods ◽

Order Complex ◽

Value Network ◽

Learning Framework ◽

A Value ◽

Real World Datasets

Non-IID categorical data is ubiquitous and common in realworld applications. Learning various kinds of couplings has been proved to be a reliable measure when detecting outliers in such non-IID data. However, it is a critical yet challenging problem to model, represent, and utilise high-order complex value couplings. Existing outlier detection methods normally only focus on pairwise primary value couplings and fail to uncover real relations that hide in complex couplings, resulting in suboptimal and unstable performance. This paper introduces a novel unsupervised embedding-based complex value coupling learning framework EMAC and its instance SCAN to address these issues. SCAN first models primary value couplings. Then, coupling bias is defined to capture complex value couplings with different granularities and highlight the essence of outliers. An embedding method is performed on the value network constructed via biased value couplings, which further learns high-order complex value couplings and embeds these couplings into a value representation matrix. Bidirectional selective value coupling learning is proposed to show how to estimate value and object outlierness through value couplings. Substantial experiments show that SCAN (i) significantly outperforms five state-of-the-art outlier detection methods on thirteen real-world datasets; and (ii) has much better resilience to noise than its competitors.

Download Full-text

Fine-grained access control ensuring data privacy in OpenStack cloud

2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT) ◽

10.1109/icicict1.2017.8342822 ◽

2017 ◽

Author(s):

M Naveen Thomas John ◽

Manoj V. Thomas

Keyword(s):

Access Control ◽

Data Privacy ◽

Fine Grained

Download Full-text

Data-to-Text Generation with Content Selection and Planning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016908 ◽

2019 ◽

Vol 33 ◽

pp. 6908-6915 ◽

Cited By ~ 5

Author(s):

Ratish Puduppully ◽

Li Dong ◽

Mirella Lapata

Keyword(s):

Neural Network ◽

Network Architecture ◽

Large Scale ◽

Network Models ◽

Text Generation ◽

Neural Network Models ◽

Generation Task ◽

Content Selection ◽

End To End ◽

Two Stages

Recent advances in data-to-text generation have led to the use of large-scale datasets and neural network models which are trained end-to-end, without explicitly modeling what to say and in what order. In this work, we present a neural network architecture which incorporates content selection and planning without sacrificing end-to-end training. We decompose the generation task into two stages. Given a corpus of data records (paired with descriptive documents), we first generate a content plan highlighting which information should be mentioned and in which order and then generate the document while taking the content plan into account. Automatic and human-based evaluation experiments show that our model1 outperforms strong baselines improving the state-of-the-art on the recently released RotoWIRE dataset.

Download Full-text

A Fine-Grained User-Divided Privacy-Preserving Access Control Protocol in Smart Watch

Sensors ◽

10.3390/s19092109 ◽

2019 ◽

Vol 19 (9) ◽

pp. 2109

Author(s):

Liming Fang ◽

Minghui Li ◽

Lu Zhou ◽

Hanyi Zhang ◽

Chunpeng Ge

Keyword(s):

Access Control ◽

Data Privacy ◽

Privacy Preservation ◽

Data Access ◽

Privacy Preserving ◽

Security And Privacy ◽

Fine Grained ◽

Data Access Control ◽

Smart Watch ◽

Performance And Evaluation

A smart watch is a kind of emerging wearable device in the Internet of Things. The security and privacy problems are the main obstacles that hinder the wide deployment of smart watches. Existing security mechanisms do not achieve a balance between the privacy-preserving and data access control. In this paper, we propose a fine-grained privacy-preserving access control architecture for smart watches (FPAS). In FPAS, we leverage the identity-based authentication scheme to protect the devices from malicious connection and policy-based access control for data privacy preservation. The core policy of FPAS is two-fold: (1) utilizing a homomorphic and re-encrypted scheme to ensure that the ciphertext information can be correctly calculated; (2) dividing the data requester by different attributes to avoid unauthorized access. We present a concrete scheme based on the above prototype and analyze the security of the FPAS. The performance and evaluation demonstrate that the FPAS scheme is efficient, practical, and extensible.

Download Full-text