Abstractive Text Summarization by Incorporating Reader Comments

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016399 ◽

2019 ◽

Vol 33 ◽

pp. 6399-6406 ◽

Cited By ~ 4

Author(s):

Shen Gao ◽

Xiuying Chen ◽

Piji Li ◽

Zhaochun Ren ◽

Lidong Bing ◽

...

Keyword(s):

Real World ◽

Large Scale ◽

Text Summarization ◽

Experimental Results ◽

Semantic Gap ◽

Main Aspect ◽

Adversarial Learning ◽

Large Scale Dataset ◽

Reader Comments ◽

Abstractive Summarization

In neural abstractive summarization field, conventional sequence-to-sequence based models often suffer from summarizing the wrong aspect of the document with respect to the main aspect. To tackle this problem, we propose the task of reader-aware abstractive summary generation, which utilizes the reader comments to help the model produce better summary about the main aspect. Unlike traditional abstractive summarization task, reader-aware summarization confronts two main challenges: (1) Comments are informal and noisy; (2) jointly modeling the news document and the reader comments is challenging. To tackle the above challenges, we design an adversarial learning model named reader-aware summary generator (RASG), which consists of four components: (1) a sequence-to-sequence based summary generator; (2) a reader attention module capturing the reader focused aspects; (3) a supervisor modeling the semantic gap between the generated summary and reader focused aspects; (4) a goal tracker producing the goal for each generation step. The supervisor and the goal tacker are used to guide the training of our framework in an adversarial manner. Extensive experiments are conducted on our large-scale real-world text summarization dataset, and the results show that RASG achieves the stateof-the-art performance in terms of both automatic metrics and human evaluations. The experimental results also demonstrate the effectiveness of each module in our framework. We release our large-scale dataset for further research1.

Download Full-text

Aspect-Aware Multimodal Summarization for Chinese E-Commerce Products

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6332 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8188-8195

Author(s):

Haoran Li ◽

Peng Yuan ◽

Song Xu ◽

Youzheng Wu ◽

Xiaodong He ◽

...

Keyword(s):

Energy Efficiency ◽

Significant Role ◽

Visual Information ◽

Large Scale ◽

Product Information ◽

Comparative Methods ◽

Text Summarization ◽

Experimental Results ◽

Summarization System ◽

Abstractive Summarization

We present an abstractive summarization system that produces summary for Chinese e-commerce products. This task is more challenging than general text summarization. First, the appearance of a product typically plays a significant role in customers' decisions to buy the product or not, which requires that the summarization model effectively use the visual information of the product. Furthermore, different products have remarkable features in various aspects, such as “energy efficiency” and “large capacity” for refrigerators. Meanwhile, different customers may care about different aspects. Thus, the summarizer needs to capture the most attractive aspects of a product that resonate with potential purchasers. We propose an aspect-aware multimodal summarization model that can effectively incorporate the visual information and also determine the most salient aspects of a product. We construct a large-scale Chinese e-commerce product summarization dataset that contains approximately 1.4 million manually created product summaries that are paired with detailed product information, including an image, a title, and other textual descriptions for each product. The experimental results on this dataset demonstrate that our models significantly outperform the comparative methods in terms of both the ROUGE score and manual evaluations.

Download Full-text

Review Summary Generation in Online Systems: Frameworks for Supervised and Unsupervised Scenarios

ACM Transactions on the Web ◽

10.1145/3448015 ◽

2021 ◽

Vol 15 (3) ◽

pp. 1-33

Author(s):

Wenjun Jiang ◽

Jing Chen ◽

Xiaofei Ding ◽

Jie Wu ◽

Jiawei He ◽

...

Keyword(s):

Decision Making ◽

Real World ◽

Text Summarization ◽

Experimental Results ◽

Product Review ◽

Comprehensive Review ◽

Online Systems ◽

Real World Datasets ◽

Different Characteristics

In online systems, including e-commerce platforms, many users resort to the reviews or comments generated by previous consumers for decision making, while their time is limited to deal with many reviews. Therefore, a review summary, which contains all important features in user-generated reviews, is expected. In this article, we study “how to generate a comprehensive review summary from a large number of user-generated reviews.” This can be implemented by text summarization, which mainly has two types of extractive and abstractive approaches. Both of these approaches can deal with both supervised and unsupervised scenarios, but the former may generate redundant and incoherent summaries, while the latter can avoid redundancy but usually can only deal with short sequences. Moreover, both approaches may neglect the sentiment information. To address the above issues, we propose comprehensive Review Summary Generation frameworks to deal with the supervised and unsupervised scenarios. We design two different preprocess models of re-ranking and selecting to identify the important sentences while keeping users’ sentiment in the original reviews. These sentences can be further used to generate review summaries with text summarization methods. Experimental results in seven real-world datasets (Idebate, Rotten Tomatoes Amazon, Yelp, and three unlabelled product review datasets in Amazon) demonstrate that our work performs well in review summary generation. Moreover, the re-ranking and selecting models show different characteristics.

Download Full-text

Local Graph Edge Partitioning

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3466685 ◽

2021 ◽

Vol 12 (5) ◽

pp. 1-25

Author(s):

Shengwei Ji ◽

Chenyang Bu ◽

Lei Li ◽

Xindong Wu

Keyword(s):

Real World ◽

Graph Partitioning ◽

Large Scale ◽

Complete Information ◽

Local Information ◽

Experimental Results ◽

Two Stage ◽

Graph Computation ◽

Local Graph ◽

Edge Partitioning

Graph edge partitioning, which is essential for the efficiency of distributed graph computation systems, divides a graph into several balanced partitions within a given size to minimize the number of vertices to be cut. Existing graph partitioning models can be classified into two categories: offline and streaming graph partitioning models. The former requires global graph information during the partitioning, which is expensive in terms of time and memory for large-scale graphs. The latter creates partitions based solely on the received graph information. However, the streaming model may result in a lower partitioning quality compared with the offline model. Therefore, this study introduces a Local Graph Edge Partitioning model, which considers only the local information (i.e., a portion of a graph instead of the entire graph) during the partitioning. Considering only the local graph information is meaningful because acquiring complete information for large-scale graphs is expensive. Based on the Local Graph Edge Partitioning model, two local graph edge partitioning algorithms—Two-stage Local Partitioning and Adaptive Local Partitioning—are given. Experimental results obtained on 14 real-world graphs demonstrate that the proposed algorithms outperform rival algorithms in most tested cases. Furthermore, the proposed algorithms are proven to significantly improve the efficiency of the real graph computation system GraphX.

Download Full-text

A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/591 ◽

2018 ◽

Cited By ~ 15

Author(s):

Shuming Ma ◽

Xu Sun ◽

Junyang Lin ◽

Xuancheng Ren

Keyword(s):

Hierarchical Structure ◽

Online Reviews ◽

Text Summarization ◽

Sentiment Classification ◽

Experimental Results ◽

Joint Learning ◽

End To End ◽

Abstractive Summarization ◽

Main Ideas ◽

Different Levels

Text summarization and sentiment classification both aim to capture the main ideas of the text but at different levels. Text summarization is to describe the text within a few sentences, while sentiment classification can be regarded as a special type of summarization which ``summarizes'' the text into a even more abstract fashion, i.e., a sentiment class. Based on this idea, we propose a hierarchical end-to-end model for joint learning of text summarization and sentiment classification, where the sentiment classification label is treated as the further ``summarization'' of the text summarization output. Hence, the sentiment classification layer is put upon the text summarization layer, and a hierarchical structure is derived. Experimental results on Amazon online reviews datasets show that our model achieves better performance than the strong baseline systems on both abstractive summarization and sentiment classification.

Download Full-text

Legal Judgment Prediction Based on Multiclass Information Fusion

Complexity ◽

10.1155/2020/3089189 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Kongfan Zhu ◽

Rundong Guo ◽

Weifeng Hu ◽

Zeqiang Li ◽

Yujun Li

Keyword(s):

Information Fusion ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

External Information ◽

Criminal Cases ◽

Law System ◽

Large Scale Dataset ◽

Assistant Systems ◽

Civil Law System

Legal judgment prediction (LJP), as an effective and critical application in legal assistant systems, aims to determine the judgment results according to the information based on the fact determination. In real-world scenarios, to deal with the criminal cases, judges not only take advantage of the fact description, but also consider the external information, such as the basic information of defendant and the court view. However, most existing works take the fact description as the sole input for LJP and ignore the external information. We propose a Transformer-Hierarchical-Attention-Multi-Extra (THME) Network to make full use of the information based on the fact determination. We conduct experiments on a real-world large-scale dataset of criminal cases in the civil law system. Experimental results show that our method outperforms state-of-the-art LJP methods on all judgment prediction tasks.

Download Full-text

Scene text removal via cascaded text stroke detection and erasing

Computational Visual Media ◽

10.1007/s41095-021-0242-8 ◽

2021 ◽

Vol 8 (2) ◽

pp. 273-287

Author(s):

Xuewei Bian ◽

Chaoqun Wang ◽

Weize Quan ◽

Juntao Ye ◽

Xiaopeng Zhang ◽

...

Keyword(s):

Performance Improvement ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Processing Unit ◽

Final Model ◽

Scene Text ◽

End To End

AbstractRecent learning-based approaches show promising performance improvement for the scene text removal task but usually leave several remnants of text and provide visually unpleasant results. In this work, a novel end-to-end framework is proposed based on accurate text stroke detection. Specifically, the text removal problem is decoupled into text stroke detection and stroke removal; we design separate networks to solve these two subproblems, the latter being a generative network. These two networks are combined as a processing unit, which is cascaded to obtain our final model for text removal. Experimental results demonstrate that the proposed method substantially outperforms the state-of-the-art for locating and erasing scene text. A new large-scale real-world dataset with 12,120 images has been constructed and is being made available to facilitate research, as current publicly available datasets are mainly synthetic so cannot properly measure the performance of different methods.

Download Full-text

Modified Password Guessing Methods Based on TarGuess-I

Wireless Communications and Mobile Computing ◽

10.1155/2020/8837210 ◽

2020 ◽

Vol 2020 ◽

pp. 1-22

Author(s):

Zhijie Xie ◽

Min Zhang ◽

Yuqi Guo ◽

Zhenhan Li ◽

Hongjun Wang

Keyword(s):

Real World ◽

Relative Position ◽

Large Scale ◽

Experimental Results ◽

Demographic Information ◽

Modified Model ◽

Password Security ◽

Personally Identifiable Information

TarGuess − I is a leading online targeted password guessing model using users’ personally identifiable information (PII) proposed at ACM CCS 2016 by Wang et al. It has attracted widespread attention in password security owing to its superior guessing performance. Yet, after analyzing the users’ vulnerable behaviors of using popular passwords and constructing passwords with users’ PII, we find that this model does not take into account popular passwords, keyboard patterns, and the special strings. The special strings are the strings related to users but do not appear in the users’ demographic information. Thus, we propose TarGuess − I + K P X , a modified password guessing model with three semantic methods, including (1) identifying popular passwords by generating top-300 lists from similar websites, (2) recognizing keyboard patterns by relative position, and (3) catching the special strings by extracting continuous characters from user-generated PII. We conduct a series of evaluations on six large-scale real-world leaked password datasets. The experimental results show that our modified model outperforms TarGuess − I by 2.62% within 100 guesses.

Download Full-text

DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr42600.2020.00296 ◽

2020 ◽

Cited By ~ 4

Author(s):

Liming Jiang ◽

Ren Li ◽

Wayne Wu ◽

Chen Qian ◽

Chen Change Loy

Keyword(s):

Real World ◽

Large Scale ◽

Forgery Detection ◽

Large Scale Dataset

Download Full-text

DuEE: A Large-Scale Dataset for Chinese Event Extraction in Real-World Scenarios

Natural Language Processing and Chinese Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-030-60457-8_44 ◽

2020 ◽

pp. 534-545

Author(s):

Xinyu Li ◽

Fayuan Li ◽

Lu Pan ◽

Yuguang Chen ◽

Weihua Peng ◽

...

Keyword(s):

Real World ◽

Large Scale ◽

Event Extraction ◽

Large Scale Dataset

Download Full-text

What is this Article about? Extreme Summarization with Topic-aware Convolutional Neural Networks

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11315 ◽

2019 ◽

Vol 66 ◽

pp. 243-278

Author(s):

Shashi Narayan ◽

Shay B. Cohen ◽

Mirella Lapata

Keyword(s):

Neural Networks ◽

Long Range ◽

Convolutional Neural Networks ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

British Broadcasting Corporation ◽

Document Summarization ◽

Modeling Approach ◽

Large Scale Dataset

We introduce "extreme summarization," a new single-document summarization task which aims at creating a short, one-sentence news summary answering the question "What is the article about?". We argue that extreme summarization, by nature, is not amenable to extractive strategies and requires an abstractive modeling approach. In the hope of driving research on this task further: (a) we collect a real-world, large scale dataset by harvesting online articles from the British Broadcasting Corporation (BBC); and (b) propose a novel abstractive model which is conditioned on the article's topics and based entirely on convolutional neural networks. We demonstrate experimentally that this architecture captures long-range dependencies in a document and recognizes pertinent content, outperforming an oracle extractive system and state-of-the-art abstractive approaches when evaluated automatically and by humans on the extreme summarization dataset.

Download Full-text