Text Summarization using Extractive and Abstractive Methods

Text Summarization is a process where a huge text file is converted into summarized version which will preserve the original meaning and context. The main aim of any text summarization is to provide a accurate and precise summary. One approach is to use a sentence ranking algorithm. This comes under extractive summarization. Here, a graph based ranking algorithm is used to rank the sentences in the text and then top k-scored sentences are included in the summary. The most widely used algorithm to decide the importance of any vertex in a graph based on the information retrieved from the graph is Graph Based Ranking Algorithm. TextRank is one of the most efficient ranking algorithms which is used for Web link analysis that is for measuring the importance of website pages. Another approach is abstractive summarization where a LSTM encoder decoder model is used along with attention mechanism which focuses on some important words from the input. Encoder encodes the input sequence and decoder along with attention mechanism gives the summary as the output.

Download Full-text

Abstractive Arabic Text Summarization Based on Deep Learning

Computational Intelligence and Neuroscience ◽

10.1155/2022/1566890 ◽

2022 ◽

Vol 2022 ◽

pp. 1-14

Author(s):

Y.M. Wazery ◽

Marwa E. Saleh ◽

Abdullah Alharbi ◽

Abdelmgeid A. Ali

Keyword(s):

Short Term Memory ◽

Attention Mechanism ◽

Text Summarization ◽

Arabic Text ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory ◽

Arabic Text Summarization ◽

Abstractive Summarization ◽

Research Studies

Text summarization (TS) is considered one of the most difficult tasks in natural language processing (NLP). It is one of the most important challenges that stand against the modern computer system’s capabilities with all its new improvement. Many papers and research studies address this task in literature but are being carried out in extractive summarization, and few of them are being carried out in abstractive summarization, especially in the Arabic language due to its complexity. In this paper, an abstractive Arabic text summarization system is proposed, based on a sequence-to-sequence model. This model works through two components, encoder and decoder. Our aim is to develop the sequence-to-sequence model using several deep artificial neural networks to investigate which of them achieves the best performance. Different layers of Gated Recurrent Units (GRU), Long Short-Term Memory (LSTM), and Bidirectional Long Short-Term Memory (BiLSTM) have been used to develop the encoder and the decoder. In addition, the global attention mechanism has been used because it provides better results than the local attention mechanism. Furthermore, AraBERT preprocess has been applied in the data preprocessing stage that helps the model to understand the Arabic words and achieves state-of-the-art results. Moreover, a comparison between the skip-gram and the continuous bag of words (CBOW) word2Vec word embedding models has been made. We have built these models using the Keras library and run-on Google Colab Jupiter notebook to run seamlessly. Finally, the proposed system is evaluated through ROUGE-1, ROUGE-2, ROUGE-L, and BLEU evaluation metrics. The experimental results show that three layers of BiLSTM hidden states at the encoder achieve the best performance. In addition, our proposed system outperforms the other latest research studies. Also, the results show that abstractive summarization models that use the skip-gram word2Vec model outperform the models that use the CBOW word2Vec model.

Download Full-text

GUI Based Text Summarizing of Social Response

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.d1710.029420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1773-1776

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Summarization ◽

Social Response ◽

Extractive Summarization ◽

News Stories ◽

Social Reaction ◽

Abstractive Summarization

Text Summarization is one of those utilizations of Natural Language Processing (NLP) which will undoubtedly hugy affect our lives. For the most part, Text outline can comprehensively be partitioned into two classifications, Extractive Summarization and Abstractive Summarization and the execution of seq2seq model for rundown of literary information utilizing of tensor stream/keras and showed on amazon or social reaction surveys, issues and news stories. Content rundown is a subdomain of Natural Language Processing that manages removing synopses from tremendous lumps of writings. There are two fundamental sorts of methods utilized for content rundown: NLP-based procedures and profound learning based strategies. Along these lines, our point is to look at spacy, gensim and nltk synopsis system by the info prerequisites. It will see a basic NLP-based system for content rundown. Or maybe it will basically utilize Python's NLTK library for content abridging.

Download Full-text

Unsupervised Multi-Document Abstractive Summarization Using Recursive Neural Network with Attention Mechanism

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.8976 ◽

2020 ◽

Vol 17 (9) ◽

pp. 3867-3872

Author(s):

Aniv Chakravarty ◽

Jagadish S. Kallimani

Keyword(s):

Neural Network ◽

Neural Networks ◽

Attention Mechanism ◽

Text Summarization ◽

Text Generation ◽

Text Documents ◽

Current State ◽

Semantic Concepts ◽

Text Information ◽

Abstractive Summarization

Text summarization is an active field of research with a goal to provide short and meaningful gists from large amount of text documents. Extractive text summarization methods have been extensively studied where text is extracted from the documents to build summaries. There are various type of multi document ranging from different formats to domains and topics. With the recent advancement in technology and use of neural networks for text generation, interest for research in abstractive text summarization has increased significantly. The use of graph based methods which handle semantic information has shown significant results. When given a set of documents of English text files, we make use of abstractive method and predicate argument structures to retrieve necessary text information and pass it through a neural network for text generation. Recurrent neural networks are a subtype of recursive neural networks which try to predict the next sequence based on the current state and considering the information from previous states. The use of neural networks allows generation of summaries for long text sentences as well. This paper implements a semantic based filtering approach using a similarity matrix while keeping all stop-words. The similarity is calculated using semantic concepts and Jiang–Conrath similarity and making use of a recurrent neural network with an attention mechanism to generate summary. ROUGE score is used for measuring accuracy, precision and recall scores.

Download Full-text

A Study on Tools and Techniques of Big Data Analytics for Text Summarization From Multi-Documents

Advances in Marketing, Customer Relationship Management, and E-Services - Big Data Analytics for Improved Accuracy, Efficiency, and Decision Making in Digital Marketing ◽

10.4018/978-1-7998-7231-3.ch002 ◽

2021 ◽

pp. 19-32

Author(s):

Martin Aruldoss ◽

Miranda Lakshmi Travis

Keyword(s):

Big Data ◽

Search Algorithm ◽

Big Data Analytics ◽

Text Summarization ◽

Advanced Technology ◽

Document Summarization ◽

Extractive Summarization ◽

Internet Resources ◽

Abstractive Summarization ◽

Tools And Techniques

Multi-document summarization extracts and summarizes the information without affecting its original context from the different sources of documents. It has been carried out using extractive text summarization and abstractive text summarization. Extractive summarization extracts summaries from verbatim lines, and abstractive summarization extracts new lines of summary from the source documents. Abstractive summarization is an advanced technology compared to extractive summarization. This research studies extractive summarization of multi documents from internet resources using word frequency counting and with maximum coverage using K-means clustering. In an internet search, the search algorithm shows the results from different websites using crawling and indexing. However, the search and text summary take place from hundreds, thousands, maybe millions of documents. To handle and manipulate these huge amounts of information, big data and its techniques are applied widely. This research also addresses big data techniques and tools that are available for multi-document summarization.

Download Full-text

Automatic Text Summarization Using Deep Reinforcement Learning and Beyond

Information Technology And Control ◽

10.5755/j01.itc.50.3.28047 ◽

2021 ◽

Vol 50 (3) ◽

pp. 458-469

Author(s):

Gang Sun ◽

Zhongxin Wang ◽

Jia Zhao

Keyword(s):

Information Overload ◽

Optimization Method ◽

Text Summarization ◽

Original Text ◽

Baseline Model ◽

Extractive Summarization ◽

Automatic Text Summarization ◽

Text Information ◽

Abstractive Summarization ◽

Automatic Text

In the era of big data, information overload problems are becoming increasingly prominent. It is challengingfor machines to understand, compress and filter massive text information through the use of artificial intelligencetechnology. The emergence of automatic text summarization mainly aims at solving the problem ofinformation overload, and it can be divided into two types: extractive and abstractive. The former finds somekey sentences or phrases from the original text and combines them into a summarization; the latter needs acomputer to understand the content of the original text and then uses the readable language for the human tosummarize the key information of the original text. This paper presents a two-stage optimization method forautomatic text summarization that combines abstractive summarization and extractive summarization. First,a sequence-to-sequence model with the attention mechanism is trained as a baseline model to generate initialsummarization. Second, it is updated and optimized directly on the ROUGE metric by using deep reinforcementlearning (DRL). Experimental results show that compared with the baseline model, Rouge-1, Rouge-2,and Rouge-L have been increased on the LCSTS dataset and CNN/DailyMail dataset.

Download Full-text

Extractive Text Summarization using Recurrent Neural Networks with Attention Mechanism

10.5121/csit.2021.111518 ◽

2021 ◽

Author(s):

Shimirwa Aline Valerie ◽

Jian Xu

Keyword(s):

Recurrent Neural Networks ◽

State Of The Art ◽

Attention Mechanism ◽

Text Summarization ◽

Data Driven ◽

Future Research ◽

Generalization Capability ◽

Extractive Summarization ◽

Benchmark Datasets ◽

Set Up

Extractive summarization aims to select the most important sentences or words from a document to generate a summary. Traditional summarization approaches have relied extensively on features manually designed by humans. In this paper, based on the recurrent neural network equipped with the attention mechanism, we propose a data-driven technique. We set up a general framework that consists of a hierarchical sentence encoder and an attentionbased sentence extractor. The framework allows us to establish various extractive summarization models and explore them. Comprehensive experiments are conducted on two benchmark datasets, and experimental results show that training extractive models based on Reward Augmented Maximum Likelihood (RAML)can improve the model’s generalization capability. And we realize that complicated components of the state-of-the-art extractive models do not attain good performance over simpler ones. We hope that our work can give more hints for future research on extractive text summarization.

Download Full-text

SGAN4AbSum: A Semantic-Enhanced Generative Adversarial Network for Abstractive Text Summarization

10.21203/rs.3.rs-648146/v1 ◽

2021 ◽

Author(s):

Tham Vo

Keyword(s):

Ground Truth ◽

Text Summarization ◽

Generative Adversarial Network ◽

Convolutional Network ◽

Training Strategy ◽

Adversarial Network ◽

Deep Recurrent Neural Network ◽

Benchmark Datasets ◽

Latent Representations ◽

Abstractive Summarization

Abstract In abstractive summarization task, most of proposed models adopt the deep recurrent neural network (RNN)-based encoder-decoder architecture to learn and generate meaningful summary for a given input document. However, most of recent RNN-based models always suffer the challenges related to the involvement of much capturing high-frequency/reparative phrases in long documents during the training process which leads to the outcome of trivial and generic summaries are generated. Moreover, the lack of thorough analysis on the sequential and long-range dependency relationships between words within different contexts while learning the textual representation also make the generated summaries unnatural and incoherent. To deal with these challenges, in this paper we proposed a novel semantic-enhanced generative adversarial network (GAN)-based approach for abstractive text summarization task, called as: SGAN4AbSum. We use an adversarial training strategy for our text summarization model in which train the generator and discriminator to simultaneously handle the summary generation and distinguishing the generated summary with the ground-truth one. The input of generator is the jointed rich-semantic and global structural latent representations of training documents which are achieved by applying a combined BERT and graph convolutional network (GCN) textual embedding mechanism. Extensive experiments in benchmark datasets demonstrate the effectiveness of our proposed SGAN4AbSum which achieve the competitive ROUGE-based scores in comparing with state-of-the-art abstractive text summarization baselines.

Download Full-text

Research on Text Summarization Generation Based on LSTM and Attention Mechanism

10.1109/ispds54097.2021.00048 ◽

2021 ◽

Author(s):

Meiqi Ji ◽

Ruiling Fu ◽

Tongtong Xing ◽

Fulian Yin

Keyword(s):

Attention Mechanism ◽

Text Summarization

Download Full-text

A Pointer Generator Network Model to Automatic Text Summarization and Headline Generation

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e1094.0785s319 ◽

2019 ◽

Vol 8 (5S3) ◽

pp. 447-451

Keyword(s):

Neural Network ◽

Network Model ◽

Recurrent Neural Network ◽

Text Summarization ◽

Daily Mail ◽

Automatic Text Summarization ◽

Generator Model ◽

Abstractive Summarization ◽

Automatic Text

In a world where information is growing rapidly every single day, we need tools to generate summary and headlines from text which is accurate as well as short and precise. In this paper, we have described a method for generating headlines from article. This is done by using hybrid pointer-generator network with attention distribution and coverage mechanism on article which generates abstractive summarization followed by the application of encoder-decoder recurrent neural network with LSTM unit to generate headlines from the summary. Hybrid pointer generator model helps in removing inaccuracy as well as repetitions. We have used CNN / Daily Mail as our dataset.

Download Full-text

A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/591 ◽

2018 ◽

Cited By ~ 15

Author(s):

Shuming Ma ◽

Xu Sun ◽

Junyang Lin ◽

Xuancheng Ren

Keyword(s):

Hierarchical Structure ◽

Online Reviews ◽

Text Summarization ◽

Sentiment Classification ◽

Experimental Results ◽

Joint Learning ◽

End To End ◽

Abstractive Summarization ◽

Main Ideas ◽

Different Levels

Text summarization and sentiment classification both aim to capture the main ideas of the text but at different levels. Text summarization is to describe the text within a few sentences, while sentiment classification can be regarded as a special type of summarization which ``summarizes'' the text into a even more abstract fashion, i.e., a sentiment class. Based on this idea, we propose a hierarchical end-to-end model for joint learning of text summarization and sentiment classification, where the sentiment classification label is treated as the further ``summarization'' of the text summarization output. Hence, the sentiment classification layer is put upon the text summarization layer, and a hierarchical structure is derived. Experimental results on Amazon online reviews datasets show that our model achieves better performance than the strong baseline systems on both abstractive summarization and sentiment classification.

Download Full-text