A Framework for Word Embedding Based Automatic Text Summarization and Evaluation

Text summarization is a process of producing a concise version of text (summary) from one or more information sources. If the generated summary preserves meaning of the original text, it will help the users to make fast and effective decision. However, how much meaning of the source text can be preserved is becoming harder to evaluate. The most commonly used automatic evaluation metrics like Recall-Oriented Understudy for Gisting Evaluation (ROUGE) strictly rely on the overlapping n-gram units between reference and candidate summaries, which are not suitable to measure the quality of abstractive summaries. Another major challenge to evaluate text summarization systems is lack of consistent ideal reference summaries. Studies show that human summarizers can produce variable reference summaries of the same source that can significantly affect automatic evaluation metrics scores of summarization systems. Humans are biased to certain situation while producing summary, even the same person perhaps produces substantially different summaries of the same source at different time. This paper proposes a word embedding based automatic text summarization and evaluation framework, which can successfully determine salient top-n sentences of a source text as a reference summary, and evaluate the quality of systems summaries against it. Extensive experimental results demonstrate that the proposed framework is effective and able to outperform several baseline methods with regard to both text summarization systems and automatic evaluation metrics when tested on a publicly available dataset.

Download Full-text

A SYNTACTIC-BASED SENTENCE VALIDATION TECHNIQUE FOR MALAY TEXT SUMMARIZER

Journal of Information and Communication Technology ◽

10.32890/jict2021.20.3.3 ◽

2021 ◽

Vol 20 (Number 3) ◽

pp. 329-352

Author(s):

Suraya Alias ◽

Mohd Shamrie Sainin ◽

Siti Khaotijah Mohammad

Keyword(s):

Language Processing ◽

Text Summarization ◽

Compression Rate ◽

Automatic Evaluation ◽

Readability Score ◽

Automatic Text Summarization ◽

Validation Technique ◽

Automatic Text ◽

F Measure

In the Automatic Text Summarization domain, a Sentence Compression (SC) technique is applied to the summary sentence to remove unnecessary words or phrases. The purpose of SC is to preserve the important information in the sentence and to remove the unnecessary ones without sacrificing the sentence's grammar. The existing development of Malay Natural Language Processing (NLP) tools is still under study with limited open access. The issue is the lack of a benchmark dataset in the Malay language to evaluate the quality of the summaries and to validate the compressed sentence produced by the summarizer model. Hence, our paper outlines a Syntactic-based Sentence Validation technique for Malay sentences by referring to the Malay Grammar Pattern. In this work, we propose a new derivation set of Syntactic Rules based on the Malay main Word Class to validate a Malay sentence that undergoes the SC procedure. We experimented using the Malay dataset of 100 new articles covering the Natural Disaster and Events domain to find the optimal compression rate and its effect on the summary content. An automatic evaluation using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) produced a result with an average F-measure of 0.5826 and an average Recall value of 0.5925 with an optimum compression rate of 0.5 Confidence Conf value. Furthermore, a manual summary evaluation by a group of Malay experts on the grammaticality of the compressed summary sentence produced a good result of 4.11 and a readability score of 4.12 out of 5. This depicts the reliability of the proposed technique to validate the Malay sentence with promising summary content and readability results.

Download Full-text

A Quantum-Inspired Genetic Algorithm for Extractive Text Summarization

International Journal of Natural Computing Research ◽

10.4018/ijncr.2021040103 ◽

2021 ◽

Vol 10 (2) ◽

pp. 42-60

Author(s):

Khadidja Chettah ◽

Amer Draa

Keyword(s):

Genetic Algorithm ◽

State Of The Art ◽

Text Summarization ◽

Automated System ◽

Evaluation Metrics ◽

Document Summarization ◽

Automatic Text Summarization ◽

Reference Methods ◽

Textual Data ◽

Automatic Text

Automatic text summarization has recently become a key instrument for reducing the huge quantity of textual data. In this paper, the authors propose a quantum-inspired genetic algorithm (QGA) for extractive single-document summarization. The QGA is used inside a totally automated system as an optimizer to search for the best combination of sentences to be put in the final summary. The presented approach is compared with 11 reference methods including supervised and unsupervised summarization techniques. They have evaluated the performances of the proposed approach on the DUC 2001 and DUC 2002 datasets using the ROUGE-1 and ROUGE-2 evaluation metrics. The obtained results show that the proposal can compete with other state-of-the-art methods. It is ranked first out of 12, outperforming all other algorithms.

Download Full-text

A Text Abstraction Summary Model Based on BERT Word Embedding and Reinforcement Learning

Applied Sciences ◽

10.3390/app9214701 ◽

2019 ◽

Vol 9 (21) ◽

pp. 4701 ◽

Cited By ~ 8

Author(s):

Qicai Wang ◽

Peiyu Liu ◽

Zhenfang Zhu ◽

Hongxia Yin ◽

Qiuyue Zhang ◽

...

Keyword(s):

Reinforcement Learning ◽

Language Processing ◽

Evaluation Method ◽

Ground Truth ◽

Text Summarization ◽

Word Embedding ◽

Text Representation ◽

Daily Mail ◽

Automatic Text Summarization ◽

Automatic Text

As a core task of natural language processing and information retrieval, automatic text summarization is widely applied in many fields. There are two existing methods for text summarization task at present: abstractive and extractive. On this basis we propose a novel hybrid model of extractive-abstractive to combine BERT (Bidirectional Encoder Representations from Transformers) word embedding with reinforcement learning. Firstly, we convert the human-written abstractive summaries to the ground truth labels. Secondly, we use BERT word embedding as text representation and pre-train two sub-models respectively. Finally, the extraction network and the abstraction network are bridged by reinforcement learning. To verify the performance of the model, we compare it with the current popular automatic text summary model on the CNN/Daily Mail dataset, and use the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metrics as the evaluation method. Extensive experimental results show that the accuracy of the model is improved obviously.

Download Full-text

Developing a new approach to summarize Arabic text automatically using syntactic and semantic analysis

International Journal of Engineering & Technology ◽

10.14419/ijet.v9i2.30324 ◽

2020 ◽

Vol 9 (2) ◽

pp. 342

Author(s):

Amal Alkhudari

Keyword(s):

Language Processing ◽

Automatic System ◽

Semantic Analysis ◽

Text Summarization ◽

Original Text ◽

Arabic Text ◽

Wide Spread ◽

New Approach ◽

Automatic Text Summarization ◽

Automatic Text

Due to the wide spread information and the diversity of its sources, there is a need to produce an accurate text summary with the least time and effort. This summary must preserve key information content and overall meaning of the original text. Text summarization is one of the most important applications of Natural Language Processing (NLP). The goal of automatic text summarization is to create summaries that are similar to human-created ones. However, in many cases, the readability of created summaries is not satisfactory, because the summaries do not consider the meaning of the words and do not cover all the semantically relevant aspects of data. In this paper we use syntactic and semantic analysis to propose an automatic system of Arabic texts summarization. This system is capable of understanding the meaning of information and retrieves only the relevant part. The effectiveness and evaluation of the proposed work are demonstrated under EASC corpus using Rouge measure. The generated summaries will be compared against those done by human and precedent researches.

Download Full-text

A New Biomimetic Method Based on the Power Saves of Social Bees for Automatic Summaries of Texts by Extraction

International Journal of Software Science and Computational Intelligence ◽

10.4018/ijssci.2015010102 ◽

2015 ◽

Vol 7 (1) ◽

pp. 18-38 ◽

Cited By ~ 5

Author(s):

Mohamed Amine Boudia ◽

Reda Mohamed Hamou ◽

Abdelmalek Amine ◽

Amine Rahmani

Keyword(s):

Text Summarization ◽

Second Step ◽

Original Text ◽

Simple Majority ◽

New Approach ◽

Social Bees ◽

Automatic Text Summarization ◽

Final Layer ◽

Biomimetic Method ◽

Automatic Text

In this paper, the authors propose a new approach for automatic text summarization by extraction based on Saving Energy Function where the first step constitute to use two techniques of extraction: scoring of phrases, and similarity that aims to eliminate redundant phrases without losing the theme of the text. While the second step aims to optimize the results of the previous layer by the metaheuristic based on Bee Algorithm, the objective function of the optimization is to maximize the sum of similarity between phrases of the candidate summary in order to keep the theme of the text, minimize the sum of scores in order to increase the summarization rate, this optimization also will give a candidate's summary where the order of the phrases changes compared to the original text. The third and final layer aims to choose the best summary from the candidate summaries generated by bee optimization, the authors opted for the technique of voting with a simple majority.

Download Full-text

Better Metrics to Automatically Predict the Quality of a Text Summary

Algorithms ◽

10.3390/a5040398 ◽

2012 ◽

Vol 5 (4) ◽

pp. 398-420 ◽

Cited By ~ 4

Author(s):

Peter A. Rankel ◽

John M. Conroy ◽

Judith D. Schlesinger

Keyword(s):

Least Squares ◽

Canonical Correlation ◽

Robust Regression ◽

Text Summarization ◽

Automatic Text Summarization ◽

Eigenvalue Method ◽

Automatic Text

In this paper we demonstrate a family of metrics for estimating the quality of a text summary relative to one or more human-generated summaries. The improved metrics are based on features automatically computed from the summaries to measure content and linguistic quality. The features are combined using one of three methods—robust regression, non-negative least squares, or canonical correlation, an eigenvalue method. The new metrics significantly outperform the previous standard for automatic text summarization evaluation, ROUGE.

Download Full-text

Automatic Text Summarization by Providing Coverage, Non-Redundancy, and Novelty Using Sentence Graph

Journal of Information Technology Research ◽

10.4018/jitr.2022010108 ◽

2022 ◽

Vol 15 (1) ◽

pp. 1-18

Author(s):

Krishnaveni P. ◽

Balasundaram S. R.

Keyword(s):

Graph Algorithms ◽

Maximal Clique ◽

Text Summarization ◽

Original Text ◽

Online Information ◽

Automatic Text Summarization ◽

Global Properties ◽

Input Text ◽

Local Properties ◽

Automatic Text

The day-to-day growth of online information necessitates intensive research in automatic text summarization (ATS). The ATS software produces summary text by extracting important information from the original text. With the help of summaries, users can easily read and understand the documents of interest. Most of the approaches for ATS used only local properties of text. Moreover, the numerous properties make the sentence selection difficult and complicated. So this article uses a graph based summarization to utilize structural and global properties of text. It introduces maximal clique based sentence selection (MCBSS) algorithm to select important and non-redundant sentences that cover all concepts of the input text for summary. The MCBSS algorithm finds novel information using maximal cliques (MCs). The experimental results of recall oriented understudy for gisting evaluation (ROUGE) on Timeline dataset show that the proposed work outperforms the existing graph algorithms Bushy Path (BP), Aggregate Similarity (AS), and TextRank (TR).

Download Full-text

A Systematic Survey on Multi-document Text Summarization

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/111062021 ◽

2021 ◽

Vol 10 (6) ◽

pp. 3148-3153

Keyword(s):

Deep Learning ◽

Text Summarization ◽

Evaluation Metrics ◽

Automatic Process ◽

Document Summarization ◽

Text Document ◽

Automatic Text Summarization ◽

As Graph ◽

Abstractive Summarization ◽

Automatic Text

Automatic text summarization is a technique of generating short and accurate summary of a longer text document. Text summarization can be classified based on the number of input documents (single document and multi-document summarization) and based on the characteristics of the summary generated (extractive and abstractive summarization). Multi-document summarization is an automatic process of creating relevant, informative and concise summary from a cluster of related documents. This paper does a detailed survey on the existing literature on the various approaches for text summarization. Few of the most popular approaches such as graph based, cluster based and deep learning-based summarization techniques are discussed here along with the evaluation metrics, which can provide an insight to the future researchers.

Download Full-text

Automatic Text Summarization Using Deep Reinforcement Learning and Beyond

Information Technology And Control ◽

10.5755/j01.itc.50.3.28047 ◽

2021 ◽

Vol 50 (3) ◽

pp. 458-469

Author(s):

Gang Sun ◽

Zhongxin Wang ◽

Jia Zhao

Keyword(s):

Information Overload ◽

Optimization Method ◽

Text Summarization ◽

Original Text ◽

Baseline Model ◽

Extractive Summarization ◽

Automatic Text Summarization ◽

Text Information ◽

Abstractive Summarization ◽

Automatic Text

In the era of big data, information overload problems are becoming increasingly prominent. It is challengingfor machines to understand, compress and filter massive text information through the use of artificial intelligencetechnology. The emergence of automatic text summarization mainly aims at solving the problem ofinformation overload, and it can be divided into two types: extractive and abstractive. The former finds somekey sentences or phrases from the original text and combines them into a summarization; the latter needs acomputer to understand the content of the original text and then uses the readable language for the human tosummarize the key information of the original text. This paper presents a two-stage optimization method forautomatic text summarization that combines abstractive summarization and extractive summarization. First,a sequence-to-sequence model with the attention mechanism is trained as a baseline model to generate initialsummarization. Second, it is updated and optimized directly on the ROUGE metric by using deep reinforcementlearning (DRL). Experimental results show that compared with the baseline model, Rouge-1, Rouge-2,and Rouge-L have been increased on the LCSTS dataset and CNN/DailyMail dataset.

Download Full-text

A Long Texts Summarization Approach to Scientific Articles

10.5753/stil.2021.17797 ◽

2021 ◽

Author(s):

Cinthia M. Souza ◽

Renato Vimieiro

Keyword(s):

Deep Learning ◽

Real World ◽

Text Summarization ◽

Scientific Texts ◽

Automatic Text Summarization ◽

Single Section ◽

Recent Advances ◽

Automatic Text ◽

Entire Text

Automatic text summarization aims at condensing the contents of a text into a simple and descriptive summary. Summarization techniques drastically benefited from the recent advances in Deep Learning. Nevertheless, these techniques are still unable to properly deal with long texts. In this work, we investigate whether the combination of summaries extracted from multiple sections of long scientific texts may enhance the quality of the summary for the whole document. We conduct experiments on a real world corpus to assess the effectiveness of our proposal. The results show that our multi-section proposal is as good as summaries generated using the entire text as input and twice as good as single section.

Download Full-text