A Systematic Survey on Multi-document Text Summarization

Automatic text summarization is a technique of generating short and accurate summary of a longer text document. Text summarization can be classified based on the number of input documents (single document and multi-document summarization) and based on the characteristics of the summary generated (extractive and abstractive summarization). Multi-document summarization is an automatic process of creating relevant, informative and concise summary from a cluster of related documents. This paper does a detailed survey on the existing literature on the various approaches for text summarization. Few of the most popular approaches such as graph based, cluster based and deep learning-based summarization techniques are discussed here along with the evaluation metrics, which can provide an insight to the future researchers.

Download Full-text

A Quantum-Inspired Genetic Algorithm for Extractive Text Summarization

International Journal of Natural Computing Research ◽

10.4018/ijncr.2021040103 ◽

2021 ◽

Vol 10 (2) ◽

pp. 42-60

Author(s):

Khadidja Chettah ◽

Amer Draa

Keyword(s):

Genetic Algorithm ◽

State Of The Art ◽

Text Summarization ◽

Automated System ◽

Evaluation Metrics ◽

Document Summarization ◽

Automatic Text Summarization ◽

Reference Methods ◽

Textual Data ◽

Automatic Text

Automatic text summarization has recently become a key instrument for reducing the huge quantity of textual data. In this paper, the authors propose a quantum-inspired genetic algorithm (QGA) for extractive single-document summarization. The QGA is used inside a totally automated system as an optimizer to search for the best combination of sentences to be put in the final summary. The presented approach is compared with 11 reference methods including supervised and unsupervised summarization techniques. They have evaluated the performances of the proposed approach on the DUC 2001 and DUC 2002 datasets using the ROUGE-1 and ROUGE-2 evaluation metrics. The obtained results show that the proposal can compete with other state-of-the-art methods. It is ranked first out of 12, outperforming all other algorithms.

Download Full-text

A Pointer Generator Network Model to Automatic Text Summarization and Headline Generation

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e1094.0785s319 ◽

2019 ◽

Vol 8 (5S3) ◽

pp. 447-451

Keyword(s):

Neural Network ◽

Network Model ◽

Recurrent Neural Network ◽

Text Summarization ◽

Daily Mail ◽

Automatic Text Summarization ◽

Generator Model ◽

Abstractive Summarization ◽

Automatic Text

In a world where information is growing rapidly every single day, we need tools to generate summary and headlines from text which is accurate as well as short and precise. In this paper, we have described a method for generating headlines from article. This is done by using hybrid pointer-generator network with attention distribution and coverage mechanism on article which generates abstractive summarization followed by the application of encoder-decoder recurrent neural network with LSTM unit to generate headlines from the summary. Hybrid pointer generator model helps in removing inaccuracy as well as repetitions. We have used CNN / Daily Mail as our dataset.

Download Full-text

SISTEM AUTOMATIC TEXT SUMMARIZATION MENGGUNAKAN ALGORITMA TEXTRANK

MATICS ◽

10.18860/mat.v12i2.8372 ◽

2020 ◽

Vol 12 (2) ◽

pp. 111-116

Author(s):

Muhammad Adib Zamzam

Keyword(s):

Text Summarization ◽

Automatic Text Summarization ◽

Unit Unit ◽

Abstractive Summarization ◽

Automatic Text

Text summarization (perangkuman teks) adalah pendekatan yang bisa digunakan untuk meringkas atau memadatkan teks artikel yang panjang menjadi lebih pendek dan ringkas sehingga hasil rangkuman teks yang relatif lebih pendek bisa mewakilkan teks yang panjang. Automatic Text Summarization adalah perangkuman teks yang dilakukan secara otomatis oleh komputer. Terdapat dua macam algoritma Automatic Text Summarization yaitu Extraction-based summarization dan Abstractive summarization. Algoritma TextRank merupakan algoritma extraction-based atau extractive, dimana ekstraksi di sini berarti memilih unit teks (kalimat, segmen-segmen kalimat, paragraf atau passages), lalu dianggap berisi informasi penting dari dokumen dan menyusun unit-unit (kalimat-kalimat) tersebut dengan cara yang benar. Hasil penelitian dengan input 50 artikel dan hasil rangkuman sebanyak 12,5% dari teks asli menunjukkan bahwa sistem memiliki nilai recall ROUGE 41,659 %. Nilai tertinggi recall ROUGE tertinggi tercatat pada artikel 48 dengan nilai 0,764. Nilai terendah recall ROUGE tercatat pada artikel 37 dengan nilai 0,167.

Download Full-text

A Survey on Deep Learning-Based Automatic Text Summarization Models

Advances in Intelligent Systems and Computing - Advances in Artificial Intelligence and Data Engineering ◽

10.1007/978-981-15-3514-7_30 ◽

2020 ◽

pp. 377-392

Author(s):

P. G. Magdum ◽

Sheetal Rathi

Keyword(s):

Deep Learning ◽

Text Summarization ◽

Automatic Text Summarization ◽

Automatic Text

Download Full-text

A Framework for Word Embedding Based Automatic Text Summarization and Evaluation

Information ◽

10.3390/info11020078 ◽

2020 ◽

Vol 11 (2) ◽

pp. 78 ◽

Cited By ~ 2

Author(s):

Tulu Tilahun Hailu ◽

Junqing Yu ◽

Tessfu Geteye Fantaye

Keyword(s):

Text Summarization ◽

Evaluation Framework ◽

Word Embedding ◽

Evaluation Metrics ◽

Original Text ◽

Automatic Evaluation ◽

Source Text ◽

Automatic Text Summarization ◽

Automatic Text

Text summarization is a process of producing a concise version of text (summary) from one or more information sources. If the generated summary preserves meaning of the original text, it will help the users to make fast and effective decision. However, how much meaning of the source text can be preserved is becoming harder to evaluate. The most commonly used automatic evaluation metrics like Recall-Oriented Understudy for Gisting Evaluation (ROUGE) strictly rely on the overlapping n-gram units between reference and candidate summaries, which are not suitable to measure the quality of abstractive summaries. Another major challenge to evaluate text summarization systems is lack of consistent ideal reference summaries. Studies show that human summarizers can produce variable reference summaries of the same source that can significantly affect automatic evaluation metrics scores of summarization systems. Humans are biased to certain situation while producing summary, even the same person perhaps produces substantially different summaries of the same source at different time. This paper proposes a word embedding based automatic text summarization and evaluation framework, which can successfully determine salient top-n sentences of a source text as a reference summary, and evaluate the quality of systems summaries against it. Extensive experimental results demonstrate that the proposed framework is effective and able to outperform several baseline methods with regard to both text summarization systems and automatic evaluation metrics when tested on a publicly available dataset.

Download Full-text

Bilingual automatic text summarization using unsupervised deep learning

2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT) ◽

10.1109/iceeot.2016.7754874 ◽

2016 ◽

Cited By ~ 12

Author(s):

Shashi Pal Singh ◽

Ajai Kumar ◽

Abhilasha Mangal ◽

Shikha Singhal

Keyword(s):

Deep Learning ◽

Text Summarization ◽

Automatic Text Summarization ◽

Unsupervised Deep Learning ◽

Automatic Text

Download Full-text

Survey of Scientific Document Summarization Techniques

Computer Science ◽

10.7494/csci.2020.21.2.3356 ◽

2020 ◽

Vol 21 (2) ◽

Author(s):

Sheena Kurian K ◽

Sheena Mathew

Keyword(s):

Text Summarization ◽

Exponential Rate ◽

Research Papers ◽

Document Summarization ◽

Automatic Text Summarization ◽

Scientific Document Summarization ◽

Pros And Cons ◽

Comparison Of The Results ◽

Evaluation Techniques ◽

Automatic Text

The number of scientic or research papers published every year is growing at an exponential rate, which has led to an intensive research in scientic document summarization. The different methods commonly used in automatic text summarization are discussed in this paper with their pros and cons. Commonly used evaluation techniques and datasets in this field are also discussed. Rouge and Pyramid scores of the different methods are tabulated for easy comparison of the results.

Download Full-text

Automatic Text Summarization Using Latent Drichlet Allocation (LDA) for Document Clustering

International Journal of Advances in Intelligent Informatics ◽

10.26555/ijain.v1i3.43 ◽

2015 ◽

Vol 1 (3) ◽

pp. 132 ◽

Cited By ~ 5

Author(s):

Erwin Yudi Hidayat ◽

Fahri Firdausillah ◽

Khafiizh Hastuti ◽

Ika Novita Dewi ◽

Azhari Azhari

Keyword(s):

Clustering Algorithm ◽

Document Clustering ◽

Text Summarization ◽

Data Set ◽

Document Summarization ◽

Automatic Text Summarization ◽

Improve Accuracy ◽

Automatic Document Summarization ◽

Document Compression ◽

Automatic Text

In this paper, we present Latent Drichlet Allocation in automatic text summarization to improve accuracy in document clustering. The experiments involving 398 data set from public blog article obtained by using python scrapy crawler and scraper. Several steps of clustering in this research are preprocessing, automatic document compression using feature method, automatic document compression using LDA, word weighting and clustering algorithm The results show that automatic document summarization with LDA reaches 72% in LDA 40%, compared to traditional k-means method which only reaches 66%.

Download Full-text

Abstractive Summarization: A Survey of the State of the Art

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019815 ◽

2019 ◽

Vol 33 ◽

pp. 9815-9822 ◽

Cited By ~ 5

Author(s):

Hui Lin ◽

Vincent Ng

Keyword(s):

Machine Translation ◽

State Of The Art ◽

The State ◽

Text Summarization ◽

Abstract Representation ◽

Automatic Text Summarization ◽

Input Text ◽

Gradual Shift ◽

Abstractive Summarization ◽

Automatic Text

The focus of automatic text summarization research has exhibited a gradual shift from extractive methods to abstractive methods in recent years, owing in part to advances in neural methods. Originally developed for machine translation, neural methods provide a viable framework for obtaining an abstract representation of the meaning of an input text and generating informative, fluent, and human-like summaries. This paper surveys existing approaches to abstractive summarization, focusing on the recently developed neural approaches.

Download Full-text

MultiSumm: Towards a Unified Model for Multi-Lingual Abstractive Summarization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5328 ◽

2020 ◽

Vol 34 (01) ◽

pp. 11-18

Author(s):

Yue Cao ◽

Xiaojun Wan ◽

Jinge Yao ◽

Dian Yu

Keyword(s):

Language Model ◽

Text Summarization ◽

Additional Contribution ◽

Automatic Text Summarization ◽

Proposed Model ◽

Back Translation ◽

Model Training ◽

Abstractive Summarization ◽

Multiple Languages ◽

Automatic Text

Automatic text summarization aims at producing a shorter version of the input text that conveys the most important information. However, multi-lingual text summarization, where the goal is to process texts in multiple languages and output summaries in the corresponding languages with a single model, has been rarely studied. In this paper, we present MultiSumm, a novel multi-lingual model for abstractive summarization. The MultiSumm model uses the following training regime: (I) multi-lingual learning that contains language model training, auto-encoder training, translation and back-translation training, and (II) joint summary generation training. We conduct experiments on summarization datasets for five rich-resource languages: English, Chinese, French, Spanish, and German, as well as two low-resource languages: Bosnian and Croatian. Experimental results show that our proposed model significantly outperforms a multi-lingual baseline model. Specifically, our model achieves comparable or even better performance than models trained separately on each language. As an additional contribution, we construct the first summarization dataset for Bosnian and Croatian, containing 177,406 and 204,748 samples, respectively.

Download Full-text