Better Metrics to Automatically Predict the Quality of a Text Summary

In this paper we demonstrate a family of metrics for estimating the quality of a text summary relative to one or more human-generated summaries. The improved metrics are based on features automatically computed from the summaries to measure content and linguistic quality. The features are combined using one of three methods—robust regression, non-negative least squares, or canonical correlation, an eigenvalue method. The new metrics significantly outperform the previous standard for automatic text summarization evaluation, ROUGE.

Download Full-text

A Framework for Word Embedding Based Automatic Text Summarization and Evaluation

Information ◽

10.3390/info11020078 ◽

2020 ◽

Vol 11 (2) ◽

pp. 78 ◽

Cited By ~ 2

Author(s):

Tulu Tilahun Hailu ◽

Junqing Yu ◽

Tessfu Geteye Fantaye

Keyword(s):

Text Summarization ◽

Evaluation Framework ◽

Word Embedding ◽

Evaluation Metrics ◽

Original Text ◽

Automatic Evaluation ◽

Source Text ◽

Automatic Text Summarization ◽

Automatic Text

Text summarization is a process of producing a concise version of text (summary) from one or more information sources. If the generated summary preserves meaning of the original text, it will help the users to make fast and effective decision. However, how much meaning of the source text can be preserved is becoming harder to evaluate. The most commonly used automatic evaluation metrics like Recall-Oriented Understudy for Gisting Evaluation (ROUGE) strictly rely on the overlapping n-gram units between reference and candidate summaries, which are not suitable to measure the quality of abstractive summaries. Another major challenge to evaluate text summarization systems is lack of consistent ideal reference summaries. Studies show that human summarizers can produce variable reference summaries of the same source that can significantly affect automatic evaluation metrics scores of summarization systems. Humans are biased to certain situation while producing summary, even the same person perhaps produces substantially different summaries of the same source at different time. This paper proposes a word embedding based automatic text summarization and evaluation framework, which can successfully determine salient top-n sentences of a source text as a reference summary, and evaluate the quality of systems summaries against it. Extensive experimental results demonstrate that the proposed framework is effective and able to outperform several baseline methods with regard to both text summarization systems and automatic evaluation metrics when tested on a publicly available dataset.

Download Full-text

A SYNTACTIC-BASED SENTENCE VALIDATION TECHNIQUE FOR MALAY TEXT SUMMARIZER

Journal of Information and Communication Technology ◽

10.32890/jict2021.20.3.3 ◽

2021 ◽

Vol 20 (Number 3) ◽

pp. 329-352

Author(s):

Suraya Alias ◽

Mohd Shamrie Sainin ◽

Siti Khaotijah Mohammad

Keyword(s):

Language Processing ◽

Text Summarization ◽

Compression Rate ◽

Automatic Evaluation ◽

Readability Score ◽

Automatic Text Summarization ◽

Validation Technique ◽

Automatic Text ◽

F Measure

In the Automatic Text Summarization domain, a Sentence Compression (SC) technique is applied to the summary sentence to remove unnecessary words or phrases. The purpose of SC is to preserve the important information in the sentence and to remove the unnecessary ones without sacrificing the sentence's grammar. The existing development of Malay Natural Language Processing (NLP) tools is still under study with limited open access. The issue is the lack of a benchmark dataset in the Malay language to evaluate the quality of the summaries and to validate the compressed sentence produced by the summarizer model. Hence, our paper outlines a Syntactic-based Sentence Validation technique for Malay sentences by referring to the Malay Grammar Pattern. In this work, we propose a new derivation set of Syntactic Rules based on the Malay main Word Class to validate a Malay sentence that undergoes the SC procedure. We experimented using the Malay dataset of 100 new articles covering the Natural Disaster and Events domain to find the optimal compression rate and its effect on the summary content. An automatic evaluation using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) produced a result with an average F-measure of 0.5826 and an average Recall value of 0.5925 with an optimum compression rate of 0.5 Confidence Conf value. Furthermore, a manual summary evaluation by a group of Malay experts on the grammaticality of the compressed summary sentence produced a good result of 4.11 and a readability score of 4.12 out of 5. This depicts the reliability of the proposed technique to validate the Malay sentence with promising summary content and readability results.

Download Full-text

A Long Texts Summarization Approach to Scientific Articles

10.5753/stil.2021.17797 ◽

2021 ◽

Author(s):

Cinthia M. Souza ◽

Renato Vimieiro

Keyword(s):

Deep Learning ◽

Real World ◽

Text Summarization ◽

Scientific Texts ◽

Automatic Text Summarization ◽

Single Section ◽

Recent Advances ◽

Automatic Text ◽

Entire Text

Automatic text summarization aims at condensing the contents of a text into a simple and descriptive summary. Summarization techniques drastically benefited from the recent advances in Deep Learning. Nevertheless, these techniques are still unable to properly deal with long texts. In this work, we investigate whether the combination of summaries extracted from multiple sections of long scientific texts may enhance the quality of the summary for the whole document. We conduct experiments on a real world corpus to assess the effectiveness of our proposal. The results show that our multi-section proposal is as good as summaries generated using the entire text as input and twice as good as single section.

Download Full-text

Improve the Quality of Important Sentences for Automatic Text Summarization

Computer Science & Information Technology (CS & IT) ◽

10.5121/csit.2017.71402 ◽

2017 ◽

Author(s):

Michael George

Keyword(s):

Text Summarization ◽

Automatic Text Summarization ◽

Automatic Text

Download Full-text

Automatic Text Summarization on Social Media

Proceedings of the 2020 4th International Symposium on Computer Science and Intelligent Control ◽

10.1145/3440084.3441182 ◽

2020 ◽

Author(s):

Zhang Kerui ◽

Hu Haichao ◽

Liu Yuxia

Keyword(s):

Social Media ◽

Text Summarization ◽

Automatic Text Summarization ◽

Automatic Text

Download Full-text

Using librarian techniques in automatic text summarization for information retrieval

Proceedings of the second ACM/IEEE-CS joint conference on Digital libraries - JCDL '02 ◽

10.1145/544220.544227 ◽

2002 ◽

Cited By ~ 7

Author(s):

Min-Yen Kan ◽

Judith L. Klavans

Keyword(s):

Information Retrieval ◽

Text Summarization ◽

Automatic Text Summarization ◽

Automatic Text

Download Full-text

A Quantum-Inspired Genetic Algorithm for Extractive Text Summarization

International Journal of Natural Computing Research ◽

10.4018/ijncr.2021040103 ◽

2021 ◽

Vol 10 (2) ◽

pp. 42-60

Author(s):

Khadidja Chettah ◽

Amer Draa

Keyword(s):

Genetic Algorithm ◽

State Of The Art ◽

Text Summarization ◽

Automated System ◽

Evaluation Metrics ◽

Document Summarization ◽

Automatic Text Summarization ◽

Reference Methods ◽

Textual Data ◽

Automatic Text

Automatic text summarization has recently become a key instrument for reducing the huge quantity of textual data. In this paper, the authors propose a quantum-inspired genetic algorithm (QGA) for extractive single-document summarization. The QGA is used inside a totally automated system as an optimizer to search for the best combination of sentences to be put in the final summary. The presented approach is compared with 11 reference methods including supervised and unsupervised summarization techniques. They have evaluated the performances of the proposed approach on the DUC 2001 and DUC 2002 datasets using the ROUGE-1 and ROUGE-2 evaluation metrics. The obtained results show that the proposal can compete with other state-of-the-art methods. It is ranked first out of 12, outperforming all other algorithms.

Download Full-text

Calculating the Upper Bounds for Portuguese Automatic Text Summarization Using Genetic Algorithm

Advances in Artificial Intelligence - IBERAMIA 2018 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-03928-8_36 ◽

2018 ◽

pp. 442-454 ◽

Cited By ~ 1

Author(s):

Jonathan Rojas-Simón ◽

Yulia Ledeneva ◽

René Arnulfo García-Hernández

Keyword(s):

Genetic Algorithm ◽

Upper Bounds ◽

Text Summarization ◽

Automatic Text Summarization ◽

Automatic Text

Download Full-text

Automatic Text Summarization Techniques Used in Industry

Proceedings of ICETIT 2019 - Lecture Notes in Electrical Engineering ◽

10.1007/978-3-030-30577-2_19 ◽

2019 ◽

pp. 229-237

Author(s):

Mukesh Kumar Kharita ◽

Pardeep Singh

Keyword(s):

Text Summarization ◽

Automatic Text Summarization ◽

Automatic Text

Download Full-text

Prediction and Analysis of Extracting Relations using Spacy Model

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f8524.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 3281-3287

Keyword(s):

Natural Language ◽

Information Extraction ◽

Performance Measures ◽

Text Summarization ◽

Language Understanding ◽

Language Generation ◽

Automatic Text Summarization ◽

Structured Information ◽

Automatic Text ◽

F Measure

Text is an extremely rich resources of information. Each and every second, minutes, peoples are sending or receiving hundreds of millions of data. There are various tasks involved in NLP are machine learning, information extraction, information retrieval, automatic text summarization, question-answered system, parsing, sentiment analysis, natural language understanding and natural language generation. The information extraction is an important task which is used to find the structured information from unstructured or semi-structured text. The paper presents a methodology for extracting the relations of biomedical entities using spacy. The framework consists of following phases such as data creation, load and converting the data into spacy object, preprocessing, define the pattern and extract the relations. The dataset is downloaded from NCBI database which contains only the sentences. The created model evaluated with performance measures like precision, recall and f-measure. The model achieved 87% of accuracy in retrieving of entities relation.

Download Full-text