scholarly journals A Long Texts Summarization Approach to Scientific Articles

2021 ◽  
Author(s):  
Cinthia M. Souza ◽  
Renato Vimieiro

Automatic text summarization aims at condensing the contents of a text into a simple and descriptive summary. Summarization techniques drastically benefited from the recent advances in Deep Learning. Nevertheless, these techniques are still unable to properly deal with long texts. In this work, we investigate whether the combination of summaries extracted from multiple sections of long scientific texts may enhance the quality of the summary for the whole document. We conduct experiments on a real world corpus to assess the effectiveness of our proposal. The results show that our multi-section proposal is as good as summaries generated using the entire text as input and twice as good as single section.

Information ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 78 ◽  
Author(s):  
Tulu Tilahun Hailu ◽  
Junqing Yu ◽  
Tessfu Geteye Fantaye

Text summarization is a process of producing a concise version of text (summary) from one or more information sources. If the generated summary preserves meaning of the original text, it will help the users to make fast and effective decision. However, how much meaning of the source text can be preserved is becoming harder to evaluate. The most commonly used automatic evaluation metrics like Recall-Oriented Understudy for Gisting Evaluation (ROUGE) strictly rely on the overlapping n-gram units between reference and candidate summaries, which are not suitable to measure the quality of abstractive summaries. Another major challenge to evaluate text summarization systems is lack of consistent ideal reference summaries. Studies show that human summarizers can produce variable reference summaries of the same source that can significantly affect automatic evaluation metrics scores of summarization systems. Humans are biased to certain situation while producing summary, even the same person perhaps produces substantially different summaries of the same source at different time. This paper proposes a word embedding based automatic text summarization and evaluation framework, which can successfully determine salient top-n sentences of a source text as a reference summary, and evaluate the quality of systems summaries against it. Extensive experimental results demonstrate that the proposed framework is effective and able to outperform several baseline methods with regard to both text summarization systems and automatic evaluation metrics when tested on a publicly available dataset.


2021 ◽  
Vol 20 (Number 3) ◽  
pp. 329-352
Author(s):  
Suraya Alias ◽  
Mohd Shamrie Sainin ◽  
Siti Khaotijah Mohammad

In the Automatic Text Summarization domain, a Sentence Compression (SC) technique is applied to the summary sentence to remove unnecessary words or phrases. The purpose of SC is to preserve the important information in the sentence and to remove the unnecessary ones without sacrificing the sentence's grammar. The existing development of Malay Natural Language Processing (NLP) tools is still under study with limited open access. The issue is the lack of a benchmark dataset in the Malay language to evaluate the quality of the summaries and to validate the compressed sentence produced by the summarizer model. Hence, our paper outlines a Syntactic-based Sentence Validation technique for Malay sentences by referring to the Malay Grammar Pattern. In this work, we propose a new derivation set of Syntactic Rules based on the Malay main Word Class to validate a Malay sentence that undergoes the SC procedure. We experimented using the Malay dataset of 100 new articles covering the Natural Disaster and Events domain to find the optimal compression rate and its effect on the summary content. An automatic evaluation using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) produced a result with an average F-measure of 0.5826 and an average Recall value of 0.5925 with an optimum compression rate of 0.5 Confidence Conf value. Furthermore, a manual summary evaluation by a group of Malay experts on the grammaticality of the compressed summary sentence produced a good result of 4.11 and a readability score of 4.12 out of 5. This depicts the reliability of the proposed technique to validate the Malay sentence with promising summary content and readability results.


Algorithms ◽  
2012 ◽  
Vol 5 (4) ◽  
pp. 398-420 ◽  
Author(s):  
Peter A. Rankel ◽  
John M. Conroy ◽  
Judith D. Schlesinger

In this paper we demonstrate a family of metrics for estimating the quality of a text summary relative to one or more human-generated summaries. The improved metrics are based on features automatically computed from the summaries to measure content and linguistic quality. The features are combined using one of three methods—robust regression, non-negative least squares, or canonical correlation, an eigenvalue method. The new metrics significantly outperform the previous standard for automatic text summarization evaluation, ROUGE.


Automatic text summarization is a technique of generating short and accurate summary of a longer text document. Text summarization can be classified based on the number of input documents (single document and multi-document summarization) and based on the characteristics of the summary generated (extractive and abstractive summarization). Multi-document summarization is an automatic process of creating relevant, informative and concise summary from a cluster of related documents. This paper does a detailed survey on the existing literature on the various approaches for text summarization. Few of the most popular approaches such as graph based, cluster based and deep learning-based summarization techniques are discussed here along with the evaluation metrics, which can provide an insight to the future researchers.


Author(s):  
Jovi D’Silva ◽  
Uzzal Sharma

<span lang="EN-US">Automatic text summarization has gained immense popularity in research. Previously, several methods have been explored for obtaining effective text summarization outcomes. However, most of the work pertains to the most popular languages spoken in the world. Through this paper, we explore the area of extractive automatic text summarization using deep learning approach and apply it to Konkani language, which is a low-resource language as there are limited resources, such as data, tools, speakers and/or experts in Konkani. In the proposed technique, Facebook’s fastText <br /> pre-trained word embeddings are used to get a vector representation for sentences. Thereafter, deep multi-layer perceptron technique is employed, as a supervised binary classification task for auto-generating summaries using the feature vectors. Using pre-trained fastText word embeddings eliminated the requirement of a large training set and reduced training time. The system generated summaries were evaluated against the ‘gold-standard’ human generated summaries with recall-oriented understudy for gisting evaluation (ROUGE) toolkit. The results thus obtained showed that performance of the proposed system matched closely to the performance of the human annotators in generating summaries.</span>


Sign in / Sign up

Export Citation Format

Share Document