extractive summarization
Recently Published Documents


TOTAL DOCUMENTS

238
(FIVE YEARS 134)

H-INDEX

15
(FIVE YEARS 4)

Author(s):  
Manju Lata Joshi ◽  
Nisheeth Joshi ◽  
Namita Mittal

Creating a coherent summary of the text is a challenging task in the field of Natural Language Processing (NLP). Various Automatic Text Summarization techniques have been developed for abstractive as well as extractive summarization. This study focuses on extractive summarization which is a process containing selected delineative paragraphs or sentences from the original text and combining these into smaller forms than the document(s) to generate a summary. The methods that have been used for extractive summarization are based on a graph-theoretic approach, machine learning, Latent Semantic Analysis (LSA), neural networks, cluster, and fuzzy logic. In this paper, a semantic graph-based approach SGATS (Semantic Graph-based approach for Automatic Text Summarization) is proposed to generate an extractive summary. The proposed approach constructs a semantic graph of the original Hindi text document by establishing a semantic relationship between sentences of the document using Hindi Wordnet ontology as a background knowledge source. Once the semantic graph is constructed, fourteen different graph theoretical measures are applied to rank the document sentences depending on their semantic scores. The proposed approach is applied to two data sets of different domains of Tourism and Health. The performance of the proposed approach is compared with the state-of-the-art TextRank algorithm and human-annotated summary. The performance of the proposed system is evaluated using widely accepted ROUGE measures. The outcomes exhibit that our proposed system produces better results than TextRank for health domain corpus and comparable results for tourism corpus. Further, correlation coefficient methods are applied to find a correlation between eight different graphical measures and it is observed that most of the graphical measures are highly correlated.


2021 ◽  
Author(s):  
Yash Agrawal ◽  
Vivek Anand ◽  
Manish Gupta ◽  
S Arunachalam ◽  
Vasudeva Varma

2021 ◽  
Author(s):  
Shimirwa Aline Valerie ◽  
Jian Xu

Extractive summarization aims to select the most important sentences or words from a document to generate a summary. Traditional summarization approaches have relied extensively on features manually designed by humans. In this paper, based on the recurrent neural network equipped with the attention mechanism, we propose a data-driven technique. We set up a general framework that consists of a hierarchical sentence encoder and an attentionbased sentence extractor. The framework allows us to establish various extractive summarization models and explore them. Comprehensive experiments are conducted on two benchmark datasets, and experimental results show that training extractive models based on Reward Augmented Maximum Likelihood (RAML)can improve the model’s generalization capability. And we realize that complicated components of the state-of-the-art extractive models do not attain good performance over simpler ones. We hope that our work can give more hints for future research on extractive text summarization.


2021 ◽  
Vol 50 (3) ◽  
pp. 458-469
Author(s):  
Gang Sun ◽  
Zhongxin Wang ◽  
Jia Zhao

In the era of big data, information overload problems are becoming increasingly prominent. It is challengingfor machines to understand, compress and filter massive text information through the use of artificial intelligencetechnology. The emergence of automatic text summarization mainly aims at solving the problem ofinformation overload, and it can be divided into two types: extractive and abstractive. The former finds somekey sentences or phrases from the original text and combines them into a summarization; the latter needs acomputer to understand the content of the original text and then uses the readable language for the human tosummarize the key information of the original text. This paper presents a two-stage optimization method forautomatic text summarization that combines abstractive summarization and extractive summarization. First,a sequence-to-sequence model with the attention mechanism is trained as a baseline model to generate initialsummarization. Second, it is updated and optimized directly on the ROUGE metric by using deep reinforcementlearning (DRL). Experimental results show that compared with the baseline model, Rouge-1, Rouge-2,and Rouge-L have been increased on the LCSTS dataset and CNN/DailyMail dataset.


2021 ◽  
Author(s):  
Raunak Kolle ◽  
S Sanjana ◽  
Merin Meleet

Electronics ◽  
2021 ◽  
Vol 10 (18) ◽  
pp. 2195
Author(s):  
Luca Bacco ◽  
Andrea Cimino ◽  
Felice Dell’Orletta ◽  
Mario Merone

In recent years, the explainable artificial intelligence (XAI) paradigm is gaining wide research interest. The natural language processing (NLP) community is also approaching the shift of paradigm: building a suite of models that provide an explanation of the decision on some main task, without affecting the performances. It is not an easy job for sure, especially when very poorly interpretable models are involved, like the almost ubiquitous (at least in the NLP literature of the last years) transformers. Here, we propose two different transformer-based methodologies exploiting the inner hierarchy of the documents to perform a sentiment analysis task while extracting the most important (with regards to the model decision) sentences to build a summary as the explanation of the output. For the first architecture, we placed two transformers in cascade and leveraged the attention weights of the second one to build the summary. For the other architecture, we employed a single transformer to classify the single sentences in the document and then combine the probability scores of each to perform the classification and then build the summary. We compared the two methodologies by using the IMDB dataset, both in terms of classification and explainability performances. To assess the explainability part, we propose two kinds of metrics, based on benchmarking the models’ summaries with human annotations. We recruited four independent operators to annotate few documents retrieved from the original dataset. Furthermore, we conducted an ablation study to highlight how implementing some strategies leads to important improvements on the explainability performance of the cascade transformers model.


2021 ◽  
Vol 11 (2) ◽  
pp. 303-312
Author(s):  
Nnaemeka M Oparauwah ◽  
Juliet N Odii ◽  
Ikechukwu I Ayogu ◽  
Vitalis C Iwuchukwu

The need to extract and manage vital information contained in copious volumes of text documents has given birth to several automatic text summarization (ATS) approaches. ATS has found application in academic research, medical health records analysis, content creation and search engine optimization, finance and media. This study presents a boundary-based tokenization method for extractive text summarization. The proposed method performs word tokenization by defining word boundaries in place of specific delimiters. An extractive summarization algorithm was further developed based on the proposed boundary-based tokenization method, as well as word length consideration to control redundancy in summary output. Experimental results showed that the proposed approach enhanced word tokenization by enhancing the selection of appropriate keywords from text document to be used for summarization.


2021 ◽  
Author(s):  
Keshav Balachandar ◽  
Anam Saatvik Reddy ◽  
A. Shahina ◽  
Nayeemulla Khan

In this paper, we propose a novel system for providing summaries for commercial contracts such as Non- Disclosure Agreements (NDAs), employment agreements, etc. to enable those reviewing the contract to spend less time on such reviews and improve understanding as well. Since it is observed that a majority of such commercial documents are paragraphed and contain headings/topics followed by their respective content along with their context, we extract those topics and summarize them as per the user’s need. In this paper, we propose that summarizing such paragraphs/topics as per requirements is a more viable approach than summarizing the whole document. We use extractive summarization approaches for this task and compare their performance with human-written summaries. We conclude that the results of extractive techniques are satisfactory and could be improved with a large corpus of data and supervised abstractive summarization methods.


Sign in / Sign up

Export Citation Format

Share Document