Document vector embedding based extractive text summarization system for Hindi and English text

Author(s):  
Ruby Rani ◽  
D. K. Lobiyal
Author(s):  
Mahsa Afsharizadeh ◽  
Hossein Ebrahimpour-Komleh ◽  
Ayoub Bagheri

Purpose: Pandemic COVID-19 has created an emergency for the medical community. Researchers require extensive study of scientific literature in order to discover drugs and vaccines. In this situation where every minute is valuable to save the lives of hundreds of people, a quick understanding of scientific articles will help the medical community. Automatic text summarization makes this possible. Materials and Methods: In this study, a recurrent neural network-based extractive summarization is proposed. The extractive method identifies the informative parts of the text. Recurrent neural network is very powerful for analyzing sequences such as text. The proposed method has three phases: sentence encoding, sentence ranking, and summary generation. To improve the performance of the summarization system, a coreference resolution procedure is used. Coreference resolution identifies the mentions in the text that refer to the same entity in the real world. This procedure helps to summarization process by discovering the central subject of the text. Results: The proposed method is evaluated on the COVID-19 research articles extracted from the CORD-19 dataset. The results show that the combination of using recurrent neural network and coreference resolution embedding vectors improves the performance of the summarization system. The Proposed method by achieving the value of ROUGE1-recall 0.53 demonstrates the improvement of summarization performance by using coreference resolution embedding vectors in the RNN-based summarization system. Conclusion: In this study, coreference information is stored in the form of coreference embedding vectors. Jointly use of recurrent neural network and coreference resolution results in an efficient summarization system.


Author(s):  
Pedro Paulo Balage Filho ◽  
Vinícius Rodrigues de Uzêda ◽  
Thiago Alexandre Salgueiro Pardo ◽  
Maria das Graças Volpe Nunes

2016 ◽  
Vol 64 ◽  
pp. 265-272 ◽  
Author(s):  
Duy Duc An Bui ◽  
Guilherme Del Fiol ◽  
John F. Hurdle ◽  
Siddhartha Jonnalagadda

2020 ◽  
Vol 11 (4) ◽  
pp. 67-83
Author(s):  
Md. Majharul Haque ◽  
Suraiya Pervin ◽  
Anowar Hossain ◽  
Zerina Begum

As long as the internet user is increasing, online electronic content is growing proportionally irrespective of languages. A lot of research works on English text summarization have come to light to deal with this gigantic body of online text. Unfortunately, a few works have been accomplished for Bangla though a huge number of people are involved with this language. This article has tried to explore the trend of research work on Bangla text summarization. Fourteen approaches have been briefly expounded here by addressing the pros and cons with some scope of improvement. A comparison has also been turned based on their incorporated features and evaluation results. It is expected that this article will draw the attention of more researchers in the area of Bangla text summarization and give a crystal-clear message about the opportunities to the next generation. The integrated message about all the existing methods has been depicted here to reveal the importance of Bangla text summarization. To the best of the author's knowledge, this is the first review study in this ground.


2015 ◽  
Vol 8 (2) ◽  
pp. 261-277 ◽  
Author(s):  
Vishal Gupta ◽  
Narvinder Kaur

2021 ◽  
Vol 37 (2) ◽  
pp. 123-143
Author(s):  
Tuan Minh Luu ◽  
Huong Thanh Le ◽  
Tan Minh Hoang

Deep neural networks have been applied successfully to extractive text summarization tasks with the accompany of large training datasets. However, when the training dataset is not large enough, these models reveal certain limitations that affect the quality of the system’s summary. In this paper, we propose an extractive summarization system basing on a Convolutional Neural Network and a Fully Connected network for sentence selection. The pretrained BERT multilingual model is used to generate embeddings vectors from the input text. These vectors are combined with TF-IDF values to produce the input of the text summarization system. Redundant sentences from the output summary are eliminated by the Maximal Marginal Relevance method. Our system is evaluated with both English and Vietnamese languages using CNN and Baomoi datasets, respectively. Experimental results show that our system achieves better results comparing to existing works using the same dataset. It confirms that our approach can be effectively applied to summarize both English and Vietnamese languages.


Sign in / Sign up

Export Citation Format

Share Document