Semantic Graph Based Automatic Text Summarization for Hindi Documents Using Particle Swarm Optimization

Author(s):  
Vipul Dalal ◽  
Latesh Malik
Entropy ◽  
2019 ◽  
Vol 21 (6) ◽  
pp. 617 ◽  
Author(s):  
Augusto Villa-Monte ◽  
Laura Lanzarini ◽  
Aurelio F. Bariviera ◽  
José A. Olivas

Automatic text summarization tools have a great impact on many fields, such as medicine, law, and scientific research in general. As information overload increases, automatic summaries allow handling the growing volume of documents, usually by assigning weights to the extracted phrases based on their significance in the expected summary. Obtaining the main contents of any given document in less time than it would take to do that manually is still an issue of interest. In this article, a new method is presented that allows automatically generating extractive summaries from documents by adequately weighting sentence scoring features using Particle Swarm Optimization. The key feature of the proposed method is the identification of those features that are closest to the criterion used by the individual when summarizing. The proposed method combines a binary representation and a continuous one, using an original variation of the technique developed by the authors of this paper. Our paper shows that using user labeled information in the training set helps to find better metrics and weights. The empirical results yield an improved accuracy compared to previous methods used in this field.


Author(s):  
Rasmita Rautray ◽  
Rakesh Chandra Balabantaray

In last few decades, Bio-inspired algorithms (BIAs) have gained a significant popularity to handle hard real world and complex optimization problem. The scope and growth of Bio Inspired algorithms explore new application areas and computing opportunities. This paper presents a review with the objective is to bring a better understanding and to motivate the research on BIAs based text summarization. Different techniques have been used for text summarization are genetic algorithm (GA), particle swarm optimization (PSO), differential evolution (DE), harmonic search (HS).


Author(s):  
Manju Lata Joshi ◽  
Nisheeth Joshi ◽  
Namita Mittal

Creating a coherent summary of the text is a challenging task in the field of Natural Language Processing (NLP). Various Automatic Text Summarization techniques have been developed for abstractive as well as extractive summarization. This study focuses on extractive summarization which is a process containing selected delineative paragraphs or sentences from the original text and combining these into smaller forms than the document(s) to generate a summary. The methods that have been used for extractive summarization are based on a graph-theoretic approach, machine learning, Latent Semantic Analysis (LSA), neural networks, cluster, and fuzzy logic. In this paper, a semantic graph-based approach SGATS (Semantic Graph-based approach for Automatic Text Summarization) is proposed to generate an extractive summary. The proposed approach constructs a semantic graph of the original Hindi text document by establishing a semantic relationship between sentences of the document using Hindi Wordnet ontology as a background knowledge source. Once the semantic graph is constructed, fourteen different graph theoretical measures are applied to rank the document sentences depending on their semantic scores. The proposed approach is applied to two data sets of different domains of Tourism and Health. The performance of the proposed approach is compared with the state-of-the-art TextRank algorithm and human-annotated summary. The performance of the proposed system is evaluated using widely accepted ROUGE measures. The outcomes exhibit that our proposed system produces better results than TextRank for health domain corpus and comparable results for tourism corpus. Further, correlation coefficient methods are applied to find a correlation between eight different graphical measures and it is observed that most of the graphical measures are highly correlated.


Author(s):  
Arti Jain ◽  
Divakar Yadav ◽  
Anuja Arora

Particle swarm optimization (PSO) algorithm is proposed to deal with text summarization for the Punjabi language. PSO is based on intelligence that predicts among a given set of solutions which is the best solution. The search is carried out by extremely high-speed particles. It updates particle position and velocity at the end of iteration so that during the development of generations, the personal best solution and global best solution are updated. Calculation within PSO is performed using fitness function which looks into various statistical and linguistic features of the Punjabi datasets. Two Punjabi datasets—monolingual Punjabi corpus from Indian Languages Corpora Initiative Phase-II and Punjabi-Hindi parallel corpus—are considered. The parallel corpus comprises 1,000 Punjabi sentences from the tourism domain while monolingual corpus contains 30,000 Punjabi sentences of the general domain. ROUGE measures evaluate summary where the highest measure, ROUGE-1, is achieved for parallel corpus with precision, recall, and F-measure as 0.7836, 0.7957, and 0.7896, respectively.


Sign in / Sign up

Export Citation Format

Share Document