scholarly journals Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms

2018 ◽  
Vol 8 (1) ◽  
pp. 2562-2567
Author(s):  
M. S. Bewoor ◽  
S. H. Patil

The availability of various digital sources has created a demand for text mining mechanisms. Effective summary generation mechanisms are needed in order to utilize relevant information from often overwhelming digital data sources. In this view, this paper conducts a survey of various single as well as multi-document text summarization techniques. It also provides analysis of treating a query sentence as a common one, segmented from documents for text summarization. Experimental results show the degree of effectiveness in text summarization over different clustering algorithms.

Author(s):  
Jiwei Tan ◽  
Xiaojun Wan ◽  
Jianguo Xiao

Headline generation is a task of abstractive text summarization, and previously suffers from the immaturity of natural language generation techniques. Recent success of neural sentence summarization models shows the capacity of generating informative, fluent headlines conditioned on selected recapitulative sentences. In this paper, we investigate the extension of sentence summarization models to the document headline generation task. The challenge is that extending the sentence summarization model to consider more document information will mostly confuse the model and hurt the performance. In this paper, we propose a coarse-to-fine approach, which first identifies the important sentences of a document using document summarization techniques, and then exploits a multi-sentence summarization model with hierarchical attention to leverage the important sentences for headline generation. Experimental results on a large real dataset demonstrate the proposed approach significantly improves the performance of neural sentence summarization models on the headline generation task.


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

The traditional frequency based approach to creating multi-document extractive summary ranks sentences based on scores computed by summing up TF*IDF weights of words contained in the sentences. In this approach, TF or term frequency is calculated based on how frequently a term (word) occurs in the input and TF calculated in this way does not take into account the semantic relations among terms. In this paper, we propose methods that exploits semantic term relations for improving sentence ranking and redundancy removal steps of a summarization system. Our proposed summarization system has been tested on DUC 2003 and DUC 2004 benchmark multi-document summarization datasets. The experimental results reveal that performance of our multi-document text summarizer is significantly improved when the distributional term similarity measure is used for finding semantic term relations. Our multi-document text summarizer also outperforms some well known summarization baselines to which it is compared.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
O. G. El Barbary ◽  
Radwan Abu Gdairi

Nowadays, rich quantity of information is offered on the Net which makes it hard for the clients to detect necessary information. Programmed techniques are desirable to effectively filter and search useful data from the Net. The purpose of purported text summarization is to get satisfied content handling with information variety. The main factor of document summarization is to extract benefit feature. In this paper, we extract word feature in three group called important words. Also, we extract sentence feature depending on the extracted words. With increasing knowledge on the Internet, it turns out to be an extremely time-consuming, exhausting, and boring mission to read the whole content and papers and get the relevant information on precise topics


2019 ◽  
Author(s):  
Laerth Gomes ◽  
Hilário Oliveira

Automatic Text Summarization (ATS) has been demanding intense research in recent years. Its importance is given the fact that ATS systems can aid in the processing of large amounts of textual documents. The ATS task aims to create a summary of one or more documents by extracting their most relevant information. Despite the existence of several works, researches involving the development of ATS systems for documents written in Brazilian Portuguese are still a few. In this paper, we propose a multi-document summarization system following a concept-based approach using Integer Linear Programming for the generation of summaries from news articles written in Portuguese. Experiments using the CSTNews corpus were performed to evaluate different aspects of the proposed system. The experimental results obtained regarding the ROUGE measures demonstrate that the developed system presents encourage results, outperforming other works of the literature.


Author(s):  
Giuliano Armano ◽  
Alessandro Giuliani

Recently, there has been a renewed interest on automatic text summarization techniques. The Internet has caused a continuous growth of information overload, focusing the attention on retrieval and filtering needs. Since digitally stored information is more and more available, users need suitable tools able to select, filter, and extract only relevant information. This chapter concentrates on studying and developing techniques for summarizing Webpages. In particular, the focus is the field of contextual advertising, the task of automatically suggesting ads within the content of a generic Webpage. Several novel text summarization techniques are proposed, comparing them with state of the art techniques and assessing whether the proposed techniques can be successfully applied to contextual advertising. Comparative experimental results are also reported and discussed. Results highlight the improvements of the proposals with respect to well-known text summarization techniques.


Author(s):  
Giuliano Armano ◽  
Alessandro Giuliani

Recently, there has been a renewed interest on automatic text summarization techniques. The Internet has caused a continuous growth of information overload, focusing the attention on retrieval and filtering needs. Since digitally stored information is more and more available, users need suitable tools able to select, filter, and extract only relevant information. This chapter concentrates on studying and developing techniques for summarizing Webpages. In particular, the focus is the field of contextual advertising, the task of automatically suggesting ads within the content of a generic Webpage. Several novel text summarization techniques are proposed, comparing them with state of the art techniques and assessing whether the proposed techniques can be successfully applied to contextual advertising. Comparative experimental results are also reported and discussed. Results highlight the improvements of the proposals with respect to well-known text summarization techniques.


2010 ◽  
Vol 19 (05) ◽  
pp. 597-626 ◽  
Author(s):  
C. RAVINDRANATH CHOWDARY ◽  
M. SRAVANTHI ◽  
P. SREENIVASA KUMAR

In this paper, we present a system called QueSTS, which generates a query specific extractive summary of a selected set of documents. We have proposed an integrated graph approach to represent the contextual relationships among sentences of all the input documents. These relationships are exploited and several sub-graphs of the integrated graph are constructed. These sub-graphs consist of sentences that are highly relevant to the query and that are highly related to each other. These sub-graphs are ranked by a scoring model. The highest ranked sub-graph which is rich in query relevant information is selected as a query specific summary. A sentence ordering strategy has also been proposed by us to improve the coherence of the summary. Sentences in the selected summary are sequenced as per the above strategy. Experimental results show that the summaries generated by the QueSTS system are significantly better than other systems in terms of user satisfaction.


2020 ◽  
Vol 13 (5) ◽  
pp. 977-986
Author(s):  
Srinivasa Rao Kongara ◽  
Dasika Sree Rama Chandra Murthy ◽  
Gangadhara Rao Kancherla

Background: Text summarization is the process of generating a short description of the entire document which is more difficult to read. This method provides a convenient way of extracting the most useful information and a short summary of the documents. In the existing research work, this is focused by introducing the Fuzzy Rule-based Automated Summarization Method (FRASM). Existing work tends to have various limitations which might limit its applicability to the various real-world applications. The existing method is only suitable for the single document summarization where various applications such as research industries tend to summarize information from multiple documents. Methods: This paper proposed Multi-document Automated Summarization Method (MDASM) to introduce the summarization framework which would result in the accurate summarized outcome from the multiple documents. In this work, multi-document summarization is performed whereas in the existing system only single document summarization was performed. Initially document clustering is performed using modified k means cluster algorithm to group the similar kind of documents that provides the same meaning. This is identified by measuring the frequent term measurement. After clustering, pre-processing is performed by introducing the Hybrid TF-IDF and Singular value decomposition technique which would eliminate the irrelevant content and would result in the required content. Then sentence measurement is one by introducing the additional metrics namely Title measurement in addition to the existing work metrics to accurately retrieve the sentences with more similarity. Finally, a fuzzy rule system is applied to perform text summarization. Results: The overall evaluation of the research work is conducted in the MatLab simulation environment from which it is proved that the proposed research method ensures the optimal outcome than the existing research method in terms of accurate summarization. MDASM produces 89.28% increased accuracy, 89.28% increased precision, 89.36% increased recall value and 70% increased the f-measure value which performs better than FRASM. Conclusion: The summarization processes carried out in this work provides the accurate summarized outcome.


2021 ◽  
Vol 15 (3) ◽  
pp. 1-33
Author(s):  
Wenjun Jiang ◽  
Jing Chen ◽  
Xiaofei Ding ◽  
Jie Wu ◽  
Jiawei He ◽  
...  

In online systems, including e-commerce platforms, many users resort to the reviews or comments generated by previous consumers for decision making, while their time is limited to deal with many reviews. Therefore, a review summary, which contains all important features in user-generated reviews, is expected. In this article, we study “how to generate a comprehensive review summary from a large number of user-generated reviews.” This can be implemented by text summarization, which mainly has two types of extractive and abstractive approaches. Both of these approaches can deal with both supervised and unsupervised scenarios, but the former may generate redundant and incoherent summaries, while the latter can avoid redundancy but usually can only deal with short sequences. Moreover, both approaches may neglect the sentiment information. To address the above issues, we propose comprehensive Review Summary Generation frameworks to deal with the supervised and unsupervised scenarios. We design two different preprocess models of re-ranking and selecting to identify the important sentences while keeping users’ sentiment in the original reviews. These sentences can be further used to generate review summaries with text summarization methods. Experimental results in seven real-world datasets (Idebate, Rotten Tomatoes Amazon, Yelp, and three unlabelled product review datasets in Amazon) demonstrate that our work performs well in review summary generation. Moreover, the re-ranking and selecting models show different characteristics.


2021 ◽  
Vol 71 (1) ◽  
pp. 18-33
Author(s):  
I.F. Kuzminov ◽  
P.A. Lobanova

The authors show the need and some existing opportunities for analysis of non-traditional data sources to obtain a complete and more relevant picture of industries spatial development. The research methodology includes the use of text mining for economic and geographical studies. The relevance of the research is determined by insufficient completeness of official statistical data, cheapening of relevant information processing technologies and abundance of large text data sources in open access. The article discusses the role of the pulp and paper industry (as a key part of the timber industry) in economic and spatial development of modern Russia. The authors identify main trends in the economic and spatial development of the pulp and paper industry of European Russia, draw the conclusions on the expected industry trends and give recommendations for strategic management decisions to respond to industry challenges. The authors claim that the industry needs liberalization and stabilization, primarily through moratoriums on policy changes. The role of the use of big data, and in particular of text mining in economic and geographical research for reasonable and objective conclusions formation that can be used to make timely and balanced management decisions in the timber industry and the pulp and paper industry, is emphasized.


Sign in / Sign up

Export Citation Format

Share Document