Advances in Data Mining and Database Management - Innovative Document Summarization Techniques
Latest Publications


TOTAL DOCUMENTS

13
(FIVE YEARS 0)

H-INDEX

2
(FIVE YEARS 0)

Published By IGI Global

9781466650190, 9781466650206

Author(s):  
Firas Hmida
Keyword(s):  

In this chapter, the authors introduce monolingual and multilingual summarization and present the problem of dependence of language and linguistic knowledge of the process. Then they describe the most influential works and techniques in the field of automatic multilingual and language-independent summarization. This section is presented as a solution to solve the problem already explained. The authors present several language independent approaches and used techniques. In the next section, they study the behavior of these methods by discussing their limitations and perspectives.


Author(s):  
Josef Steinberger ◽  
Ralf Steinberger ◽  
Hristo Tanev ◽  
Vanni Zavarella ◽  
Marco Turchi

In this chapter, the authors discuss several pertinent aspects of an automatic system that generates summaries in multiple languages for sets of topic-related news articles (multilingual multi-document summarisation), gathered by news aggregation systems. The discussion follows a framework based on Latent Semantic Analysis (LSA) because LSA was shown to be a high-performing method across many different languages. Starting from a sentence-extractive approach, the authors show how domain-specific aspects can be used and how a compression and paraphrasing method can be plugged in. They also discuss the challenging problem of summarisation evaluation in different languages. In particular, the authors describe two approaches: the first uses a parallel corpus and the second statistical machine translation.


Author(s):  
Atefeh Farzindar

In this chapter, the author presents the new role of summarization in the dynamic network of social media and its importance in semantic analysis of social media and large data. The author introduces how summarization tasks can improve social media retrieval and event detection. The author discusses the challenges in social media data versus traditional documents. The author presents the approaches to social media summarization and methods for update summarization, network activities summarization, event-based summarization, and opinion summarization. The author reviews the existing evaluation metrics for summarization and the efforts on evaluation shared tasks on social data related tracks by ACL, TREC, TAC, and SemEval. In conclusion, the author discusses the importance of this dynamic discipline and great potential of automatic summarization in the coming decade, in the context of changes in mobile technology, cloud computing, and social networking.


Author(s):  
Marina Litvak ◽  
Natalia Vanetik

The problem of extractive summarization for a collection of documents is defined as the problem of selecting a small subset of sentences so that the contents and meaning of the original document set are preserved in the extract in best possible way. In this chapter, the authors present a linear model for the problem of extractive text summarization, where they strive to obtain a summary that preserves the information coverage as much as possible in comparison to the original document set. The authors measure the information coverage in terms and reduce the summarization task to the maximum coverage problem. They construct a system of linear inequalities that describes the given document set and its possible summaries and translate the problem of finding the best summary to the problem of finding the point on a convex polytope closest to the given hyperplane. This re-formulated problem can be solved efficiently with the help of linear programming. The experimental results show the partial superiority of our introduced approach over other systems participated in the generic multi-document summarization tasks of the DUC 2002 and the MultiLing 2013 competitions.


Author(s):  
Uri Mirchev ◽  
Mark Last

Automatic multi-document summarization is aimed at recognizing important text content in a collection of topic-related documents and representing it in the form of a short abstract or extract. This chapter presents a novel approach to the multi-document summarization problem, focusing on the generic summarization task. The proposed SentRel (Sentence Relations) multi-document summarization algorithm assigns importance scores to documents and sentences in a collection based on two aspects: static and dynamic. In the static aspect, the significance score is recursively inferred from a novel, tripartite graph representation of the text corpus. In the dynamic aspect, the significance score is continuously refined with respect to the current summary content. The resulting summary is generated in the form of complete sentences exactly as they appear in the summarized documents, ensuring the summary's grammatical correctness. The proposed algorithm is evaluated on the TAC 2011 dataset using DUC 2001 for training and DUC 2004 for parameter tuning. The SentRel ROUGE-1 and ROUGE-2 scores are comparable to state-of-the-art summarization systems, which require a different set of textual entities.


Author(s):  
Sean Sovine ◽  
Hyoil Han

Modern information technology allows text information to be produced and disseminated at a very rapid pace. This situation leads to the problem of information overload, in which users are faced with a very large body of text that is relevant to an information need and no efficient and effective way to locate within the body of text the specific information that is needed. In one example of such a scenario, a user might be given a collection of digital news articles relevant to a particular current event and may need to rapidly generate a summary of the essential information relevant to the event contained in those articles. In extractive MDS, the most fundamental task is to select a subset of the sentences in the input document set in order to form a summary of the document set. An essential component of this task is sentence ranking, in which sentences from the original document set are ranked in order of importance for inclusion in a summary. The purpose of this chapter is to give an analysis of the most successful methods for sentence ranking that have been employed in recent MDS work. To this end, the authors classify sentence ranking methods into six classes and present/discuss specific approaches within each class.


Author(s):  
Bettina Berendt ◽  
Mark Last ◽  
Ilija Subašić ◽  
Mathias Verbeke

News production, delivery, and consumption are increasing in ubiquity and speed, spreading over more software and hardware platforms, in particular mobile devices. This has led to an increasing interest in automated methods for multi-document summarization. The authors start this chapter with discussing several new alternatives for automated news summarization, with a particular focus on temporal text mining, graph-based methods, and graphical interfaces. Then they present automated and user-centric frameworks for cross-evaluating summarization methods that output different summary formats and describe the challenges associated with each evaluation framework. Based on the results of the user studies, the authors argue that it is crucial for effective summarization to integrate the user into sense-making through usable, entertaining, and ultimately useful interactive summarization-plus-document-search interfaces. In particular, graph-based methods and interfaces may be a better preparation for people to concentrate on what is essential in a collection of texts, and thus may be a key to enhancing the summary evaluation process by replacing the “one gold standard fits all” approach with carefully designed user studies built upon a variety of summary representation formats.


Author(s):  
Kamal Sarkar

As the amount of on-line information in the languages other than English (such as Chinese, Japanese, German, French, Hindi, etc.) increases, systems that can automatically summarize multilingual documents are becoming increasingly desirable for managing information overload problem on the Web. This chapter presents an overview of automatic text summarization with special emphasis on multilingual text summarization. The various state-of-the-art multilingual summarization approaches have been grouped based on their characteristics and presented in this chapter.


Author(s):  
William Darling

This chapter discusses approaches to applying text summarization research to the real-world problem of opinion summarization of user comments. Following a brief overview of the history of research in text summarization, the authors consider large scale user opinion summarization on the Web, a summarization problem that is distinct from the traditional domain that the research has focused on until very recently. More specifically, they consider opinion summarization of large datasets that generally include large degrees of noise and little editorial structure. To deal with this kind of real-world problem, the chapter addresses three major areas that must be considered and adhered to when designing systems for this type of problem: simple techniques, domain knowledge, and evaluative testing. Each area is covered in detail, and throughout the chapter, the lessons are applied to a case study that aims to apply the recommendations to designing a real-world opinion summarization system for a fictional book publisher.


Author(s):  
George Giannakopoulos ◽  
George Kiomourtzis ◽  
Vangelis Karkaletsis

This chapter describes a real, multi-document, multilingual news summarization application, named NewSum, the research problems behind it, as well as the novel methods proposed and tested to solve these problems. The system uses the representation of n-gram graphs in a novel manner to perform sentence selection and redundancy removal for the summaries and faces problems related to topic and subtopic detection (via clustering) and multi-lingual applicability, which are caused by the nature of the real-world news summarization sources.


Sign in / Sign up

Export Citation Format

Share Document