Multi-Document Summarization by Extended Graph Text Representation and Importance Refinement
Automatic multi-document summarization is aimed at recognizing important text content in a collection of topic-related documents and representing it in the form of a short abstract or extract. This chapter presents a novel approach to the multi-document summarization problem, focusing on the generic summarization task. The proposed SentRel (Sentence Relations) multi-document summarization algorithm assigns importance scores to documents and sentences in a collection based on two aspects: static and dynamic. In the static aspect, the significance score is recursively inferred from a novel, tripartite graph representation of the text corpus. In the dynamic aspect, the significance score is continuously refined with respect to the current summary content. The resulting summary is generated in the form of complete sentences exactly as they appear in the summarized documents, ensuring the summary's grammatical correctness. The proposed algorithm is evaluated on the TAC 2011 dataset using DUC 2001 for training and DUC 2004 for parameter tuning. The SentRel ROUGE-1 and ROUGE-2 scores are comparable to state-of-the-art summarization systems, which require a different set of textual entities.