Peringkasan Multi Dokumen Berita Dengan Pemilihan Kalimat Utama Berbasis Algoritma Cluster Importance Dengan Mempertimbangkan Posisi Kalimat

Syadza Anggraini; Nur Hayatin; Gita Indah Marthasari

doi:10.22219/repositor.v2i1.161

Peringkasan Multi Dokumen Berita Dengan Pemilihan Kalimat Utama Berbasis Algoritma Cluster Importance Dengan Mempertimbangkan Posisi Kalimat

Repositor ◽

10.22219/repositor.v2i1.161 ◽

2020 ◽

Vol 2 (1) ◽

pp. 107

Author(s):

Syadza Anggraini ◽

Nur Hayatin ◽

Gita Indah Marthasari

Keyword(s):

Text Summarization ◽

Document Summarization ◽

Sentence Position

AbstrakPeringkasan teks merupakan salah satu cara untuk mengurangi suatu dimensi dokumen yang besar untuk mendapatkan informasi penting dari dokumen tersebut. Berita adalah salah satu informasi yang biasanya dalam satu topik memiliki beberapa sub topik. Untuk dapat mengambil informasi penting dari satu topik secara cepat, peringkasan multi dokumen berita dapat menjadi solusi. Namun, peringkasan multi dokumen dapat menimbulkan redundansi. Oleh sebab itu, penelitian ini menerapkan algoritma cluster importance dengan mempertimbangkan posisi kalimat untuk mengatasi redundansi tersebut. Penelitian ini menggunakan 30 topik berita berbahasa Indonesia, dimana tiap topiknya terdiri dari 5 sub topik berita. Dari 30 topik berita yang diuji menggunakan Rouge-1, dimana terdapat 2 topik berita yang memiliki nilai Rouge-1 berbeda antara yang menggunakan algoritma cluster importance ditambah posisi kalimat dengan yang hanya menggunakan algoritma cluster. Namun dari 2 topik berita tersebut, nilai Rouge-1 yang menggunakan cluster importance ditambah posisi kalimat memiliki nilai yang lebih besar daripada yang hanya menggunakan cluster importance. Penggunaan posisi kalimat memiliki pengaruh terhadap urutan bobot kalimat pada setiap topiknya, namun hanya 2 topik berita yang berpengaruh terhadap hasil ringkasan. Abstract Text summarization is one of way to reduce large document dimension to get an important point of information. News is one of information which usually has some sub topics from one topic. In order to get the main information from one topic as fast as possible, multi document summarization is the solution. But sometimes it can create redundancy. So in this study, we applied cluster importance algorithm by considering sentence position to overcome the redundancy.This study used 30 topics of Indonesian news, where each topic consists 5 news sub topics. From 30 news topics where it has tested using Rouge-1, there are 2 news topics that have a Rouge-1 score differ between which used cluster importance algorithm by considering sentence position and which only used cluster importance. But, those 2 news topics which used cluster importance by considering sentence position have a greater score of Rouge-1 than which only used cluster importance. The use of sentence position had an effect on the order of sentence weights on each topic, but there was only 2 news topics that affect the outcome of the summary.

Download Full-text

An Automatic Text Summarization Method with the Concern of Covering Complete Formation

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190716105347 ◽

2020 ◽

Vol 13 (5) ◽

pp. 977-986

Author(s):

Srinivasa Rao Kongara ◽

Dasika Sree Rama Chandra Murthy ◽

Gangadhara Rao Kancherla

Keyword(s):

Research Method ◽

Research Work ◽

Fuzzy Rule ◽

Text Summarization ◽

Document Summarization ◽

Summarization Method ◽

Overall Evaluation ◽

Multiple Documents ◽

Rule System ◽

Value Decomposition

Background: Text summarization is the process of generating a short description of the entire document which is more difficult to read. This method provides a convenient way of extracting the most useful information and a short summary of the documents. In the existing research work, this is focused by introducing the Fuzzy Rule-based Automated Summarization Method (FRASM). Existing work tends to have various limitations which might limit its applicability to the various real-world applications. The existing method is only suitable for the single document summarization where various applications such as research industries tend to summarize information from multiple documents. Methods: This paper proposed Multi-document Automated Summarization Method (MDASM) to introduce the summarization framework which would result in the accurate summarized outcome from the multiple documents. In this work, multi-document summarization is performed whereas in the existing system only single document summarization was performed. Initially document clustering is performed using modified k means cluster algorithm to group the similar kind of documents that provides the same meaning. This is identified by measuring the frequent term measurement. After clustering, pre-processing is performed by introducing the Hybrid TF-IDF and Singular value decomposition technique which would eliminate the irrelevant content and would result in the required content. Then sentence measurement is one by introducing the additional metrics namely Title measurement in addition to the existing work metrics to accurately retrieve the sentences with more similarity. Finally, a fuzzy rule system is applied to perform text summarization. Results: The overall evaluation of the research work is conducted in the MatLab simulation environment from which it is proved that the proposed research method ensures the optimal outcome than the existing research method in terms of accurate summarization. MDASM produces 89.28% increased accuracy, 89.28% increased precision, 89.36% increased recall value and 70% increased the f-measure value which performs better than FRASM. Conclusion: The summarization processes carried out in this work provides the accurate summarized outcome.

Download Full-text

A Quantum-Inspired Genetic Algorithm for Extractive Text Summarization

International Journal of Natural Computing Research ◽

10.4018/ijncr.2021040103 ◽

2021 ◽

Vol 10 (2) ◽

pp. 42-60

Author(s):

Khadidja Chettah ◽

Amer Draa

Keyword(s):

Genetic Algorithm ◽

State Of The Art ◽

Text Summarization ◽

Automated System ◽

Evaluation Metrics ◽

Document Summarization ◽

Automatic Text Summarization ◽

Reference Methods ◽

Textual Data ◽

Automatic Text

Automatic text summarization has recently become a key instrument for reducing the huge quantity of textual data. In this paper, the authors propose a quantum-inspired genetic algorithm (QGA) for extractive single-document summarization. The QGA is used inside a totally automated system as an optimizer to search for the best combination of sentences to be put in the final summary. The presented approach is compared with 11 reference methods including supervised and unsupervised summarization techniques. They have evaluated the performances of the proposed approach on the DUC 2001 and DUC 2002 datasets using the ROUGE-1 and ROUGE-2 evaluation metrics. The obtained results show that the proposal can compete with other state-of-the-art methods. It is ranked first out of 12, outperforming all other algorithms.

Download Full-text

Text Summarization

10.1093/oxfordhb/9780199276349.013.0032 ◽

2012 ◽

Cited By ~ 11

Author(s):

Eduard Hovy

Keyword(s):

Research And Development ◽

Evaluation Studies ◽

Text Summarization ◽

Single Measurement ◽

Document Summarization ◽

Topic Identification ◽

Evaluation Strategies ◽

And Performance

This article describes research and development on the automated creation of summaries of one or more texts. It defines the concept of summary and presents an overview of the principal approaches in summarization. It describes the design, implementation, and performance of various summarization systems. The stages of automated text summarization are topic identification, interpretation, and summary generation, each having its sub stages. Due to the challenges involved, multi-document summarization is much less developed than single-document summarization. This article reviews particular techniques used in several summarization systems. Finally, this article assesses the methods of evaluating summaries. This article reviews evaluation strategies, from previous evaluation studies, to the two-basic measures method. Summaries are so task and genre specific; therefore, no single measurement covers all cases of evaluation

Download Full-text

From Neural Sentence Summarization to Headline Generation: A Coarse-to-Fine Approach

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/574 ◽

2017 ◽

Cited By ~ 12

Author(s):

Jiwei Tan ◽

Xiaojun Wan ◽

Jianguo Xiao

Keyword(s):

Natural Language ◽

Natural Language Generation ◽

Text Summarization ◽

Experimental Results ◽

Generation Task ◽

Document Summarization ◽

Real Dataset ◽

Language Generation ◽

Recent Success ◽

Coarse To Fine

Headline generation is a task of abstractive text summarization, and previously suffers from the immaturity of natural language generation techniques. Recent success of neural sentence summarization models shows the capacity of generating informative, fluent headlines conditioned on selected recapitulative sentences. In this paper, we investigate the extension of sentence summarization models to the document headline generation task. The challenge is that extending the sentence summarization model to consider more document information will mostly confuse the model and hurt the performance. In this paper, we propose a coarse-to-fine approach, which first identifies the important sentences of a document using document summarization techniques, and then exploits a multi-sentence summarization model with hierarchical attention to leverage the important sentences for headline generation. Experimental results on a large real dataset demonstrate the proposed approach significantly improves the performance of neural sentence summarization models on the headline generation task.

Download Full-text

MHLM Majority Voting Based Hybrid Learning Model for Multi-Document Summarization

International Journal of Artificial Intelligence and Machine Learning ◽

10.4018/ijaiml.2019010104 ◽

2019 ◽

Vol 9 (1) ◽

pp. 67-81

Author(s):

Suneetha S. ◽

Venugopal Reddy A.

Keyword(s):

Numerical Data ◽

Hybrid Learning ◽

Learning Model ◽

Text Summarization ◽

Majority Voting ◽

Sentence Length ◽

Support Vector ◽

Data Set ◽

Document Summarization ◽

Multiple Documents

Text summarization from multiple documents is an active research area in the current scenario as the data in the World Wide Web (WWW) is found in abundance. The text summarization process is time-consuming and hectic for the users to retrieve the relevant contents from this mass collection of the data. Numerous techniques have been proposed to provide the relevant information to the users in the form of the summary. Accordingly, this article presents the majority voting based hybrid learning model (MHLM) for multi-document summarization. First, the multiple documents are subjected to pre-processing, and the features, such as title-based, sentence length, numerical data and TF-IDF features are extracted for all the individual sentences of the document. Then, the feature set is sent to the proposed MHLM classifier, which includes the Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Neural Network (NN) classifiers for evaluating the significance of the sentences present in the document. These classifiers provide the significance scores based on four features extracted from the sentences in the document. Then, the majority voting model decides the significant texts based on the significance scores and develops the summary for the user and thereby, reduces the redundancy, increasing the quality of the summary similar to the original document. The experiment performed with the DUC 2002 data set is used to analyze the effectiveness of the proposed MHLM that attains the precision and recall at a rate of 0.94, f-measure at a rate of 0.93, and ROUGE-1 at a rate of 0.6324.

Download Full-text

Determining the importance of sentence position for automatic text summarization

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-179902 ◽

2020 ◽

pp. 1-11

Author(s):

Griselda Areli Matias Mendoza ◽

Yulia Ledeneva ◽

Rene Arnulfo García-Hernández

Keyword(s):

Text Summarization ◽

Automatic Text Summarization ◽

Sentence Position ◽

Automatic Text

Download Full-text

Extractive Multi-document Summarization using K-means, Centroid-based Method, MMR, and Sentence Position

Proceedings of the Tenth International Symposium on Information and Communication Technology - SoICT 2019 ◽

10.1145/3368926.3369688 ◽

2019 ◽

Author(s):

Hai Cao Manh ◽

Huong Le Thanh ◽

Tuan Luu Minh

Keyword(s):

Document Summarization ◽

Sentence Position

Download Full-text

Survey of Scientific Document Summarization Techniques

Computer Science ◽

10.7494/csci.2020.21.2.3356 ◽

2020 ◽

Vol 21 (2) ◽

Author(s):

Sheena Kurian K ◽

Sheena Mathew

Keyword(s):

Text Summarization ◽

Exponential Rate ◽

Research Papers ◽

Document Summarization ◽

Automatic Text Summarization ◽

Scientific Document Summarization ◽

Pros And Cons ◽

Comparison Of The Results ◽

Evaluation Techniques ◽

Automatic Text

The number of scientic or research papers published every year is growing at an exponential rate, which has led to an intensive research in scientic document summarization. The different methods commonly used in automatic text summarization are discussed in this paper with their pros and cons. Commonly used evaluation techniques and datasets in this field are also discussed. Rouge and Pyramid scores of the different methods are tabulated for easy comparison of the results.

Download Full-text

Automatic Text Summarization Using Latent Drichlet Allocation (LDA) for Document Clustering

International Journal of Advances in Intelligent Informatics ◽

10.26555/ijain.v1i3.43 ◽

2015 ◽

Vol 1 (3) ◽

pp. 132 ◽

Cited By ~ 5

Author(s):

Erwin Yudi Hidayat ◽

Fahri Firdausillah ◽

Khafiizh Hastuti ◽

Ika Novita Dewi ◽

Azhari Azhari

Keyword(s):

Clustering Algorithm ◽

Document Clustering ◽

Text Summarization ◽

Data Set ◽

Document Summarization ◽

Automatic Text Summarization ◽

Improve Accuracy ◽

Automatic Document Summarization ◽

Document Compression ◽

Automatic Text

In this paper, we present Latent Drichlet Allocation in automatic text summarization to improve accuracy in document clustering. The experiments involving 398 data set from public blog article obtained by using python scrapy crawler and scraper. Several steps of clustering in this research are preprocessing, automatic document compression using feature method, automatic document compression using LDA, word weighting and clustering algorithm The results show that automatic document summarization with LDA reaches 72% in LDA 40%, compared to traditional k-means method which only reaches 66%.

Download Full-text

Modern Multi-Document Text Summarization Techniques

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a1945.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 654-670

Keyword(s):

Machine Learning ◽

Game Theory ◽

Text Summarization ◽

Significant Progress ◽

Important Data ◽

Document Summarization ◽

Hybrid Techniques ◽

Information Richness ◽

Benchmark Datasets ◽

Comprehensive Manner

Text Summarization is the technique in which the source document is simplified, valuable information is distilled and an abridged version is produced. Over the last decade, the focus has shifted from single document to multi-document summarization and despite significant progress in the domain, challenges such as sentence ordering and fluency remain. In this paper, a thorough comparison of the several multi-document text summarization techniques such as Machine Learning based, Graph based, Game-Theory based and more has been presented. This paper in its entirety condenses and interprets the numerous approaches, merits and limitations of these techniques. The Benchmark datasets of this domain and their features have also been examined. This survey aims to distinguish the various summarization algorithms based on properties that prove to be valuable in the generation of highly consistent, rational, summaries with reduced redundancy and information richness. The conclusions presented by this paper can be utilized to identify the advantages of these papers which will help future researchers in their study of this domain and ensure the provision of important data for further analysis in a more systematic and comprehensive manner. With the aid of this paper, researchers can identify the areas that present some scope for improvement and thereafter come up with novel or possibly hybrid techniques in Multi-Document Summarization.

Download Full-text