Latent Semantic Analysis in Automatic Text Summarization: A state of the art analysis

Author(s):  
Mehala N ◽  
Tapas Guha
2021 ◽  
Vol 7 (2) ◽  
pp. 153
Author(s):  
Yunita Maulidia Sari ◽  
Nenden Siti Fatonah

Perkembangan teknologi yang pesat membuat kita lebih mudah dalam menemukan informasi-informasi yang dibutuhkan. Permasalahan muncul ketika informasi tersebut sangat banyak. Semakin banyak informasi dalam sebuah modul maka akan semakin panjang isi teks dalam modul tersebut. Hal tersebut akan memakan waktu yang cukup lama untuk memahami inti informasi dari modul tersebut. Salah satu solusi untuk mendapatkan inti informasi dari keseluruhan modul dengan cepat dan menghemat waktu adalah dengan membaca ringkasannya. Cara cepat untuk mendapatkan ringkasan sebuah dokumen adalah dengan cara peringkasan teks otomatis. Peringkasan teks otomatis (Automatic Text Summarization) merupakan teks yang dihasilkan dari satu atau lebih dokumen, yang mana hasil teks tersebut memberikan informasi penting dari sumber dokumen asli, serta secara otomatis hasil teks tersebut tidak lebih panjang dari setengah sumber dokumen aslinya. Penelitian ini bertujuan untuk menghasilkan peringkasan teks otomatis pada modul pembelajaran berbahasa Indonesia dan mengetahui hasil akurasi peringkasan teks otomatis yang menerapkan metode Cross Latent Semantic Analysis (CLSA). Jumlah data yang digunakan pada penelitian ini sebanyak 10 file modul pembelajaran yang berasal dari modul para dosen Universitas Mercu Buana, dengan format .docx sebanyak 5 file dan format .pdf sebanyak 5 file. Penelitian ini menerapkan metode Term Frequency-Inverse Document Frequency (TF-IDF) untuk pembobotan kata dan metode Cross Latent Semantic Analysis (CLSA) untuk peringkasan teks. Pengujian akurasi pada peringkasan modul pembelajaran dilakukan dengan cara membandingkan hasil ringkasan manual oleh manusia dan hasil ringkasan sistem. Yang mana pengujian ini menghasilkan rata-rata nilai f-measure, precision, dan recall tertinggi pada compression rate 20% dengan nilai berturut-turut 0.3853, 0.432, dan 0.3715.


Automatic text summarization of a resource-poor language is a challenging task. Unsupervised extractive techniques are often preferred for such languages due to scarcity of resources. Latent Semantic Analysis (LSA) is an unsupervised technique which automatically identifies semantically important sentences from a text document. Two methods based on Latent Semantic Analysis have been evaluated on two datasets of a resource-poor language using Singular Value Decomposition (SVD) on different vector-space models. The performance of the methods is evaluated using ROUGE-L scores obtained by comparing the system generated summaries with human generated model summaries. Both the methods are found to be performing better for shorter documents than longer ones.


2011 ◽  
Vol 37 (6) ◽  
pp. 299-305 ◽  
Author(s):  
I. V. Mashechkin ◽  
M. I. Petrovskiy ◽  
D. S. Popov ◽  
D. V. Tsarev

2021 ◽  
Vol 10 (2) ◽  
pp. 42-60
Author(s):  
Khadidja Chettah ◽  
Amer Draa

Automatic text summarization has recently become a key instrument for reducing the huge quantity of textual data. In this paper, the authors propose a quantum-inspired genetic algorithm (QGA) for extractive single-document summarization. The QGA is used inside a totally automated system as an optimizer to search for the best combination of sentences to be put in the final summary. The presented approach is compared with 11 reference methods including supervised and unsupervised summarization techniques. They have evaluated the performances of the proposed approach on the DUC 2001 and DUC 2002 datasets using the ROUGE-1 and ROUGE-2 evaluation metrics. The obtained results show that the proposal can compete with other state-of-the-art methods. It is ranked first out of 12, outperforming all other algorithms.


2021 ◽  
Vol 7 (3) ◽  
pp. 9-16
Author(s):  
Millenia Rusbandi ◽  
Imam Fahrur Rozi ◽  
Kadek Suarjuna Batubulan

At present, the number of crimes in Indonesia is quite large. The large number of crimes in Indonesia will have an impact on the number of legal documents that will be handled by law enforcement officials. In understanding legal documents, law enforcement officials such as lawyers, judges, and prosecutors must read the entire document which will take a long time. Therefore a summary is needed so that law enforcement officials can understand it more easily. So that one solution needed is to make a summary of the legal documents where the documents are in PDF form. In terms of summarizing the text, the method that can be used is the Latent Semantic Analysis algorithm. The algorithm is used to describe or analyze the hidden meaning of a language, code or other type of representation in order to obtain important information.From testing the 10 documents summarized by experts, the results of precision, recall, f-measure and accuracy are obtained sequentially on automatic text summarization using the Latent Semantic Analysis method for a compression rate of 75%, namely 53%, 27%, 35% and 71%. for a compression rate of 50%, namely 54%, 56%, 55% and 75%, and for a compression rate of 25%, namely 51%, 79%, 61% and 75%. Based on the results of the research and testing that has been done, it can be concluded that the Latent Semantic Analysis Method can be used to summarize legal documents.


2020 ◽  
Vol 9 (2) ◽  
pp. 342
Author(s):  
Amal Alkhudari

Due to the wide spread information and the diversity of its sources, there is a need to produce an accurate text summary with the least time and effort. This summary must  preserve key information content and overall meaning of the original text. Text summarization is one of the most important applications of Natural Language Processing (NLP). The goal of automatic text summarization is to create summaries that are similar to human-created ones. However, in many cases, the readability of created summaries is not satisfactory,   because the summaries do not consider the meaning of the words and do not cover all the semantically relevant aspects of data. In this paper we use syntactic and semantic analysis to propose an automatic system of Arabic texts summarization. This system is capable of understanding the meaning of information and retrieves only the relevant part. The effectiveness and evaluation of the proposed work are demonstrated under EASC corpus using Rouge measure. The generated summaries will be compared against those done by human and precedent researches.  


Author(s):  
Hui Lin ◽  
Vincent Ng

The focus of automatic text summarization research has exhibited a gradual shift from extractive methods to abstractive methods in recent years, owing in part to advances in neural methods. Originally developed for machine translation, neural methods provide a viable framework for obtaining an abstract representation of the meaning of an input text and generating informative, fluent, and human-like summaries. This paper surveys existing approaches to abstractive summarization, focusing on the recently developed neural approaches.


Sign in / Sign up

Export Citation Format

Share Document