Single Document Text Summarization of a Resource-Poor Language using an Unsupervised Technique

Automatic text summarization of a resource-poor language is a challenging task. Unsupervised extractive techniques are often preferred for such languages due to scarcity of resources. Latent Semantic Analysis (LSA) is an unsupervised technique which automatically identifies semantically important sentences from a text document. Two methods based on Latent Semantic Analysis have been evaluated on two datasets of a resource-poor language using Singular Value Decomposition (SVD) on different vector-space models. The performance of the methods is evaluated using ROUGE-L scores obtained by comparing the system generated summaries with human generated model summaries. Both the methods are found to be performing better for shorter documents than longer ones.

Download Full-text

An automatic text summarization using text features and singular value decomposition for popular articles in Indonesia language

2015 International Seminar on Intelligent Technology and Its Applications (ISITIA) ◽

10.1109/isitia.2015.7219948 ◽

2015 ◽

Cited By ~ 2

Author(s):

Fergyanto E. Gunawan ◽

Adrian Victor Juandi ◽

Benfano Soewito

Keyword(s):

Singular Value Decomposition ◽

Singular Value ◽

Text Summarization ◽

Automatic Text Summarization ◽

Text Features ◽

Value Decomposition ◽

Automatic Text

Download Full-text

Latent Semantic Analysis in Automatic Text Summarization: A state of the art analysis

International Journal of Intelligence and Sustainable Computing ◽

10.1504/ijisc.2020.10029282 ◽

2020 ◽

Vol 1 (1) ◽

pp. 1

Author(s):

Mehala N ◽

Tapas Guha

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

State Of The Art ◽

Text Summarization ◽

Automatic Text Summarization ◽

Art Analysis ◽

Automatic Text

Download Full-text

Peringkasan Teks Otomatis pada Modul Pembelajaran Berbahasa Indonesia Menggunakan Metode Cross Latent Semantic Analysis (CLSA)

Jurnal Edukasi dan Penelitian Informatika (JEPIN) ◽

10.26418/jp.v7i2.47768 ◽

2021 ◽

Vol 7 (2) ◽

pp. 153

Author(s):

Yunita Maulidia Sari ◽

Nenden Siti Fatonah

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Text Summarization ◽

Compression Rate ◽

Inverse Document Frequency ◽

Term Frequency ◽

Automatic Text Summarization ◽

Document Frequency ◽

Automatic Text ◽

F Measure

Perkembangan teknologi yang pesat membuat kita lebih mudah dalam menemukan informasi-informasi yang dibutuhkan. Permasalahan muncul ketika informasi tersebut sangat banyak. Semakin banyak informasi dalam sebuah modul maka akan semakin panjang isi teks dalam modul tersebut. Hal tersebut akan memakan waktu yang cukup lama untuk memahami inti informasi dari modul tersebut. Salah satu solusi untuk mendapatkan inti informasi dari keseluruhan modul dengan cepat dan menghemat waktu adalah dengan membaca ringkasannya. Cara cepat untuk mendapatkan ringkasan sebuah dokumen adalah dengan cara peringkasan teks otomatis. Peringkasan teks otomatis (Automatic Text Summarization) merupakan teks yang dihasilkan dari satu atau lebih dokumen, yang mana hasil teks tersebut memberikan informasi penting dari sumber dokumen asli, serta secara otomatis hasil teks tersebut tidak lebih panjang dari setengah sumber dokumen aslinya. Penelitian ini bertujuan untuk menghasilkan peringkasan teks otomatis pada modul pembelajaran berbahasa Indonesia dan mengetahui hasil akurasi peringkasan teks otomatis yang menerapkan metode Cross Latent Semantic Analysis (CLSA). Jumlah data yang digunakan pada penelitian ini sebanyak 10 file modul pembelajaran yang berasal dari modul para dosen Universitas Mercu Buana, dengan format .docx sebanyak 5 file dan format .pdf sebanyak 5 file. Penelitian ini menerapkan metode Term Frequency-Inverse Document Frequency (TF-IDF) untuk pembobotan kata dan metode Cross Latent Semantic Analysis (CLSA) untuk peringkasan teks. Pengujian akurasi pada peringkasan modul pembelajaran dilakukan dengan cara membandingkan hasil ringkasan manual oleh manusia dan hasil ringkasan sistem. Yang mana pengujian ini menghasilkan rata-rata nilai f-measure, precision, dan recall tertinggi pada compression rate 20% dengan nilai berturut-turut 0.3853, 0.432, dan 0.3715.

Download Full-text

SGATS: Semantic Graph-based Automatic Text Summarization from Hindi Text Documents

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3464381 ◽

2021 ◽

Vol 20 (6) ◽

pp. 1-32

Author(s):

Manju Lata Joshi ◽

Nisheeth Joshi ◽

Namita Mittal

Keyword(s):

Language Processing ◽

Semantic Analysis ◽

Text Summarization ◽

Original Text ◽

Theoretic Approach ◽

Extractive Summarization ◽

Semantic Graph ◽

Text Document ◽

Automatic Text Summarization ◽

Automatic Text

Creating a coherent summary of the text is a challenging task in the field of Natural Language Processing (NLP). Various Automatic Text Summarization techniques have been developed for abstractive as well as extractive summarization. This study focuses on extractive summarization which is a process containing selected delineative paragraphs or sentences from the original text and combining these into smaller forms than the document(s) to generate a summary. The methods that have been used for extractive summarization are based on a graph-theoretic approach, machine learning, Latent Semantic Analysis (LSA), neural networks, cluster, and fuzzy logic. In this paper, a semantic graph-based approach SGATS (Semantic Graph-based approach for Automatic Text Summarization) is proposed to generate an extractive summary. The proposed approach constructs a semantic graph of the original Hindi text document by establishing a semantic relationship between sentences of the document using Hindi Wordnet ontology as a background knowledge source. Once the semantic graph is constructed, fourteen different graph theoretical measures are applied to rank the document sentences depending on their semantic scores. The proposed approach is applied to two data sets of different domains of Tourism and Health. The performance of the proposed approach is compared with the state-of-the-art TextRank algorithm and human-annotated summary. The performance of the proposed system is evaluated using widely accepted ROUGE measures. The outcomes exhibit that our proposed system produces better results than TextRank for health domain corpus and comparable results for tourism corpus. Further, correlation coefficient methods are applied to find a correlation between eight different graphical measures and it is observed that most of the graphical measures are highly correlated.

Download Full-text

An Automatic Text Summarization on Naive Bayes Classifier Using Latent Semantic Analysis

Data, Engineering and Applications ◽

10.1007/978-981-13-6347-4_16 ◽

2019 ◽

pp. 171-180

Author(s):

Chintan Shah ◽

Anjali Jivani

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Text Summarization ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Automatic Text Summarization ◽

Automatic Text

Download Full-text

Automatic text summarization using latent semantic analysis

Programming and Computer Software ◽

10.1134/s0361768811060041 ◽

2011 ◽

Vol 37 (6) ◽

pp. 299-305 ◽

Cited By ~ 18

Author(s):

I. V. Mashechkin ◽

M. I. Petrovskiy ◽

D. S. Popov ◽

D. V. Tsarev

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Text Summarization ◽

Automatic Text Summarization ◽

Automatic Text

Download Full-text

lsemantica: A command for text similarity based on latent semantic analysis

The Stata Journal Promoting communications on statistics and Stata ◽

10.1177/1536867x19830910 ◽

2019 ◽

Vol 19 (1) ◽

pp. 129-142 ◽

Cited By ~ 1

Author(s):

Carlo Schwarz

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Learning Algorithm ◽

Singular Value ◽

Machine Learning Algorithm ◽

Truncated Singular Value Decomposition ◽

Similarity Comparison ◽

Text Similarity ◽

Semantic Relationships ◽

Value Decomposition

In this article, I present the lsemantica command, which implements latent semantic analysis in Stata. Latent semantic analysis is a machine learning algorithm for word and text similarity comparison and uses truncated singular value decomposition to derive the hidden semantic relationships between words and texts. lsemantica provides a simple command for latent semantic analysis as well as complementary commands for text similarity comparison.

Download Full-text

Otomatisasi Peringkasan Teks Pada Dokumen Hukum Menggunakan Metode Latent Semantic Analysis

Jurnal Informatika Polinema ◽

10.33795/jip.v7i3.515 ◽

2021 ◽

Vol 7 (3) ◽

pp. 9-16

Author(s):

Millenia Rusbandi ◽

Imam Fahrur Rozi ◽

Kadek Suarjuna Batubulan

Keyword(s):

Law Enforcement ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Compression Rate ◽

Analysis Method ◽

Legal Documents ◽

Automatic Text Summarization ◽

Long Time ◽

Law Enforcement Officials ◽

Automatic Text

At present, the number of crimes in Indonesia is quite large. The large number of crimes in Indonesia will have an impact on the number of legal documents that will be handled by law enforcement officials. In understanding legal documents, law enforcement officials such as lawyers, judges, and prosecutors must read the entire document which will take a long time. Therefore a summary is needed so that law enforcement officials can understand it more easily. So that one solution needed is to make a summary of the legal documents where the documents are in PDF form. In terms of summarizing the text, the method that can be used is the Latent Semantic Analysis algorithm. The algorithm is used to describe or analyze the hidden meaning of a language, code or other type of representation in order to obtain important information.From testing the 10 documents summarized by experts, the results of precision, recall, f-measure and accuracy are obtained sequentially on automatic text summarization using the Latent Semantic Analysis method for a compression rate of 75%, namely 53%, 27%, 35% and 71%. for a compression rate of 50%, namely 54%, 56%, 55% and 75%, and for a compression rate of 25%, namely 51%, 79%, 61% and 75%. Based on the results of the research and testing that has been done, it can be concluded that the Latent Semantic Analysis Method can be used to summarize legal documents.

Download Full-text

Developing a new approach to summarize Arabic text automatically using syntactic and semantic analysis

International Journal of Engineering & Technology ◽

10.14419/ijet.v9i2.30324 ◽

2020 ◽

Vol 9 (2) ◽

pp. 342

Author(s):

Amal Alkhudari

Keyword(s):

Language Processing ◽

Automatic System ◽

Semantic Analysis ◽

Text Summarization ◽

Original Text ◽

Arabic Text ◽

Wide Spread ◽

New Approach ◽

Automatic Text Summarization ◽

Automatic Text

Due to the wide spread information and the diversity of its sources, there is a need to produce an accurate text summary with the least time and effort. This summary must preserve key information content and overall meaning of the original text. Text summarization is one of the most important applications of Natural Language Processing (NLP). The goal of automatic text summarization is to create summaries that are similar to human-created ones. However, in many cases, the readability of created summaries is not satisfactory, because the summaries do not consider the meaning of the words and do not cover all the semantically relevant aspects of data. In this paper we use syntactic and semantic analysis to propose an automatic system of Arabic texts summarization. This system is capable of understanding the meaning of information and retrieves only the relevant part. The effectiveness and evaluation of the proposed work are demonstrated under EASC corpus using Rouge measure. The generated summaries will be compared against those done by human and precedent researches.

Download Full-text

A New LSA and Entropy-Based Approach for Automatic Text Document Summarization

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2018100101 ◽

2018 ◽

Vol 14 (4) ◽

pp. 1-32 ◽

Cited By ~ 4

Author(s):

Chandra Yadav ◽

Aditi Sharan

Keyword(s):

Semantic Analysis ◽

Evaluation Criteria ◽

Research Area ◽

Algebraic Model ◽

Document Summarization ◽

Text Document ◽

Proposed Model ◽

Active Research ◽

Value Decomposition ◽

Automatic Text

Automatic text document summarization is active research area in text mining field. In this article, the authors are proposing two new approaches (three models) for sentence selection, and a new entropy-based summary evaluation criteria. The first approach is based on the algebraic model, Singular Value Decomposition (SVD), i.e. Latent Semantic Analysis (LSA) and model is termed as proposed_model-1, and Second Approach is based on entropy that is further divided into proposed_model-2 and proposed_model-3. In first proposed model, the authors are using right singular matrix, and second & third proposed models are based on Shannon entropy. The advantage of these models is that these are not a Length dominating model, giving better results, and low redundancy. Along with these three new models, an entropy-based summary evaluation criteria is proposed and tested. They are also showing that their entropy based proposed models statistically closer to DUC-2002's standard/gold summary. In this article, the authors are using a dataset taken from Document Understanding Conference-2002.

Download Full-text