Peringkasan Teks Otomatis pada Modul Pembelajaran Berbahasa Indonesia Menggunakan Metode Cross Latent Semantic Analysis (CLSA)

Perkembangan teknologi yang pesat membuat kita lebih mudah dalam menemukan informasi-informasi yang dibutuhkan. Permasalahan muncul ketika informasi tersebut sangat banyak. Semakin banyak informasi dalam sebuah modul maka akan semakin panjang isi teks dalam modul tersebut. Hal tersebut akan memakan waktu yang cukup lama untuk memahami inti informasi dari modul tersebut. Salah satu solusi untuk mendapatkan inti informasi dari keseluruhan modul dengan cepat dan menghemat waktu adalah dengan membaca ringkasannya. Cara cepat untuk mendapatkan ringkasan sebuah dokumen adalah dengan cara peringkasan teks otomatis. Peringkasan teks otomatis (Automatic Text Summarization) merupakan teks yang dihasilkan dari satu atau lebih dokumen, yang mana hasil teks tersebut memberikan informasi penting dari sumber dokumen asli, serta secara otomatis hasil teks tersebut tidak lebih panjang dari setengah sumber dokumen aslinya. Penelitian ini bertujuan untuk menghasilkan peringkasan teks otomatis pada modul pembelajaran berbahasa Indonesia dan mengetahui hasil akurasi peringkasan teks otomatis yang menerapkan metode Cross Latent Semantic Analysis (CLSA). Jumlah data yang digunakan pada penelitian ini sebanyak 10 file modul pembelajaran yang berasal dari modul para dosen Universitas Mercu Buana, dengan format .docx sebanyak 5 file dan format .pdf sebanyak 5 file. Penelitian ini menerapkan metode Term Frequency-Inverse Document Frequency (TF-IDF) untuk pembobotan kata dan metode Cross Latent Semantic Analysis (CLSA) untuk peringkasan teks. Pengujian akurasi pada peringkasan modul pembelajaran dilakukan dengan cara membandingkan hasil ringkasan manual oleh manusia dan hasil ringkasan sistem. Yang mana pengujian ini menghasilkan rata-rata nilai f-measure, precision, dan recall tertinggi pada compression rate 20% dengan nilai berturut-turut 0.3853, 0.432, dan 0.3715.

Download Full-text

Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF)

ComTech Computer Mathematics and Engineering Applications ◽

10.21512/comtech.v7i4.3746 ◽

2016 ◽

Vol 7 (4) ◽

pp. 285 ◽

Cited By ~ 14

Author(s):

Hans Christian ◽

Mikhael Pramodana Agus ◽

Derwin Suhartono

Keyword(s):

Language Processing ◽

Text Summarization ◽

The Other ◽

Online Information ◽

Inverse Document Frequency ◽

Automatic Text Summarization ◽

Document Frequency ◽

Online Source ◽

Automatic Text ◽

F Measure

The increasing availability of online information has triggered an intensive research in the area of automatic text summarization within the Natural Language Processing (NLP). Text summarization reduces the text by removing the less useful information which helps the reader to find the required information quickly. There are many kinds of algorithms that can be used to summarize the text. One of them is TF-IDF (TermFrequency-Inverse Document Frequency). This research aimed to produce an automatic text summarizer implemented with TF-IDF algorithm and to compare it with other various online source of automatic text summarizer. To evaluate the summary produced from each summarizer, The F-Measure as the standard comparison value had been used. The result of this research produces 67% of accuracy with three data samples which are higher compared to the other online summarizers.

Download Full-text

An Extractive Summarization Technique for Text Documents

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f8369.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 1202-1206

Keyword(s):

Text Summarization ◽

Text Documents ◽

Inverse Document Frequency ◽

Extractive Summarization ◽

Term Frequency ◽

Automatic Text Summarization ◽

Document Frequency ◽

Search Information ◽

Automatic Text

In order to read as well as search information quickly, there was a need to reduce the size of the documents without any changes to its content. Therefore, in order to solve this problem, there was a solution to it by introducing a technique called as automatic text summarization which is used to generate summaries from the input document by condensing large sized input documents into smaller documents without losing its meaning as well as relevancy with respect to the original document. Text summarization stands for shortening of text into accurate, meaningful sentences. The paper shows an implementation of summarization of the original document by scoring the sentence based on term frequency and inverse document frequency matrix. The entire record was compressed so that only the relevant sentences in the document were retained. This technique can be applicable in various applications like automating text documents, quicker understanding of documents because of summarization

Download Full-text

Term Frequency-Inverse Document Frequency Answer Categorization with Support Vector Machine on Automatic Short Essay Grading System with Latent Semantic Analysis for Japanese Language

2019 International Conference on Electrical Engineering and Computer Science (ICECOS) ◽

10.1109/icecos47637.2019.8984530 ◽

2019 ◽

Author(s):

Anak Agung Putri Ratna ◽

Aaliyah Kaltsum ◽

Lea Santiar ◽

Hanifah Khairunissa ◽

Ihsan Ibrahim ◽

...

Keyword(s):

Support Vector Machine ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Support Vector ◽

Grading System ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency ◽

Essay Grading ◽

Short Essay

Download Full-text

Otomatisasi Peringkasan Teks Pada Dokumen Hukum Menggunakan Metode Latent Semantic Analysis

Jurnal Informatika Polinema ◽

10.33795/jip.v7i3.515 ◽

2021 ◽

Vol 7 (3) ◽

pp. 9-16

Author(s):

Millenia Rusbandi ◽

Imam Fahrur Rozi ◽

Kadek Suarjuna Batubulan

Keyword(s):

Law Enforcement ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Compression Rate ◽

Analysis Method ◽

Legal Documents ◽

Automatic Text Summarization ◽

Long Time ◽

Law Enforcement Officials ◽

Automatic Text

At present, the number of crimes in Indonesia is quite large. The large number of crimes in Indonesia will have an impact on the number of legal documents that will be handled by law enforcement officials. In understanding legal documents, law enforcement officials such as lawyers, judges, and prosecutors must read the entire document which will take a long time. Therefore a summary is needed so that law enforcement officials can understand it more easily. So that one solution needed is to make a summary of the legal documents where the documents are in PDF form. In terms of summarizing the text, the method that can be used is the Latent Semantic Analysis algorithm. The algorithm is used to describe or analyze the hidden meaning of a language, code or other type of representation in order to obtain important information.From testing the 10 documents summarized by experts, the results of precision, recall, f-measure and accuracy are obtained sequentially on automatic text summarization using the Latent Semantic Analysis method for a compression rate of 75%, namely 53%, 27%, 35% and 71%. for a compression rate of 50%, namely 54%, 56%, 55% and 75%, and for a compression rate of 25%, namely 51%, 79%, 61% and 75%. Based on the results of the research and testing that has been done, it can be concluded that the Latent Semantic Analysis Method can be used to summarize legal documents.

Download Full-text

A SYNTACTIC-BASED SENTENCE VALIDATION TECHNIQUE FOR MALAY TEXT SUMMARIZER

Journal of Information and Communication Technology ◽

10.32890/jict2021.20.3.3 ◽

2021 ◽

Vol 20 (Number 3) ◽

pp. 329-352

Author(s):

Suraya Alias ◽

Mohd Shamrie Sainin ◽

Siti Khaotijah Mohammad

Keyword(s):

Language Processing ◽

Text Summarization ◽

Compression Rate ◽

Automatic Evaluation ◽

Readability Score ◽

Automatic Text Summarization ◽

Validation Technique ◽

Automatic Text ◽

F Measure

In the Automatic Text Summarization domain, a Sentence Compression (SC) technique is applied to the summary sentence to remove unnecessary words or phrases. The purpose of SC is to preserve the important information in the sentence and to remove the unnecessary ones without sacrificing the sentence's grammar. The existing development of Malay Natural Language Processing (NLP) tools is still under study with limited open access. The issue is the lack of a benchmark dataset in the Malay language to evaluate the quality of the summaries and to validate the compressed sentence produced by the summarizer model. Hence, our paper outlines a Syntactic-based Sentence Validation technique for Malay sentences by referring to the Malay Grammar Pattern. In this work, we propose a new derivation set of Syntactic Rules based on the Malay main Word Class to validate a Malay sentence that undergoes the SC procedure. We experimented using the Malay dataset of 100 new articles covering the Natural Disaster and Events domain to find the optimal compression rate and its effect on the summary content. An automatic evaluation using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) produced a result with an average F-measure of 0.5826 and an average Recall value of 0.5925 with an optimum compression rate of 0.5 Confidence Conf value. Furthermore, a manual summary evaluation by a group of Malay experts on the grammaticality of the compressed summary sentence produced a good result of 4.11 and a readability score of 4.12 out of 5. This depicts the reliability of the proposed technique to validate the Malay sentence with promising summary content and readability results.

Download Full-text

Latent Semantic Analysis in Automatic Text Summarization: A state of the art analysis

International Journal of Intelligence and Sustainable Computing ◽

10.1504/ijisc.2020.10029282 ◽

2020 ◽

Vol 1 (1) ◽

pp. 1

Author(s):

Mehala N ◽

Tapas Guha

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

State Of The Art ◽

Text Summarization ◽

Automatic Text Summarization ◽

Art Analysis ◽

Automatic Text

Download Full-text

LSA Based Text Summarization

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b3288.079220 ◽

2020 ◽

Vol 9 (2) ◽

pp. 150-156

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Maximum Level ◽

Latent Semantic Indexing ◽

Text Summarization ◽

Semantic Indexing ◽

Inverse Document Frequency ◽

Document Frequency ◽

Key Terms ◽

Diversity Constraint

In this study we propose an automatic single document text summarization technique using Latent Semantic Analysis (LSA) and diversity constraint in combination. The proposed technique uses the query based sentence ranking. Here we are not considering the concept of IR (Information Retrieval) so we generate the query by using the TF-IDF(Term Frequency-Inverse Document Frequency). For producing the query vector, we identify the terms having the high IDF. We know that LSA utilizes the vectorial semantics to analyze the relationships between documents in a corpus or between sentences within a document and key terms they carry by producing a list of ideas interconnected to the documents and terms. LSA helps to represent the latent structure of documents. For selecting the sentences from the document Latent Semantic Indexing (LSI) is used. LSI helps to arrange the sentences with its score. Traditionally the highest score sentences have been chosen for summary but here we calculate the diversity between chosen sentences and produce the final summary as a good summary should have maximum level of diversity. The proposed technique is evaluated on OpinosisDataset1.0.

Download Full-text

Single Document Text Summarization of a Resource-Poor Language using an Unsupervised Technique

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a2250.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 6278-6281

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Singular Value ◽

Text Summarization ◽

Text Document ◽

Automatic Text Summarization ◽

Resource Poor ◽

Value Decomposition ◽

Scarcity Of Resources ◽

Automatic Text

Automatic text summarization of a resource-poor language is a challenging task. Unsupervised extractive techniques are often preferred for such languages due to scarcity of resources. Latent Semantic Analysis (LSA) is an unsupervised technique which automatically identifies semantically important sentences from a text document. Two methods based on Latent Semantic Analysis have been evaluated on two datasets of a resource-poor language using Singular Value Decomposition (SVD) on different vector-space models. The performance of the methods is evaluated using ROUGE-L scores obtained by comparing the system generated summaries with human generated model summaries. Both the methods are found to be performing better for shorter documents than longer ones.

Download Full-text