scholarly journals Performance Evaluation of LSA, NMF and ILSA in Electronic Assessment of Free Text Document

Author(s):  
M. M. Rufai ◽  
A. O. Afolabi ◽  
O. D. Fenwa ◽  
F. A. Ajala

Aims: To evaluate the performance of an Improved Latent Semantic Analysis (ILSA), Latent Semantic Analysis (LSA), Non-Negative Matrix Factorization (NMF) algorithms in an Electronic Assessment Application using metrics, Term Similarity, Precision, Recall and F-measure functions, Mean divergence, Assessment Accuracy and Adequacy in Semantic Representation. Methodology: The three algorithms were separately applied in developing an Electronic Assessment application. One hundred students’ responses to a test question in an introductory artificial intelligence course were used. Their performance was measured based on the following metrics, Term Similarity, Precision, Recall and F-measure functions, Mean divergence and Assessment Accuracy. Results: ILSA outperformed the LSA and NMF with an assessment accuracy of 96.64, mean divergence from manual score of 0.03, and recall, precision and f-measure value of 0.83, 0.85 and 0.87 respectively. Conclusion: The research observed the performance of an improved algorithm ILSA for electronic Assessment of free text document using Adequacy in Semantic Representation, Retrieval Quality and Assessment Accuracy as performance metrics. The results obtained from the experimental designs shows the adequacy of the improved algorithm in semantic representation, better retrieval quality and improved assessment accuracy.

Author(s):  
Rufai Mohammed Mutiu ◽  
A. O. Afolabi ◽  
O. D. Fenwa ◽  
F. A. Ajala

Latent Semantic Analysis (LSA) is a statistical approach designed to capture the semantic content of a document which form the basis for its application in electronic assessment of free-text document in an examination context. The students submitted answers are transformed into a Document Term Matrix (DTM) and approximated using SVD-LSA for noise reduction. However, it has been shown that LSA still has remnant of noise in its semantic representation which ultimately affects the assessment result accuracy when compared to human grading. In this work, the LSA Model is formulated as an optimization problem using Non-negative Matrix Factorization(NMF)-Ant Colony Optimization (ACO). The factors of LSA are used to initialize NMF factors for quick convergence. ACO iteratively searches for the value of the decision variables in NMF that minimizes the objective function and use these values to construct a reduced DTM. The results obtained shows a better approximation of the DTM representation and improved assessment result of 91.35% accuracy, mean divergence of 0.0865 from human grading and a Pearson correlation coefficient of 0.632 which proved to be a better result than the existing ones.


2020 ◽  
Vol 9 (1) ◽  
pp. 105
Author(s):  
Muhammad Afif Ubaidillah ◽  
Ida Bagus Gede Dwidasmara ◽  
Agus Muliantara

Ringkasan merupakan suatu cara yang efektif untuk meyajikan suatu karangan yang panjang dalam bentuk yang singkat. Walaupun bentuknya ringkas, namun ringkasan itu tetap memepertahankan pikiran pengarang dan pendekatannya yang asli. Namun dalam membuat ringkasan kita harus membaca berita atau artikel terlebih dahulu, sedangkan ringkasan dibuat dengan tujuan untuk meminimalkan waktu pembaca dan memberikan teks yang isinya langsung mengarah pada tujuan utama atau ide pokoknya. Pada penelitian ini memaparkan peringkasan teks otomatis berita online dari sebuah website menggunakan CLSA (Cross Latent Semantic Analysis) dan Cosine Similarity. Penelitian ini dilakukan untuk menguji seberapa baik hasil dan akurasi ringkasan yang dilakukan oleh CLSA dan cosine similarity. Penelitian ini menggunakan data sekunder dari berita dari media online yaitu web balipost.com dengan wilayah khusus Denpasar. Proses pengambilan data dilakukan dengan cara crawling. Data berita yang digunakan ialah sebanyak 161 berita, berita hasil ringkasan sistem nantinya akan dibandingkan dengan hasil ringkasan manual untuk mendapatkan akurasinya. Dari hasil pengujian yang dilakukan oleh sistem didapatkan nilai rata – rata akurasi F-Measure sebesar 58%, rata – rata Precision 62% dan rata – rata Recall 57%. Hasil dari penelitian peringkasan teks otomatis dari berita online dengan menggunakan metode CLSA dan cosine similarity memberikan hasil dan akurasi ringkasan yang cukup. Keywords : ringkasan, peringkas teks otomatis, crawling, CLSA, cosine similarity 


2019 ◽  
Vol 8 (2) ◽  
pp. 1524-1530

In learning management system, a discussion forum, in which the students and lecturers are involved actively as part of the learning method, enriches the context of communication, thereby enhancing the students’ learning and performance. The aim of this paper was to determine the appropriate topics for a discussion forum for learning management systems through enhanced probabilistic latent semantic analysis (PLSA) with the corpus classifier algorithm. In preparing the paper, the methods used were PLSA and the classifying process, which classifies the documents to become a corpus based on the similarity word approach. The similarity word is influenced by the term-frequency of the word in the document. The novel concept in this paper is the corpus classifier algorithm. The experiment was conducted using three approaches to discover the topic, and it used 4,868 distinct words from 234 documents. The documents were contained in three threads subject. The post of the discussion forum is the text document. The performance of the result was measured by the f-measure, which was calculated for each thread subject. The corpus classifier algorithm was used in the second approach, and third approach increased the average f-measure values for the second and third thread subjects by approximately 24 and 17%, respectively.


2020 ◽  
Vol 18 (3) ◽  
pp. 239-248
Author(s):  
Eren Gultepe ◽  
Mehran Kamkarhaghighi ◽  
Masoud Makrehchi

A parsimonious convolutional neural network (CNN) for text document classification that replicates the ease of use and high classification performance of linear methods is presented. This new CNN architecture can leverage locally trained latent semantic analysis (LSA) word vectors. The architecture is based on parallel 1D convolutional layers with small window sizes, ranging from 1 to 5 words. To test the efficacy of the new CNN architecture, three balanced text datasets that are known to perform exceedingly well with linear classifiers were evaluated. Also, three additional imbalanced datasets were evaluated to gauge the robustness of the LSA vectors and small window sizes. The new CNN architecture consisting of 1 to 4-grams, coupled with LSA word vectors, exceeded the accuracy of all linear classifiers on balanced datasets with an average improvement of 0.73%. In four out of the total six datasets, the LSA word vectors provided a maximum classification performance on par with or better than word2vec vectors in CNNs. Furthermore, in four out of the six datasets, the new CNN architecture provided the highest classification performance. Thus, the new CNN architecture and LSA word vectors could be used as a baseline method for text classification tasks.


2017 ◽  
Vol 3 (2) ◽  
pp. 94
Author(s):  
Gamaria Mandar ◽  
Gunawan Gunawan

Peringkasan dokumen berita Bahasa Indonesia dapat membantu untuk menemukan ide-ide pokok atau informasi penting lain dari sebuah berita. Berita umumnya terdiri atas banyaknya paragraf menjadi sebab diperlukan sebuah sistem untuk mengekstrak informasi, sehingga mampu memberikan ide pokok atau informasi penting yang tepat kepada pembaca, tanpa harus membaca secara detail keseluruhan isi berita tersebut, di samping itu dapat dimanfaatkan guna keperluaan Really Simple Syndication Feed (RSS-Feed). Penelitian ini memaparkan peringkasan dokumen berita berbahasa Indonesia menggunakan metode Cross Latent Semantic Analysis (CLSA) dan Latent Semantic Analysis (LSA). Untuk menguji seberapa baik hasil ringkasan yang dilakukan CLSA penelitian ini menggunakan 240 artikel berita yang diambil dari halaman portal www.kompas.com dan dua pakar yang berlatar belakang bidang yang berbeda. Hasil ringkasan CLSA dengan compression rate 30% memperoleh nilai F-Measure 0.72%. Penelitian ini juga menemukan fakta bahwa CLSA lebih baik dari metode LSA yang merupakan cikal bakal dari metode CLSA, walaupun skor hasil F-Measure keduanya tidak berbeda jauh.  Summarizing news documents in Bahasa serves to find main ideas or any other important information from a piece of news. A system to extract the information from ones consisting of many paragraphs is then deemed necessary in order to present precise main ideas or important information to the readers without them having to read the entire passage of news documents, in addition to become useful for Really Simple Syndication Feed (RSS-Feed). This article discusses summarizing news documents in Bahasa using Cross Latent Semantic Analysis (CLSA). To test if the summary resulted from CLSA qualified, this study examines 240 news articles retrieved from www.kompas.com and employs two experts from different fields. The summary resulted from CLSA with a compression rate of 30% obtains an F-Measure of 0.72%. This study also evidently indicates that CLSA has better performance from Latent Semantic Analysis (LSA) which was the initial system for CLSA, despite both F-Measure percentages being only slightly different.


2021 ◽  
Vol 7 (2) ◽  
pp. 153
Author(s):  
Yunita Maulidia Sari ◽  
Nenden Siti Fatonah

Perkembangan teknologi yang pesat membuat kita lebih mudah dalam menemukan informasi-informasi yang dibutuhkan. Permasalahan muncul ketika informasi tersebut sangat banyak. Semakin banyak informasi dalam sebuah modul maka akan semakin panjang isi teks dalam modul tersebut. Hal tersebut akan memakan waktu yang cukup lama untuk memahami inti informasi dari modul tersebut. Salah satu solusi untuk mendapatkan inti informasi dari keseluruhan modul dengan cepat dan menghemat waktu adalah dengan membaca ringkasannya. Cara cepat untuk mendapatkan ringkasan sebuah dokumen adalah dengan cara peringkasan teks otomatis. Peringkasan teks otomatis (Automatic Text Summarization) merupakan teks yang dihasilkan dari satu atau lebih dokumen, yang mana hasil teks tersebut memberikan informasi penting dari sumber dokumen asli, serta secara otomatis hasil teks tersebut tidak lebih panjang dari setengah sumber dokumen aslinya. Penelitian ini bertujuan untuk menghasilkan peringkasan teks otomatis pada modul pembelajaran berbahasa Indonesia dan mengetahui hasil akurasi peringkasan teks otomatis yang menerapkan metode Cross Latent Semantic Analysis (CLSA). Jumlah data yang digunakan pada penelitian ini sebanyak 10 file modul pembelajaran yang berasal dari modul para dosen Universitas Mercu Buana, dengan format .docx sebanyak 5 file dan format .pdf sebanyak 5 file. Penelitian ini menerapkan metode Term Frequency-Inverse Document Frequency (TF-IDF) untuk pembobotan kata dan metode Cross Latent Semantic Analysis (CLSA) untuk peringkasan teks. Pengujian akurasi pada peringkasan modul pembelajaran dilakukan dengan cara membandingkan hasil ringkasan manual oleh manusia dan hasil ringkasan sistem. Yang mana pengujian ini menghasilkan rata-rata nilai f-measure, precision, dan recall tertinggi pada compression rate 20% dengan nilai berturut-turut 0.3853, 0.432, dan 0.3715.


Automatic text summarization of a resource-poor language is a challenging task. Unsupervised extractive techniques are often preferred for such languages due to scarcity of resources. Latent Semantic Analysis (LSA) is an unsupervised technique which automatically identifies semantically important sentences from a text document. Two methods based on Latent Semantic Analysis have been evaluated on two datasets of a resource-poor language using Singular Value Decomposition (SVD) on different vector-space models. The performance of the methods is evaluated using ROUGE-L scores obtained by comparing the system generated summaries with human generated model summaries. Both the methods are found to be performing better for shorter documents than longer ones.


Sign in / Sign up

Export Citation Format

Share Document