R2DCLT: retrieving relevant documents using cosine similarity and LDA in text mining

An Internship course becomes one of many compulsory subjects in Under graduate Program of Informatics Engineering in Ahmad Dahlan University, Yogyakarta.In the last few semesters, we found that some students were failed in taking this subject. After being identified, they were facing some obstacles such as determining the main theme for their job description. During this study, we proposed an application to classify the internship titles by using a technique in text mining called Shared Nearest-Neighbor and Cosine Similarity. From the result, we got values from the parameter K is 7, the epsilon value is 0.5, and the value of Mint t is 0.3 with 22 clusters and 0 outlier. These values presented that all data titles of internship activitiesareclassified into each cluster. 7 topics whichtook by majority of students are:1) Information Systems (7 titles);2) Instructional Media (5 titles);3)Archiving Applications (4 titles);4) Web Profile Implementation (3 titles); 5)Instructional Media for University Courses (3 titles); Multimedia (3 titles) and 6)Workshop & Training (3 titles).

Download Full-text

R2DCLT: retrieving relevant documents using cosine similarity and LDA in text mining

International Journal of Information and Communication Technology ◽

10.1504/ijict.2021.118576 ◽

2021 ◽

Vol 19 (4) ◽

pp. 391

Author(s):

R.S. Ramya ◽

Ganesh Singh ◽

Santosh Nimbhorkar Sejal ◽

K.R. Venugopal ◽

S.S. Iyengar ◽

...

Keyword(s):

Text Mining ◽

Cosine Similarity

Download Full-text

The Similarity of Essay Examination Results using Preprocessing Text Mining with Cosine Similarity and Nazief-Adriani Algorithms

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i3.938 ◽

2021 ◽

Vol 12 (3) ◽

pp. 1415-1422

Author(s):

Rika Rosnelly Et.al

Keyword(s):

Text Mining ◽

Correct Answer ◽

Cosine Similarity ◽

Cosine Law

Exams are one way to measure the level of students' ability to participate in learning. One type of exam given to students is the essay type. This study focuses on making automatic assessments for essay-type exams using cosine similarity. This method has several stages such as folding Case, tokenizing, filtering, stemming, analyzing, weighing of words in documents with cosine similarity. The stemming process uses the Nazief & Adriani algorithm. The results of this study are to conclude that the choice of words that are considered as keywords in the answer key greatly affects the results of the system's assessment. This is evidenced by testing applying the cosine law of 89.5%. However, there are several types of questions that are significantly different because there are unique characters in the database and answer keys that do not contain keywords that match the correct answer.

Download Full-text

Text Mining an Automatic Short Answer Grading (ASAG), Comparison of Three Methods of Cosine Similarity, Jaccard Similarity and Dice's Coefficient

Journal of Applied Data Sciences ◽

10.47738/jads.v2i2.31 ◽

2021 ◽

Vol 2 (2) ◽

Author(s):

Tri wahyuningsih

Keyword(s):

Text Mining ◽

Cosine Similarity ◽

Jaccard Similarity ◽

Short Answer

Download Full-text

Penerapan Algoritma Cosine Similarity pada Text Mining Terjemah Al-Qur’an Berdasarkan Keterkaitan Topik

Semesta Teknika ◽

10.18196/st.221235 ◽

2019 ◽

Vol 22 (1) ◽

Author(s):

M Didik Rohmad Wahyudi

Keyword(s):

Text Mining ◽

Cosine Similarity

Download Full-text

Shifting Policies in Conflict Arenas: A Cosine Similarity and Text Mining Analysis of Turkey’s Syria Policy, 2012-2016

Journal of Strategic Security ◽

10.5038/1944-0472.11.4.1690 ◽

2019 ◽

Vol 11 (4) ◽

pp. 1-19 ◽

Cited By ~ 3

Author(s):

Brendon Cannon ◽

◽

Mikiyasu Nakayama ◽

Daisuke Sasaki ◽

Ash Rossiter ◽

...

Keyword(s):

Text Mining ◽

Cosine Similarity

Download Full-text

OPTIMASI MESIN PENCARI BUKU FIKSI BERDASARKAN PADA SEMANTIK IMPRESI

METHOMIKA: Jurnal Manajemen Informatika dan Komputerisasi Akuntansi ◽

10.46880/jmika.vol5no1.pp1-8 ◽

2021 ◽

Vol 5 (1) ◽

pp. 1-8

Author(s):

Rengga Asmara ◽

◽

Nur Rasyid Mubtadai ◽

Varidh Bimantara

Keyword(s):

Text Mining ◽

Survey Data ◽

Error Rate ◽

Special Interest ◽

Cosine Similarity ◽

Application System ◽

Final Project ◽

Search Results ◽

Similarity Algorithm

Fiction books are one of the most popular types of books in Indonesia. There are five most popular genres in fiction books, namely fantasy, mystery, romance, sci-fi, and thriller. Each genre gives a different impression and special interest for readers. It has become a common habit when people choose a fiction book based on the title, author, or publisher of the book. However, it does not provide precise search results. In this final project, an application system was developed to find out fiction books based on semantic impressions on the cover of the fiction book. The impression on each book cover is obtained through a survey of fiction book lovers in Indonesia. To get the results of the closeness between the user search and the impression survey data obtained through text mining, as well as the cosine similarity algorithm to calculate the most precise proximity value to the impression the user expects. The results of this system display a fiction book that has a closeness value with an error rate of 3.93% based on the impression expected by the user.

Download Full-text