R2DCLT: retrieving relevant documents using cosine similarity and LDA in text mining

Author(s):  
L.M. Patnaik ◽  
K.R. Venugopal ◽  
S.S. Iyengar ◽  
R.S. Ramya ◽  
Santosh Nimbhorkar Sejal ◽  
...  
2017 ◽  
Vol 6 (3) ◽  
pp. 119-126
Author(s):  
Lisna Zahrotun

An Internship course becomes one of many compulsory subjects in Under graduate Program of Informatics Engineering in Ahmad Dahlan University, Yogyakarta.In the last few semesters, we found that some students were failed in taking this subject. After being identified, they were facing some obstacles such as determining the main theme for their job description. During this study, we proposed an application to classify the internship titles by using a technique in text mining called Shared Nearest-Neighbor and Cosine Similarity. From the result, we got values from the parameter K is 7, the epsilon value is 0.5, and the value of Mint t is 0.3 with 22 clusters and 0 outlier. These values presented that all data titles of internship activitiesareclassified into each cluster. 7 topics whichtook by majority of students are:1) Information Systems (7 titles);2) Instructional Media (5 titles);3)Archiving Applications (4 titles);4) Web Profile Implementation (3 titles); 5)Instructional Media for University Courses (3 titles); Multimedia (3 titles) and 6)Workshop & Training (3 titles).


Author(s):  
R.S. Ramya ◽  
Ganesh Singh ◽  
Santosh Nimbhorkar Sejal ◽  
K.R. Venugopal ◽  
S.S. Iyengar ◽  
...  

2021 ◽  
Vol 12 (3) ◽  
pp. 1415-1422
Author(s):  
Rika Rosnelly Et.al

Exams are one way to measure the level of students' ability to participate in learning. One type of exam given to students is the essay type. This study focuses on making automatic assessments for essay-type exams using cosine similarity. This method has several stages such as folding Case, tokenizing, filtering, stemming, analyzing, weighing of words in documents with cosine similarity. The stemming process uses the Nazief & Adriani algorithm. The results of this study are to conclude that the choice of words that are considered as keywords in the answer key greatly affects the results of the system's assessment. This is evidenced by testing applying the cosine law of 89.5%. However, there are several types of questions that are significantly different because there are unique characters in the database and answer keys that do not contain keywords that match the correct answer.


2019 ◽  
Vol 11 (4) ◽  
pp. 1-19 ◽  
Author(s):  
Brendon Cannon ◽  
◽  
Mikiyasu Nakayama ◽  
Daisuke Sasaki ◽  
Ash Rossiter ◽  
...  

Author(s):  
Rengga Asmara ◽  
◽  
Nur Rasyid Mubtadai ◽  
Varidh Bimantara

Fiction books are one of the most popular types of books in Indonesia. There are five most popular genres in fiction books, namely fantasy, mystery, romance, sci-fi, and thriller. Each genre gives a different impression and special interest for readers. It has become a common habit when people choose a fiction book based on the title, author, or publisher of the book. However, it does not provide precise search results. In this final project, an application system was developed to find out fiction books based on semantic impressions on the cover of the fiction book. The impression on each book cover is obtained through a survey of fiction book lovers in Indonesia. To get the results of the closeness between the user search and the impression survey data obtained through text mining, as well as the cosine similarity algorithm to calculate the most precise proximity value to the impression the user expects. The results of this system display a fiction book that has a closeness value with an error rate of 3.93% based on the impression expected by the user.


2013 ◽  
Author(s):  
Ronald N. Kostoff ◽  
◽  
Henry A. Buchtel ◽  
John Andrews ◽  
Kirstin M. Pfiel

Sign in / Sign up

Export Citation Format

Share Document