OPTIMASI MESIN PENCARI BUKU FIKSI BERDASARKAN PADA SEMANTIK IMPRESI

Fiction books are one of the most popular types of books in Indonesia. There are five most popular genres in fiction books, namely fantasy, mystery, romance, sci-fi, and thriller. Each genre gives a different impression and special interest for readers. It has become a common habit when people choose a fiction book based on the title, author, or publisher of the book. However, it does not provide precise search results. In this final project, an application system was developed to find out fiction books based on semantic impressions on the cover of the fiction book. The impression on each book cover is obtained through a survey of fiction book lovers in Indonesia. To get the results of the closeness between the user search and the impression survey data obtained through text mining, as well as the cosine similarity algorithm to calculate the most precise proximity value to the impression the user expects. The results of this system display a fiction book that has a closeness value with an error rate of 3.93% based on the impression expected by the user.

Download Full-text

Semantic Search for Scientific Articles by Language Using Cosine Similarity Algorithm and Weighted Tree Similarity

Journal of Development Research ◽

10.28926/jdr.v5i2.150 ◽

2021 ◽

Vol 5 (2) ◽

pp. 106-114

Author(s):

Muhamad Aldi Rifai ◽

Indra Gita Anugrah

Keyword(s):

Cosine Similarity ◽

Scientific Article ◽

Weighted Tree ◽

Abstract Section ◽

Language Differences ◽

Search Results ◽

Similarity Algorithm ◽

Tree Similarity ◽

N Gram ◽

The Right

The activity of writing scientific articles by academics at universities is one of the activities that is often carried out, but when writing scientific articles problems arise regarding the difficulty of finding ideas, literature studies, and reference sources that you want to use as references when writing. Sometimes when searching on a search engine, we have trouble finding the right document, because usually, the keywords we are looking for are not in the title section but another part of the structure. Since most search engines only match titles, other structures are usually excluded from matching. So that the search results that we do sometimes don't match what we want. In addition, usually, each scientific article has many language differences in its structure as found in the abstract section. To detect similarities through the structure of scientific articles, an algorithm is used, namely weighted tree similarity, and to detect language using the N-gram algorithm, then the cosine similarity algorithm can be used to check the level of similarity in keyword text with text in scientific articles.

Download Full-text

WordNet and Cosine Similarity based Classifier of Exam Questions using Bloom’s Taxonomy

International Journal of Emerging Technologies in Learning (iJET) ◽

10.3991/ijet.v11i04.5654 ◽

2016 ◽

Vol 11 (04) ◽

pp. 142 ◽

Cited By ~ 3

Author(s):

Kithsiri Jayakodi ◽

Madhushi Bandara ◽

Indika Perera ◽

Dulani Meedeniya

Keyword(s):

Language Processing ◽

Evaluation Process ◽

Bloom's Taxonomy ◽

Cosine Similarity ◽

Stop Word ◽

Bloom’S Taxonomy ◽

Similarity Algorithm ◽

Question Category ◽

Rule Set ◽

Derived Rules

Assessment usually plays an indispensable role in the education and it is the prime indicator of student learning achievement. Exam questions are the main form of assessment used in learning. Setting appropriate exam questions to achieve the desired outcome of the course is a challenging work for the examiner. Therefore this research is mainly focused to categorize the exam questions automatically into its learning levels using Bloom’s taxonomy. Natural Language Processing (NLP) techniques such as tokenization, stop word removal, lemmatization and tagging were used before generating the rule set to be used for this classification. WordNet similarity algorithms with NLTK and cosine similarity algorithm were developed to generate a unique set of rules to identify the question category and the weight for each exam question according to Bloom’s taxonomy. These derived rules make it easy to analyze the exam questions. Evaluators can redesign their exam papers based on the outcome of the evaluation process. A sample of examination questions of the Department of Computing and Information Systems, Wayamba University, Sri Lanka was used for the evaluation; weight assignment was done based on the total value generated from both WordNet algorithm and the cosine algorithm. Identified question categories were confirmed by a domain expert. The generated rule set indicated over 70% accuracy.

Download Full-text

Exploring Automated Text Classification to Improve Keyword Corpus Search Results for Bioinspired Design

Journal of Mechanical Design ◽

10.1115/1.4028167 ◽

2014 ◽

Vol 136 (11) ◽

Cited By ~ 8

Author(s):

Michael W. Glier ◽

Daniel A. McAdams ◽

Julie S. Linsey

Keyword(s):

Text Mining ◽

Text Classification ◽

Keyword Search ◽

Idea Generation ◽

Support Vector ◽

Biological Knowledge ◽

Svm Classifier ◽

Search Results ◽

Bioinspired Design ◽

Mining Algorithms

Bioinspired design is the adaptation of methods, strategies, or principles found in nature to solve engineering problems. One formalized approach to bioinspired solution seeking is the abstraction of the engineering problem into a functional need and then seeking solutions to this function using a keyword type search method on text based biological knowledge. These function keyword search approaches have shown potential for success, but as with many text based search methods, they produce a large number of results, many of little relevance to the problem in question. In this paper, we develop a method to train a computer to identify text passages more likely to suggest a solution to a human designer. The work presented examines the possibility of filtering biological keyword search results by using text mining algorithms to automatically identify which results are likely to be useful to a designer. The text mining algorithms are trained on a pair of surveys administered to human subjects to empirically identify a large number of sentences that are, or are not, helpful for idea generation. We develop and evaluate three text classification algorithms, namely, a Naïve Bayes (NB) classifier, a k nearest neighbors (kNN) classifier, and a support vector machine (SVM) classifier. Of these methods, the NB classifier generally had the best performance. Based on the analysis of 60 word stems, a NB classifier's precision is 0.87, recall is 0.52, and F score is 0.65. We find that word stem features that describe a physical action or process are correlated with helpful sentences. Similarly, we find biological jargon feature words are correlated with unhelpful sentences.

Download Full-text

Analysis of Adi Soemarmo Solo Airport Parking Payment System

International Journal of Computer and Information System (IJCIS) ◽

10.29040/ijcis.v2i1.21 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-3

Author(s):

Dwiyana Dwiyana ◽

Muqorobin Muqorobin

Keyword(s):

Payment System ◽

Payment Systems ◽

Electronic Money ◽

Application System ◽

Money Balance ◽

Final Project ◽

Cash Payment

The writing of the Semester Final Project with the title Analysis of the Parking Payment System for Adi Soemarmo Airport Solo was compiled based on the results of observations at the exit gate of Adi Soemarmo Airport Solo. Transactions on parking payments often cause problems due to several factors, especially the time or process is quite long because sometimes the money given is too large then the cashier takes too long to give change, besides that sometimes passengers do not prepare the money they want to pay in advance and often passengers pay in a situation of insufficient money and this causes queues or jams at the gate exit. The research objective given by the author later is to provide the best solution for airport parking payment systems. In addition to making it easier for passengers, this will greatly facilitate cashiers when carrying out work operations. This payment application system is called u-nik or electronic money. Where u-nik functions to transfer the money balance data contained in u-nik to a computer using a system called AINO. So that payments occur without spending additional cash. With the existence of non-cash payment transactions using the AINO system, it is hoped that it can facilitate and provide speed in making parking payment transactions without the need to carry cash.

Download Full-text

Analysis of Survey Data for Establishing the “Best Medical Survey Instrument” Using Text Mining

Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications ◽

10.1016/b978-0-12-386979-1.00012-8 ◽

2012 ◽

pp. 233-249

Author(s):

Jeremy LaMotte ◽

Ruth Moore ◽

Sanjay Thomas ◽

Chris Jenkins ◽

Linda A. Miner

Keyword(s):

Text Mining ◽

Survey Data ◽

Survey Instrument

Download Full-text

Navigation through Citation Network Based on Content Similarity Using Cosine Similarity Algorithm

International Journal of Database Theory and Application ◽

10.14257/ijdta.2016.9.5.02 ◽

2016 ◽

Vol 9 (5) ◽

pp. 9-20 ◽

Cited By ~ 3

Author(s):

Abdul Ahad ◽

Muhammad Fayaz ◽

Abdul Salam Shah

Keyword(s):

Citation Network ◽

Cosine Similarity ◽

Similarity Algorithm ◽

Content Similarity

Download Full-text

R2DCLT: retrieving relevant documents using cosine similarity and LDA in text mining

International Journal of Information and Communication Technology ◽

10.1504/ijict.2020.10030957 ◽

2020 ◽

Vol 1 (1) ◽

pp. 1

Author(s):

L.M. Patnaik ◽

K.R. Venugopal ◽

S.S. Iyengar ◽

R.S. Ramya ◽

Santosh Nimbhorkar Sejal ◽

...

Keyword(s):

Text Mining ◽

Cosine Similarity

Download Full-text

Enhancing Wikipedia search results using Text Mining

2016 Sixteenth International Conference on Advances in ICT for Emerging Regions (ICTer) ◽

10.1109/icter.2016.7829915 ◽

2016 ◽

Cited By ~ 3

Author(s):

K.D.C.G. Kapugama ◽

S.A.S. Lorensuhewa ◽

M.A.L. Kalyani

Keyword(s):

Text Mining ◽

Search Results

Download Full-text

Aplikasi Pendeteksi Plagiasi pada Universitas Indo Global Mandiri Berbasis Web

Jurnal Ilmiah Informatika Global ◽

10.36982/jig.v10i2.864 ◽

2019 ◽

Vol 10 (2) ◽

Author(s):

Dhamayanti Dhamayanti ◽

Lidia Permata Sari

Keyword(s):

Information Systems ◽

Cosine Similarity ◽

Plagiarism Detection ◽

Original Source ◽

Final Project ◽

Processing And Storage ◽

And Storage ◽

Popular Texts ◽

Similarity Method ◽

Student Thesis

ABSTRACTThesis is a final project that must be taken by students to complete their studies at the Indo Global Mandiri University in Palembang. Thesis data processing and storage, especially in the Information Systems department is still done conventionally, so checking the similarity of the title even the contents of the student thesis is difficult to detect. Difficulties in detecting the title and content of the thesis cause students to easily and freely plagiarize the proposal preparation and thesis report from beginning to end without being known by the lecturer and the Information System department. Plagiarism is the act of a shortcut that steals ideas, takes the work, and recognizes the work of others as their own without including references from the original source. This research will discuss the problem of plagiarism in the Information Systems department through making applications that can detect the plagiarism of titles and contents of the thesis especially in the information systems department so as to overcome the plagiarism problems faced by the information systems department. This plagiarism detection application is built using the cosine similarity method. Cosine similarity is a method for calculating similarity (level of similarity) between two object. In testing the similarity of documents with the results of the study, cosine similarity has a higher degree of accuracy. Cosine similarity is used to calculate the similarity value by equating said words and become one of the techniques to measure the similarity of popular texts. Plagiarism detection application using cosine similarity method which is implemented with PHP and MySQL as the database can help efforts to reduce the occurrence of plagiarism in the title and contents of the thesis in the Information Systems department.<pre> </pre> <pre>Keywords : Plagiarism, Plagiarism Detection Application, Cosine Similarity, PHP</pre>ABSTRAKSkripsi merupakan tugas akhir yang wajib ditempuh mahasiswa untuk menyelesaikan studi di Universitas Indo Global Mandiri Palembang. Pengolahan dan penyimpanan data skripsi khususnya pada program studi Sistem Informasi masih dilakukan secara konvensional, sehingga pengecekan kemiripan judul bahkan isi skripsi mahasiswa sulit untuk dideteksi. Kesulitan pendeteksian judul dan isi skripsi menyebabkan mahasiswa dengan mudah dan bebas melakukan plagiasi pada pembuatan proposal maupun laporan skripsi dari awal hingga akhir tanpa diketahui oleh dosen dan pihak program studi. Plagiasi merupakan tindakan sebuah jalan pintas yang mencuri ide, mengambil hasil karya, dan mengakui hasil karya orang lain sebagai miliknya sendiri tanpa mencantumkan referensi dari sumber aslinya. Penelitian ini akan membahas permasalahan plagiasi pada program studi Sistem Informasi melalui pembuatan aplikasi yang dapat mendeteksi plagiasi judul dan isi skripsi khusunya pada program studi sistem informasi sehingga dapat mengatasi permasalahan plagiasi yang dihadapi oleh program studi sistem informasi. Aplikasi pendeteksi plagiasi ini dibagun dengan menggunakan metode cosine similarity. Cosine similarity adalah metode untuk menghitung similarity (tingkat kesamaan) antar dua buah objek. Pada pengujian kesamaan dokumen dengan hasil penelitian menunjukkan cosine similarity memiliki tingkat akurasi yang lebih tinggi. Cosine similarity digunakan untuk menghitung nilai kemiripan dengan menyamakan kata perkata dan menjadi salah satu teknik untuk mengukur kemiripan teks yang popular. Aplikasi pendeteksi plagiasi dengan menggunakan metode cosine similarity yang diimplemntasikan dengan PHP dan MySQL sebagai databasenya dapat membantu upaya mengurangi terjadinya plagisi pada judul dan isi skripsi di program studi Sistem Informasi.Kata kunci : Plagiasi, Aplikasi Pendeteksi Plagiasi, Cosine Similarity, PHP

Download Full-text

Text Mining for Internship Titles Clustering Using Shared Nearest Neighbor

Computer Engineering and Applications Journal ◽

10.18495/comengapp.v6i3.214 ◽

2017 ◽

Vol 6 (3) ◽

pp. 119-126

Author(s):

Lisna Zahrotun

Keyword(s):

Information Systems ◽

Text Mining ◽

Graduate Program ◽

Nearest Neighbor ◽

Cosine Similarity ◽

Main Theme ◽

Instructional Media ◽

Job Description ◽

University Courses ◽

Shared Nearest Neighbor

An Internship course becomes one of many compulsory subjects in Under graduate Program of Informatics Engineering in Ahmad Dahlan University, Yogyakarta.In the last few semesters, we found that some students were failed in taking this subject. After being identified, they were facing some obstacles such as determining the main theme for their job description. During this study, we proposed an application to classify the internship titles by using a technique in text mining called Shared Nearest-Neighbor and Cosine Similarity. From the result, we got values from the parameter K is 7, the epsilon value is 0.5, and the value of Mint t is 0.3 with 22 clusters and 0 outlier. These values presented that all data titles of internship activitiesareclassified into each cluster. 7 topics whichtook by majority of students are:1) Information Systems (7 titles);2) Instructional Media (5 titles);3)Archiving Applications (4 titles);4) Web Profile Implementation (3 titles); 5)Instructional Media for University Courses (3 titles); Multimedia (3 titles) and 6)Workshop & Training (3 titles).

Download Full-text