scholarly journals Perbandingan Boolean Model Dan Vector Space Model Dalam Pencarian Dokumen Teks

2020 ◽  
Vol 11 (2) ◽  
pp. 268-277
Author(s):  
Susanti Susanti ◽  
Muhammad Azmi ◽  
Edwar Ali ◽  
Rahmaddeni Rahmaddeni ◽  
Yansyah Saputra Wijaya

Perkembangan teknologi informasi di era globalisasi saat ini, membuat semua aspek kehidupan kita berubah dan tidak dapat dihindarkan dari pengaruh kemajuan zaman. Untuk mendapatkan data dan informasi yang kita inginkan bukanlah perkara mudah, mengingat sedemikian banyaknya informasi yang tersedia untuk berbagai keperluan dengan berbagai gaya penyajian. Pencarian data di komputer, baik itu secara online ataupun offline berkembang banyak metode yang semakin menyempurnakan hasil pencarian. Hal ini juga meningkatkan kepuasan pengguna dalam mencari informasi. Metode yang umum digunakan dalam melakukan pencarian adalah Boolean Model. Metode lainnya adalah Vector Space Model (VSM). VSM yaitu model yang digunakan untuk mengukur kueri antara suatu dokumen dengan suatu kata kunci. Oleh karena itu, penulis bertujuan untuk membandingkan kedua metode tersebut dari kecepatan (waktu) pencarian dan jumlah temuan. Kecepatan tersebut dihitung berdasarkan lama waktu pencarian untuk kedua metode tersebut. Hasil yang didapati adalah perbandingan waktu pencarian antara boolean model dan vector space model didapati bahwa boolean model lebih cepat dengan selisih 30 sampai 50 detik. Perbandingan untuk hasil temuan didapati bahwa vector space model mempunyai hasil temuan yang sama dengan boolean model yang menggunakan operator or, sedangkan dengan operator and dan gabungan and serta or didapati bahwa jumlah hasil temuan tidak sama dengan vector space model.   Kata kunci: Perbandingan, Boolean Model, Vector Space Model, Pencarian, Dokumen Teks   Abstract The development of information technology in the current era of globalization, makes all aspects of our lives change and cannot be avoided from the influence of the times. To get the data and information that we want is not an easy matter, considering that so much information is available for various purposes with various styles of presentation. Searching data on a computer, be it online or offline, there are many methods that improve the search results. It also increases user satisfaction in finding information. The most commonly method of searching is the Boolean Model. Another method is the Vector Space Model (VSM). VSM is a model used to measure queries between a document and a keyword. Therefore, the authors aim to compare the two methods from the speed (time) of the search and the number of findings. The speed is calculated based on the search time for both methods. The result is that the comparison between boolean model and vector space model shows that the boolean model is faster by a difference of 30 - 50 seconds. The comparison for the foundings document text shows that vector space model has the same findings as the boolean model using the or operator, whereas with the and operator and the combination of operator and or it is found that the number of findings is not the same as vector space model.   Keywords: Comparison, Boolean Model, Vector Space Model, Search, Text Documents    

2014 ◽  
Vol 14 (3) ◽  
pp. 25-36
Author(s):  
Bohdan Pavlyshenko

Abstract This paper describes the analysis of possible differentiation of the author’s idiolect in the space of semantic fields; it also analyzes the clustering of text documents in the vector space of semantic fields and in the semantic space with orthogonal basis. The analysis showed that using the vector space model on the basis of semantic fields is efficient in cluster analysis algorithms of author’s texts in English fiction. The study of the distribution of authors' texts in the cluster structure showed the presence of the areas of semantic space that represent the idiolects of individual authors. Such areas are described by the clusters where only one author dominates. The clusters, where the texts of several authors dominate, can be considered as areas of semantic similarity of author’s styles. SVD factorization of the semantic fields matrix makes it possible to reduce significantly the dimension of the semantic space in the cluster analysis of author’s texts. Using the clustering of the semantic field vector space can be efficient in a comparative analysis of author's styles and idiolects. The clusters of some authors' idiolects are semantically invariant and do not depend on any changes in the basis of the semantic space and clustering method.


Author(s):  
Azis Alvriyanto ◽  
Muhammad Taufiq Nuruzzaman ◽  
Maria Ulfah Siregar ◽  
Rahmat Hidayat

One of the main feature of digital library is a search engine which depends on keywords submitted by a user. However, in the traditional algorithm, the computation performance, searching speed, significantly relies on the number of journal articles stored in the databases. Some irrelevant search results also increase the speed of article searching process. To solve the problem, in this paper we propose vector space model (VSM) algorithm to search for relevant journal articles. The VSM algorithm considers a term frequency - inversed document frequency (TF-IDF). The VSM algorithm will be compared to the baseline algorithm namely traditional algorithm. Both algorithms will be evaluated using combination of keywords which can be a synonym, phrase, error typography, or suffix and prefix. By using the data consist of 635 journal articles, both algorithms are compared in terms of 11 evaluation criteria. The results show that VSM algorithm is able to obtain the intended journal at 5th rank on average as compared to the traditional algorithm which can obtain the intended journal at rank of 171st on average. Therefore, our proposed algorithm can improve the performance to accurately sort the journal articles based on the submitted keywords as compared to traditional algorithm.   


Author(s):  
Đorđe Petrović ◽  
Milena Stanković

Text mining to a great extent depends on the various text preprocessing techniques. The preprocessing methods and tools which are used to prepare texts for further mining can be divided into those which are and those which are not language-dependent. The subject matter of this research was the analysis of the influence of these methods and tools on further text mining. We first focused on the analysis of the influence on the reduction of the vector space model for the multidimensional represen-tation of text documents. We then analyzed the influence on calculating text similarity, which is the focus of this research. The conclusion we reached is that the implemen-tation of various text preprocessing methods in the Serbian language, which are used for the reduction of the vector space model for the multidimensional representation of text document, achieves the required results. But, the implementation of various text preprocessing methods specific to the Serbian language for the purpose of calculating text similarity can lead to great differences in the results.


Author(s):  
Anthony Anggrawan ◽  
Azhari

Information searching based on users’ query, which is hopefully able to find the documents based on users’ need, is known as Information Retrieval. This research uses Vector Space Model method in determining the similarity percentage of each student’s assignment. This research uses PHP programming and MySQL database. The finding is represented by ranking the similarity of document with query, with mean average precision value of 0,874. It shows how accurate the application with the examination done by the experts, which is gained from the evaluation with 5 queries that is compared to 25 samples of documents. If the number of counted assignments has higher similarity, thus the process of similarity counting needs more time, it depends on the assignment’s number which is submitted.


2018 ◽  
Vol 9 (2) ◽  
pp. 97-105
Author(s):  
Richard Firdaus Oeyliawan ◽  
Dennis Gunawan

Library is one of the facilities which provides information, knowledge resource, and acts as an academic helper for readers to get the information. The huge number of books which library has, usually make readers find the books with difficulty. Universitas Multimedia Nusantara uses the Senayan Library Management System (SLiMS) as the library catalogue. SLiMS has many features which help readers, but there is still no recommendation feature to help the readers finding the books which are relevant to the specific book that readers choose. The application has been developed using Vector Space Model to represent the document in vector model. The recommendation in this application is based on the similarity of the books description. Based on the testing phase using one-language sample of the relevant books, the F-Measure value gained is 55% using 0.1 as cosine similarity threshold. The books description and variety of languages affect the F-Measure value gained. Index Terms—Book Recommendation, Porter Stemmer, SLiMS Universitas Multimedia Nusantara, TF-IDF, Vector Space Model


Sign in / Sign up

Export Citation Format

Share Document