A Search Of File Journal With Query Word on List of Journal Document List Using VectorSpace Model Method

2020 ◽  
Vol 9 (1) ◽  
pp. 97
Author(s):  
Maula Khatami

Journals are articles about research that are very useful among academics and students alike. Every time we learn a new knowledge, we certainly need a guide that is verified and also credible. Students and academics were greatly helped by this journal. With journals help students and academics get references from previous research and get more insights so that they are able to make a related research and can even be improved from previous research. However, there are still many students and academics who find it difficult to find the right journal for their needs. So here the authors make a research system of information retrieval about journal searches by querying words using the vector space model method. In the suffix tree clustering method and the Vector Space Model, each document and keyword that has been carried out by the Text Mining process is then given the weight of each word contained in each existing document with the Term Frequency - Inverse Document Frequency (TF-IDF) weighting algorithm. 

2020 ◽  
Vol 2 (2) ◽  
pp. 70
Author(s):  
Hidayatul Ma'rifah ◽  
Aji Prasetya Wibawa ◽  
Muhammad Iqbal Akbar

Penelitian ini bertujuan untuk menemukan kombinasi dan urutan preprocessing dalam text mining yang paling maksimal untuk klasifikasi bidang jurnal berbahasa Indonesia berdasarkan judul dan abstraknya. Tahap-tahap preprocessing yang akan diterapkan terdiri dari case folding, stemming, stopwords removal, transformasi VSM (Vector Space Model), dan SMOTE. Namun, pengamatan tiap skenario berfokus pada stemming dan dua teknik stopwords removal, yaitu stopwords removal berbasis kamus, dan berbasis document frequency setelah melewati proses transformasi ke dalam bentuk VSM dengan pembobotan TF-IDF (Term Trequency–Inverse Document Frequency). Proses klasifikasi mengadopsi algoritma k-NN (K-Nearest Neighbour), yang menentukan kelas suatu data tes dengan melihat tetangga terdekatnya. Dalam penelitian ini, metrik untuk menemukan jarak tetangga terdekat adalah Cosine Similarity. Pengujian klasifikasi menggunakan 10-Fold Cross Validation untuk menghasilkan confusion matrix sebagai hasil akhir. Kinerja klasifikasi terbaik dicapai dengan persentase accuracy sebesar 72.91% dan precision mencapai 73,36%.


2016 ◽  
Vol 8 (1) ◽  
Author(s):  
Karter D. Putung ◽  
Arie S.M. Lumenta ◽  
Agustinus Jacobus

Abstrak - Sistem temu kembali informasi (information retrieval system)merupakan sistem yang digunakan untuk menemukan informasi yang relevan dengan kebutuhan dari penggunanya, dengan menerapkan sistem tersebut permasalahan pencarian informasi dokumen skripsi bisa memberikan hasil yang relevan sesuai kebutuhan pengguna. Terdapat dua proses utama dalam sistem temu kembali informasi yaitu indexing dan retrieval. Proses indexing adalah proses untuk memberikan bobot pada kata dalam dokumen, metode pembobotan pada penelitian ini menggunakan metode pembobotan TF-IDF. Prosesretrieval adalah proses untuk menghitung kemiripan query terhadap dokumen, perhitungan kemiripan menggunakan konsepvector space modeldengan mencari nilai cosine similarity.Tujuan dari penelitian ini adalah untuk mengembangkan dan mengimplementasikan pengindeksan otomatis untuk membangun sistem pencarian dokumen di dalam sebuah system penyimpanan dokumen dengan konsep temu-kembali informasi. Kata kunci : Information retrieval,Term Frequncy Inverse Document Frequency, Vector Space Model.


Author(s):  
Septya Egho Pratama ◽  
Wahyudin Darmalaksana ◽  
Dian Sa'adillah Maylawati ◽  
Hamdan Sugilar ◽  
Teddy Mantoro ◽  
...  

Hadith is the second source of Islamic law after Qur’an which make many types and references of hadith need to be studied. However, there are not many Muslims know about it and many even have difficulties in studying hadiths. This study aims to build a hadith search engine from reliable source by utilizing Information Retrieval techniques. The structured representation of the text that used is Bag of Word (1-term) with the Weighted Inverse Document Frequency (WIDF) method to calculate the frequency of occurrence of each term before being converted in vector form with the Vector Space Model (VSM). Based on the experiment results using 380 texts of hadith, the recall value of WIDF and VSM is 96%, while precision value is just around 35.46%. This is because the structured representation for text that used is bag of words (1-gram) that can not maintain the meaning of text well).


Author(s):  
Didit Suhartono ◽  
Khodirun Khodirun

The archive is one of the examples of documents that important. Archives are stored systematically with a view to helping and simplifying the storage and retrieval of the archive. In the information retrieval (Information retrieval) the process of retrieving relevant documents and not retrieving documents that are not relevant. To retrieve the relevant documents, a method is needed. Using the Term Frequency-Inverse Document and Vector Space Model methods can find relevant documents according to the level of closeness or similarity, in addition to applying the Nazief-Adriani stemming algorithm can improve information retrieval performance by transforming words in a document or text to the basic word form. then the system indexes the document to simplify and speed up the search process. Relevance is determined by calculating the similarity values between existing documents by querying and represented in certain forms. The documents obtained, then the system sort by the level of relevance to the query.


Author(s):  
Anthony Anggrawan ◽  
Azhari

Information searching based on users’ query, which is hopefully able to find the documents based on users’ need, is known as Information Retrieval. This research uses Vector Space Model method in determining the similarity percentage of each student’s assignment. This research uses PHP programming and MySQL database. The finding is represented by ranking the similarity of document with query, with mean average precision value of 0,874. It shows how accurate the application with the examination done by the experts, which is gained from the evaluation with 5 queries that is compared to 25 samples of documents. If the number of counted assignments has higher similarity, thus the process of similarity counting needs more time, it depends on the assignment’s number which is submitted.


2018 ◽  
Vol 5 (2) ◽  
pp. 239
Author(s):  
I Kadek Yuda Setiadi ◽  
Made Sudarma ◽  
Duman Care Khrisne

This study aims to assist in the search for lontar images with Information Retrieval System built using the Vector Space Model method. The search system testing resulted in a lontar search system that received recall values: 75.4% and precision: 100% based on the graph of the receiver operating characteristic (ROC) analysis. Testing with System Usability Scale (SUS) tested at the Bali Provincial Culture Office got the highest score on statement point 1, 5 and 7 which reached 42.


Author(s):  
Azis Alvriyanto ◽  
Muhammad Taufiq Nuruzzaman ◽  
Maria Ulfah Siregar ◽  
Rahmat Hidayat

One of the main feature of digital library is a search engine which depends on keywords submitted by a user. However, in the traditional algorithm, the computation performance, searching speed, significantly relies on the number of journal articles stored in the databases. Some irrelevant search results also increase the speed of article searching process. To solve the problem, in this paper we propose vector space model (VSM) algorithm to search for relevant journal articles. The VSM algorithm considers a term frequency - inversed document frequency (TF-IDF). The VSM algorithm will be compared to the baseline algorithm namely traditional algorithm. Both algorithms will be evaluated using combination of keywords which can be a synonym, phrase, error typography, or suffix and prefix. By using the data consist of 635 journal articles, both algorithms are compared in terms of 11 evaluation criteria. The results show that VSM algorithm is able to obtain the intended journal at 5th rank on average as compared to the traditional algorithm which can obtain the intended journal at rank of 171st on average. Therefore, our proposed algorithm can improve the performance to accurately sort the journal articles based on the submitted keywords as compared to traditional algorithm.   


2017 ◽  
Vol 8 (2) ◽  
pp. 92-101
Author(s):  
Putra Angga ◽  
Lastri Widya Astuti ◽  
Mustafa Ramadhan

Searching for a lot of materials are materials which is needed quickly and accurately. are by ranking them. Ranking is one branch of science of information retrieval. Information document search Vector Space Model (VSM). VSM uses the concept which is included in linear algebra is a vector space. Based on the concept that is used, the development of blended learning application uses space vector modeling method as an alternative for students in searching of relavan material toward materials needed, reducing the error level in the return of information and students can achieve goals quickly. Column vector representation is used in the conversion of document input, processing and output. Another concept that is used to determine the proximity between two vectors, are by calculating the angle formed between the two vectors and then it is sequenced from the data which has a large angle of the smallest to the largest of which indicates the sequence data of the ranking from the most relevant to irrelevant. In this study is described about the will produce quality to each document to determine how relevant the document to the query. Quality method which is used in the implementation can be a combination of TF (Term Frequency), IDF (Inverse Document Frequency), and the corresponding normalized input from the user. Index Terms— Content Search, Blended learning, the Vector Space Model.


Respati ◽  
2019 ◽  
Vol 14 (1) ◽  
Author(s):  
Ferdy Febriyanto

INTISARIPerkembangan sistem e-learning setiap tahunnya terus meningkat, hal ini dikarenakan sistem e-learning memberikan banyak kemudahan dalam pembelajaran. Beberapa institusi pendidikan khususnya perguruan tinggi negeri maupun swasta mulai mengembangkan sistem e-learning pada proses pengajarannya. Dalam konsep e-learning, pelaksanaan ujian dapat dilakukan, mulai dari menjawab soal ujian hingga proses penilaian selama ini kebanyakan proses ujian esai dan penilaiannya dilaksanakan secara manual yaitu dengan membaca esai satu per satu. Para dosen perlu menghabiskan banyak waktu untuk menilai jawaban ujian mahasiswa. Semakin banyak jumlah ujian yang dikoreksi, kualitas penilaian yang diberikan semakin menurun.Untuk memecahkan masalah tersebut dapat dilakukan dengan membuat suatu aplikasi yang dapat memproses kemiripan teks. Oleh karena itu dalam penelitian tesis ini, penulis menggunakan algoritma TF/IDF (Term Frequency – Inversed Document Frequency) dan VSM (Vector Space Model) yang secara prosesnya dapat mencari nilai kemiripan dari suatu teks jawaban dengan teks kunci jawaban. Nilai kemiripan teks tersebut dapat dijadikan acuan sebagai nilai koreksi jawaban ujian mahasiswa.Hasil penelitian menggunakan data dari Ujian Akhir Semester di STMIK Indonesia Banjarmasin dengan 10 mata kuliah, yaitu : Desain Grafis, Jaringan Komputer, Pengantar Teknologi Informasi, Kecakapan Antar Personal, Sistem Operasi, Pengantar Manajemen, Etika Profesi, Sistem Basis Data, Microprosessor, dan Pemrograman Web. Masing -masing mata kuliah diinputkan 30 soal dengan setiap soalnya memiliki 3 jawaban benar yang berbeda sebagai pembanding tingkat kemiripannya. Dalam prosesnya, sistem akan menghapus kata - kata yang dianggap tidak penting atau kata - kata yang terlalu umum digunakan termasuk karakter atau bentuk simbol, karena sistem hanya akan memproses soal yang memerlukan jawaban teoritis dan argumentasi bukan matematis. Untuk kasus pada penelitian tesis ini kata - kata dalam bahasa lokal Banjar juga akan dihilangkan oleh sistem untuk penyetaraan penggunaan bahasa Indonesia. Dengan kumpulan kata yang tersisa setelah proses penghilangan kata, perhitungan nilai bobot kata akan dilakukan algoritma TF/IDF dan dengan VSM akan dihitung nilai cosinus, sehingga didapatlah nilai tingkat kemiripan antara jawaban oleh mahasiswa dan jawaban oleh dosen. Tingkat kolerasi yang dihasilkan cukup baik dengan tingkat akurasi rata – rata 80% - 90% bila dibandingkan dengan penilaian yang dilakukan manusia secara manual. Kata Kunci : Penilaian Ujian Otomatis, TF/IDF, VSM, Similiaritas. ABSTRACTThe development of e-learning system every year keep on increased, this is because the e-learning system provides much convenience in learning. Some educational institutions, especially universities started to develop a system of e-learning in the teaching process. In the concept of e-learning, test execution can be carried out, started from answering the exam until this assessment process during most of the process of essay exams and assessments carried out manually, by reading essays one by one. The lecturers need to spend a lot of time to assess the student exam answers. The more of the number exam that corrected, quality assessment given decreased.To solve these problems can be done by creating an application that can process text similarity. Therefore, in this thesis, the author uses an algorithm TF / IDF (Term Frequency - Inversed Document Frequency) and VSM (Vector Space Model) in the process can seek similarity value of a answer text with the text of the answer key. The value of text similarity can be used reference as a correction value of the answers student exam.The results using data from the Final Examination in STMIK Indonesia Banjarmasin with 10 subjects, that is: Graphic Design, Computer Networking, Introduction to Information Technology, Skills Inter-Personal, Operating Systems, Introduction to Management, Profession Ethics, Database Systems, Microprosessor and web Programming. Each subjects entered 30 questions with each question have 3 completely different answers as the comparison level of similarity. In the process, the system will remove the words are considered unimportant or words are commonly used include characters or symbols, because the system only process the questions that need theoretical and arguments answers, not mathematical. For the case in this thesis,  words in the local Banjar language also eliminated by the system to equalize use of Indonesian language. With a set remains of words after the removal of the word, the word weighted value calculation algorithms will do TF / IDF and VSM will be calculated the cosine valule, so obtained value of the degree of similarity between answers by students and answers by lecturers. The correlation level result is good enough with the average accuracy rates 80% - 90% if compared with human assessment manually. Keywords : Automatic Exam Assessment, TF / IDF, VSM, Similiarity.


2018 ◽  
Vol 7 (3.20) ◽  
pp. 385
Author(s):  
Tjut Awaliyah Zuraiyah ◽  
Fajar D Elli Wihartiko ◽  
Edwin Effendi

Job vacancy aggregator is a system that facilitates users in finding the desired job vacancy, especially in the field of information technology. Job vacancy data collected from various job sites such as http://id.jobsdb.com, http://www.jobs.id, http://www.monster.co.id and http://www.jobstreet.co.id using web scraping techniques to extract job vacancy data that is stored in the HTML structure. The collected data is then processed to facilitate the retrieval concept by vector space model method, by using vector space model data which is found to be sorted based on the similarity level between the query which is typed by the user with the job vacancy data is stored in the database. In addition system can also perform email jobs sent via email to registered users. With the development of an online job vacancy aggregator, it can be used as a media job vacancy information, especially in the field of information technology (IT).  


Sign in / Sign up

Export Citation Format

Share Document