System of Information Feedback on Archive Using Term Frequency-Inverse Document Frequency and Vector Space Model Methods

The archive is one of the examples of documents that important. Archives are stored systematically with a view to helping and simplifying the storage and retrieval of the archive. In the information retrieval (Information retrieval) the process of retrieving relevant documents and not retrieving documents that are not relevant. To retrieve the relevant documents, a method is needed. Using the Term Frequency-Inverse Document and Vector Space Model methods can find relevant documents according to the level of closeness or similarity, in addition to applying the Nazief-Adriani stemming algorithm can improve information retrieval performance by transforming words in a document or text to the basic word form. then the system indexes the document to simplify and speed up the search process. Relevance is determined by calculating the similarity values between existing documents by querying and represented in certain forms. The documents obtained, then the system sort by the level of relevance to the query.

Download Full-text

PENERAPAN SISTEM TEMU KEMBALI INFORMASI PADA KUMPULAN DOKUMEN SKRIPSI

Jurnal Teknik Informatika ◽

10.35793/jti.8.1.2016.12227 ◽

2016 ◽

Vol 8 (1) ◽

Cited By ~ 1

Author(s):

Karter D. Putung ◽

Arie S.M. Lumenta ◽

Agustinus Jacobus

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Retrieval System ◽

Vector Space Model ◽

Information Retrieval System ◽

Frequency Vector ◽

Inverse Document Frequency ◽

Space Model ◽

Document Frequency

Abstrak - Sistem temu kembali informasi (information retrieval system)merupakan sistem yang digunakan untuk menemukan informasi yang relevan dengan kebutuhan dari penggunanya, dengan menerapkan sistem tersebut permasalahan pencarian informasi dokumen skripsi bisa memberikan hasil yang relevan sesuai kebutuhan pengguna. Terdapat dua proses utama dalam sistem temu kembali informasi yaitu indexing dan retrieval. Proses indexing adalah proses untuk memberikan bobot pada kata dalam dokumen, metode pembobotan pada penelitian ini menggunakan metode pembobotan TF-IDF. Prosesretrieval adalah proses untuk menghitung kemiripan query terhadap dokumen, perhitungan kemiripan menggunakan konsepvector space modeldengan mencari nilai cosine similarity.Tujuan dari penelitian ini adalah untuk mengembangkan dan mengimplementasikan pengindeksan otomatis untuk membangun sistem pencarian dokumen di dalam sebuah system penyimpanan dokumen dengan konsep temu-kembali informasi. Kata kunci : Information retrieval,Term Frequncy Inverse Document Frequency, Vector Space Model.

Download Full-text

Optimising the Heuristics in Latent Semantic Indexing for Effective Information Retrieval

Journal of Information & Knowledge Management ◽

10.1142/s0219649206001359 ◽

2006 ◽

Vol 05 (02) ◽

pp. 97-105 ◽

Cited By ~ 3

Author(s):

S. Srinivas ◽

Ch. AswaniKumar

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Latent Semantic Indexing ◽

Semantic Indexing ◽

Retrieval Performance ◽

Term Weighting ◽

Space Model ◽

Rank Approximation

Latent Semantic Indexing (LSI) is a famous Information Retrieval (IR) technique that tries to overcome the problems of lexical matching using conceptual indexing. LSI is a variant of vector space model and proved to be 30% more effective. Many studies have reported that good retrieval performance is related to the use of various retrieval heuristics. In this paper, we focus on optimising two LSI retrieval heuristics: term weighting and rank approximation. The results obtained demonstrate that the LSI performance improves significantly with the combination of optimised term weighting and rank approximation.

Download Full-text

Sistem Penilaian Otomatis Jawaban Esai Dengan Menggunakan Metode Vector Space Model Pada Beberapa Perkuliahan Di Stmik Indonesia Banjarmasin

Respati ◽

10.35842/jtir.v14i1.272 ◽

2019 ◽

Vol 14 (1) ◽

Author(s):

Ferdy Febriyanto

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Learning System ◽

Assessment Process ◽

Text Similarity ◽

Term Frequency ◽

Space Model ◽

Test Execution ◽

Document Frequency ◽

E Learning

INTISARIPerkembangan sistem e-learning setiap tahunnya terus meningkat, hal ini dikarenakan sistem e-learning memberikan banyak kemudahan dalam pembelajaran. Beberapa institusi pendidikan khususnya perguruan tinggi negeri maupun swasta mulai mengembangkan sistem e-learning pada proses pengajarannya. Dalam konsep e-learning, pelaksanaan ujian dapat dilakukan, mulai dari menjawab soal ujian hingga proses penilaian selama ini kebanyakan proses ujian esai dan penilaiannya dilaksanakan secara manual yaitu dengan membaca esai satu per satu. Para dosen perlu menghabiskan banyak waktu untuk menilai jawaban ujian mahasiswa. Semakin banyak jumlah ujian yang dikoreksi, kualitas penilaian yang diberikan semakin menurun.Untuk memecahkan masalah tersebut dapat dilakukan dengan membuat suatu aplikasi yang dapat memproses kemiripan teks. Oleh karena itu dalam penelitian tesis ini, penulis menggunakan algoritma TF/IDF (Term Frequency – Inversed Document Frequency) dan VSM (Vector Space Model) yang secara prosesnya dapat mencari nilai kemiripan dari suatu teks jawaban dengan teks kunci jawaban. Nilai kemiripan teks tersebut dapat dijadikan acuan sebagai nilai koreksi jawaban ujian mahasiswa.Hasil penelitian menggunakan data dari Ujian Akhir Semester di STMIK Indonesia Banjarmasin dengan 10 mata kuliah, yaitu : Desain Grafis, Jaringan Komputer, Pengantar Teknologi Informasi, Kecakapan Antar Personal, Sistem Operasi, Pengantar Manajemen, Etika Profesi, Sistem Basis Data, Microprosessor, dan Pemrograman Web. Masing -masing mata kuliah diinputkan 30 soal dengan setiap soalnya memiliki 3 jawaban benar yang berbeda sebagai pembanding tingkat kemiripannya. Dalam prosesnya, sistem akan menghapus kata - kata yang dianggap tidak penting atau kata - kata yang terlalu umum digunakan termasuk karakter atau bentuk simbol, karena sistem hanya akan memproses soal yang memerlukan jawaban teoritis dan argumentasi bukan matematis. Untuk kasus pada penelitian tesis ini kata - kata dalam bahasa lokal Banjar juga akan dihilangkan oleh sistem untuk penyetaraan penggunaan bahasa Indonesia. Dengan kumpulan kata yang tersisa setelah proses penghilangan kata, perhitungan nilai bobot kata akan dilakukan algoritma TF/IDF dan dengan VSM akan dihitung nilai cosinus, sehingga didapatlah nilai tingkat kemiripan antara jawaban oleh mahasiswa dan jawaban oleh dosen. Tingkat kolerasi yang dihasilkan cukup baik dengan tingkat akurasi rata – rata 80% - 90% bila dibandingkan dengan penilaian yang dilakukan manusia secara manual. Kata Kunci : Penilaian Ujian Otomatis, TF/IDF, VSM, Similiaritas. ABSTRACTThe development of e-learning system every year keep on increased, this is because the e-learning system provides much convenience in learning. Some educational institutions, especially universities started to develop a system of e-learning in the teaching process. In the concept of e-learning, test execution can be carried out, started from answering the exam until this assessment process during most of the process of essay exams and assessments carried out manually, by reading essays one by one. The lecturers need to spend a lot of time to assess the student exam answers. The more of the number exam that corrected, quality assessment given decreased.To solve these problems can be done by creating an application that can process text similarity. Therefore, in this thesis, the author uses an algorithm TF / IDF (Term Frequency - Inversed Document Frequency) and VSM (Vector Space Model) in the process can seek similarity value of a answer text with the text of the answer key. The value of text similarity can be used reference as a correction value of the answers student exam.The results using data from the Final Examination in STMIK Indonesia Banjarmasin with 10 subjects, that is: Graphic Design, Computer Networking, Introduction to Information Technology, Skills Inter-Personal, Operating Systems, Introduction to Management, Profession Ethics, Database Systems, Microprosessor and web Programming. Each subjects entered 30 questions with each question have 3 completely different answers as the comparison level of similarity. In the process, the system will remove the words are considered unimportant or words are commonly used include characters or symbols, because the system only process the questions that need theoretical and arguments answers, not mathematical. For the case in this thesis, words in the local Banjar language also eliminated by the system to equalize use of Indonesian language. With a set remains of words after the removal of the word, the word weighted value calculation algorithms will do TF / IDF and VSM will be calculated the cosine valule, so obtained value of the degree of similarity between answers by students and answers by lecturers. The correlation level result is good enough with the average accuracy rates 80% - 90% if compared with human assessment manually. Keywords : Automatic Exam Assessment, TF / IDF, VSM, Similiarity.

Download Full-text

Klasifikasi Artikel Ilmiah Dengan Berbagai Skenario Preprocessing

Sains, Aplikasi, Komputasi dan Teknologi Informasi ◽

10.30872/jsakti.v2i2.2681 ◽

2020 ◽

Vol 2 (2) ◽

pp. 70

Author(s):

Hidayatul Ma'rifah ◽

Aji Prasetya Wibawa ◽

Muhammad Iqbal Akbar

Keyword(s):

Text Mining ◽

Vector Space ◽

Cross Validation ◽

Confusion Matrix ◽

Vector Space Model ◽

Nearest Neighbour ◽

Inverse Document Frequency ◽

Space Model ◽

Document Frequency ◽

Fold Cross Validation

Penelitian ini bertujuan untuk menemukan kombinasi dan urutan preprocessing dalam text mining yang paling maksimal untuk klasifikasi bidang jurnal berbahasa Indonesia berdasarkan judul dan abstraknya. Tahap-tahap preprocessing yang akan diterapkan terdiri dari case folding, stemming, stopwords removal, transformasi VSM (Vector Space Model), dan SMOTE. Namun, pengamatan tiap skenario berfokus pada stemming dan dua teknik stopwords removal, yaitu stopwords removal berbasis kamus, dan berbasis document frequency setelah melewati proses transformasi ke dalam bentuk VSM dengan pembobotan TF-IDF (Term Trequency–Inverse Document Frequency). Proses klasifikasi mengadopsi algoritma k-NN (K-Nearest Neighbour), yang menentukan kelas suatu data tes dengan melihat tetangga terdekatnya. Dalam penelitian ini, metrik untuk menemukan jarak tetangga terdekat adalah Cosine Similarity. Pengujian klasifikasi menggunakan 10-Fold Cross Validation untuk menghasilkan confusion matrix sebagai hasil akhir. Kinerja klasifikasi terbaik dicapai dengan persentase accuracy sebesar 72.91% dan precision mencapai 73,36%.

Download Full-text

PENGEMBANGAN SISTEM PENDETEKSI KEMIRIPAN KARYA PADA INAICTA 2013

Jurnal Informatika Polinema ◽

10.33795/jip.v1i4.117 ◽

2017 ◽

Vol 1 (4) ◽

pp. 14

Author(s):

Cadea Mikha Pasma ◽

Ulla Delfana Rosiani ◽

Rudy Ariyanto

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Term Frequency ◽

Space Model ◽

Document Frequency

Indonesia ICT Award (INAICTA) 2013 merupakan ajang lomba karya cipta kreativitas dan inovasi di bidang TIK (Teknologi Informasi dan Komputer) terbesar di Indonesia yang bertujuan untuk terus mendorong berkembangnya produk-produk TIK (Teknologi Informasi dan Komputer) lokal dengan peningkatan kualitas maupun inovasi produk. Semakin tahun, jumlah kontestan yang mengikuti INAICTA semakin bertambah. Hal tersebut berpengaruh terhadap tingkat kesulitan bagi para juri atau tim penilai untuk mengetahui kemiripan dari inovasi-inovasi para kontestan. Dibutuhkan suatu aplikasi yang dapat membantu dalam pendeteksian kemiripan tiap hasil karya yang diikutsertakan oleh para kontestan. Oleh karena itu dilakukan pengembangan sistem pendeteksi kemiripan karya pada INAICTA 2013 dengan membandingkan penjelasan ringkas karya para kontestan. Dalam pengembangan sistem pendeteksi ini menggunakan Algoritma Term Frequency– Inversed Document Frequency (TF-IDF) untuk proses pembobotan karya. Dengan TF-IDF sistem akan menghitung berdasarkan term pada setiap karya. Sedangkan untuk melihat tingkat kedekatan atau kesamaan (similarity) karya, sistem ini menggunakan Algortitma Vector Space Model (VSM). Dengan VSM data karya dipandang sebagai sebuah vektor yang memiliki magnitude (jarak) dan direction (arah). Sehingga sistem pendeteksi kemiripan karya pada INAICTA 2013 ini akan menghasilkan urutan tingkat kemiripan karya INAICTA 2013.

Download Full-text

A Search Of File Journal With Query Word on List of Journal Document List Using VectorSpace Model Method

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2020.v09.i01.p10 ◽

2020 ◽

Vol 9 (1) ◽

pp. 97

Author(s):

Maula Khatami

Keyword(s):

Vector Space ◽

Suffix Tree ◽

Vector Space Model ◽

Inverse Document Frequency ◽

Space Model ◽

Model Method ◽

Document Frequency ◽

New Knowledge ◽

Query Word ◽

The Right

Journals are articles about research that are very useful among academics and students alike. Every time we learn a new knowledge, we certainly need a guide that is verified and also credible. Students and academics were greatly helped by this journal. With journals help students and academics get references from previous research and get more insights so that they are able to make a related research and can even be improved from previous research. However, there are still many students and academics who find it difficult to find the right journal for their needs. So here the authors make a research system of information retrieval about journal searches by querying words using the vector space model method. In the suffix tree clustering method and the Vector Space Model, each document and keyword that has been carried out by the Text Mining process is then given the weight of each word contained in each existing document with the Term Frequency - Inverse Document Frequency (TF-IDF) weighting algorithm.

Download Full-text

Weighted inverse document frequency and vector space model for hadith search engine

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v18.i2.pp1004-1014 ◽

2020 ◽

Vol 18 (2) ◽

pp. 1004

Author(s):

Septya Egho Pratama ◽

Wahyudin Darmalaksana ◽

Dian Sa'adillah Maylawati ◽

Hamdan Sugilar ◽

Teddy Mantoro ◽

...

Keyword(s):

Vector Space ◽

Search Engine ◽

Islamic Law ◽

Vector Space Model ◽

Vector Form ◽

Inverse Document Frequency ◽

Space Model ◽

Document Frequency ◽

Reliable Source ◽

Structured Representation

Hadith is the second source of Islamic law after Qur’an which make many types and references of hadith need to be studied. However, there are not many Muslims know about it and many even have difficulties in studying hadiths. This study aims to build a hadith search engine from reliable source by utilizing Information Retrieval techniques. The structured representation of the text that used is Bag of Word (1-term) with the Weighted Inverse Document Frequency (WIDF) method to calculate the frequency of occurrence of each term before being converted in vector form with the Vector Space Model (VSM). Based on the experiment results using 380 texts of hadith, the recall value of WIDF and VSM is 96%, while precision value is just around 35.46%. This is because the structured representation for text that used is bag of words (1-gram) that can not maintain the meaning of text well).

Download Full-text

MODEL SISTEM MANAJEMEN PENGETAHUAN PROYEK DI PT. XYZ

Komputa : Jurnal Ilmiah Komputer dan Informatika ◽

10.34010/komputa.v7i1.2536 ◽

2018 ◽

Vol 7 (1) ◽

pp. 43-50

Author(s):

Sendy Gilang Farhamsyah ◽

Riani Lubis , S.T., M.T

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Term Frequency ◽

Space Model ◽

Document Frequency

Sebagai sebuah perusahaan yang bergerak dibidang jasa konsultasi profesional, PT. XYZ memberikan jasa konsultasi yang berkaitan dengan perencanaan teknis dan manajemen pada lingkup pembangunan daerah dan nasional. Saat ini perusahaan tersebut selalu menggunakan tenaga kontrak dalam pengerjaan proyek-proyek yang dikerjakannya. Hal ini menyebabkan pengetahuan proyek yang dimiliki oleh tenaga kontrak terebut tidak didokumentasikan saat pekerjaan proyek selesai. Sehingga perusahaan kehilangan pengetahuan proyek dari tenaga kontrak tersebut. Penerapan Sistem Manajemen Pengetahuan merupakan solusi yang tepat untuk mengelola pengetahuan proyek perusahaan tersebut, sehingga pengetahuan proyek yang diperoleh dari pelaksanaan proyek-proyek sebelumnya dapat digunakan dalam pelaksanaan proyek mendatang oleh tenaga kontrak yang baru diangkat dan belum memiliki pengalaman. Metode yang digunakan untuk menghasilkan tingkat similaritas kata kunci ketika pencarian dalam sistem adalah metode TF-IDF (Term Frequency-Inversed Document Frequency) dan VSM (Vector Space Model). Hasil dari penelitian ini berupa model Sistem Manajemen Pengetahuan yang diharapkan dapat membantu tenaga kerja proyek dalam mendokumentasikan pengetahuan dari setiap tenaga kerja proyek dan mengetahui solusi untuk menghadapi permasalah yang terjadi dalam pengerjaan proyek.

Download Full-text

Mapping College Research Roadmap Based on Information Retrieval by Lecturers Scientific Publications Documents

10.31227/osf.io/tmda3 ◽

2019 ◽

Author(s):

rusda wajhillah ◽

Agung Wibowo ◽

Saeful Bahri

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Tree Model ◽

Scientific Publications ◽

Space Model ◽

Document Frequency ◽

Scientific Papers

The quality of research needs to be directed and classified for improvement. A college roadmap must accordance interest and expertise from it lecturers. Therefore, be the duty of every college to create a strategic plan and pre-eminent research. Faculty in most all College has produced many scientific publications. Publication document of scientific papers is one example of unstructured documents. Its contents form of writing style, mostly defined by the author language. Generally, the document title only determined the maximum number of words. The main objective of the information retrieval system is to determine the documents keywords from the query provided by the user in a group of documents. TF/IDF Algorithm (Term Frequency – Inversed Document Frequency) and the Vector Space Model algorithm is several methods of the algorithm that can utilize on text mining in analysing phases as options document classification determination-based solutions words that often appear on the document title. This paper can help decision makers to determine, assess, adapt research roadmap to College. The depiction of a tree model using long-term roadmap makes it easier to read and understand. [Kualitas penelitian perlu diarahkan dan diklasifikasikan untuk perbaikan. Roadmap perguruan tinggi harus sesuai dengan minat dan keahlian dari dosen. Karena itu, jadilah tugas setiap perguruan tinggi untuk membuat rencana strategis dan penelitian unggulan. Fakultas - fakultas di hampir semua perguruan tinggi telah menghasilkan banyak publikasi ilmiah. Dokumen publikasi karya ilmiah adalah salah satu contoh dokumen tidak terstruktur. Isinya berupa gaya penulisan, sebagian besar ditentukan oleh bahasa penulis. Secara umum, judul dokumen hanya menentukan jumlah kata maksimum. Tujuan utama dari sistem pencarian informasi adalah untuk menentukan kata kunci dokumen dari permintaan yang diberikan oleh pengguna dalam sekelompok dokumen. Algoritma TF / IDF (TermFrequency - Inversed Document Frequency) dan algoritma Vector Space Model adalah beberapa metode algoritma yang dapat digunakan pada penambangan teks dalam menganalisis fase sebagai opsi dokumen klasifikasi penentuan kata-kata solusi berdasarkan solusi yang sering muncul pada judul dokumen. Makalah ini dapat membantu para pembuat keputusan untuk menentukan, menilai, mengadaptasi peta jalan penelitian ke perguruan tinggi. Penggambaran model pohon menggunakan peta jalan jangka panjang membuatnya lebih mudah dibaca dan dipahami.]

Download Full-text

Aplikasi Deteksi Kemiripan Tugas Paper

Matrik Jurnal Manajemen Teknik Informatika dan Rekayasa Komputer ◽

10.30812/matrik.v15i2.39 ◽

2017 ◽

Vol 15 (2) ◽

pp. 5

Author(s):

Anthony Anggrawan ◽

Azhari

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Mean Average Precision ◽

Average Precision ◽

Information Searching ◽

Space Model ◽

Model Method

Information searching based on users’ query, which is hopefully able to find the documents based on users’ need, is known as Information Retrieval. This research uses Vector Space Model method in determining the similarity percentage of each student’s assignment. This research uses PHP programming and MySQL database. The finding is represented by ranking the similarity of document with query, with mean average precision value of 0,874. It shows how accurate the application with the examination done by the experts, which is gained from the evaluation with 5 queries that is compared to 25 samples of documents. If the number of counted assignments has higher similarity, thus the process of similarity counting needs more time, it depends on the assignment’s number which is submitted.

Download Full-text