Klasifikasi Newsgroup Menggunakan Vector Space Model dan Novel K Nearest Neighbors

Mira Suryani; Ayi Muhammad Iqbal Nasuha; Intan Nurma Yulita; Erick Paulus

doi:10.24198/jin.v1i1.10994

Klasifikasi Newsgroup Menggunakan Vector Space Model dan Novel K Nearest Neighbors

Jurnal Informatika ◽

10.24198/jin.v1i1.10994 ◽

2017 ◽

Vol 1 (1) ◽

pp. 46

Author(s):

Mira Suryani ◽

Ayi Muhammad Iqbal Nasuha ◽

Intan Nurma Yulita ◽

Erick Paulus

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Term Frequency ◽

Space Model

Salah satu penelitian dalam bidang perolehan informasi yang hingga saat ini masih menjadi kajian adalah kategorisasi teks. Klasifikasi teks dapat membantu manusia untuk menemukan sekumpulan informasi yang relevan sesuai dengan kebutuhan secara cepat. Studi ini mengemukakan tentang proses mengkategorisasikan newsgroup. Data newsgroup dipilih sebagai dataset penelitian dikarenakan newsgroup sendiri merupakan aplikasi yang telah lama dan banyak digunakan oleh orang untuk berdiskusi di dunia maya, sehingga data newsgroup berada dalam jumlah besar dan perlu pengelolaan. Vector space model sebagai representasi fitur dari sebuah dokumen yang dihasilkan setelah melalui proses indexing dan pembobotan menggunakan term frequency. Representasi fitur kemudian diklasifikasikan ke dalam 3 kategori sesuai dengan kelas kategorinya. Dari hasil penelitian diperoleh nilai rata-rata precision sebesar 71% dengan jumlah data yang diklasifikasikan secara benar sebanyak 89 data. Hasil ini diperoleh dari penentuan jumlah k paling optimal yang berada pada nilai 30.

Download Full-text

KLASIFIKASI MULTILABEL PADA ABSTRAK TUGAS AKHIR MENGGUNAKAN VECTOR SPACE MODEL DAN K-NEAREST NEIGHBORS

SINTECH (Science and Information Technology) Journal ◽

10.31598/sintechjournal.v2i2.292 ◽

2019 ◽

Vol 2 (2) ◽

pp. 91-97

Author(s):

I Putu Yoga Indrawan ◽

I Gede Indrawan ◽

I Made Candiasa

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Nearest Neighbors ◽

Great Effort ◽

Exact Match ◽

K Nearest Neighbors ◽

Multilabel Classification ◽

Space Model ◽

Final Project

The final project is one of the requirements of graduation students. Students who want to do the final project need to see the final project result on the same topic that has been done before. With a large number of end-task documents, it certainly takes a great effort to find the final project document on the same topic. The final grouping can be automated using the document classification method. The methods that can be used to classify documents are K-Nearest Neighbors as classifier and Vector Space Model to measure the distance between documents From the initial observation, the multilabel classification in the final abstract using Vector Sapce Model and K-Nearest Neighbors has not been evaluated. Because some previous studies have led to the testing of single labels and only lead to one method, as the method is tested. Classification of abstract document final task consists of 2 stages of making distance table using vector space model and multilabel classification using KNN. This method has not been able to predict the label accurately because the exact exact ratio of its optimum value is only 0.57 when m = 4 and k = 8. This method is good enough in predicting the label even though not precisely. Can be seen from the accuracy value of its optimum which is 0.74 when m = 4 and k = 9. The exact match ratio and accuracy value of this method has the optimum value at m = k / 3.

Download Full-text

Sistem Penilaian Otomatis Jawaban Esai Dengan Menggunakan Metode Vector Space Model Pada Beberapa Perkuliahan Di Stmik Indonesia Banjarmasin

Respati ◽

10.35842/jtir.v14i1.272 ◽

2019 ◽

Vol 14 (1) ◽

Author(s):

Ferdy Febriyanto

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Learning System ◽

Assessment Process ◽

Text Similarity ◽

Term Frequency ◽

Space Model ◽

Test Execution ◽

Document Frequency ◽

E Learning

INTISARIPerkembangan sistem e-learning setiap tahunnya terus meningkat, hal ini dikarenakan sistem e-learning memberikan banyak kemudahan dalam pembelajaran. Beberapa institusi pendidikan khususnya perguruan tinggi negeri maupun swasta mulai mengembangkan sistem e-learning pada proses pengajarannya. Dalam konsep e-learning, pelaksanaan ujian dapat dilakukan, mulai dari menjawab soal ujian hingga proses penilaian selama ini kebanyakan proses ujian esai dan penilaiannya dilaksanakan secara manual yaitu dengan membaca esai satu per satu. Para dosen perlu menghabiskan banyak waktu untuk menilai jawaban ujian mahasiswa. Semakin banyak jumlah ujian yang dikoreksi, kualitas penilaian yang diberikan semakin menurun.Untuk memecahkan masalah tersebut dapat dilakukan dengan membuat suatu aplikasi yang dapat memproses kemiripan teks. Oleh karena itu dalam penelitian tesis ini, penulis menggunakan algoritma TF/IDF (Term Frequency – Inversed Document Frequency) dan VSM (Vector Space Model) yang secara prosesnya dapat mencari nilai kemiripan dari suatu teks jawaban dengan teks kunci jawaban. Nilai kemiripan teks tersebut dapat dijadikan acuan sebagai nilai koreksi jawaban ujian mahasiswa.Hasil penelitian menggunakan data dari Ujian Akhir Semester di STMIK Indonesia Banjarmasin dengan 10 mata kuliah, yaitu : Desain Grafis, Jaringan Komputer, Pengantar Teknologi Informasi, Kecakapan Antar Personal, Sistem Operasi, Pengantar Manajemen, Etika Profesi, Sistem Basis Data, Microprosessor, dan Pemrograman Web. Masing -masing mata kuliah diinputkan 30 soal dengan setiap soalnya memiliki 3 jawaban benar yang berbeda sebagai pembanding tingkat kemiripannya. Dalam prosesnya, sistem akan menghapus kata - kata yang dianggap tidak penting atau kata - kata yang terlalu umum digunakan termasuk karakter atau bentuk simbol, karena sistem hanya akan memproses soal yang memerlukan jawaban teoritis dan argumentasi bukan matematis. Untuk kasus pada penelitian tesis ini kata - kata dalam bahasa lokal Banjar juga akan dihilangkan oleh sistem untuk penyetaraan penggunaan bahasa Indonesia. Dengan kumpulan kata yang tersisa setelah proses penghilangan kata, perhitungan nilai bobot kata akan dilakukan algoritma TF/IDF dan dengan VSM akan dihitung nilai cosinus, sehingga didapatlah nilai tingkat kemiripan antara jawaban oleh mahasiswa dan jawaban oleh dosen. Tingkat kolerasi yang dihasilkan cukup baik dengan tingkat akurasi rata – rata 80% - 90% bila dibandingkan dengan penilaian yang dilakukan manusia secara manual. Kata Kunci : Penilaian Ujian Otomatis, TF/IDF, VSM, Similiaritas. ABSTRACTThe development of e-learning system every year keep on increased, this is because the e-learning system provides much convenience in learning. Some educational institutions, especially universities started to develop a system of e-learning in the teaching process. In the concept of e-learning, test execution can be carried out, started from answering the exam until this assessment process during most of the process of essay exams and assessments carried out manually, by reading essays one by one. The lecturers need to spend a lot of time to assess the student exam answers. The more of the number exam that corrected, quality assessment given decreased.To solve these problems can be done by creating an application that can process text similarity. Therefore, in this thesis, the author uses an algorithm TF / IDF (Term Frequency - Inversed Document Frequency) and VSM (Vector Space Model) in the process can seek similarity value of a answer text with the text of the answer key. The value of text similarity can be used reference as a correction value of the answers student exam.The results using data from the Final Examination in STMIK Indonesia Banjarmasin with 10 subjects, that is: Graphic Design, Computer Networking, Introduction to Information Technology, Skills Inter-Personal, Operating Systems, Introduction to Management, Profession Ethics, Database Systems, Microprosessor and web Programming. Each subjects entered 30 questions with each question have 3 completely different answers as the comparison level of similarity. In the process, the system will remove the words are considered unimportant or words are commonly used include characters or symbols, because the system only process the questions that need theoretical and arguments answers, not mathematical. For the case in this thesis, words in the local Banjar language also eliminated by the system to equalize use of Indonesian language. With a set remains of words after the removal of the word, the word weighted value calculation algorithms will do TF / IDF and VSM will be calculated the cosine valule, so obtained value of the degree of similarity between answers by students and answers by lecturers. The correlation level result is good enough with the average accuracy rates 80% - 90% if compared with human assessment manually. Keywords : Automatic Exam Assessment, TF / IDF, VSM, Similiarity.

Download Full-text

PENGEMBANGAN SISTEM PENDETEKSI KEMIRIPAN KARYA PADA INAICTA 2013

Jurnal Informatika Polinema ◽

10.33795/jip.v1i4.117 ◽

2017 ◽

Vol 1 (4) ◽

pp. 14

Author(s):

Cadea Mikha Pasma ◽

Ulla Delfana Rosiani ◽

Rudy Ariyanto

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Term Frequency ◽

Space Model ◽

Document Frequency

Indonesia ICT Award (INAICTA) 2013 merupakan ajang lomba karya cipta kreativitas dan inovasi di bidang TIK (Teknologi Informasi dan Komputer) terbesar di Indonesia yang bertujuan untuk terus mendorong berkembangnya produk-produk TIK (Teknologi Informasi dan Komputer) lokal dengan peningkatan kualitas maupun inovasi produk. Semakin tahun, jumlah kontestan yang mengikuti INAICTA semakin bertambah. Hal tersebut berpengaruh terhadap tingkat kesulitan bagi para juri atau tim penilai untuk mengetahui kemiripan dari inovasi-inovasi para kontestan. Dibutuhkan suatu aplikasi yang dapat membantu dalam pendeteksian kemiripan tiap hasil karya yang diikutsertakan oleh para kontestan. Oleh karena itu dilakukan pengembangan sistem pendeteksi kemiripan karya pada INAICTA 2013 dengan membandingkan penjelasan ringkas karya para kontestan. Dalam pengembangan sistem pendeteksi ini menggunakan Algoritma Term Frequency– Inversed Document Frequency (TF-IDF) untuk proses pembobotan karya. Dengan TF-IDF sistem akan menghitung berdasarkan term pada setiap karya. Sedangkan untuk melihat tingkat kedekatan atau kesamaan (similarity) karya, sistem ini menggunakan Algortitma Vector Space Model (VSM). Dengan VSM data karya dipandang sebagai sebuah vektor yang memiliki magnitude (jarak) dan direction (arah). Sehingga sistem pendeteksi kemiripan karya pada INAICTA 2013 ini akan menghasilkan urutan tingkat kemiripan karya INAICTA 2013.

Download Full-text

MODEL SISTEM MANAJEMEN PENGETAHUAN PROYEK DI PT. XYZ

Komputa : Jurnal Ilmiah Komputer dan Informatika ◽

10.34010/komputa.v7i1.2536 ◽

2018 ◽

Vol 7 (1) ◽

pp. 43-50

Author(s):

Sendy Gilang Farhamsyah ◽

Riani Lubis , S.T., M.T

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Term Frequency ◽

Space Model ◽

Document Frequency

Sebagai sebuah perusahaan yang bergerak dibidang jasa konsultasi profesional, PT. XYZ memberikan jasa konsultasi yang berkaitan dengan perencanaan teknis dan manajemen pada lingkup pembangunan daerah dan nasional. Saat ini perusahaan tersebut selalu menggunakan tenaga kontrak dalam pengerjaan proyek-proyek yang dikerjakannya. Hal ini menyebabkan pengetahuan proyek yang dimiliki oleh tenaga kontrak terebut tidak didokumentasikan saat pekerjaan proyek selesai. Sehingga perusahaan kehilangan pengetahuan proyek dari tenaga kontrak tersebut. Penerapan Sistem Manajemen Pengetahuan merupakan solusi yang tepat untuk mengelola pengetahuan proyek perusahaan tersebut, sehingga pengetahuan proyek yang diperoleh dari pelaksanaan proyek-proyek sebelumnya dapat digunakan dalam pelaksanaan proyek mendatang oleh tenaga kontrak yang baru diangkat dan belum memiliki pengalaman. Metode yang digunakan untuk menghasilkan tingkat similaritas kata kunci ketika pencarian dalam sistem adalah metode TF-IDF (Term Frequency-Inversed Document Frequency) dan VSM (Vector Space Model). Hasil dari penelitian ini berupa model Sistem Manajemen Pengetahuan yang diharapkan dapat membantu tenaga kerja proyek dalam mendokumentasikan pengetahuan dari setiap tenaga kerja proyek dan mengetahui solusi untuk menghadapi permasalah yang terjadi dalam pengerjaan proyek.

Download Full-text

System of Information Feedback on Archive Using Term Frequency-Inverse Document Frequency and Vector Space Model Methods

IJIIS: International Journal of Informatics and Information Systems ◽

10.47738/ijiis.v3i1.6 ◽

2020 ◽

Vol 3 (1) ◽

pp. 36-42

Author(s):

Didit Suhartono ◽

Khodirun Khodirun

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Retrieval Performance ◽

Inverse Document Frequency ◽

Term Frequency ◽

Space Model ◽

Storage And Retrieval ◽

Document Frequency ◽

Speed Up

The archive is one of the examples of documents that important. Archives are stored systematically with a view to helping and simplifying the storage and retrieval of the archive. In the information retrieval (Information retrieval) the process of retrieving relevant documents and not retrieving documents that are not relevant. To retrieve the relevant documents, a method is needed. Using the Term Frequency-Inverse Document and Vector Space Model methods can find relevant documents according to the level of closeness or similarity, in addition to applying the Nazief-Adriani stemming algorithm can improve information retrieval performance by transforming words in a document or text to the basic word form. then the system indexes the document to simplify and speed up the search process. Relevance is determined by calculating the similarity values between existing documents by querying and represented in certain forms. The documents obtained, then the system sort by the level of relevance to the query.

Download Full-text

Extended Vector Space Model with Semantic Relatedness on Java Archive Search Engine

Jurnal Teknik Informatika dan Sistem Informasi ◽

10.28932/jutisi.v1i2.372 ◽

2015 ◽

Vol 1 (2) ◽

Cited By ~ 2

Author(s):

Oscar Karnalim

Keyword(s):

Vector Space ◽

Search Engine ◽

Vector Space Model ◽

Semantic Relatedness ◽

Space Model

Download Full-text

Aplikasi Deteksi Kemiripan Tugas Paper

Matrik Jurnal Manajemen Teknik Informatika dan Rekayasa Komputer ◽

10.30812/matrik.v15i2.39 ◽

2017 ◽

Vol 15 (2) ◽

pp. 5

Author(s):

Anthony Anggrawan ◽

Azhari

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Mean Average Precision ◽

Average Precision ◽

Information Searching ◽

Space Model ◽

Model Method

Information searching based on users’ query, which is hopefully able to find the documents based on users’ need, is known as Information Retrieval. This research uses Vector Space Model method in determining the similarity percentage of each student’s assignment. This research uses PHP programming and MySQL database. The finding is represented by ranking the similarity of document with query, with mean average precision value of 0,874. It shows how accurate the application with the examination done by the experts, which is gained from the evaluation with 5 queries that is compared to 25 samples of documents. If the number of counted assignments has higher similarity, thus the process of similarity counting needs more time, it depends on the assignment’s number which is submitted.

Download Full-text

Aplikasi Rekomendasi Buku Pada Katalog Perpustakaan Universitas Multimedia Nusantara Menggunakan Vector Space Model

Jurnal ULTIMATICS ◽

10.31937/ti.v9i2.639 ◽

2018 ◽

Vol 9 (2) ◽

pp. 97-105

Author(s):

Richard Firdaus Oeyliawan ◽

Dennis Gunawan

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Vector Model ◽

Library Management ◽

Space Model ◽

Library Management System ◽

Index Terms ◽

Library Catalogue ◽

Language Sample ◽

F Measure

Library is one of the facilities which provides information, knowledge resource, and acts as an academic helper for readers to get the information. The huge number of books which library has, usually make readers find the books with difficulty. Universitas Multimedia Nusantara uses the Senayan Library Management System (SLiMS) as the library catalogue. SLiMS has many features which help readers, but there is still no recommendation feature to help the readers finding the books which are relevant to the specific book that readers choose. The application has been developed using Vector Space Model to represent the document in vector model. The recommendation in this application is based on the similarity of the books description. Based on the testing phase using one-language sample of the relevant books, the F-Measure value gained is 55% using 0.1 as cosine similarity threshold. The books description and variety of languages affect the F-Measure value gained. Index Terms—Book Recommendation, Porter Stemmer, SLiMS Universitas Multimedia Nusantara, TF-IDF, Vector Space Model

Download Full-text