Analisis Perbandingan Algoritma ID3 dan KNN Pada Klasifikasi Emosi Teks Berita Berbahasa Indonesia

Penggunaan algoritma pada pembuktian proses klasifikasi berbasis teks atau text mining sangat jarang dilakukan perbandingan khususnya untuk sebuah klasifikasi emosi. Banyak yang melakukan penelitian dalam klasifikasi tanpa unsur perbandingan didalamnya serta tidak terdapat penggunaan sistem yang dibangun secara mandiri. Pada penelitian ini perbandingan dilakukan untuk mengukur kemampuan algoritma dalam perolehan tingkat akurasi pada proses klasifikasi menggunana ID3 dan KNN. Data yang digunakan sebanyak 220 data berbasis teks berita yang diambil pada sistus warta media online yaitu viva.co.id, proses pelatihan data dilakukan dengan perbedaan proses pembobotan pada masing-masing algoritma yaitu dengan term weighting tf-idf untuk ID3 sedangkan KNN dengan similarity dan vector space model. Klasifikasi yang dilakukan untuk memperoleh data berkategori emosi dengan hasil akurasi yang didapatkan dari klasifikasi testing dengan data perbandingan yang beragam didapatkan akurasi paling tinggi yaitu 71.25 yaitu dengan perbandingan data latih dengan data uji 75%- 25%. Demikian penggunaan algoritma ID3 lebih baik dalam pengklasifikasian emosi berbahasa Indonesia dimana sebuah metode yang sangat efisien dalam pengelompokkan data berdasarkan kategori baik secara manual ataupun sistem.

Download Full-text

Improving Term Weighting Schemes for Short Text Classification in Vector Space Model

IEEE Access ◽

10.1109/access.2019.2953918 ◽

2019 ◽

Vol 7 ◽

pp. 166578-166592

Author(s):

Surender Singh Samant ◽

N. L. Bhanu Murthy ◽

Aruna Malapati

Keyword(s):

Vector Space ◽

Text Classification ◽

Vector Space Model ◽

Term Weighting ◽

Weighting Schemes ◽

Short Text ◽

Space Model

Download Full-text

An improved term weighting scheme for vector space model

Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826) ◽

10.1109/icmlc.2004.1382048 ◽

2005 ◽

Cited By ~ 3

Author(s):

Yue-Heng Sun ◽

Pi-Lian He ◽

Zhi-Gang Chen

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Weighting Scheme ◽

Term Weighting ◽

Space Model

Download Full-text

Scoring, term weighting, and the vector space model

Introduction to Information Retrieval ◽

10.1017/cbo9780511809071.007 ◽

2012 ◽

pp. 100-123 ◽

Cited By ~ 52

Author(s):

Christopher D. Manning ◽

Prabhakar Raghavan ◽

Hinrich Schutze

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Term Weighting ◽

Space Model

Download Full-text

Optimising the Heuristics in Latent Semantic Indexing for Effective Information Retrieval

Journal of Information & Knowledge Management ◽

10.1142/s0219649206001359 ◽

2006 ◽

Vol 05 (02) ◽

pp. 97-105 ◽

Cited By ~ 3

Author(s):

S. Srinivas ◽

Ch. AswaniKumar

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Latent Semantic Indexing ◽

Semantic Indexing ◽

Retrieval Performance ◽

Term Weighting ◽

Space Model ◽

Rank Approximation

Latent Semantic Indexing (LSI) is a famous Information Retrieval (IR) technique that tries to overcome the problems of lexical matching using conceptual indexing. LSI is a variant of vector space model and proved to be 30% more effective. Many studies have reported that good retrieval performance is related to the use of various retrieval heuristics. In this paper, we focus on optimising two LSI retrieval heuristics: term weighting and rank approximation. The results obtained demonstrate that the LSI performance improves significantly with the combination of optimised term weighting and rank approximation.

Download Full-text

Analysis of Text Classification with various Term Weighting Schemes in Vector Space Model

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.d1938.0891020 ◽

2020 ◽

Vol 9 (10) ◽

pp. 390-393

Keyword(s):

Vector Space ◽

Text Classification ◽

Naive Bayes ◽

Information Gain ◽

Vector Space Model ◽

Naïve Bayes ◽

Weighting Scheme ◽

Term Weighting ◽

Space Model ◽

Weighting Methods

Term Weighting Scheme (TWS) is a key component of the matching mechanism when using the vector space model In the context of information retrieval (IR) from text documents, the this paper described a new approach of term weighting methods to improve the classification performance. In this study, we propose an effective term weighting scheme, which gives highest accuracy with compare to the text classification methods. We compared performance parameter of KNN and Naïve Bayes Classification with different Weighting Method, Weight information gain, SVM and proposed method.We have implemented many term-weighting methods (TWM) on Amazon data collections in combination with Information-Gain and SVM and KNN algorithm and Naïve Bayes Algorithm.

Download Full-text

Klasifikasi Artikel Ilmiah Dengan Berbagai Skenario Preprocessing

Sains, Aplikasi, Komputasi dan Teknologi Informasi ◽

10.30872/jsakti.v2i2.2681 ◽

2020 ◽

Vol 2 (2) ◽

pp. 70

Author(s):

Hidayatul Ma'rifah ◽

Aji Prasetya Wibawa ◽

Muhammad Iqbal Akbar

Keyword(s):

Text Mining ◽

Vector Space ◽

Cross Validation ◽

Confusion Matrix ◽

Vector Space Model ◽

Nearest Neighbour ◽

Inverse Document Frequency ◽

Space Model ◽

Document Frequency ◽

Fold Cross Validation

Penelitian ini bertujuan untuk menemukan kombinasi dan urutan preprocessing dalam text mining yang paling maksimal untuk klasifikasi bidang jurnal berbahasa Indonesia berdasarkan judul dan abstraknya. Tahap-tahap preprocessing yang akan diterapkan terdiri dari case folding, stemming, stopwords removal, transformasi VSM (Vector Space Model), dan SMOTE. Namun, pengamatan tiap skenario berfokus pada stemming dan dua teknik stopwords removal, yaitu stopwords removal berbasis kamus, dan berbasis document frequency setelah melewati proses transformasi ke dalam bentuk VSM dengan pembobotan TF-IDF (Term Trequency–Inverse Document Frequency). Proses klasifikasi mengadopsi algoritma k-NN (K-Nearest Neighbour), yang menentukan kelas suatu data tes dengan melihat tetangga terdekatnya. Dalam penelitian ini, metrik untuk menemukan jarak tetangga terdekat adalah Cosine Similarity. Pengujian klasifikasi menggunakan 10-Fold Cross Validation untuk menghasilkan confusion matrix sebagai hasil akhir. Kinerja klasifikasi terbaik dicapai dengan persentase accuracy sebesar 72.91% dan precision mencapai 73,36%.

Download Full-text

Aplikasi Penentuan Dosen Penguji Skripsi Menggunakan Metode TF-IDF dan Vector Space Model

Computatio : Journal of Computer Science and Information Systems ◽

10.24912/computatio.v1i2.1014 ◽

2017 ◽

Vol 1 (2) ◽

pp. 171

Author(s):

Riki Ruli A. Siregar ◽

Fera Amelia Sinaga ◽

Rakhmat Arianto

Keyword(s):

Text Mining ◽

Vector Space ◽

Vector Space Model ◽

Space Model

Pada Sekolah Tinggi Teknik PLN (STT-PLN) penentuan dosen penguji tugas akhir atau skripsimerupakan tugas dari sekretaris jurusan. Penelitian ini bertujuan untuk memberikan alternativeuntuk menentukan dosen penguji skripsi. Metode yang di terapkan untuk membangun system iniadalah text mining, TF-IDF dan Vector Space Model (VSM). Text mining untuk melakukanprocessing data, dimana data yang akan diproses adalah judul dan abstrak skripsi, sedangkanVSM untuk melakukan pengklasifikasian kompetensi, penelitian ini dapat merekomendasikantiga dosen untuk menjadi dosen penguji skripsi berdasarkan kecocokan antara judul danabstrak dengan klasifikasi Pada penelitian ini, penulis menggunakan Model pengembanganperangkat lunak CRISP-DM. Adapun fase yang dimiliki oleh CRISP-DM adalah fasepemahaman bisnis, fase pemahanman data, fase pengolahan data, fase permodelan, faseevaluasi dan fase penyebaran. Hasil dari penelitian ini memiliki akurasi 93,22%.

Download Full-text

Best Approximate of Vector Space Model by Using SVD

Al-Mustansiriyah Journal of Science ◽

10.23851/mjs.v28i2.509 ◽

2018 ◽

Vol 28 (2) ◽

pp. 143

Author(s):

Raghad M. Hadi

Keyword(s):

Text Mining ◽

Vector Space ◽

Document Clustering ◽

Vector Space Model ◽

Internet Technology ◽

Low Rank ◽

Space Model ◽

Text Document ◽

Space Technique ◽

Text Mining Application

A quick growth of internet technology makes it easy to assemble a huge volume of data as text document; e. g., journals, blogs, network pages, articles, email letters. In text mining application, increasing text space of datasets represent excessive task which makes it hard to pre-processing documents in efficient way to prepare it for text mining application like document clustering. The proposed system focuses on pre-processing document and reduction document space technique to prepare it for clustering technique. The mutual method for text mining problematic is vector space model (VSM), each term represent a features. Thus the proposed system create vector-space mod-el by using pre-processing method to reduce of trivial data from dataset. While the hug dimen-sionality of VSM is resolved by using low-rank SVD. Experiment results show that the proposed system give better document representation results about 10% from previous approach to prepare it for document clustering

Download Full-text

THE INFLUENCE OF TEXT PREPROCESSING METHODS AND TOOLS ON CALCULATING TEXT SIMILARITY

Facta Universitatis Series Mathematics and Informatics ◽

10.22190/fumi1905973d ◽

2019 ◽

pp. 973

Author(s):

Đorđe Petrović ◽

Milena Stanković

Keyword(s):

Text Mining ◽

Vector Space ◽

Vector Space Model ◽

Text Similarity ◽

Text Documents ◽

Space Model ◽

Text Document ◽

The Subject ◽

Text Preprocessing ◽

Multidimensional Representation

Text mining to a great extent depends on the various text preprocessing techniques. The preprocessing methods and tools which are used to prepare texts for further mining can be divided into those which are and those which are not language-dependent. The subject matter of this research was the analysis of the inﬂuence of these methods and tools on further text mining. We ﬁrst focused on the analysis of the inﬂuence on the reduction of the vector space model for the multidimensional represen-tation of text documents. We then analyzed the inﬂuence on calculating text similarity, which is the focus of this research. The conclusion we reached is that the implemen-tation of various text preprocessing methods in the Serbian language, which are used for the reduction of the vector space model for the multidimensional representation of text document, achieves the required results. But, the implementation of various text preprocessing methods speciﬁc to the Serbian language for the purpose of calculating text similarity can lead to great diﬀerences in the results.

Download Full-text

SISTEM KLASIFIKASI DAN PENCARIAN JURNAL DENGAN MENGGUNAKAN METODE NAIVE BAYES DAN VECTOR SPACE MODEL

Jurnal Informatika ◽

10.21460/inf.2008.42.48 ◽

2011 ◽

Vol 4 (2) ◽

Cited By ~ 1

Author(s):

Amalia Indranandita ◽

Budi Susanto ◽

Antonius Rahmat

Keyword(s):

Text Mining ◽

Vector Space ◽

Naive Bayes ◽

Vector Space Model ◽

Naïve Bayes ◽

Space Model

Kebutuhan konsumen terhadap informasi dalam bentuk jurnal atau artikel ilmiahsemakin meningkat, sehingga pengelompokan jurnal dibutuhkan untuk mempermudahpencarian informasi. Topik jurnal diharapkan dapat mewakili isi jurnal, tanpa harusmembaca secara keseluruhan. Dalam kenyataannya, pengelompokan jurnal yangmengacu topit</kategori tertentu sulit dilakukan jika hanya mengandalkan query biasa.-Sistem klasifikasi dan pencarian jurnal dengan metode Naive Bayes dan VectorSpace Model dengan pendekatan Cosine diharapkan membantu pengguna dalampenentuan topik/kategori dan menghasilkan daftar jurnal berdasarkan urutan tingkatkemiripan. Proses text mining dilakukan untuk mempersiapkan kebutuhan dasar sistem.Tahapan proses text mining adalah text preprocessrng dengan parsing, texttransformation dengan stemming dan sfoprazords removal, feature setection dan-patterndiscovery.Klasifikasi Naive Bayes menghasilkan prediksi baik jika vektor yang terbentukmewakili setiap kategori. Sedangkan pencarian Vector Space Uoabt denganpendekatan Cosrne menghasilkan recallsebesar 54.8% dan precision sebesar 60.7%.Oleh karena itu, dibangun sistem klasifikasi dan pencarian yang dapat membantupengguna, karena dilengkapi pencarian detil dengan pengetahuan label kategori hasilklasifikasi dan fitur metadata.

Download Full-text