Students Query Classification System

A University or educational institute generally receives a bulk of complaints posted by students every day. The issues relate to their academics or any issues related to their education or related to exam sections etc., because of these bulk of complaints received from the students every day, makes it difficult for the university to sort out them and classify them and send them to their respective departments for resolving the issues. In this project, we work on classifying these complaints based on the classes or departments they belong to, using. By using TF-IDF (term frequency-inverse document frequency) it finds terms which are more related to a specific document by converting to vectors. By capturing some keywords in the complaints, adding some weight to the keywords and using different Machine Learning classification’s we are classifying the complaint based on these keywords. This classification makes the works easier for the university and saves time which is used to sort them and gives better service for the students. Now they can directly send the complaints to the respective departments with ease.

Download Full-text

Aspect Category Classification dengan Pendekatan Machine Learning Menggunakan Dataset Bahasa Indonesia

Jurnal Nasional Teknik Elektro dan Teknologi Informasi (JNTETI) ◽

10.22146/jnteti.v10i3.1819 ◽

2021 ◽

Vol 10 (3) ◽

pp. 229-235

Author(s):

Syaifulloh Amien Pandega Perdana ◽

Teguh Bharata Aji ◽

Ridi Ferdiana

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Sentiment Analysis ◽

Support Vector ◽

Term Weighting ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency ◽

Bahasa Indonesia

Ulasan pelanggan merupakan opini terhadap kualitas barang atau jasa yang dirasakan konsumen. Ulasan pelanggan mengandung informasi yang berguna bagi konsumen maupun penyedia barang atau jasa. Ketersediaan ulasan pelanggan dalam jumlah besar pada website membutuhkan suatu framework untuk mengekstraksi sentimen secara otomatis. Sebuah ulasan pelanggan sering kali mengandung banyak aspek sehingga Aspect Based Sentiment Analysis (ABSA) harus digunakan untuk mengetahui polaritas masing-masing aspek. Salah satu tugas penting dalam ABSA adalah Aspect Category Detection. Metode machine learning untuk Aspect Category Detection sudah banyak dilakukan pada domain berbahasa Inggris, tetapi pada domain bahasa Indonesia masih sedikit. Makalah ini membandingkan kinerja tiga algoritme machine learning, yaitu Naïve Bayes (NB), Support Vector Machine (SVM), dan Random Forest (RF) pada ulasan pelanggan berbahasa Indonesia menggunakan Term Frequency–Inverse Document Frequency (TF-IDF) sebagai term weighting. Hasil menunjukkan bahwa RF memiliki kinerja paling unggul dibandingkan NB dan SVM pada tiga domain yang berbeda, yaitu restoran, hotel, dan e-commerce, dengan nilai f1-score untuk masing-masing domain adalah 84.3%, 85.7%, dan 89,3%.

Download Full-text

Sistem Perekomendasi Dosen Pembimbing berdasarkan Relevansi Topik Tugas Akhir menggunakan Metode Okapi BM25

Repositor ◽

10.22219/repositor.v2i9.672 ◽

2020 ◽

Vol 2 (9) ◽

Author(s):

Meilina Agustina ◽

Yufiz Azhar ◽

Nur Hayatin

Keyword(s):

Recommendation System ◽

School Teacher ◽

Primary School Teacher ◽

Education Department ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency ◽

Teacher Education Department ◽

Primary School Teacher Education ◽

The University

AbstrakSistem rekomendasi adalah sebuah perangkat lunak untuk memberikan rekomendasi kepada pengguna mengenai produk yang dapat digunakannya. Masalah administrasi di kantor jurusan Pendidikan Guru Sekolah Dasar Universitas Muhammadiyah Malang merupakan salah satu permasalahan yang selalu dihadapi oleh para staf TU dan part timer. Penggunaan sistem manual yang masih berjalan saat ini dinilai kurang efektif terhadap waktu, tempat, dan tenaga sehingga diperlukan adanya bantuan berupa sistem informasi. Pada perancangan sistem informasi ini akan menggunakan metode Okapi BM25 dimana metode ini merupakan fungsi peringkat yang digunakan oleh mesin pencari (search engine) untuk peringkat dokumen pencocokan sesuai relevansinya dengan permintaan pencarian yaitu berupa topik tugas akhir. BM25 memiliki fungsi yang sesuai dengan 3 prinsip pembobotan yang baik, yaitu memiliki inverse document frequency (idf), term frequency (tf), dan memiliki fungsi normalisasi dari panjang dokumen (document length normalization).Abstract The recommendation system is a software to provide recommendations to users about the products they can use. The administrative problem in the office of the Primary School Teacher Education department at the University of Muhammadiyah Malang is one of the problems faced by the Administration staff and part timers. The use of manual systems that are still running at this time is considered to be less effective against time, place, and energy, so that assistance in the form of information systems is needed. In designing this information system will use the Okapi BM25 method where this method is a ranking function used by search engines for matching document rankings according to their relevance to search queries, namely in the form of final assignment topics. BM25 has functions that are in accordance with the 3 principles of good weighting, which has an inverse document frequency (idf), term frequency (tf), and has a document length normalization function.

Download Full-text

Phisher Fighter: Website Phishing Detection System Based on URL and Term Frequency-Inverse Document Frequency Values

Journal of Cyber Security and Mobility ◽

10.13052/jcsm2245-1439.1114 ◽

2021 ◽

Author(s):

E. Sri Vishva ◽

D. Aju

Keyword(s):

Machine Learning ◽

Detection System ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Sensitive Information ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency ◽

Single Piece ◽

Phishing Detection

Fundamentally, phishing is a common cybercrime that is indulged by the intruders or hackers on naive and credible individuals and make them to reveal their unique and sensitive information through fictitious websites. The primary intension of this kind of cybercrime is to gain access to the ad hominem or classified information from the recipients. The obtained data comprises of information that can very well utilized to recognize an individual. The purloined personal or sensitive information is commonly marketed in the online dark market and subsequently these information will be bought by the personal identity brigands. Depending upon the sensitivity and the importance of the stolen information, the price of a single piece of purloined information would vary from few dollars to thousands of dollars. Machine learning (ML) as well as Deep Learning (DL) are powerful methods to analyse and endeavour against these phishing attacks. A machine learning based phishing detection system is proposed to protect the website and users from such attacks. In order to optimize the results in a better way, the TF-IDF (Term Frequency-Inverse Document Frequency) value of webpages is employed within the system. ML methods such as LR (Logistic Regression), RF (Random Forest), SVM (Support Vector Machine), NB (Naive Bayes) and SGD (Stochastic Gradient Descent) are applied for training and testing the obtained dataset. Henceforth, a robust phishing website detection system is developed with 90.68% accuracy.

Download Full-text

Recommendation System Using Weighted TF-IDF and Naive Bayes Classifiers on RSS Contents

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2010.p0631 ◽

2010 ◽

Vol 14 (6) ◽

pp. 631-637

Author(s):

Incheon Paik ◽

◽

Hiroshi Mizugai ◽

Keyword(s):

Machine Learning ◽

Recommendation System ◽

Naive Bayes ◽

Naïve Bayes ◽

Bayes Classifier ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency ◽

Rss Feeds ◽

Enormous Quantity

A recent increase in RDF Site Summary (RSS) feeds, used for news updates and blogs, has been caused by the widespread use of blogs. This means that much effort is now needed to search the contents of RSS feeds because of this enormous quantity of material. To solve this problem, recommendation systems enable users to obtain relevant RSS contents easily and quickly. In previous research, an RSS recommendation system was proposed that used the similarity between the Term Frequency (TF) of the RSS contents and the TF derived from the contents of the user’s browsing history for RSS feeds. In this paper, we use Term Frequency-Inverse Document Frequency (TF-IDF) calculations to propose a Weighted TF-IDF method, which focuses on the terms folded by the title tags in RSS contents as characteristic terms. In addition, we propose a new recommendation method, which uses a Naive Bayes classifier in a Machine Learning-based approach. Via experiments, we compare the proposed methods and the existing method in a prototype recommendation system, and we show that the proposed methods outperform the existing method with respect to several evaluation measurements.

Download Full-text

Sistem Rekomendasi Produk Pena Eksklusif Menggunakan Metode Content-Based Filtering dan TF-IDF

JOINTECS (Journal of Information Technology and Computer Science) ◽

10.31328/jointecs.v5i3.1563 ◽

2020 ◽

Vol 5 (3) ◽

pp. 229

Author(s):

Mariani Widia Putri ◽

Achmad Muchayan ◽

Made Kamisutara

Keyword(s):

Information Retrieval ◽

Customer Relationship Management ◽

Relationship Management ◽

Customer Relationship ◽

Brand Awareness ◽

Product Knowledge ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency ◽

Content Based Filtering

Sistem rekomendasi saat ini sedang menjadi tren. Kebiasaan masyarakat yang saat ini lebih mengandalkan transaksi secara online dengan berbagai alasan pribadi. Sistem rekomendasi menawarkan cara yang lebih mudah dan cepat sehingga pengguna tidak perlu meluangkan waktu terlalu banyak untuk menemukan barang yang diinginkan. Persaingan antar pelaku bisnis pun berubah sehingga harus mengubah pendekatan agar bisa menjangkau calon pelanggan. Oleh karena itu dibutuhkan sebuah sistem yang dapat menunjang hal tersebut. Maka dalam penelitian ini, penulis membangun sistem rekomendasi produk menggunakan metode Content-Based Filtering dan Term Frequency Inverse Document Frequency (TF-IDF) dari model Information Retrieval (IR). Untuk memperoleh hasil yang efisien dan sesuai dengan kebutuhan solusi dalam meningkatkan Customer Relationship Management (CRM). Sistem rekomendasi dibangun dan diterapkan sebagai solusi agar dapat meningkatkan brand awareness pelanggan dan meminimalisir terjadinya gagal transaksi di karenakan kurang nya informasi yang dapat disampaikan secara langsung atau offline. Data yang digunakan terdiri dari 258 kode produk produk yang yang masing-masing memiliki delapan kategori dan 33 kata kunci pembentuk sesuai dengan product knowledge perusahaan. Hasil perhitungan TF-IDF menunjukkan nilai bobot 13,854 saat menampilkan rekomendasi produk terbaik pertama, dan memiliki keakuratan sebesar 96,5% dalam memberikan rekomendasi pena.

Download Full-text

Application of Customized Term Frequency-Inverse Document Frequency for Vietnamese Document Classification in Place of Lemmatization

Advances in Intelligent Systems and Computing - Intelligent Computing and Optimization ◽

10.1007/978-3-030-68154-8_37 ◽

2021 ◽

pp. 406-417

Author(s):

Do Viet Quan ◽

Phan Duy Hung

Keyword(s):

Document Classification ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency

Download Full-text

Term Frequency by Inverse Document Frequency

10.1007/springerreference_65918 ◽

2011 ◽

Keyword(s):

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency

Download Full-text

Hoax News Detection on Twitter using Term Frequency Inverse Document Frequency and Support Vector Machine Method

Journal of Physics Conference Series ◽

10.1088/1742-6596/1192/1/012025 ◽

2019 ◽

Vol 1192 ◽

pp. 012025

Author(s):

A Fauzi ◽

E B Setiawan ◽

Z K A Baizal

Keyword(s):

Support Vector Machine ◽

Support Vector ◽

Machine Method ◽

Inverse Document Frequency ◽

Support Vector Machine Method ◽

Term Frequency ◽

Document Frequency

Download Full-text

Seleksi Fitur Bobot Kata dengan Metode TFIDF untuk Ringkasan Bahasa Indonesia

Jurnal Ilmiah Merpati (Menara Penelitian Akademika Teknologi Informasi) ◽

10.24843/jim.2018.v06.i02.p06 ◽

2018 ◽

pp. 119

Author(s):

Ni Komang Widyasanti ◽

I Ketut Gede Darma Putra ◽

Ni Kadek Dwi Rusjayanthi

Keyword(s):

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency ◽

Bahasa Indonesia

Penyebaran informasi dalam bentuk teks digital semakin tak terbendung seiring perkembangan waktu. Kebutuhan akan membaca informasi juga tidak pernah berkurang, berdasarkan riset yang dilakukan pada lima kota besar di Indonesia sepanjang tahun 2015 oleh okezone.com menyatakan persentasi konsumsi berita secara online mencapai 96%. Salah satu solusi untuk mempermudah dan mempercepat pencarian informasi yang sesuai adalah dengan meringkas konten tersebut. TFIDF (Term Frequency Inverse Document Frequency) merupakan metode pembobotan dalam bentuk integrasi antar term frequency dengan inverse document frequency. Metode TFIDF digunakan pada penelitian ini untuk memilih fitur sebagai hasil ringkasan, dengan penerapannya pada seleksi fitur bobot kata. Nilai kepuasan pembaca sebesar 61,94%. Durasi ringkasan rata-rata 68,25 detik dengan jumlah kalimat dan kata rata-rata 31,875 dan 387,375. Penelitian dilakukan menggunakan jenis dokumen fiksi dan non-fiksi serta seleksi fitur disetiap paragrafnya, yang membedakannya dengan penelitian terkait sebelumnya. Kata Kunci: Ringkasan Teks Otomatis, Pembobotan TFIDF, Bahasa Indonesia

Download Full-text