Klasifikasi Teks Bahasa Indonesia Pada Dokumen Pengaduan Sambat Online Menggunakan Metode K-Nearest Neighbors Dan Chi-square

Claudio Fresta Suharno; M. Ali Fauzi; Rizal Setya Perdana

doi:10.29080/systemic.v3i1.191

Klasifikasi Teks Bahasa Indonesia Pada Dokumen Pengaduan Sambat Online Menggunakan Metode K-Nearest Neighbors Dan Chi-square

Systemic Information System and Informatics Journal ◽

10.29080/systemic.v3i1.191 ◽

2017 ◽

Vol 3 (1) ◽

pp. 25-32 ◽

Cited By ~ 1

Author(s):

Claudio Fresta Suharno ◽

M. Ali Fauzi ◽

Rizal Setya Perdana

Keyword(s):

Nearest Neighbors ◽

K Nearest Neighbors ◽

Chi Square ◽

F Measure ◽

Bahasa Indonesia

K-Nearest Neighbors (K-NN) merupakan metode klasifikasi yang mudah untuk dipahami. Akan tetapi metode tersebut memiliki beberapa kekurangan, salah satunya dalam aspek komputasi perhitungan yang besar. Oleh karena itu, seleksi fitur digunakan sebagai salah satu cara untuk mengurangi besarnya komputasi adalah dengan mengurangi jumlah fitur yang tidak relevan dalam klasifikasi teks. Metode seleksi fitur yang digunakan adalah menggunakan metode Chi-Square untuk menghitung tingkat dependensi fitur. Proses yang dilakukan adalah mengumpulkan dokumen latih dan dokumen uji, melakukan tahap preprocessing dan seleksi fitur, kemudian dilakukan klasifikasi, dan pada tahap akhir dilakukan pengujian dan analisis terhadap hasil klasifikasi oleh sistem terkait nilai precision, recall, dan F-Measure. Dari penelitian ini dihasilkan bahwa seleksi fitur dapat meningkatkan nilai F-Measure dalam klasifikasi teks berbahasa Indonesia pada dokumen pengaduan SAMBAT Online dengan menggunakan metode klasifikasi K-Nearest Neighbors

Download Full-text

LexiPers: An ontology based sentiment lexicon for Persian

10.29007/f4j4 ◽

2018 ◽

Author(s):

Behnam Sabeti ◽

Pedram Hosseini ◽

Gholamreza Ghassem-Sani ◽

Sَeyed Abolghasem Mirroshandel

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

Nearest Neighbors ◽

Classification Problem ◽

General Purpose ◽

Seed Selection ◽

K Nearest Neighbors ◽

Sentiment Lexicon ◽

Sentiment Orientation ◽

F Measure

Sentiment analysis refers to the use of natural language processing to identify and extract subjective information from textual resources. One approach for sentiment extraction is using a sentiment lexicon. A sentiment lexicon is a set of words associated with the sentiment orientation that they express. In this paper, we describe the process of generating a general purpose sentiment lexicon for Persian. A new graph-based method is introduced for seed selection and expansion based on an ontology. Sentiment lexicon generation is then mapped to a document classification problem. We used the K-nearest neighbors and nearest centroid methods for classification. These classifiers have been evaluated based on a set of hand labeled synsets. The final sentiment lexicon has been generated by the best classifier. The results show an acceptable performance in terms of accuracy and F-measure in the generated sentiment lexicon.

Download Full-text

Towards Optimization of Malware Detection using Chi-square Feature Selection on Ensemble Classifiers

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d2359.0410421 ◽

2021 ◽

Vol 10 (4) ◽

pp. 254-262

Author(s):

*Fadare Oluwaseun Gbenga ◽

Adetunmbi Adebayo Olusola ◽

(Mrs) Oyinloye Oghenerukevwe Eloho ◽

Mogaji Stephen Alaba

Keyword(s):

Feature Selection ◽

Malware Detection ◽

Feature Selection Method ◽

Ensemble Methods ◽

Nearest Neighbors ◽

Selection Method ◽

Gradient Boosting ◽

K Nearest Neighbors ◽

Chi Square ◽

Extreme Gradient Boosting

The multiplication of malware variations is probably the greatest problem in PC security and the protection of information in form of source code against unauthorized access is a central issue in computer security. In recent times, machine learning has been extensively researched for malware detection and ensemble technique has been established to be highly effective in terms of detection accuracy. This paper proposes a framework that combines combining the exploit of both Chi-square as the feature selection method and eight ensemble learning classifiers on five base learners- K-Nearest Neighbors, Naïve Bayes, Support Vector Machine, Decision Trees, and Logistic Regression. K-Nearest Neighbors returns the highest accuracy of 95.37%, 87.89% on chi-square, and without feature selection respectively. Extreme Gradient Boosting Classifier ensemble accuracy is the highest with 97.407%, 91.72% with Chi-square as feature selection, and ensemble methods without feature selection respectively. Extreme Gradient Boosting Classifier and Random Forest are leading in the seven evaluative measures of chi-square as a feature selection method and ensemble methods without feature selection respectively. The study results show that the tree-based ensemble model is compelling for malware classification.

Download Full-text

Sentiment Classification Using Text Embedding for Thai Teaching Evaluation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.886.221 ◽

2019 ◽

Vol 886 ◽

pp. 221-226 ◽

Cited By ~ 1

Author(s):

Kesinee Boonchuay

Keyword(s):

Naive Bayes ◽

Geometric Mean ◽

Nearest Neighbors ◽

Naïve Bayes ◽

Teaching Evaluation ◽

Sentiment Classification ◽

Teaching Skills ◽

K Nearest Neighbors ◽

Overall Performance ◽

F Measure

Sentiment classification gains a lot of attention nowadays. For a university, the knowledge obtained from classifying sentiments of student learning in courses is highly valuable, and can be used to help teachers improve their teaching skills. In this research, sentiment classification based on text embedding is applied to enhance the performance of sentiment classification for Thai teaching evaluation. Text embedding techniques considers both syntactic and semantic elements of sentences that can be used to improve the performance of the classification. This research uses two approaches to apply text embedding for classification. The first approach uses fastText classification. According to the results, fastText provides the best overall performance; its highest F-measure was at 0.8212. The second approach constructs text vectors for classification using traditional classifiers. This approach provides better performance over TF-IDF for k-nearest neighbors and naïve Bayes. For naïve Bayes, the second approach yields the best performance of geometric mean at 0.8961. The performance of TF-IDF is better suited to using decision tree than the second approach. The benefit of this research is that it presents the workflow of using text embedding for Thai teaching evaluation to improve the performance of sentiment classification. By using embedding techniques, similarity and analogy tasks of texts are established along with the classification.

Download Full-text

Discretization Based Framework to Improve the Recommendation Quality

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/3/13 ◽

2021 ◽

Vol 18 (3) ◽

Author(s):

Bilal Ahmed ◽

Wang Li

Keyword(s):

Collaborative Filtering ◽

Recommendation System ◽

Information Filtering ◽

Nearest Neighbors ◽

User Preferences ◽

Primary Concern ◽

K Nearest Neighbors ◽

Chi Square ◽

Similarity Computation ◽

Rating Prediction

Recommendation systems are information filtering software that delivers suggestions about relevant stuff from a massive collection of data. Collaborative filtering approaches are the most popular in recommendations. The primary concern of any recommender system is to provide favorable recommendations based on the rating prediction of user preferences. In this article, we propose a novel discretization based framework for collaborative filtering to improve rating prediction. Our framework includes discretization-based preprocessing, chi-square based attribution selection, and K-Nearest Neighbors (KNN) based similarity computation. Rating prediction affords some basis for the judgment to decide whether recommendations are generated or not, subject to the ratio of performance of any recommendation system. Experiments on two datasets MovieLens and BookCrossing, demonstrate the effectiveness of our method.

Download Full-text

Food Detection Using Histogram of Oriented Gradient (HOG) as Feature Extraction and K-Nearest Neighbors (K-NN) as Classifier

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2020/3191.52020 ◽

2020 ◽

Vol 9 (1.5) ◽

pp. 219-225

Author(s):

Diah Rahmadani

Keyword(s):

Feature Extraction ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Histogram Of Oriented Gradient ◽

Food Detection

Download Full-text

The Implementation of Subspace Outlier Detection in K-Nearest Neighbors to Improve Accuracy in Bank Marketing Data

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2020/44822020 ◽

2020 ◽

Vol 8 (2) ◽

pp. 545-550

Author(s):

Dimas Aryo Anggoro

Keyword(s):

Outlier Detection ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Improve Accuracy ◽

Marketing Data ◽

Bank Marketing

Download Full-text

Evolutionary Feature Scaling in K-Nearest Neighbors Based on Label Dispersion Minimization

2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/smc42975.2020.9282834 ◽

2020 ◽

Author(s):

Suryoday Basak ◽

Manfred Huber

Keyword(s):

Nearest Neighbors ◽

K Nearest Neighbors ◽

Feature Scaling

Download Full-text

Tropical Balls and Its Applications to K Nearest Neighbor over the Space of Phylogenetic Trees

Mathematics ◽

10.3390/math9070779 ◽

2021 ◽

Vol 9 (7) ◽

pp. 779

Author(s):

Ruriko Yoshida

Keyword(s):

Supervised Learning ◽

Phylogenetic Trees ◽

Nearest Neighbor ◽

Nearest Neighbors ◽

High Dimensional ◽

Learning Method ◽

Dimensional Vector ◽

K Nearest Neighbor ◽

K Nearest Neighbors

A tropical ball is a ball defined by the tropical metric over the tropical projective torus. In this paper we show several properties of tropical balls over the tropical projective torus and also over the space of phylogenetic trees with a given set of leaf labels. Then we discuss its application to the K nearest neighbors (KNN) algorithm, a supervised learning method used to classify a high-dimensional vector into given categories by looking at a ball centered at the vector, which contains K vectors in the space.

Download Full-text

Classification of soil quality using K-Nearest Neighbors methods

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/739/1/012011 ◽

2021 ◽

Vol 739 (1) ◽

pp. 012011

Author(s):

I D Ratih ◽

S M Retnaningsih ◽

V M Dewi

Keyword(s):

Soil Quality ◽

Nearest Neighbors ◽

K Nearest Neighbors

Download Full-text

Privacy-Enhancing k-Nearest Neighbors Search over Mobile Social Networks

Sensors ◽

10.3390/s21123994 ◽

2021 ◽

Vol 21 (12) ◽

pp. 3994

Author(s):

Yuxi Li ◽

Fucai Zhou ◽

Yue Ge ◽

Zifeng Xu

Keyword(s):

Social Networks ◽

Access Control ◽

Location Privacy ◽

Nearest Neighbors ◽

Search Pattern ◽

Broadcast Encryption ◽

Mobile Social Networks ◽

K Nearest Neighbors ◽

Fine Grained ◽

Server Architecture

Focusing on the diversified demands of location privacy in mobile social networks (MSNs), we propose a privacy-enhancing k-nearest neighbors search scheme over MSNs. First, we construct a dual-server architecture that incorporates location privacy and fine-grained access control. Under the above architecture, we design a lightweight location encryption algorithm to achieve a minimal cost to the user. We also propose a location re-encryption protocol and an encrypted location search protocol based on secure multi-party computation and homomorphic encryption mechanism, which achieve accurate and secure k-nearest friends retrieval. Moreover, to satisfy fine-grained access control requirements, we propose a dynamic friends management mechanism based on public-key broadcast encryption. It enables users to grant/revoke others’ search right without updating their friends’ keys, realizing constant-time authentication. Security analysis shows that the proposed scheme satisfies adaptive L-semantic security and revocation security under a random oracle model. In terms of performance, compared with the related works with single server architecture, the proposed scheme reduces the leakage of the location information, search pattern and the user–server communication cost. Our results show that a decentralized and end-to-end encrypted k-nearest neighbors search over MSNs is not only possible in theory, but also feasible in real-world MSNs collaboration deployment with resource-constrained mobile devices and highly iterative location update demands.

Download Full-text