scholarly journals Klasifikasi Teks Bahasa Indonesia Pada Dokumen Pengaduan Sambat Online Menggunakan Metode K-Nearest Neighbors Dan Chi-square

2017 ◽  
Vol 3 (1) ◽  
pp. 25-32 ◽  
Author(s):  
Claudio Fresta Suharno ◽  
M. Ali Fauzi ◽  
Rizal Setya Perdana

K-Nearest Neighbors (K-NN) merupakan metode klasifikasi yang mudah untuk dipahami. Akan tetapi metode tersebut memiliki beberapa kekurangan, salah satunya dalam aspek komputasi perhitungan yang besar. Oleh karena itu, seleksi fitur digunakan sebagai salah satu cara untuk mengurangi besarnya komputasi adalah dengan mengurangi jumlah fitur yang tidak relevan dalam klasifikasi teks. Metode seleksi fitur yang digunakan adalah menggunakan metode Chi-Square untuk menghitung tingkat dependensi fitur. Proses yang dilakukan adalah mengumpulkan dokumen latih dan dokumen uji, melakukan tahap preprocessing dan seleksi fitur, kemudian dilakukan klasifikasi, dan pada tahap akhir dilakukan pengujian dan analisis terhadap hasil klasifikasi oleh sistem terkait nilai precision, recall, dan F-Measure. Dari penelitian ini dihasilkan bahwa seleksi fitur dapat meningkatkan nilai F-Measure dalam klasifikasi teks berbahasa Indonesia pada dokumen pengaduan SAMBAT Online dengan menggunakan metode klasifikasi K-Nearest Neighbors

10.29007/f4j4 ◽  
2018 ◽  
Author(s):  
Behnam Sabeti ◽  
Pedram Hosseini ◽  
Gholamreza Ghassem-Sani ◽  
Sَeyed Abolghasem Mirroshandel

Sentiment analysis refers to the use of natural language processing to identify and extract subjective information from textual resources. One approach for sentiment extraction is using a sentiment lexicon. A sentiment lexicon is a set of words associated with the sentiment orientation that they express. In this paper, we describe the process of generating a general purpose sentiment lexicon for Persian. A new graph-based method is introduced for seed selection and expansion based on an ontology. Sentiment lexicon generation is then mapped to a document classification problem. We used the K-nearest neighbors and nearest centroid methods for classification. These classifiers have been evaluated based on a set of hand labeled synsets. The final sentiment lexicon has been generated by the best classifier. The results show an acceptable performance in terms of accuracy and F-measure in the generated sentiment lexicon.


Author(s):  
*Fadare Oluwaseun Gbenga ◽  
Adetunmbi Adebayo Olusola ◽  
(Mrs) Oyinloye Oghenerukevwe Eloho ◽  
Mogaji Stephen Alaba

The multiplication of malware variations is probably the greatest problem in PC security and the protection of information in form of source code against unauthorized access is a central issue in computer security. In recent times, machine learning has been extensively researched for malware detection and ensemble technique has been established to be highly effective in terms of detection accuracy. This paper proposes a framework that combines combining the exploit of both Chi-square as the feature selection method and eight ensemble learning classifiers on five base learners- K-Nearest Neighbors, Naïve Bayes, Support Vector Machine, Decision Trees, and Logistic Regression. K-Nearest Neighbors returns the highest accuracy of 95.37%, 87.89% on chi-square, and without feature selection respectively. Extreme Gradient Boosting Classifier ensemble accuracy is the highest with 97.407%, 91.72% with Chi-square as feature selection, and ensemble methods without feature selection respectively. Extreme Gradient Boosting Classifier and Random Forest are leading in the seven evaluative measures of chi-square as a feature selection method and ensemble methods without feature selection respectively. The study results show that the tree-based ensemble model is compelling for malware classification.


2019 ◽  
Vol 886 ◽  
pp. 221-226 ◽  
Author(s):  
Kesinee Boonchuay

Sentiment classification gains a lot of attention nowadays. For a university, the knowledge obtained from classifying sentiments of student learning in courses is highly valuable, and can be used to help teachers improve their teaching skills. In this research, sentiment classification based on text embedding is applied to enhance the performance of sentiment classification for Thai teaching evaluation. Text embedding techniques considers both syntactic and semantic elements of sentences that can be used to improve the performance of the classification. This research uses two approaches to apply text embedding for classification. The first approach uses fastText classification. According to the results, fastText provides the best overall performance; its highest F-measure was at 0.8212. The second approach constructs text vectors for classification using traditional classifiers. This approach provides better performance over TF-IDF for k-nearest neighbors and naïve Bayes. For naïve Bayes, the second approach yields the best performance of geometric mean at 0.8961. The performance of TF-IDF is better suited to using decision tree than the second approach. The benefit of this research is that it presents the workflow of using text embedding for Thai teaching evaluation to improve the performance of sentiment classification. By using embedding techniques, similarity and analogy tasks of texts are established along with the classification.


Author(s):  
Bilal Ahmed ◽  
Wang Li

Recommendation systems are information filtering software that delivers suggestions about relevant stuff from a massive collection of data. Collaborative filtering approaches are the most popular in recommendations. The primary concern of any recommender system is to provide favorable recommendations based on the rating prediction of user preferences. In this article, we propose a novel discretization based framework for collaborative filtering to improve rating prediction. Our framework includes discretization-based preprocessing, chi-square based attribution selection, and K-Nearest Neighbors (KNN) based similarity computation. Rating prediction affords some basis for the judgment to decide whether recommendations are generated or not, subject to the ratio of performance of any recommendation system. Experiments on two datasets MovieLens and BookCrossing, demonstrate the effectiveness of our method.


Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 779
Author(s):  
Ruriko Yoshida

A tropical ball is a ball defined by the tropical metric over the tropical projective torus. In this paper we show several properties of tropical balls over the tropical projective torus and also over the space of phylogenetic trees with a given set of leaf labels. Then we discuss its application to the K nearest neighbors (KNN) algorithm, a supervised learning method used to classify a high-dimensional vector into given categories by looking at a ball centered at the vector, which contains K vectors in the space.


2021 ◽  
Vol 739 (1) ◽  
pp. 012011
Author(s):  
I D Ratih ◽  
S M Retnaningsih ◽  
V M Dewi

Sensors ◽  
2021 ◽  
Vol 21 (12) ◽  
pp. 3994
Author(s):  
Yuxi Li ◽  
Fucai Zhou ◽  
Yue Ge ◽  
Zifeng Xu

Focusing on the diversified demands of location privacy in mobile social networks (MSNs), we propose a privacy-enhancing k-nearest neighbors search scheme over MSNs. First, we construct a dual-server architecture that incorporates location privacy and fine-grained access control. Under the above architecture, we design a lightweight location encryption algorithm to achieve a minimal cost to the user. We also propose a location re-encryption protocol and an encrypted location search protocol based on secure multi-party computation and homomorphic encryption mechanism, which achieve accurate and secure k-nearest friends retrieval. Moreover, to satisfy fine-grained access control requirements, we propose a dynamic friends management mechanism based on public-key broadcast encryption. It enables users to grant/revoke others’ search right without updating their friends’ keys, realizing constant-time authentication. Security analysis shows that the proposed scheme satisfies adaptive L-semantic security and revocation security under a random oracle model. In terms of performance, compared with the related works with single server architecture, the proposed scheme reduces the leakage of the location information, search pattern and the user–server communication cost. Our results show that a decentralized and end-to-end encrypted k-nearest neighbors search over MSNs is not only possible in theory, but also feasible in real-world MSNs collaboration deployment with resource-constrained mobile devices and highly iterative location update demands.


Sign in / Sign up

Export Citation Format

Share Document