scholarly journals Deteksi Penyakit Dengue Hemorrhagic Fever dengan Pendekatan One Class Classification

Author(s):  
Zida Ziyan Azkiya ◽  
Fatma Indriani ◽  
Heru Kartika Chandra

Abstrak— Pada kasus deteksi penderita penyakit demam berdarah (Dengue Hemorrhagic Fever- DHF), data training yang tersedia umumnya hanya data pasien penderita positif. Sedangkan data orang normal (data negatif) tidak tersedia secara khusus. Pada makalah ini dipaparkan pembangunan model klasifikasi untuk deteksi DHF dengan pendekatan One Class Classification (OCC). Data yang digunakan pada penelitian ini adalah hasil uji darah dari laboratorium dari pasien penderita penyakit demam berdarah. Metode yang diteliti adalah One-class Support Vector Machine dan K-Means. Hasil yang diperoleh pada penelitian ini adalah untuk metode SVM memiliki nilai precision = 1,0, recall = 0,993, f-1 score = 0,997, dan tingkat akurasi sebesar 99,7%  sedangkan dengan metode K-Means diperoleh nilai precision = 0,901, recall = 0,973, f-1 score = 0,936, dan tingkat akurasi sebesar 93,3%. Hal ini  menunjukkan bahwa metode SVM sedikit lebih unggul dibandingkan dengan K-Means untuk kasus ini. Kata Kunci— demam berdarah, Dengue Hemorrhagic Fever, K-Means, One Class Classification, OSVMAbstract— Two class classification problem maps input into two target classes. In certain cases, training data is available only in the form of a single class, as in the case of Dengue Hemorrhagic Fever (DHF) patients, where only data of positive patients is available. In this paper, we report our experiment in building a classification model for detecting DHF infection using One Class Classification (OCC) approach. Data from this study is sourced from laboratory tests of patients with dengue fever. The OCC methods compared are One-Class Support Vector Machine and One-Class K-Means. The result shows SVM method obtained precision value = 1.0, recall = 0.993, f-1 score = 0.997, and accuracy of 99.7% while the K-Means method obtained precision value = 0.901, recall = 0.973, f- 1 score = 0.936, and accuracy of 93.3%. This indicates that the SVM method is slightly superior to K-Means for One-Class Classification of DHF patients. Keywords— Dengue Hemorrhagic Fever, K-Means, One Class Classification, OSVM

Author(s):  
Noviah Dwi Putranti ◽  
Edi Winarko

AbstrakAnalisis sentimen dalam penelitian ini merupakan proses klasifikasi dokumen tekstual ke dalam dua kelas, yaitu kelas sentimen positif dan negatif.  Data opini diperoleh dari jejaring sosial Twitter berdasarkan query dalam Bahasa Indonesia. Penelitian ini bertujuan untuk menentukan sentimen publik terhadap objek tertentu yang disampaikan di Twitter dalam bahasa Indonesia, sehingga membantu usaha untuk melakukan riset pasar atas opini publik. Data yang sudah terkumpul dilakukan proses preprocessing dan POS tagger untuk menghasilkan model klasifikasi melalui proses pelatihan. Teknik pengumpulan kata yang memiliki sentimen dilakukan dengan pendekatan berdasarkan kamus, yang dihasilkan dalam penelitian ini berjumlah 18.069 kata. Algoritma Maximum Entropy digunakan untuk POS tagger dan algoritma yang digunakan untuk membangun model klasifikasi atas data pelatihan dalam penelitian ini adalah Support Vector Machine. Fitur yang digunakan adalah unigram dengan fitur pembobotan TFIDF. Implementasi klasifikasi diperoleh akurasi 86,81 %  pada pengujian 7 fold cross validation untuk tipe kernel Sigmoid. Pelabelan kelas secara manual dengan POS tagger menghasilkan akurasi 81,67%.  Kata kunci—analisis sentimen, klasifikasi, maximum entropy POS tagger, support vector machine, twitter.  AbstractSentiment analysis in this research classified textual documents into two classes, positive and negative sentiment. Opinion data obtained a query from social networking site Twitter of Indonesian tweet. This research uses  Indonesian tweets. This study aims to determine public sentiment toward a particular object presented in Twitter businesses conduct market. Collected data then prepocessed to help POS tagged to generate classification models through the training process. Sentiment word collection has done the dictionary based approach, which is generated in this study consists 18.069 words. Maximum Entropy algorithm is used for POS tagger and the algorithms used to build the classification model on the training data is Support Vector Machine. The unigram features used are the features of TFIDF weighting.Classification implementation 86,81 % accuration at examination of 7 validation cross fold for the type of kernel of Sigmoid. Class labeling manually with POS tagger yield accuration 81,67 %. Keywords—sentiment analysis, classification, maximum entropy POS tagger, support vector machine, twitter.


2017 ◽  
Vol 9 (4) ◽  
pp. 416 ◽  
Author(s):  
Nelly Indriani Widiastuti ◽  
Ednawati Rainarli ◽  
Kania Evita Dewi

Classification is the process of grouping objects that have the same features or characteristics into several classes. The automatic documents classification use words frequency that appears on training data as features. The large number of documents cause the number of words that appears as a feature will increase. Therefore, summaries are chosen to reduce the number of words that used in classification. The classification uses multiclass Support Vector Machine (SVM) method. SVM was considered to have a good reputation in the classification. This research tests the effect of summary as selection features into documents classification. The summaries reduce text into 50%. A result obtained that the summaries did not affect value accuracy of classification of documents that use SVM. But, summaries improve the accuracy of Simple Logistic Classifier. The classification testing shows that the accuracy of Naïve Bayes Multinomial (NBM) better than SVM


Author(s):  
Boyang Li ◽  
◽  
Jinglu Hu ◽  
Kotaro Hirasawa

We propose an improved support vector machine (SVM) classifier by introducing a new offset, for solving the real-world unbalanced classification problem. The new offset is calculated based on the unbalanced support vectors resulting from the unbalanced training data. We developed a weighted harmonic mean (WHM) algorithm to further reduce the effects of noise on offset calculation. We apply the proposed approach to classify real-world data. Results of simulation demonstrate the effectiveness of our proposed approach.


Author(s):  
Wenjuan An ◽  
Mangui Liang ◽  
He Liu

Outlier detection, as a type of one-class classification problem, is one of important research topics in data mining and machine learning. Its task is to identify sample points markedly deviating from the normal data. A reliable outlier detector needs to build a model which encloses the normal data tightly. In this paper, an improved one-class SVM (OC-SVM) classifier is proposed for outlier detection problems. We name this method OC-SVM with minimum within-class scatter (OC-WCSSVM), which exploits the inner-class structure of the training set via minimizing the within-class scatter of the training data. This can construct a more accurate hyperplane for outlier detection, such that the margin between the training data and the origin in a higher dimensional space is as large as possible, while at the same time the decision boundary around the normal data is as tight as possible. Experimental results on a synthetic dataset and 10 real-world datasets demonstrate that our proposed OC-WCSSVM algorithm is effective and superior to the compared algorithms.


Author(s):  
Jie Xu ◽  
Xianglong Liu ◽  
Zhouyuan Huo ◽  
Cheng Deng ◽  
Feiping Nie ◽  
...  

Support Vector Machine (SVM) is originally proposed as a binary classification model, and it has already achieved great success in different applications. In reality, it is more often to solve a problem which has more than two classes. So, it is natural to extend SVM to a multi-class classifier. There have been many works proposed to construct a multi-class classifier based on binary SVM, such as one versus all strategy, one versus one strategy and Weston's multi-class SVM. One versus all strategy and one versus one strategy split the multi-class problem to multiple binary classification subproblems, and we need to train multiple binary classifiers. Weston's multi-class SVM is formed by ensuring risk constraints and imposing a specific regularization, like Frobenius norm. It is not derived by maximizing the margin between hyperplane and training data which is the motivation in SVM. In this paper, we propose a multi-class SVM model from the perspective of maximizing margin between training points and hyperplane, and analyze the relation between our model and other related methods. In the experiment, it shows that our model can get better or compared results when comparing with other related methods.


Phishing is one among the luring procedures used by phishing attackers in the means to abuse the personal details of clients. Phishing is earnest cyber security issue that includes facsimileing legitimate website to apostatize online users so as to purloin their personal information. Phishing can be viewed as special type of classification problem where the classifier is built from substantial number of website's features. It is required to identify the best features for improving classifiers accuracy. This study, highlights on the important features of websites that are used to classify the phishing website and form the legitimate ones by presenting a scheme Decision Tree Least Square Twin Support Vector Machine (DT-LST-SVM) for the classification of phishing website. UCI public domain benchmark website phishing dataset was used to conduct the experiment on the proposed classifier with different kernel function and calculate the classification accuracy of the classifiers. Computational results show that DT-LST-SVM scheme yield the better classification accuracy with phishing websites classification dataset


2021 ◽  
Vol 23 (08) ◽  
pp. 616-624
Author(s):  
Gaddam Akhil Reddy ◽  
◽  
Dr. B. Indira Reddy ◽  

The necessity for spam detection is particularly pertinent nowadays, as there is no quality control over social media, and users have the ability to distribute unverified material, therefore facilitating fraud and deceit. Spam detection can aid in the prevention of such fraud. This scenario has developed mostly as a result of the distribution of disparate, unconfirmed information via shopping websites, emails, and text messages (SMS). There are several ways of categorising and identifying spam. Each of them has certain advantages and disadvantages. The machine learning model “Support Vector Machine” is employed to detect spam in this case. SVM is a basic concept: the method proposes a line or hyperplane to classify the data. The model can categorise any type of text into a given category after being fed a set of labelled training data for each category.


CCIT Journal ◽  
2017 ◽  
Vol 10 (2) ◽  
pp. 197-206
Author(s):  
Atika Rahmawati ◽  
Aris Marjuni ◽  
Junta Zeniarja

Pilkada Serentak is a very important event for the future viability regions and countries. Through this election people can cast their vote and elect representatives of the people according to their choice. Public respond can be expressed through twitter social media. Using twitter social media sentiment analysis can then be made about the public response to the implementation of the election simultaneously. The classification process can be detected via text tweeted by twitter users. In this study, the classification of responses detected by text because it is easily obtained and applied. This study determined the classification of the response to the Indonesian language text and increase accuracy by using SVM.Tweet classification method used by the categorical approach is divided into two classes tweet basic level: positive and negative. Data collected from Indonesian twitter tweet as much as 3000. The labeling is not done manually but using clustering method that divides the 3000 data into two groups. Cluster 1 as a group of positive tweets and Cluster 2 as a negative group tweet.2700 for training data and 300 for the test data. The stage of pre-processing the data includetokenization, casenormalization, stop word detection, and stemming. The process of classification using Support Vector Machine (SVM). Accuracy of SVM showed the highest yield that is 91% compared to the k-means clustering with the results of 82%.


2017 ◽  
Vol 8 (1) ◽  
pp. 18-30 ◽  
Author(s):  
Monali Y. Khachane

Computer-Aided Detection/Diagnosis (CAD) through artificial Intelligence is emerging ara in Medical Image processing and health care to make the expert systems more and more intelligent. The aim of this paper is to analyze the performance of different feature extraction techniques for medical image classification problem. Efforts are made to classify Brain MRI and Knee MRI medical images. Gray Level Co-occurrence Matrix (GLCM) based texture features, DWT and DCT transform features and Invariant Moments are used to classify the data. Experimental results shown that the proposed system produced better results however the training data is less than testing data. Support Vector Machine classifier with linear kernel produced higher accuracy 100% when used with texture features.


Sign in / Sign up

Export Citation Format

Share Document