scholarly journals PENILAIAN ESAI JAWABAN BAHASA INDONESIA MENGGUNAKAN METODE SVM - LSA DENGAN FITUR GENERIK

2012 ◽  
Vol 5 (1) ◽  
pp. 33 ◽  
Author(s):  
Rama Adhitia ◽  
Ayu Purwarianti

Paper ini mengkaji sebuah solusi untuk permasalahan penilaian jawaban esai secara otomatis dengan menggabungkan support vector machine (SVM) sebagai teknik klasifikasi teks otomatis dengan LSA sebagai usaha untuk menangani sinonim dan polisemi antar index term. Berbeda dengan sistem penilaian esai yang biasa yakni fitur yang digunakan berupa index term, fitur yang digunakan proses penilaian jawaban esai adalah berupa fitur generic yang memungkinkan pengujian model penilaian esai untuk berbagai pertanyaan yang berbeda. Dengan menggunakan fitur generic ini, seseorang tidak perlu melakukan pelatihan ulang jika orang tersebut akan melakukan penilaian esai jawaban untuk beberapa pertanyaan. Fitur yang dimaksud meliputi persentase kemunculan kata kunci, similarity jawaban esai dengan jawaban referensi, persentase kemunculan gagasan kunci, persentase kemunculan gagasan salah, serta persentase kemunculan sinonim kata kunci. Hasil pengujian juga memperlihatkan bahwa metode yang diusulkan mempunyai tingkat akurasi penilaian yang lebih tinggi jika dibandingkan dengan metode lain seperti SVM atau LSA menggunakan index term sebagai fitur pembelajaran mesin. This paper examines a solution for problems of assessment an essay answers automatically by combining support vector machine (SVM) as automatic text classification techniques and LSA as an attempt to deal with synonyms and the polysemy between index terms. Unlike the usual essay scoring system that used index terms features, the feature used for the essay answers assessment process is a generic feature which allows testing of valuation models essays for a variety of different questions. By using these generic features, one does not need to re training if the person will conduct an assessment essay answers to some questions. The features include percentage of keywords, similarity essay answers with the answer reference, percentage of key ideas, percentage of wrong answer, and percentage of keyword synonyms. The test results also show that the proposed method has a higher valuation accuracy rate compared to other methods such as SVM or LSA, use term index as features in machine learning.

With the explosion of internet information, people feel helpless and difficult to choose in the face of massive information. However, the traditional method to organize a huge set of original documents is not only time-consuming and laborious, but also not ideal. The automatic text classification can liberate users from the tedious document processing work, recognize and distinguish different document contents more conveniently, make a large number of complicated documents institutionalized and systematized, and greatly improve the utilization rate of information. This paper adopts termed-based model to extract the features in web semantics to represent document. The extracted web semantics features are used to learn a reduced support vector machine. The experimental results show that the proposed method can correctly identify most of the writing styles.


2020 ◽  
Vol 7 (1) ◽  
pp. 53
Author(s):  
Derisma Derisma ◽  
Fajri Febrian

Abstrak: Kanker payudara merupakan jenis kanker yang sering ditemukan oleh kebanyakan wanita. Di Indonesia Kanker payudara menempati urutan pertama pada pasien rawat inap di seluruh rumah sakit. Tujuan dari penelitian ini adalah melakukan diagnosis penyakit kanker payudara berbasis komputasi yang dapat menghasilkan bagaimana kondisi kanker seseorang berdasarkan akurasi algoritma. Penelitian ini menggunakan pemrograman orange python dan dataset Wisconsin Breast Cancer untuk pemodelan klasifikasi kanker payudara. Metode data mining yang diterapkan yaitu Neural Network, Support Vector Machine, dan Naive Bayes. Dalam penelitian ini didapat algoritma klasifikasi terbaik yaitu algoritma Kernel SVM dengan tingkat akurasi sebesar  98.9 % dan algoritma terendah yaitu Naive Bayes senilai 96.1 %.   Kata kunci: kanker payudara, neural network, support vector machine, naive bayes   Abstract: Breast cancer is a type of cancer that mostly found in many women. In Indonesia, breast cancer ranks first in hospitalized patients at every hospital. This study aimed to conduct a computation-based diagnose of breast cancer disease that could produce the state of cancer of an individual based on the accuracy of algorithm. This study used python orange programming and Wisconsin Breast Cancer dataset for a modeling and application of breast cancer classification. The data mining methods that were applied in this study were Neural Network, Support Vector Machine, dan Naive Bayes. In this study, Kernel SVM’s algorithm was the best classification algorithm of breast cancer disease with 98.9 % accuracy rate and Naïve Beyes was the lowest with 96.1 % of accuracy rate.   Keywords: breast cancer, neural network, support vector machine, naive bayes


Author(s):  
Nur Azizul Haqimi ◽  
Nur Rokhman ◽  
Sigit Priyanta

Instagram (IG) is a web-based and mobile social media application where users can share photos or videos with available features. Upload photos or videos with captions that contain an explanation of the photo or video that can reap spam comments. Comments on spam containing comments that are not relevant to the caption and photos. The problem that arises when identifying spam is non-spam comments are more dominant than spam comments so that it leads to the problem of the imbalanced dataset. A balanced dataset can influence the performance of a classification method. This is the focus of research related to the implementation of the CNB method in dealing with imbalance datasets for the detection of Instagram spam comments. The study used TF-IDF weighting with Support Vector Machine (SVM) as a comparison classification. Based on the test results with 2500 training data and 100 test data on the imbalanced dataset (25% spam and 75% non-spam), the CNB accuracy was 92%, precision 86% and f-measure 93%. Whereas SVM produces 87% accuracy, 79% precision, 88% f-measure. In conclusion, the CNB method is more suitable for detecting spam comments in cases of imbalanced datasets.


2020 ◽  
Vol 9 (3) ◽  
pp. 376-390
Author(s):  
Nur Fitriyah ◽  
Budi Warsito ◽  
Di Asih I Maruddani

Appearance of PT Aplikasi Karya Anak Bangsa or as known as Gojek since 2015 give a convenience facility to people in Indonesia especially in daily activities. Sentiment analysis on Twitter social media can be the option to see how Gojek users respond to the services that have been provided. The response was classified into positive sentiment and negative sentiment using Support Vector Machine method with model evaluation 10-fold cross validation. The kernel used is the linear kernel and the RBF kernel. Data labeling can be done with manually and sentiment scoring. The test results showed that the RBF kernel gets overall accuracy and the highest kappa accuracy on manual data labeling and sentiment scoring. On manual data labeling, the overall accuracy is 79.19% and kappa accuracy is 16.52%. While the labeling of data with sentiment scoring obtained overall accuracy of 79.19% and kappa accuracy of 21%. The greater overall accuracy value and kappa accuracy obtained, the better performance of the classification model. Keywords: Gojek, Twitter, Support Vector Machine, overall accuracy, kappa accuracy


10.2196/29120 ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. e29120
Author(s):  
Bruna Stella Zanotto ◽  
Ana Paula Beck da Silva Etges ◽  
Avner dal Bosco ◽  
Eduardo Gabriel Cortes ◽  
Renata Ruschel ◽  
...  

Background With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management. Objective This study aims to compare the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs. Methods Our study addressed the computational problems of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: tier 1 (achieved health care status), tier 2 (recovery process), care related (clinical management and risk scores), and baseline characteristics. The analyzed data set was retrospectively extracted from the EMRs of patients with stroke from a private Brazilian hospital between 2018 and 2019. A total of 44,206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning methods, including state-of-the-art neural and nonneural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with subject-wise sampling. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1 score), supported by statistical significance tests. A feature importance analysis was conducted to provide insights into the results. Results The top-performing models were support vector machines trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR textual representations. The support vector machine models produced statistically superior results in 71% (17/24) of tasks, with an F1 score >80% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally or ambulate and communicate), health care status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional nonneural methods, given the characteristics of the data set. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future. Conclusions Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to clinical conditions of stroke victims, and thus ultimately assess the possibility of proactively using these machine learning techniques in real-world situations.


2013 ◽  
Vol 2013 ◽  
pp. 1-17 ◽  
Author(s):  
Hiroshi Ogura ◽  
Hiromi Amano ◽  
Masato Kondo

We introduce a new model for describing word frequency distributions in documents for automatic text classification tasks. In the model, the gamma-Poisson probability distribution is used to achieve better text modeling. The framework of the modeling and its application to text categorization are demonstrated with practical techniques for parameter estimation and vector normalization. To investigate the efficiency of our model, text categorization experiments were performed on 20 Newsgroups, Reuters-21578, Industry Sector, and TechTC-100 datasets. The results show that the model allows performance comparable to that of the support vector machine and clearly exceeding that of the multinomial model and the Dirichlet-multinomial model. The time complexity of the proposed classifier and its advantage in practical applications are also discussed.


2018 ◽  
Vol 9 (2) ◽  
pp. 118-121
Author(s):  
Felix Indra Kurniadi

In recent year, a lot of researches try to overcome problem in recognition and classify white blood cells to help hematologists diagnose white blood cells disease such blood cancer, leukemia and AIDS. This paper compares several methods Local Binary Pattern such as Local Binary Pattern Uniform, Local Binary Pattern Rotation Invariant and Local Binary Pattern Rotation Invariant Uniform to classify five types of white blood cells using two classifier: Support Vector Machine and K-Nearest Neighbour. Index Terms—LBP, LBP-U, LBP-RI, LBP-RIU, white blood cells


2020 ◽  
Vol 8 (4) ◽  
pp. T753-T762
Author(s):  
Zhenghui Xiao ◽  
Wei Jiang ◽  
Bin Sun ◽  
Yunjiang Cao ◽  
Lei Jiang ◽  
...  

Coal texture is important for predicting coal seam permeability and selecting favorable blocks for coalbed methane (CBM) exploration. Drilled cores and mining seam observations are the most direct and effective methods of identifying coal texture; however, they are expensive and cannot be used in unexplored coal seams. Geophysical logging has become a common method of coal texture identification, particularly during the CBM mining stage. However, quantitative methods for identifying coal texture based on geophysical logging data require further study. The support vector machine (SVM), a machine-learning method, has received great interest due to its remarkable generalization performance, and it has been used to quantitatively identify hard and soft coal using geophysical logging data. In this study, four well-logging curves, the acoustic time difference (AC), caliper log (CAL), density (DEN), and natural gamma (GR), were used for coal texture analysis. Hard coal (undeformed and cataclastic coal) exhibited higher DEN, GR, lower CAL, and lower AC than soft coal. The accuracy rate of coal texture identification was highest (97%) when the linear kernel function was applied, and the maximum training accuracy rate was achieved when the penalty parameter value of the linear kernel increased to 1. The results of verification with a newly cored CBM exploration well indicated that the SVM-based identification method was effective for coal texture analysis. With the increasing availability of data, this method can be used to distinguish hard and soft coal in a coal-bearing basin under numerous sample learning conditions.


2013 ◽  
Vol 785-786 ◽  
pp. 1437-1440 ◽  
Author(s):  
Ke Li ◽  
Chong Lun Li ◽  
Wei Zhang

To recognize small diver target from the dim special diver sonar images accurately, the Support Vector Machine method is used as classifier. According to the main characteristics of diver, five feature parameters, including Average-scale, Velocity, Shape, Direction, Included angle, are chosen as the input of characteristics vectors to train the net. And then the testing images are classified and identified. The experimental results show that accuracy rate of recognition reaches 94.5% for as many as 200 testing images. The experiment indicates that small object recognition from complex sonar images based on the right selection of feature parameters is of good performance by using the SVM method as well as good engineering foreground.


Sign in / Sign up

Export Citation Format

Share Document