LDA filter: A Latent Dirichlet Allocation preprocess method for Weka

This work presents an alternative method to represent documents based on LDA (Latent Dirichlet Allocation) and how it affects to classification algorithms, in comparison to common text representation. LDA assumes that each document deals with a set of predefined topics, which are distributions over an entire vocabulary. Our main objective is to use the probability of a document belonging to each topic to implement a new text representation model. This proposed technique is deployed as an extension of the Weka software as a new filter. To demonstrate its performance, the created filter is tested with different classifiers such as a Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Naive Bayes in different documental corpora (OHSUMED, Reuters-21578, 20Newsgroup, Yahoo! Answers, YELP Polarity, and TREC Genomics 2015). Then, it is compared with the Bag of Words (BoW) representation technique. Results suggest that the application of our proposed filter achieves similar accuracy as BoW but greatly improves classification processing times.

Download Full-text

Analisis Perbandingan Algoritma SVM, KNN, dan CNN untuk Klasifikasi Citra Cuaca

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2021824553 ◽

2021 ◽

Vol 8 (2) ◽

pp. 311

Author(s):

Mohammad Farid Naufal

Keyword(s):

Neural Network ◽

Machine Learning ◽

Computer Vision ◽

Support Vector Machine ◽

Convolutional Neural Network ◽

Cross Validation ◽

Nearest Neighbors ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbors

Cuaca merupakan faktor penting yang dipertimbangkan untuk berbagai pengambilan keputusan. Klasifikasi cuaca manual oleh manusia membutuhkan waktu yang lama dan inkonsistensi. Computer vision adalah cabang ilmu yang digunakan komputer untuk mengenali atau melakukan klasifikasi citra. Hal ini dapat membantu pengembangan self autonomous machine agar tidak bergantung pada koneksi internet dan dapat melakukan kalkulasi sendiri secara real time. Terdapat beberapa algoritma klasifikasi citra populer yaitu K-Nearest Neighbors (KNN), Support Vector Machine (SVM), dan Convolutional Neural Network (CNN). KNN dan SVM merupakan algoritma klasifikasi dari Machine Learning sedangkan CNN merupakan algoritma klasifikasi dari Deep Neural Network. Penelitian ini bertujuan untuk membandingkan performa dari tiga algoritma tersebut sehingga diketahui berapa gap performa diantara ketiganya. Arsitektur uji coba yang dilakukan adalah menggunakan 5 cross validation. Beberapa parameter digunakan untuk mengkonfigurasikan algoritma KNN, SVM, dan CNN. Dari hasil uji coba yang dilakukan CNN memiliki performa terbaik dengan akurasi 0.942, precision 0.943, recall 0.942, dan F1 Score 0.942. AbstractWeather is an important factor that is considered for various decision making. Manual weather classification by humans is time consuming and inconsistent. Computer vision is a branch of science that computers use to recognize or classify images. This can help develop self-autonomous machines so that they are not dependent on an internet connection and can perform their own calculations in real time. There are several popular image classification algorithms, namely K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Convolutional Neural Network (CNN). KNN and SVM are Machine Learning classification algorithms, while CNN is a Deep Neural Networks classification algorithm. This study aims to compare the performance of that three algorithms so that the performance gap between the three is known. The test architecture is using 5 cross validation. Several parameters are used to configure the KNN, SVM, and CNN algorithms. From the test results conducted by CNN, it has the best performance with 0.942 accuracy, 0.943 precision, 0.942 recall, and F1 Score 0.942.

Download Full-text

SENTIMENT ANALYSIS OF COVID-19 TWEETS

FUDMA Journal of Sciences ◽

10.33003/fjs-2021-0501-690 ◽

2021 ◽

Vol 5 (1) ◽

pp. 566-576

Author(s):

Azeez A. Nureni ◽

Victor E. Ogunlusi ◽

Emmanuel Junior Uloko

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Nearest Neighbors ◽

Support Vector ◽

Classification Algorithms ◽

Learning Approach ◽

K Nearest Neighbors ◽

Machine Learning Classification ◽

Global Pandemic ◽

Machine Learning Approach

Sentiment analysis involves techniques used in analyzing texts in order to identify the sentiment and emotion dominant in such texts and classify them accordingly. Techniques involved include but not limited to preprocessing of texts and the use a machine learning or lexical based approach in classifying these texts. In this research, attempt was made to adopt a machine learning approach to classify tweets on Covid-19 which is considered a global pandemic. To achieve this noble objective, a cross-dataset approach was applied to train four machine learning classification algorithms: Support Vector Machine (SVM), Random Forest (RF) and Naïve Bayes (NB), as well as K-Nearest Neighbors algorithm (KNN). The final result will not only assist us in knowing the best performing algorithm, it will also assist in creating awareness on Covid-19 with the final objective of destigmatizing the patients through the analysis of sentiments and emotions on Covid-19 and finally use the same result for containing the spread of the pandemic

Download Full-text

Mineração de Texto para a Análise do Perfil Emocional de Usuários de Jogo Empático

10.14210/cotb.v12.p370-377 ◽

2021 ◽

Author(s):

Leonardo Dias Martins ◽

Fabíola Pantoja Oliveira Araújo

Keyword(s):

Naive Bayes ◽

Nearest Neighbors ◽

Naïve Bayes ◽

Support Vector ◽

The Internet ◽

Classification Algorithms ◽

K Nearest Neighbors ◽

The One ◽

Radial Kernel

Daily, a large amount of data circulates on the Internet, producing a lot of information in the form of images, videos and texts. Then, it is necessary to analyze and extract these information automatically. Therefore, this work presents a case study that applies text mining to extract the emotional and sentimental profiles from the comments of the Last Day of June game users, where the results and the information extracted from the analysis of sentiments were presented. Three classification algorithms were used: Naive Bayes, Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) to predict the class of elements according to the emotions or feelings identified in the comments analysis. As a result, SVM with radial kernel was the one with the best accuracy, with 79%, followed by KNN with 3 closest neighbors, with 75%, and finally, Naive Bayes, with 62%.

Download Full-text

Using Data Mining Techniques to Analyze the Customers Reaction towards Social Media Advertisements

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1700.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 1139-1143

Keyword(s):

Data Mining ◽

Social Media ◽

Support Vector Machine ◽

Nearest Neighbors ◽

Support Vector ◽

Classification Algorithms ◽

Bayes Classifier ◽

K Nearest Neighbors ◽

Online Social Media ◽

Using Data

As social media is in boom, it is becoming very easier for customers to share their views and comments and express their feelings regarding any products which are present in online social media. . If these data can be analyzed efficiently different suggestions can be provided to the company regarding to improvise their products sale. It becomes easier for the company to understand the customer’s reaction after seeing the advertisements of the products posted on social media. This research focuses on analyzing the sentiments of customers based on the comments and reviews of products available in Facebook. Sentimental Analysis is performed to analyze the customer comments as positive, negative and neutral and later they are labeled as 0 or 1. After the labeling process, a comparative analysis is performed using different classification algorithms. The classification algorithms used are K Nearest Neighbors (KNN), Support Vector Machine (SVM) and Naïve Bayes Classifier. The classification algorithm with the highest accuracy is identified to predict the sales of online products

Download Full-text

On relationships between imbalance and overlapping of datasets

10.29007/h71z ◽

2020 ◽

Author(s):

Waleed Almutairi ◽

Ryszard Janicki

Keyword(s):

Support Vector Machines ◽

Performance Indicators ◽

Imbalanced Data ◽

Nearest Neighbors ◽

Support Vector ◽

Classification Algorithms ◽

Data Sets ◽

K Nearest Neighbors ◽

Imbalanced Data Sets ◽

Vector Machines

The paper deals with problems that imbalanced and overlapping datasets often en- counter. Performance indicators as accuracy, precision and recall of imbalanced data sets, both with and without overlapping, are discussed and compared with the same performance indicators of balanced datasets with overlapping. Three popular classification algorithms, namely, Decision Tree, KNN (k-Nearest Neighbors) and SVM (Support Vector Machines) classifiers are analyzed and compared.

Download Full-text

Persian Handwritten Number Recognition Using Adapted Framing Feature and Support Vector Machines

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026816500048 ◽

2016 ◽

Vol 15 (01) ◽

pp. 1650004 ◽

Cited By ~ 3

Author(s):

Hedieh Sajedi ◽

Mehran Bahador

Keyword(s):

Support Vector Machines ◽

Recognition Rate ◽

Nearest Neighbors ◽

Polynomial Kernel ◽

Support Vector ◽

K Nearest Neighbors ◽

New Approach ◽

Number Recognition ◽

Vector Machines

In this paper, a new approach for segmentation and recognition of Persian handwritten numbers is presented. This method utilizes the framing feature technique in combination with outer profile feature that we named this the adapted framing feature. In our proposed approach, segmentation of the numbers into digits has been carried out automatically. In the classification stage of the proposed method, Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN) are used. Experimentations are conducted on the IFHCDB database consisting 17,740 numeral images and HODA database consisting 102,352 numeral images. In isolated digit level on IFHCDB, the recognition rate of 99.27%, is achieved by using SVM with polynomial kernel. Furthermore, in isolated digit level on HODA, the recognition rate of 99.07% is achieved by using SVM with polynomial kernel. The experiments illustrate that applying our proposed method resulted higher accuracy compared to previous researches.

Download Full-text

Preprocessing Unbalanced Data using Support Vector Machine with Method K-Nearest Neighbors for Cerebral Infarction Classification

Journal of Physics Conference Series ◽

10.1088/1742-6596/1752/1/012037 ◽

2021 ◽

Vol 1752 (1) ◽

pp. 012037

Author(s):

A G M Sari ◽

A M Putri ◽

Z Rustam ◽

J Pandelaki

Keyword(s):

Support Vector Machine ◽

Cerebral Infarction ◽

Nearest Neighbors ◽

Support Vector ◽

Unbalanced Data ◽

K Nearest Neighbors

Download Full-text

Prediction of breast cancer using support vector machine and K-Nearest neighbors

2017 IEEE Region 10 Humanitarian Technology Conference (R10-HTC) ◽

10.1109/r10-htc.2017.8288944 ◽

2017 ◽

Cited By ~ 25

Author(s):

Md. Milon Islam ◽

Hasib Iqbal ◽

Md. Rezwanul Haque ◽

Md. Kamrul Hasan

Keyword(s):

Breast Cancer ◽

Support Vector Machine ◽

Nearest Neighbors ◽

Support Vector ◽

K Nearest Neighbors

Download Full-text

Intelligent System to Classify Peanuts Varieties Using K-Nearest Neighbors (K-NN) and Support Vector Machine (SVM)

Communications in Computer and Information Science - Advanced Informatics for Computing Research ◽

10.1007/978-981-15-0108-1_33 ◽

2019 ◽

pp. 359-368

Author(s):

V. G. Narendra ◽

K. Govardhan Hegde

Keyword(s):

Support Vector Machine ◽

Intelligent System ◽

Nearest Neighbors ◽

Support Vector ◽

K Nearest Neighbors

Download Full-text

Machine Learning Classification Algorithms to Predict aGvHD following Allo-HSCT: A Systematic Review

Methods of Information in Medicine ◽

10.1055/s-0040-1709150 ◽

2019 ◽

Vol 58 (06) ◽

pp. 205-212

Author(s):

Cirruse Salehnasab ◽

Abbas Hajifathali ◽

Farkhondeh Asadi ◽

Elham Roshandel ◽

Alireza Kazemi ◽

...

Keyword(s):

Machine Learning ◽

Systemic Review ◽

Predictor Variables ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbors ◽

Hematopoietic Stem ◽

Machine Learning Classification ◽

Graft Versus Host ◽

Meta Analyses

Abstract Background The acute graft-versus-host disease (aGvHD) is the most important cause of mortality in patients receiving allogeneic hematopoietic stem cell transplantation. Given that it occurs at the stage of severe tissue damage, its diagnosis is late. With the advancement of machine learning (ML), promising real-time models to predict aGvHD have emerged. Objective This article aims to synthesize the literature on ML classification algorithms for predicting aGvHD, highlighting algorithms and important predictor variables used. Methods A systemic review of ML classification algorithms used to predict aGvHD was performed using a search of the PubMed, Embase, Web of Science, Scopus, Springer, and IEEE Xplore databases undertaken up to April 2019 based on Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statements. The studies with a focus on using the ML classification algorithms in the process of predicting of aGvHD were considered. Results After applying the inclusion and exclusion criteria, 14 studies were selected for evaluation. The results of the current analysis showed that the algorithms used were Artificial Neural Network (79%), Support Vector Machine (50%), Naive Bayes (43%), k-Nearest Neighbors (29%), Regression (29%), and Decision Trees (14%), respectively. Also, many predictor variables have been used in these studies so that we have divided them into more abstract categories, including biomarkers, demographics, infections, clinical, genes, transplants, drugs, and other variables. Conclusion Each of these ML algorithms has a particular characteristic and different proposed predictors. Therefore, it seems these ML algorithms have a high potential for predicting aGvHD if the process of modeling is performed correctly.

Download Full-text