A Preliminary Performance Evaluation of K-means, KNN and EM Unsupervised Machine Learning Methods for Network Flow Classification

Alhamza Alalousi; Rozmie Razif; Mosleh AbuAlhaj; Mohammed Anbar; Shahrul Nizam

doi:10.11591/ijece.v6i2.pp778-784

A Preliminary Performance Evaluation of K-means, KNN and EM Unsupervised Machine Learning Methods for Network Flow Classification

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v6i2.8909 ◽

2016 ◽

Vol 6 (2) ◽

pp. 778 ◽

Cited By ~ 1

Author(s):

Alhamza Alalousi ◽

Rozmie Razif ◽

Mosleh AbuAlhaj ◽

Mohammed Anbar ◽

Shahrul Nizam

Keyword(s):

Machine Learning ◽

Expectation Maximization ◽

Classification Accuracy ◽

Network Flow ◽

Processing Time ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Popular Method ◽

Flow Classification ◽

Machine Learning Methods

Unsupervised leaning is a popular method for classify unlabeled dataset i.e. without prior knowledge about data class. Many of unsupervised learning are used to inspect and classify network flow. This paper presents in-deep study for three unsupervised classifiers, namely: K-means, K-nearest neighbor and Expectation maximization. The methodologies and how it’s employed to classify network flow are elaborated in details. The three classifiers are evaluated using three significant metrics, which are classification accuracy, classification speed and memory consuming. The K-nearest neighbor introduce better results for accuracy and memory; while K-means announce lowest processing time.

Download Full-text

The Classification of Skateboarding Tricks : A Transfer Learning and Machine Learning Approach

Mekatronika ◽

10.15282/mekatronika.v2i2.6683 ◽

2020 ◽

Vol 2 (2) ◽

pp. 1-12

Author(s):

Muhammad Nur Aiman Shapiee ◽

Muhammad Ar Rahim Ibrahim ◽

Muhammad Amirul Abdullah ◽

Rabiu Muazu Musa ◽

Noor Azuan Abu Osman ◽

...

Keyword(s):

Machine Learning ◽

Classification Accuracy ◽

Nearest Neighbor ◽

Olympic Games ◽

Learning Approach ◽

K Nearest Neighbor ◽

Test Dataset ◽

Machine Learning Approach ◽

Competitive Games

The skateboarding scene has arrived at new statures, particularly with its first appearance at the now delayed Tokyo Summer Olympic Games. Hence, attributable to the size of the game in such competitive games, progressed creative appraisal approaches have progressively increased due consideration by pertinent partners, particularly with the enthusiasm of a more goal-based assessment. This study purposes for classifying skateboarding tricks, specifically Frontside 180, Kickflip, Ollie, Nollie Front Shove-it, and Pop Shove-it over the integration of image processing, Trasnfer Learning (TL) to feature extraction enhanced with tradisional Machine Learning (ML) classifier. A male skateboarder performed five tricks every sort of trick consistently and the YI Action camera captured the movement by a range of 1.26 m. Then, the image dataset were features built and extricated by means of three TL models, and afterward in this manner arranged to utilize by k-Nearest Neighbor (k-NN) classifier. The perception via the initial experiments showed, the MobileNet, NASNetMobile, and NASNetLarge coupled with optimized k-NN classifiers attain a classification accuracy (CA) of 95%, 92% and 90%, respectively on the test dataset. Besides, the result evident from the robustness evaluation showed the MobileNet+k-NN pipeline is more robust as it could provide a decent average CA than other pipelines. It would be demonstrated that the suggested study could characterize the skateboard tricks sufficiently and could, over the long haul, uphold judges decided for giving progressively objective-based decision.

Download Full-text

Studi Komparasi Metode Machine Learning untuk Klasifikasi Citra Huruf Vokal Hiragana

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i3.3083 ◽

2021 ◽

Vol 5 (3) ◽

pp. 905

Author(s):

Muhammad Afrizal Amrustian ◽

Vika Febri Muliati ◽

Elsa Elvira Awal

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Image Classification ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Learning Methods ◽

Machine Learning Methods ◽

The Comparative Study

Japanese is one of the most difficult languages to understand and read. Japanese writing that does not use the alphabet is the reason for the difficulty of the Japanese language to read. There are three types of Japanese, namely kanji, katakana, and hiragana. Hiragana letters are the most commonly used type of writing. In addition, hiragana has a cursive nature, so each person's writing will be different. Machine learning methods can be used to read Japanese letters by recognizing the image of the letters. The Japanese letters that are used in this study are hiragana vowels. This study focuses on conducting a comparative study of machine learning methods for the image classification of Japanese letters. The machine learning methods that were successfully compared are Naïve Bayes, Support Vector Machine, Decision Tree, Random Forest, and K-Nearest Neighbor. The results of the comparative study show that the K-Nearest Neighbor method is the best method for image classification of hiragana vowels. K-Nearest Neighbor gets an accuracy of 89.4% with a low error rate.

Download Full-text

Predicting Fine Particulate Matter (PM2.5) in the Greater London Area: An Ensemble Approach using Machine Learning Methods

Remote Sensing ◽

10.3390/rs12060914 ◽

2020 ◽

Vol 12 (6) ◽

pp. 914 ◽

Cited By ~ 4

Author(s):

Mahdieh Danesh Yazdi ◽

Zheng Kuang ◽

Konstantina Dimakopoulou ◽

Benjamin Barratt ◽

Esra Suel ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Nearest Neighbor ◽

Meteorological Data ◽

Fine Particulate Matter ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Learning Methods ◽

Machine Learning Methods ◽

Technological Advances

Estimating air pollution exposure has long been a challenge for environmental health researchers. Technological advances and novel machine learning methods have allowed us to increase the geographic range and accuracy of exposure models, making them a valuable tool in conducting health studies and identifying hotspots of pollution. Here, we have created a prediction model for daily PM2.5 levels in the Greater London area from 1st January 2005 to 31st December 2013 using an ensemble machine learning approach incorporating satellite aerosol optical depth (AOD), land use, and meteorological data. The predictions were made on a 1 km × 1 km scale over 3960 grid cells. The ensemble included predictions from three different machine learners: a random forest (RF), a gradient boosting machine (GBM), and a k-nearest neighbor (KNN) approach. Our ensemble model performed very well, with a ten-fold cross-validated R2 of 0.828. Of the three machine learners, the random forest outperformed the GBM and KNN. Our model was particularly adept at predicting day-to-day changes in PM2.5 levels with an out-of-sample temporal R2 of 0.882. However, its ability to predict spatial variability was weaker, with a R2 of 0.396. We believe this to be due to the smaller spatial variation in pollutant levels in this area.

Download Full-text

Classification of Micro-Damage in Piezoelectric Ceramics Using Machine Learning of Ultrasound Signals

Sensors ◽

10.3390/s19194216 ◽

2019 ◽

Vol 19 (19) ◽

pp. 4216 ◽

Cited By ~ 5

Author(s):

Gaurav Tripathi ◽

Habib Anowarul ◽

Krishna Agarwal ◽

Dilip Prasad

Keyword(s):

Machine Learning ◽

Classification Accuracy ◽

Nearest Neighbor ◽

Time Series Data ◽

Series Data ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Power Spectral ◽

Slope Change ◽

Domain Information

Ultrasound based structural health monitoring of piezoelectric material is challenging if a damage changes at a microscale over time. Classifying geometrically similar damages with a difference in diameter as small as 100 μ m is difficult using conventional sensing and signal analysis approaches. Here, we use an unconventional ultrasound sensing approach that collects information of the entire bulk of the material and investigate the applicability of machine learning approaches for classifying such similar defects. Our results show that appropriate feature design combined with simple k-nearest neighbor classifier can provide up to 98% classification accuracy even though conventional features for time-series data and a variety of classifiers cannot achieve close to 70% accuracy. The newly proposed hybrid feature, which combines frequency domain information in the form of power spectral density and time domain information in the form of sign of slope change, is a suitable feature for achieving the best classification accuracy on this challenging problem.

Download Full-text

Evaluating Machine Learning Methods for Predicting Diabetes among Female Patients in Bangladesh

Information ◽

10.3390/info11080374 ◽

2020 ◽

Vol 11 (8) ◽

pp. 374

Author(s):

Badiuzzaman Pranto ◽

Sk. Maliha Mehnaz ◽

Esha Bintee Mahid ◽

Imran Mahmud Sadman ◽

Ahsanur Rahman ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Techniques ◽

Bayes Classifier ◽

K Nearest Neighbor ◽

Machine Learning Methods ◽

Learning Techniques

Machine Learning has a significant impact on different aspects of science and technology including that of medical researches and life sciences. Diabetes Mellitus, more commonly known as diabetes, is a chronic disease that involves abnormally high levels of glucose sugar in blood cells and the usage of insulin in the human body. This article has focused on analyzing diabetes patients as well as detection of diabetes using different Machine Learning techniques to build up a model with a few dependencies based on the PIMA dataset. The model has been tested on an unseen portion of PIMA and also on the dataset collected from Kurmitola General Hospital, Dhaka, Bangladesh. The research is conducted to demonstrate the performance of several classifiers trained on a particular country’s diabetes dataset and tested on patients from a different country. We have evaluated decision tree, K-nearest neighbor, random forest, and Naïve Bayes in this research and the results show that both random forest and Naïve Bayes classifier performed well on both datasets.

Download Full-text

Metabolic Syndrome Prediction Models Using Machine Learning and Sasang Constitution Type

Evidence-based Complementary and Alternative Medicine ◽

10.1155/2021/8315047 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Ji-Eun Park ◽

Sujeong Mun ◽

Siwoo Lee

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Prediction Models ◽

Support Vector ◽

K Nearest Neighbor ◽

Learning Methods ◽

Machine Learning Methods ◽

Sasang Constitution ◽

Constitution Type ◽

Conventional Regression

Background. Machine learning may be a useful tool for predicting metabolic syndrome (MetS), and previous studies also suggest that the risk of MetS differs according to Sasang constitution type. The present study investigated the development of MetS prediction models utilizing machine learning methods and whether the incorporation of Sasang constitution type could improve the performance of those prediction models. Methods. Participants visiting a medical center for a health check-up were recruited in 2005 and 2006. Six kinds of machine learning were utilized (K-nearest neighbor, naive Bayes, random forest, decision tree, multilayer perceptron, and support vector machine), as was conventional logistic regression. Machine learning-derived MetS prediction models with and without the incorporation of Sasang constitution type were compared to investigate whether the former would predict MetS with higher sensitivity. Age, sex, education level, marital status, body mass index, stress, physical activity, alcohol consumption, and smoking were included as potentially predictive factors. Results. A total of 750/2,871 participants had MetS. Among the six types of machine learning methods investigated, multiplayer perceptron and support vector machine exhibited the same performance as the conventional regression method, based on the areas under the receiver operating characteristic curves. The naive-Bayes method exhibited the highest sensitivity (0.49), which was higher than that of the conventional regression method (0.39). The incorporation of Sasang constitution type improved the sensitivity of all of the machine learning methods investigated except for the K-nearest neighbor method. Conclusion. Machine learning-derived models may be useful for MetS prediction, and the incorporation of Sasang constitution type may increase the sensitivity of such models.

Download Full-text

COMPARISON OF MACHINE LEARNING METHODS IN CLASSIFYING POVERTY IN INDONESIA IN 2018

Jurnal Teknik Informatika (Jutif) ◽

10.20884/1.jutif.2021.2.1.52 ◽

2021 ◽

Vol 2 (1) ◽

pp. 51-56

Author(s):

Pardomuan Robinson Sihombing ◽

Ade Marsinta Arsani

Keyword(s):

Machine Learning ◽

Sampling Method ◽

Nearest Neighbor ◽

Choice Model ◽

Imbalanced Data ◽

K Nearest Neighbor ◽

Learning Methods ◽

Rotation Forest ◽

Machine Learning Classification ◽

Machine Learning Methods

Poverty is still one of the main problems in economic development besides inequality, unemployment, and economic growth. This study aims to model poverty directly using a discrete choice model, namely the machine learning classification method. The data used are imbalanced data where one of the categories is small enough so that the resample of both sampling method is used. In this study, several machine learning methods were applied, including the Decision Tree, Naïve Bayes, K-Nearest Neighbor (KNN), and Rotation Forest. The results show that the technique of using resample both samplings provides optimal results for the four machine learning methods. If viewed from the indicators of accuracy, specificity, sensitivity, AUC, and the highest Kappa coefficient produced, the best method is the KNN method. The KNN model has an accuracy value of 0.73 percent, sensitivity of 0.68 percent, specificity of 78 percent, and AUC of 0.73.

Download Full-text

STATISTICAL PREDICTION OF EMOTIONAL STATES BY PHYSIOLOGICAL SIGNALS WITH MANOVA AND MACHINE LEARNING

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001412500085 ◽

2012 ◽

Vol 26 (04) ◽

pp. 1250008 ◽

Cited By ~ 7

Author(s):

TUNG-HUNG CHUEH ◽

TAI-BEEN CHEN ◽

HENRY HORNG-SHING LU ◽

SHAN-SHAN JU ◽

TEH-HO TAO ◽

...

Keyword(s):

Machine Learning ◽

Logistic Model ◽

Nearest Neighbor ◽

Statistical Technique ◽

Physiological Signals ◽

Statistical Prediction ◽

Emotional States ◽

K Nearest Neighbor ◽

Learning Methods ◽

Machine Learning Methods

For the importance of communication between human and machine interface, it would be valuable to develop an implement which has the ability to recognize emotional states. In this paper, we proposed an approach which can deal with the daily dependence and personal dependence in the data of multiple subjects and samples. 30 features were extracted from the physiological signals of subject for three states of emotion. The physiological signals measured were: electrocardiogram (ECG), skin temperature (SKT) and galvanic skin response (GSR). After removing the daily dependence and personal dependence by the statistical technique of MANOVA, six machine learning methods including Bayesian network learning, naive Bayesian classification, SVM, decision tree of C4.5, Logistic model and K-nearest-neighbor (KNN) were implemented to differentiate the emotional states. The results showed that Logistic model gives the best classification accuracy and the statistical technique of MANOVA can significantly improve the performance of all six machine learning methods in emotion recognition system.

Download Full-text

The Tomatoes and Chilies Type Classifications by Using Machine Learning Methods

Journal of Development Research ◽

10.28926/jdr.v4i1.93 ◽

2020 ◽

Vol 4 (1) ◽

pp. 1-6

Author(s):

Irzal Ahmad Sabilla ◽

Chastine Fatichah

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Support Vector ◽

Staple Food ◽

K Nearest Neighbor ◽

Learning Methods ◽

Linear Discriminant ◽

Machine Learning Methods

Vegetables are ingredients for flavoring, such as tomatoes and chilies. A Both of these ingredients are processed to accompany the people's staple food in the form of sauce and seasoning. In supermarkets, these vegetables can be found easily, but many people do not understand how to choose the type and quality of chilies and tomatoes. This study discusses the classification of types of cayenne, curly, green, red chilies, and tomatoes with good and bad conditions using machine learning and contrast enhancement techniques. The machine learning methods used are Support Vector Machine (SVM), K-Nearest Neighbor (K-NN), Linear Discriminant Analysis (LDA), and Random Forest (RF). The results of testing the best method are measured based on the value of accuracy. In addition to the accuracy of this study, it also measures the speed of computation so that the methods used are efficient.

Download Full-text