Map-Reduce based Distance Weighted k-Nearest Neighbor Machine Learning Algorithm for Big Data Applications

Abstract With the evolution of Internet standards and advancements in various Internet and mobile technologies, especially since web 4.0, more and more web and mobile applications emerge such as e-commerce, social networks, online gaming applications and Internet of Things based applications. Due to the deployment and concurrent access of these applications on the Internet and mobile devices, the amount of data and the kind of data generated increases exponentially and the new era of Big Data has come into existence. Presently available data structures and data analyzing algorithms are not capable to handle such Big Data. Hence, there is a need for scalable, flexible, parallel and intelligent data analyzing algorithms to handle and analyze the complex massive data. In this article, we have proposed a novel distributed supervised machine learning algorithm based on the MapReduce programming model and Distance Weighted k-Nearest Neighbor algorithm called MR-DWkNN to process and analyze the Big Data in the Hadoop cluster environment. The proposed distributed algorithm is based on supervised learning performs both regression tasks as well as classification tasks on large-volume of Big Data applications. Three performance metrics, such as Root Mean Squared Error (RMSE), Determination coefficient (R2) for regression task, and Accuracy for classification tasks are utilized for the performance measure of the proposed MR-DWkNN algorithm. The extensive experimental results shows that there is an average increase of 3–4.5% prediction and classification performances as compared to standard distributed k-NN algorithm and a considerable decrease of Root Mean Squared Error (RMSE) with good parallelism characteristics of scalability and speedup thus, proves its effectiveness in Big Data predictive and classification applications.

Download Full-text

Design exploration of ASIP architectures for the K-Nearest Neighbor machine-learning algorithm

2016 28th International Conference on Microelectronics (ICM) ◽

10.1109/icm.2016.7847907 ◽

2016 ◽

Author(s):

Dunia Jamma ◽

Omar Ahmed ◽

Shawki Areibi ◽

Gary Grewal ◽

Nicholas Molloy

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

K Nearest Neighbor ◽

Design Exploration

Download Full-text

Ability of a Machine Learning Algorithm to Predict the Need for Perioperative Red Blood Cells Transfusion in Pelvic Fracture Patients: A Multicenter Cohort Study in China

Frontiers in Medicine ◽

10.3389/fmed.2021.694733 ◽

2021 ◽

Vol 8 ◽

Author(s):

Xueyuan Huang ◽

Yongjun Wang ◽

Bingyu Chen ◽

Yuanshuai Huang ◽

Xinhua Wang ◽

...

Keyword(s):

Machine Learning ◽

Cohort Study ◽

Random Forest ◽

Blood Cells ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Kappa Coefficient ◽

Gradient Boosting ◽

Machine Learning Algorithm ◽

K Nearest Neighbor

Background: Predicting the perioperative requirement for red blood cells (RBCs) transfusion in patients with the pelvic fracture may be challenging. In this study, we constructed a perioperative RBCs transfusion predictive model (ternary classifications) based on a machine learning algorithm.Materials and Methods: This study included perioperative adult patients with pelvic trauma hospitalized across six Chinese centers between September 2012 and June 2019. An extreme gradient boosting (XGBoost) algorithm was used to predict the need for perioperative RBCs transfusion, with data being split into training test (80%), which was subjected to 5-fold cross-validation, and test set (20%). The ability of the predictive transfusion model was compared with blood preparation based on surgeons' experience and other predictive models, including random forest, gradient boosting decision tree, K-nearest neighbor, logistic regression, and Gaussian naïve Bayes classifier models. Data of 33 patients from one of the hospitals were prospectively collected for model validation.Results: Among 510 patients, 192 (37.65%) have not received any perioperative RBCs transfusion, 127 (24.90%) received less-transfusion (RBCs < 4U), and 191 (37.45%) received more-transfusion (RBCs ≥ 4U). Machine learning-based transfusion predictive model produced the best performance with the accuracy of 83.34%, and Kappa coefficient of 0.7967 compared with other methods (blood preparation based on surgeons' experience with the accuracy of 65.94%, and Kappa coefficient of 0.5704; the random forest method with an accuracy of 82.35%, and Kappa coefficient of 0.7858; the gradient boosting decision tree with an accuracy of 79.41%, and Kappa coefficient of 0.7742; the K-nearest neighbor with an accuracy of 53.92%, and Kappa coefficient of 0.3341). In the prospective dataset, it also had a food performance with accuracy 81.82%.Conclusion: This multicenter retrospective cohort study described the construction of an accurate model that could predict perioperative RBCs transfusion in patients with pelvic fractures.

Download Full-text

Sentiment Analysis for Iraqis Dialect in Social Media

Iraqi Journal of Information & Communications Technology ◽

10.31987/ijict.1.2.17 ◽

2018 ◽

Vol 1 (2) ◽

pp. 24-32

Author(s):

Lamiaa Abd Habeeb

Keyword(s):

Machine Learning ◽

Social Media ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Learning Approach ◽

Machine Learning Algorithm ◽

K Nearest Neighbor ◽

Multinomial Models ◽

Ensemble Machine Learning ◽

Machine Learning Approach

In this paper, we designed a system that extract citizens opinion about Iraqis government and Iraqis politicians through analyze their comments from Facebook (social media network). Since the data is random and contains noise, we cleaned the text and builds a stemmer to stem the words as much as possible, cleaning and stemming reduced the number of vocabulary from 28968 to 17083, these reductions caused reduction in memory size from 382858 bytes to 197102 bytes. Generally, there are two approaches to extract users opinion; namely, lexicon-based approach and machine learning approach. In our work, machine learning approach is applied with three machine learning algorithm which are; Naïve base, K-Nearest neighbor and AdaBoost ensemble machine learning algorithm. For Naïve base, we apply two models; Bernoulli and Multinomial models. We found that, Naïve base with Multinomial models give highest accuracy.

Download Full-text

Human Activity Recognition Using K-Nearest Neighbor Machine Learning Algorithm

10.1007/978-981-16-6128-0_29 ◽

2021 ◽

pp. 304-313

Author(s):

Saeed Mohsen ◽

Ahmed Elkaseer ◽

Steffen G. Scholz

Keyword(s):

Machine Learning ◽

Activity Recognition ◽

Human Activity ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Human Activity Recognition ◽

Machine Learning Algorithm ◽

K Nearest Neighbor

Download Full-text

Channel Coverage Identification Conditions for Massive MIMO Millimeter Wave at 28 and 39 GHz Using Fine K-Nearest Neighbor Machine Learning Algorithm

Lecture Notes in Electrical Engineering - Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication ◽

10.1007/978-981-16-0289-4_12 ◽

2021 ◽

pp. 143-163

Author(s):

Vankayala Chethan Prakash ◽

G. Nagarajan ◽

N. Priyavarthan

Keyword(s):

Machine Learning ◽

Millimeter Wave ◽

Massive Mimo ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

K Nearest Neighbor

Download Full-text

Recommender System for Term Deposit Likelihood Prediction Using Cross-Validated Neural Network

10.20944/preprints202006.0360.v1 ◽

2020 ◽

Author(s):

Shawni Dutta ◽

Samir Kumar Bandyopadhyay

Keyword(s):

Neural Network ◽

Nearest Neighbor ◽

Mean Squared Error ◽

Automated System ◽

K Nearest Neighbor ◽

Decision Tree Classifier ◽

Squared Error ◽

Proposed Model ◽

Tree Classifier ◽

Customer Perspective

For enhancing the maximized profit from bank as well as customer perspective, term deposit can accelerate finance fields. This paper focuses on likelihood of term deposit subscription taken by the customers. Bank campaign efforts and customer details are influential while considering possibilities of taking term deposit subscription. An automated system is provided in this paper that approaches towards prediction of term deposit investment possibilities in advance. Neural network(NN) along with stratified 10-fold cross-validation methodology is proposed as predictive model which is later compared with other benchmark classifiers such as k-Nearest Neighbor (k-NN), Decision tree classifier (DT), and Multi-layer perceptron classifier (MLP). Experimental study concluded that proposed model provides significant prediction results over other baseline models with an accuracy of 88.32% and Mean Squared Error (MSE) of 0.1168.

Download Full-text

Data Censoring with Set-Membership Affine Projection Algorithm

Computer Science ◽

10.7494/csci.2020.21.1.3388 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Gholamreza Karamali ◽

Akram Zardadi ◽

Hamid Reza Moradi

Keyword(s):

Big Data ◽

Mean Squared Error ◽

Noise Signal ◽

Projection Algorithm ◽

Affine Projection ◽

Squared Error ◽

Big Data Applications ◽

Set Membership ◽

Update Rate ◽

Data Censoring

In this paper, the set-membership affine projection (SM-AP) algorithm is utilized to censor non-informative data in big data applications. To this end, the probability distribution of the additive noise signal and the excess of the mean-squared error (EMSE) in steady-state are employed in order to estimate the threshold parameter of the single threshold SM-AP (ST-SM-AP) algorithm aiming at attaining the desired update rate. Furthermore, by defining an acceptable range for the error signal, the double threshold SM-AP (DT-SM-AP) algorithm is proposed to detect very large errors due to the irrelevant data such as outliers. The DT-SM-AP algorithm can censor non-informative and irrelevant data in big data applications, and it can improve the misalignment and convergence rate of the learning process with high computational efficiency. The simulation and numerical results corroborate the superiority of the proposed algorithms over traditional algorithms.

Download Full-text

Breast Cancer Detection Using K-Nearest Neighbor Machine Learning Algorithm

2016 9th International Conference on Developments in eSystems Engineering (DeSE) ◽

10.1109/dese.2016.8 ◽

2016 ◽

Cited By ~ 10

Author(s):

Moh'd Rasoul Al-Hadidi ◽

Abdulsalam Alarabeyyat ◽

Mohannad Alhanahnah

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Cancer Detection ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Breast Cancer Detection ◽

Machine Learning Algorithm ◽

K Nearest Neighbor

Download Full-text

Data Censoring with Set-Membership Affine Projection Algorithm

10.20944/preprints201810.0253.v1 ◽

2018 ◽

Author(s):

Hamid Reza Moradi ◽

Akram Zardadi

Keyword(s):

Big Data ◽

Mean Squared Error ◽

Noise Signal ◽

Projection Algorithm ◽

Affine Projection ◽

Squared Error ◽

Big Data Applications ◽

Set Membership ◽

Update Rate ◽

Data Censoring

In this paper, the set-membership affine projection (SM-AP) algorithm is utilized to censor non-informative data in big data applications. To this end, the probability distribution of the additive noise signal and the excess of mean-squared error (EMSE) in steady-state are employed in order to estimate the threshold parameter of the single threshold SM-AP (ST-SM-AP) algorithm aiming at attaining the desired update rate. Furthermore, by defining an acceptable range for the error signal, the double threshold SM-AP (DT-SM-AP) algorithm is proposed to detect very large errors due to the irrelevant data such as outliers. The DT-SM-AP algorithm can censor non-informative and irrelevant data in big data applications, and it can improve misalignment and convergence rate of the learning process with high computational efficiency. The simulation and numerical results corroborate the superiority of the proposed algorithms over traditional algorithms.

Download Full-text

Analisis Matthew Correlation Coefficient pada K-Nearest Neighbor dalam Klasifikasi Ikan Hias

INFORMAL: Informatics Journal ◽

10.19184/isj.v5i2.18907 ◽

2020 ◽

Vol 5 (2) ◽

pp. 57

Author(s):

Novia Hasdyna ◽

Rozzi Kesuma Dinata

Keyword(s):

Machine Learning ◽

Correlation Coefficient ◽

Euclidean Distance ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Ornamental Fish ◽

Machine Learning Algorithm ◽

K Nearest Neighbor ◽

Matthew Correlation Coefficient

K-Nearest Neighbor (K-NN) is a machine learning algorithm that functions to classify data. This study aims to measure the performance of K-NN algorithm by using Matthew Correlation Coefficient (MCC). The data that used in this study are the ornamental fish which consisting of 3 classes named Premium, Medium, and Low. The analysis results of the Matthew Correlation Coefficient on K-NN using Euclidean Distance obtained the highest MCC value in Medium class which is 0.786542. The second highest MCC value is in Premium class which is 0.567434. The lowest MCC value is in Low class which is 0.435269. Overall, the MCC values is statistically which is 0,596415.

Download Full-text