scholarly journals Map-Reduce based Distance Weighted k-Nearest Neighbor Machine Learning Algorithm for Big Data Applications

Author(s):  
Gothai E ◽  
Usha Moorthy ◽  
Sathishkumar V E ◽  
Abeer Ali Alnuaim ◽  
Wesam Atef Hatamleh ◽  
...  

Abstract With the evolution of Internet standards and advancements in various Internet and mobile technologies, especially since web 4.0, more and more web and mobile applications emerge such as e-commerce, social networks, online gaming applications and Internet of Things based applications. Due to the deployment and concurrent access of these applications on the Internet and mobile devices, the amount of data and the kind of data generated increases exponentially and the new era of Big Data has come into existence. Presently available data structures and data analyzing algorithms are not capable to handle such Big Data. Hence, there is a need for scalable, flexible, parallel and intelligent data analyzing algorithms to handle and analyze the complex massive data. In this article, we have proposed a novel distributed supervised machine learning algorithm based on the MapReduce programming model and Distance Weighted k-Nearest Neighbor algorithm called MR-DWkNN to process and analyze the Big Data in the Hadoop cluster environment. The proposed distributed algorithm is based on supervised learning performs both regression tasks as well as classification tasks on large-volume of Big Data applications. Three performance metrics, such as Root Mean Squared Error (RMSE), Determination coefficient (R2) for regression task, and Accuracy for classification tasks are utilized for the performance measure of the proposed MR-DWkNN algorithm. The extensive experimental results shows that there is an average increase of 3–4.5% prediction and classification performances as compared to standard distributed k-NN algorithm and a considerable decrease of Root Mean Squared Error (RMSE) with good parallelism characteristics of scalability and speedup thus, proves its effectiveness in Big Data predictive and classification applications.

2021 ◽  
Vol 8 ◽  
Author(s):  
Xueyuan Huang ◽  
Yongjun Wang ◽  
Bingyu Chen ◽  
Yuanshuai Huang ◽  
Xinhua Wang ◽  
...  

Background: Predicting the perioperative requirement for red blood cells (RBCs) transfusion in patients with the pelvic fracture may be challenging. In this study, we constructed a perioperative RBCs transfusion predictive model (ternary classifications) based on a machine learning algorithm.Materials and Methods: This study included perioperative adult patients with pelvic trauma hospitalized across six Chinese centers between September 2012 and June 2019. An extreme gradient boosting (XGBoost) algorithm was used to predict the need for perioperative RBCs transfusion, with data being split into training test (80%), which was subjected to 5-fold cross-validation, and test set (20%). The ability of the predictive transfusion model was compared with blood preparation based on surgeons' experience and other predictive models, including random forest, gradient boosting decision tree, K-nearest neighbor, logistic regression, and Gaussian naïve Bayes classifier models. Data of 33 patients from one of the hospitals were prospectively collected for model validation.Results: Among 510 patients, 192 (37.65%) have not received any perioperative RBCs transfusion, 127 (24.90%) received less-transfusion (RBCs < 4U), and 191 (37.45%) received more-transfusion (RBCs ≥ 4U). Machine learning-based transfusion predictive model produced the best performance with the accuracy of 83.34%, and Kappa coefficient of 0.7967 compared with other methods (blood preparation based on surgeons' experience with the accuracy of 65.94%, and Kappa coefficient of 0.5704; the random forest method with an accuracy of 82.35%, and Kappa coefficient of 0.7858; the gradient boosting decision tree with an accuracy of 79.41%, and Kappa coefficient of 0.7742; the K-nearest neighbor with an accuracy of 53.92%, and Kappa coefficient of 0.3341). In the prospective dataset, it also had a food performance with accuracy 81.82%.Conclusion: This multicenter retrospective cohort study described the construction of an accurate model that could predict perioperative RBCs transfusion in patients with pelvic fractures.


2018 ◽  
Vol 1 (2) ◽  
pp. 24-32
Author(s):  
Lamiaa Abd Habeeb

In this paper, we designed a system that extract citizens opinion about Iraqis government and Iraqis politicians through analyze their comments from Facebook (social media network). Since the data is random and contains noise, we cleaned the text and builds a stemmer to stem the words as much as possible, cleaning and stemming reduced the number of vocabulary from 28968 to 17083, these reductions caused reduction in memory size from 382858 bytes to 197102 bytes. Generally, there are two approaches to extract users opinion; namely, lexicon-based approach and machine learning approach. In our work, machine learning approach is applied with three machine learning algorithm which are; Naïve base, K-Nearest neighbor and AdaBoost ensemble machine learning algorithm. For Naïve base, we apply two models; Bernoulli and Multinomial models. We found that, Naïve base with Multinomial models give highest accuracy.


Author(s):  
Shawni Dutta ◽  
Samir Kumar Bandyopadhyay

For enhancing the maximized profit from bank as well as customer perspective, term deposit can accelerate finance fields. This paper focuses on likelihood of term deposit subscription taken by the customers. Bank campaign efforts and customer details are influential while considering possibilities of taking term deposit subscription. An automated system is provided in this paper that approaches towards prediction of term deposit investment possibilities in advance. Neural network(NN) along with stratified 10-fold cross-validation methodology is proposed as predictive model which is later compared with other benchmark classifiers such as k-Nearest Neighbor (k-NN), Decision tree classifier (DT), and Multi-layer perceptron classifier (MLP). Experimental study concluded that proposed model provides significant prediction results over other baseline models with an accuracy of 88.32% and Mean Squared Error (MSE) of 0.1168.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Gholamreza Karamali ◽  
Akram Zardadi ◽  
Hamid Reza Moradi

In this paper, the set-membership affine projection (SM-AP) algorithm is utilized to censor non-informative data in big data applications. To this end, the probability distribution of the additive noise signal and the excess of the mean-squared error (EMSE) in steady-state are employed in order to estimate the threshold parameter of the single threshold SM-AP (ST-SM-AP) algorithm aiming at attaining the desired update rate. Furthermore, by defining an acceptable range for the error signal, the double threshold SM-AP (DT-SM-AP) algorithm is proposed to detect very large errors due to the irrelevant data such as outliers. The DT-SM-AP algorithm can censor non-informative and irrelevant data in big data applications, and it can improve the misalignment and convergence rate of the learning process with high computational efficiency. The simulation and numerical results corroborate the superiority of the proposed algorithms over traditional algorithms.


Author(s):  
Hamid Reza Moradi ◽  
Akram Zardadi

In this paper, the set-membership affine projection (SM-AP) algorithm is utilized to censor non-informative data in big data applications. To this end, the probability distribution of the additive noise signal and the excess of mean-squared error (EMSE) in steady-state are employed in order to estimate the threshold parameter of the single threshold SM-AP (ST-SM-AP) algorithm aiming at attaining the desired update rate. Furthermore, by defining an acceptable range for the error signal, the double threshold SM-AP (DT-SM-AP) algorithm is proposed to detect very large errors due to the irrelevant data such as outliers. The DT-SM-AP algorithm can censor non-informative and irrelevant data in big data applications, and it can improve misalignment and convergence rate of the learning process with high computational efficiency. The simulation and numerical results corroborate the superiority of the proposed algorithms over traditional algorithms.


2020 ◽  
Vol 5 (2) ◽  
pp. 57
Author(s):  
Novia Hasdyna ◽  
Rozzi Kesuma Dinata

K-Nearest Neighbor (K-NN) is a machine learning algorithm that functions to classify data. This study aims to measure the performance of K-NN algorithm by using Matthew Correlation Coefficient (MCC). The data that used in this study are the ornamental fish which consisting of 3 classes named Premium, Medium, and Low. The analysis results of the Matthew Correlation Coefficient on K-NN using Euclidean Distance obtained the highest MCC value in Medium class which is 0.786542. The second highest MCC value is in Premium class which is 0.567434. The lowest MCC value is in Low class which is 0.435269. Overall, the MCC values is statistically which is 0,596415.


Sign in / Sign up

Export Citation Format

Share Document