scholarly journals K-NN supervised learning algorithm in the predictive analysis of the quality of the university administrative service in the virtual environment

Author(s):  
Omar Freddy Chamorro-Atalaya ◽  
Guillermo Morales Romero ◽  
Adrián Quispe Andía ◽  
Beatriz Caycho Salas ◽  
Elizabeth Katerin Auqui Ramos ◽  
...  

The objective of this study is to analyze and discuss the metrics of the predictive model using the K-nearest neighbor (K-NN) learning algorithm, which will be applied to the data on the perception of engineering students on the quality of the virtual administrative service, such as part of the methodology was analyzed the indicators of accuracy, precision, sensitivity and specificity, from the obtaining of the confusion matrix and the receiver operational characteristic (ROC) curve. The collected data were validated through Cronbach's Alpha, finding consistency values higher than 0.9, which allows to continue with the analysis. Through the predictive model through the Matlab R2021a software, it was concluded that the average metrics for all classes are optimal, presenting a precision of 92.77%, sensitivity 86.62%, and specificity 94.7%; with a total accuracy of 85.5%. In turn, the highest level of the area under the curve (AUC) is 0.98, which is why it is considered an optimal predictive model. Having carried out this study, it is possible to contribute significantly to the decision-making of the higher institution in relation to the improvement of the quality of the virtual administrative service.

2021 ◽  
Author(s):  
Ji Su Ko ◽  
Jieun Byun ◽  
Seongkeun Park ◽  
Ji Young Woo

Abstract We retrospectively assessed 214 patients with chronic liver disease or liver cirrhosis who underwent magnetic resonance imaging (MRI) enhanced with gadolinium ethoxybenzyl diethylenetriamine pentaacetic acid (Gd-EOB-DTPA) from August 2016 to May 2020 to evaluate the relationship between biochemical results that reflect liver function and hepatic enhancement. With the information gained we employed a machine learning approach with the K-Nearest Neighbor (KNN) algorithm to develop a predictive model for determining insufficient hepatic enhancement during the hepatobiliary phase (HBP) in Gd-EOB-DTPA-enhanced MRI. Using both quantitative and qualitative assessments, the total bilirubin (TB), albumin (Alb), prothrombin time-international normalized ratio, platelet, Child-Pugh score (CPS), and Model for End-stage Liver Disease Sodium (MELD-Na) score were related to decreased hepatic enhancement. In a multivariate analysis, TB and Alb were associated with insufficient enhancement (p < 0.001). The predictive model showed that a combination of a variety of biochemical parameters had better performance (accuracy = 82.8%, area under the curve (AUC) = 0.861) in predicting insufficient enhancement than either the CPS (accuracy = 79.5%, AUC = 0.845) or the MELD-Na score (accuracy = 80.8%, AUC = 0.821). By using a machine-learning-based predictive model with the KNN algorithm, radiologists can predict insufficient hepatic enhancement during HBP in advance and adjust each patient's individually optimized MRI protocol.


2017 ◽  
Vol 15 (1) ◽  
pp. 52-68
Author(s):  
Ming Liu ◽  
Yuqi Wang ◽  
Weiwei Xu ◽  
Li Liu

The number of Chinese engineering students has increased greatly since 1999. Rating the quality of these students' English essays has thus become time-consuming and challenging. This paper presents a novel automatic essay scoring algorithm called PSO-SVR, based on a machine learning algorithm, Support Vector Machine for Regression (SVR), and a computational intelligence algorithm, Particle Swarm Optimization, which optimizes the parameters of SVR kernel functions. Three groups of essays, written by chemical, electrical and computer science engineering majors respectively, were used for evaluation. The study result shows that this PSO-SVR outperforms traditional essay scoring algorithms, such as multiple linear regression, support vector machine for regression and K Nearest Neighbor algorithm. It indicates that PSO-SVR is more robust in predicting irregular datasets, because the repeated use of simple content words may result in the low score of an essay, even though the system detects higher cohesion but no spelling error.


2021 ◽  
pp. 1-17
Author(s):  
Ahmed Al-Tarawneh ◽  
Ja’afer Al-Saraireh

Twitter is one of the most popular platforms used to share and post ideas. Hackers and anonymous attackers use these platforms maliciously, and their behavior can be used to predict the risk of future attacks, by gathering and classifying hackers’ tweets using machine-learning techniques. Previous approaches for detecting infected tweets are based on human efforts or text analysis, thus they are limited to capturing the hidden text between tweet lines. The main aim of this research paper is to enhance the efficiency of hacker detection for the Twitter platform using the complex networks technique with adapted machine learning algorithms. This work presents a methodology that collects a list of users with their followers who are sharing their posts that have similar interests from a hackers’ community on Twitter. The list is built based on a set of suggested keywords that are the commonly used terms by hackers in their tweets. After that, a complex network is generated for all users to find relations among them in terms of network centrality, closeness, and betweenness. After extracting these values, a dataset of the most influential users in the hacker community is assembled. Subsequently, tweets belonging to users in the extracted dataset are gathered and classified into positive and negative classes. The output of this process is utilized with a machine learning process by applying different algorithms. This research build and investigate an accurate dataset containing real users who belong to a hackers’ community. Correctly, classified instances were measured for accuracy using the average values of K-nearest neighbor, Naive Bayes, Random Tree, and the support vector machine techniques, demonstrating about 90% and 88% accuracy for cross-validation and percentage split respectively. Consequently, the proposed network cyber Twitter model is able to detect hackers, and determine if tweets pose a risk to future institutions and individuals to provide early warning of possible attacks.


Mathematics ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. 830
Author(s):  
Seokho Kang

k-nearest neighbor (kNN) is a widely used learning algorithm for supervised learning tasks. In practice, the main challenge when using kNN is its high sensitivity to its hyperparameter setting, including the number of nearest neighbors k, the distance function, and the weighting function. To improve the robustness to hyperparameters, this study presents a novel kNN learning method based on a graph neural network, named kNNGNN. Given training data, the method learns a task-specific kNN rule in an end-to-end fashion by means of a graph neural network that takes the kNN graph of an instance to predict the label of the instance. The distance and weighting functions are implicitly embedded within the graph neural network. For a query instance, the prediction is obtained by performing a kNN search from the training data to create a kNN graph and passing it through the graph neural network. The effectiveness of the proposed method is demonstrated using various benchmark datasets for classification and regression tasks.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1797
Author(s):  
Ján Vachálek ◽  
Dana Šišmišová ◽  
Pavol Vašek ◽  
Jan Rybář ◽  
Juraj Slovák ◽  
...  

The article deals with aspects of identifying industrial products in motion based on their color. An automated robotic workplace with a conveyor belt, robot and an industrial color sensor is created for this purpose. Measured data are processed in a database and then statistically evaluated in form of type A standard uncertainty and type B standard uncertainty, in order to obtain combined standard uncertainties results. Based on the acquired data, control charts of RGB color components for identified products are created. Influence of product speed on the measuring process identification and process stability is monitored. In case of identification uncertainty i.e., measured values are outside the limits of control charts, the K-nearest neighbor machine learning algorithm is used. This algorithm, based on the Euclidean distances to the classified value, estimates its most accurate iteration. This results into the comprehensive system for identification of product moving on conveyor belt, where based on the data collection and statistical analysis using machine learning, industry usage reliability is demonstrated.


2018 ◽  
Vol 7 (3) ◽  
pp. 1372
Author(s):  
Soudamini Hota ◽  
Sudhir Pathak

‘Sentiment’ literally means ‘Emotions’. Sentiment analysis, synonymous to opinion mining, is a type of data mining that refers to the analy-sis of data obtained from microblogging sites, social media updates, online news reports, user reviews etc., in order to study the sentiments of the people towards an event, organization, product, brand, person etc. In this work, sentiment classification is done into multiple classes. The proposed methodology based on KNN classification algorithm shows an improvement over one of the existing methodologies which is based on SVM classification algorithm. The data used for analysis has been taken from Twitter, this being the most popular microblogging site. The source data has been extracted from Twitter using Python’s Tweepy. N-Gram modeling technique has been used for feature extraction and the supervised machine learning algorithm k-nearest neighbor has been used for sentiment classification. The performance of proposed and existing techniques is compared in terms of accuracy, precision and recall. It is analyzed and concluded that the proposed technique performs better in terms of all the standard evaluation parameters. 


2021 ◽  
Author(s):  
Ayesha Sania ◽  
Nicolo Pini ◽  
Morgan Nelson ◽  
Michael Myers ◽  
Lauren Shuffrey ◽  
...  

Abstract Background — Missing data are a source of bias in epidemiologic studies. This is problematic in alcohol research where data missingness is linked to drinking behavior. Methods — The Safe Passage study was a prospective investigation of prenatal drinking and fetal/infant outcomes (n=11,083). Daily alcohol consumption for last reported drinking day and 30 days prior was recorded using Timeline Followback method. Of 3.2 million person-days, data were missing for 0.36 million. We imputed missing data using a machine learning algorithm; “K Nearest Neighbor” (K-NN). K-NN imputes missing values for a participant using data of participants closest to it. Imputed values were weighted for the distances from nearest neighbors and matched for day of week. Validation was done on randomly deleted data for 5-15 consecutive days. Results — Data from 5 nearest neighbors and segments of 55 days provided imputed values with least imputation error. After deleting data segments from with no missing days first trimester, there was no difference between actual and predicted values for 64% of deleted segments. For 31% of the segments, imputed data were within +/-1 drink/day of the actual. Conclusions — K-NN can be used to impute missing data in longitudinal studies of alcohol use during pregnancy with high accuracy.


2020 ◽  
Vol 2 (1) ◽  
pp. 1-14
Author(s):  
Torkis Nasution

The selection was an attempt College to get qualified prospective students. Test data for new students able to describe the quality of academic and connect to graduate on time. Recognizing the academic quality of students is required in the implementation of the lecture to obtain optimal results. Real conditions today, timely graduation has not achieved optimally, need to be improved to reach the limits of reasonableness. Data that has no need to do a classification based on academic quality, in order to obtain predictions timely graduation. Therefore, proposed an effort to resolve the problem by applying the K-Nearest Neighbor algorithm to re-clustering the test result data for new students. The procedure is to determine the amount of data clusters, determining the center point of the cluster, calculate the distance of the object with the centroid, classifying objects. If the new data group calculation results together with the results of calculation of new data group then finished its calculations. The data will be used in clustering is the result of the entrance exam for new students 3 years old, and has been declared STMIK Amik Riau. This study aims to predict the graduation on time or not. Results of research on testing the value of k, maximum accuracy is obtained when k = 5, reaching 99.25%. Accuracy will decline if the k value the greater the more inaccurate results. The data will be used in clustering is the result of the entrance exam for new students 3 years old, and has been declared STMIK Amik Riau. This study aims to predict the graduation on time or not. Results of research on testing the value of k, maximum accuracy is obtained when k = 5, reaching 99.25%. Accuracy will decline if the k value the greater the more inaccurate results. The data will be used in clustering is the result of the entrance exam for new students 3 years old, and has been declared STMIK Amik Riau. This study aims to predict the graduation on time or not. Results of research on testing the value of k, maximum accuracy is obtained when k = 5, reaching 99.25%. Accuracy will decline if the k value the greater the more inaccurate results.  


Stock Trading has been one of the most important parts of the financial world for decades. People investing in the share market analyze the financial history of a corporation, the news related to it and study huge amounts of data so as to predict its stock price trend. The right investment i.e. buying and selling a company stock at the right time leads to monetary benefits and can make one a millionaire overnight. The stock market is an extremely fluctuating platform wherein data is produced in humongous quantities and is influenced by numerous disparate factors such as socio-political issues, financial activities like splits and dividends, news as well as rumors. This work proposes a novel system “IntelliFin” to predict the share market trend. The system uses the various stock market technical indicators along with the company's historical market data trends to predict the share prices. The system employs the sentiment determination of a company's financial and socio-political news for a more accurate prediction. This system is implemented using two models. The first is a hybrid LSTM model optimized by an ADAM optimizer. The other is a hybrid ML model which integrates a Support Vector Regressor, K-Nearest Neighbor classifier, an RF classifier and a Linear Regressor using a Majority Voting algorithm. Both models employ a sentiment analyzer to account for the news impacting the stock prices which is powered by NLP. The models are trained continuously using Reinforcement Learning implemented by the Q-Learning Algorithm to increase the consistency and accuracy. The project aims to support the inexperienced investors, who don't have enough experience in investing in the stock market and help them maximize their profit and minimize or eliminate the losses. The developed system will also serve as a tool for professional investors to help and aid their decision making.


2021 ◽  
Vol 12 (2) ◽  
pp. 85-99
Author(s):  
Nassima Dif ◽  
Zakaria Elberrichi

Hybrid metaheuristics has received a lot of attention lately to solve combinatorial optimization problems. The purpose of hybridization is to create a cooperation between metaheuristics for better solutions. Most proposed works were interested in static hybridization. The objective of this work is to propose a novel dynamic hybridization method (GPBD) that generates the most suitable sequential hybridization between GA, PSO, BAT, and DE metaheuristics, according to each problem. The authors choose to test this approach for solving the best feature selection problem in a wrapper tactic, performed on face image recognition datasets, with the k-nearest neighbor (KNN) learning algorithm. The comparative study of the metaheuristics and their hybridization GPBD shows that the proposed approach achieved the best results. It was definitely competitive with other filter approaches proposed in the literature. It achieved a perfect accuracy score of 100% for Orl10P, Pix10P, and PIE10P datasets.


Sign in / Sign up

Export Citation Format

Share Document