Performance of Machine Learning Algorithms with Different K Values in K-fold CrossValidation

Isaac Kofi Nti;  ; Owusu N yarko-Boateng; Justice Aning

doi:10.5815/ijitcs.2021.06.05

Performance of Machine Learning Algorithms with Different K Values in K-fold CrossValidation

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2021.06.05 ◽

2021 ◽

Vol 13 (6) ◽

pp. 61-71

Author(s):

Isaac Kofi Nti ◽

◽

Owusu N yarko-Boateng ◽

Justice Aning

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Experimental Studies ◽

Area Under The Curve ◽

Essential Element ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Machine Learning Algorithm ◽

K Value

The numerical value of k in a k-fold cross-validation training technique of machine learning predictive models is an essential element that impacts the model’s performance. A right choice of k results in better accuracy, while a poorly chosen value for k might affect the model’s performance. In literature, the most commonly used values of k are five (5) or ten (10), as these two values are believed to give test error rate estimates that suffer neither from extremely high bias nor very high variance. However, there is no formal rule. To the best of our knowledge, few experimental studies attempted to investigate the effect of diverse k values in training different machine learning models. This paper empirically analyses the prevalence and effect of distinct k values (3, 5, 7, 10, 15 and 20) on the validation performance of four well-known machine learning algorithms (Gradient Boosting Machine (GBM), Logistic Regression (LR), Decision Tree (DT) and K-Nearest Neighbours (KNN)). It was observed that the value of k and model validation performance differ from one machine-learning algorithm to another for the same classification task. However, our empirical suggest that k = 7 offers a slight increase in validations accuracy and area under the curve measure with lesser computational complexity than k = 10 across most MLA. We discuss in detail the study outcomes and outline some guidelines for beginners in the machine learning field in selecting the best k value and machine learning algorithm for a given task.

Download Full-text

Risk Monitoring and Quantitative Results of Various Attributes of Machine Learning Algorithms with a Time Series Data

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9570.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 4018-4022

Keyword(s):

Machine Learning ◽

Time Series Data ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Series Data ◽

Machine Learning Algorithm ◽

Risk Modelling ◽

Risk Monitoring ◽

Quantitative Results

The aim of this research is to do risk modelling after analysis of twitter posts based on certain sentiment analysis. In this research we analyze posts of several users or a particular user to check whether they can be cause of concern to the society or not. Every sentiment like happy, sad, anger and other emotions are going to provide scaling of severity in the conclusion of final table on which machine learning algorithm is applied. The data which is put under the machine learning algorithms are been monitored over a period of time and it is related to a particular topic in an area

Download Full-text

Significant Impact of Improved Machine Learning Algorithm in The Processes of Large Data Sets

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206133 ◽

2020 ◽

pp. 458-467

Author(s):

Virendra Tiwari ◽

Balendra Garg ◽

Uday Prakash Sharma

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Dynamic Environment ◽

Large Data ◽

Machine Learning Algorithms ◽

Streaming Data ◽

Machine Learning Techniques ◽

Machine Learning Algorithm ◽

Learning Mechanisms

The machine learning algorithms are capable of managing multi-dimensional data under the dynamic environment. Despite its so many vital features, there are some challenges to overcome. The machine learning algorithms still requires some additional mechanisms or procedures for predicting a large number of new classes with managing privacy. The deficiencies show the reliable use of a machine learning algorithm relies on human experts because raw data may complicate the learning process which may generate inaccurate results. So the interpretation of outcomes with expertise in machine learning mechanisms is a significant challenge in the machine learning algorithm. The machine learning technique suffers from the issue of high dimensionality, adaptability, distributed computing, scalability, the streaming data, and the duplicity. The main issue of the machine learning algorithm is found its vulnerability to manage errors. Furthermore, machine learning techniques are also found to lack variability. This paper studies how can be reduced the computational complexity of machine learning algorithms by finding how to make predictions using an improved algorithm.

Download Full-text

A Surveillance on Machine Learning Algorithms and Its Applications

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9064 ◽

2020 ◽

Vol 17 (9) ◽

pp. 4294-4298

Author(s):

B. R. Sunil Kumar ◽

B. S. Siddhartha ◽

S. N. Shwetha ◽

K. Arpitha

Keyword(s):

Machine Learning ◽

Health Care ◽

Sentiment Analysis ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Algorithm ◽

Data Set ◽

Pros And Cons ◽

Primary Advantage

This paper intends to use distinct machine learning algorithms and exploring its multi-features. The primary advantage of machine learning is, a machine learning algorithm can predict its work automatically by learning what to do with information. This paper reveals the concept of machine learning and its algorithms which can be used for different applications such as health care, sentiment analysis and many more. Sometimes the programmers will get confused which algorithm to apply for their applications. This paper provides an idea related to the algorithm used on the basis of how accurately it fits. Based on the collected data, one of the algorithms can be selected based upon its pros and cons. By considering the data set, the base model is developed, trained and tested. Then the trained model is ready for prediction and can be deployed on the basis of feasibility.

Download Full-text

Machine learning algorithms can predict tail biting outbreaks in pigs using feeding behaviour records

10.1101/2021.05.11.443554 ◽

2021 ◽

Author(s):

Catherine Ollagnier ◽

Claudia Kasper ◽

Anna Wallenbeck ◽

Linda Keeling ◽

Siavash A Bigdeli

Keyword(s):

Machine Learning ◽

Random Forest ◽

Real Time ◽

Feeding Behaviour ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Algorithm ◽

Tail Biting ◽

Testing Set

Tail biting is a detrimental behaviour that impacts the welfare and health of pigs. Early detection of tail biting precursor signs allows for preventive measures to be taken, thus avoiding the occurrence of the tail biting event. This study aimed to build a machine-learning algorithm for real time detection of upcoming tail biting outbreaks, using feeding behaviour data recorded by an electronic feeder. Prediction capacities of seven machine learning algorithms (e.g., random forest, neural networks) were evaluated from daily feeding data collected from 65 pens originating from 2 herds of grower-finisher pigs (25-100kg), in which 27 tail biting events occurred. Data were divided into training and testing data, either by randomly splitting data into 75% (training set) and 25% (testing set), or by randomly selecting pens to constitute the testing set. The random forest algorithm was able to predict 70% of the upcoming events with an accuracy of 94%, when predicting events in pens for which it had previous data. The detection of events for unknown pens was less sensitive, and the neural network model was able to detect 14% of the upcoming events with an accuracy of 63%. A machine-learning algorithm based on ongoing data collection should be considered for implementation into automatic feeder systems for real time prediction of tail biting events.

Download Full-text

Machine Learning for the Prediction of Springback in High Tensile Strength Steels after V-Bending Process Using Tree-Based Learning

10.21203/rs.3.rs-795174/v1 ◽

2021 ◽

Author(s):

Saad Ur Rehman Baig ◽

Muhammad Wasif ◽

Anis Fatima ◽

Mirza Muhammad Anas Baig ◽

Syed Amir Iqbal

Keyword(s):

Machine Learning ◽

Sheet Material ◽

Learning Algorithms ◽

Experimental Studies ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Machining Parameters ◽

Initial Angle ◽

Bending Process ◽

Boosting Algorithms

Abstract Sheet metal bending is a typical operation and springback is an unintended consequence of this operation. Since it causes fitting issues in the assembly, which leads to quality problems, anticipating it long before the bending operation is done is essential in today's production, so that machining parameters can be adjusted accordingly. In order to predict springback with minimum errors, this paper presents the idea for the development of machine learning models using tree-based learning algorithms (A class of machine learning algorithms). Tree-based learning algorithms are employed because they are precise, consistent, and easy to understand. Experimental studies provided the data for training and testing the models. The model's input parameters were sheet Material, Thickness, Width, Initial Angle (Desired angle), and Machine used to perform the bending. Following the training and testing of different tree-based learning algorithms, the results were evaluated using MAE and MSE. It was determined that Gradient boosting algorithms (a class of tree-based learning) gave the best results. Later on further evaluation of algorithms, it was found that LightGBM produced the best results, with MAE and MSE of 0.41 and 0.25, respectively.

Download Full-text

Overview of Machine Learning for Stock Selection Based on Multi-Factor Models

E3S Web of Conferences ◽

10.1051/e3sconf/202021402047 ◽

2020 ◽

Vol 214 ◽

pp. 02047

Author(s):

Haoxuan Li ◽

Xueyan Zhang ◽

Ziyan Li ◽

Chunyuan Zheng

Keyword(s):

Machine Learning ◽

Statistical Methods ◽

Learning Algorithm ◽

Empirical Studies ◽

Learning Algorithms ◽

Factor Models ◽

Machine Learning Algorithms ◽

Machine Learning Algorithm ◽

Stock Selection

In recent years, many scholars have used different methods to predict and select stocks. Empirical studies have shown that in multi-factor models, machine learning algorithms perform better on stock selection than traditional statistical methods. This article selects six classic machine learning algorithms, and takes the CSI 500 component stocks as an example, using 19 factors to select stocks. In this article, we introduce four of these algorithms in detail and apply them to select stocks. Finally, we back-test six machine learning algorithms, list the data, analyze the performance of each algorithm, and put forward some ideas on the direction of machine learning algorithm improvement.

Download Full-text

Software Engineering for Machine Learning in Health Informatics

10.21203/rs.2.17747/v1 ◽

2019 ◽

Author(s):

Mohammed Moreb ◽

Oguz Ata

Keyword(s):

Machine Learning ◽

Software Engineering ◽

Health Informatics ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Prototype System ◽

Machine Learning Algorithm ◽

Learning Models ◽

Machine Learning Models

Abstract Background We propose a novel framework for health Informatics: framework and methodology of Software Engineering for machine learning in Health Informatics (SEMLHI). This framework shed light on its features, that allow users to study and analyze the requirements, determine the function of objects related to the system and determine the machine learning algorithms that will be used for the dataset.Methods Based on original data that collected from the hospital in Palestine government in the past three years, first the data validated and all outlier removed, analyzed using develop framework in order to compare ML provide patients with real-time. Our proposed module comparison with three Systems Engineering Methods Vee, agile and SEMLHI. The result used by implement prototype system, which require machine learning algorithm, after development phase, questionnaire deliver to developer to indicate the result using three methodology. SEMLHI framework, is composed into four components: software, machine learning model, machine learning algorithms, and health informatics data, Machine learning Algorithm component used five algorithms use to evaluate the accuracy for machine learning models on component.Results we compare our approach with the previously published systems in terms of performance to evaluate the accuracy for machine learning models, the results of accuracy with different algorithms applied for 750 case, linear SVG have about 0.57 value compared with KNeighbors classifier, logistic regression, multinomial NB, random forest classifier. This research investigates the interaction between SE, and ML within the context of health informatics, our proposed framework define the methodology for developers to analyzing and developing software for the health informatic model, and create a space, in which software engineering, and ML experts could work on the ML model lifecycle, on the disease level and the subtype level.Conclusions This article is an ongoing effort towards defining and translating an existing research pipeline into four integrated modules, as framework system using the dataset from healthcare to reduce cost estimation by using a new suggested methodology. The framework is available as open source software, licensed under GNU General Public License Version 3 to encourage others to contribute to the future development of the SEMLHI framework.

Download Full-text

APPLICATIONS OF ARTIFICIAL INTELLIGENCE IN CYBER SECURITY

10.36227/techrxiv.16789237.v1 ◽

2021 ◽

Author(s):

Arvind Thorat

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Cyber Security ◽

Learning Algorithm ◽

Learning Algorithms ◽

Research Paper ◽

Machine Learning Algorithms ◽

Machine Learning Algorithm

<div>In the above research paper we describe the how machine learning algorithm can be applied to cyber security purpose, like how to detect malware, botnet. How can we recognize strong password for our system. And detail implementation of Artificial Intelligence and machine learning algorithms is mentioned.</div>

Download Full-text

Deducing of Optimal Machine Learning Algorithms for Heterogeneity

10.36227/techrxiv.17162147 ◽

2021 ◽

Author(s):

Omar Alfarisi ◽

Zeyar Aung ◽

Mohamed Sassi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Learning Algorithms ◽

Synthetic Data ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Data Set ◽

Optimal Machine

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneity, we identified Random Forest, among others, to be the best algorithm.

Download Full-text

Machine learning approaches in predicting ambulatory same day discharge patients after total hip arthroplasty

Regional Anesthesia and Pain Medicine ◽

10.1136/rapm-2021-102715 ◽

2021 ◽

pp. rapm-2021-102715

Author(s):

Haoyan Zhong ◽

Jashvant Poeran ◽

Alex Gu ◽

Lauren A Wilson ◽

Alejandro Gonzalez Della Valle ◽

...

Keyword(s):

Machine Learning ◽

Total Hip Arthroplasty ◽

Length Of Stay ◽

Hip Arthroplasty ◽

Learning Algorithm ◽

Area Under The Curve ◽

Machine Learning Algorithms ◽

Machine Learning Algorithm ◽

Total Hip ◽

Machine Learning Models

BackgroundWith continuing financial and regulatory pressures, practice of ambulatory total hip arthroplasty is increasing. However, studies focusing on selection of optimal candidates are burdened by limitations related to traditional statistical approaches. Hereby we aimed to apply machine learning algorithm to identify characteristics associated with optimal candidates.MethodsThis retrospective cohort study included elective total hip arthroplasty (n=63 859) recorded in National Surgical Quality Improvement Program dataset from 2017 to 2018. The main outcome was length of stay. A total of 40 candidate variables were considered. We applied machine learning algorithms (multivariable logistic regression, artificial neural networks, and random forest models) to predict length of stay=0 day. Models’ accuracies and area under the curve were calculated.ResultsApplying machine learning models to compare length of stay=0 day to length of stay=1–3 days cases, we found area under the curve of 0.715, 0.762, and 0.804, accuracy of 0.65, 0.73, and 0.81 for logistic regression, artificial neural networks, and random forest model, respectively. Regarding the most important predictive features, anesthesia type, body mass index, age, ethnicity, white blood cell count, sodium level, and alkaline phosphatase were highlighted in machine learning models.ConclusionsMachine learning algorithm exhibited acceptable model quality and accuracy. Machine learning algorithms highlighted the as yet unrecognized impact of laboratory testing on future patient ambulatory pathway assignment.

Download Full-text