scholarly journals Performance of Machine Learning Algorithms with Different K Values in K-fold CrossValidation

Author(s):  
Isaac Kofi Nti ◽  
◽  
Owusu N yarko-Boateng ◽  
Justice Aning

The numerical value of k in a k-fold cross-validation training technique of machine learning predictive models is an essential element that impacts the model’s performance. A right choice of k results in better accuracy, while a poorly chosen value for k might affect the model’s performance. In literature, the most commonly used values of k are five (5) or ten (10), as these two values are believed to give test error rate estimates that suffer neither from extremely high bias nor very high variance. However, there is no formal rule. To the best of our knowledge, few experimental studies attempted to investigate the effect of diverse k values in training different machine learning models. This paper empirically analyses the prevalence and effect of distinct k values (3, 5, 7, 10, 15 and 20) on the validation performance of four well-known machine learning algorithms (Gradient Boosting Machine (GBM), Logistic Regression (LR), Decision Tree (DT) and K-Nearest Neighbours (KNN)). It was observed that the value of k and model validation performance differ from one machine-learning algorithm to another for the same classification task. However, our empirical suggest that k = 7 offers a slight increase in validations accuracy and area under the curve measure with lesser computational complexity than k = 10 across most MLA. We discuss in detail the study outcomes and outline some guidelines for beginners in the machine learning field in selecting the best k value and machine learning algorithm for a given task.

The aim of this research is to do risk modelling after analysis of twitter posts based on certain sentiment analysis. In this research we analyze posts of several users or a particular user to check whether they can be cause of concern to the society or not. Every sentiment like happy, sad, anger and other emotions are going to provide scaling of severity in the conclusion of final table on which machine learning algorithm is applied. The data which is put under the machine learning algorithms are been monitored over a period of time and it is related to a particular topic in an area


Author(s):  
Virendra Tiwari ◽  
Balendra Garg ◽  
Uday Prakash Sharma

The machine learning algorithms are capable of managing multi-dimensional data under the dynamic environment. Despite its so many vital features, there are some challenges to overcome. The machine learning algorithms still requires some additional mechanisms or procedures for predicting a large number of new classes with managing privacy. The deficiencies show the reliable use of a machine learning algorithm relies on human experts because raw data may complicate the learning process which may generate inaccurate results. So the interpretation of outcomes with expertise in machine learning mechanisms is a significant challenge in the machine learning algorithm. The machine learning technique suffers from the issue of high dimensionality, adaptability, distributed computing, scalability, the streaming data, and the duplicity. The main issue of the machine learning algorithm is found its vulnerability to manage errors. Furthermore, machine learning techniques are also found to lack variability. This paper studies how can be reduced the computational complexity of machine learning algorithms by finding how to make predictions using an improved algorithm.


2020 ◽  
Vol 17 (9) ◽  
pp. 4294-4298
Author(s):  
B. R. Sunil Kumar ◽  
B. S. Siddhartha ◽  
S. N. Shwetha ◽  
K. Arpitha

This paper intends to use distinct machine learning algorithms and exploring its multi-features. The primary advantage of machine learning is, a machine learning algorithm can predict its work automatically by learning what to do with information. This paper reveals the concept of machine learning and its algorithms which can be used for different applications such as health care, sentiment analysis and many more. Sometimes the programmers will get confused which algorithm to apply for their applications. This paper provides an idea related to the algorithm used on the basis of how accurately it fits. Based on the collected data, one of the algorithms can be selected based upon its pros and cons. By considering the data set, the base model is developed, trained and tested. Then the trained model is ready for prediction and can be deployed on the basis of feasibility.


2021 ◽  
Author(s):  
Catherine Ollagnier ◽  
Claudia Kasper ◽  
Anna Wallenbeck ◽  
Linda Keeling ◽  
Siavash A Bigdeli

Tail biting is a detrimental behaviour that impacts the welfare and health of pigs. Early detection of tail biting precursor signs allows for preventive measures to be taken, thus avoiding the occurrence of the tail biting event. This study aimed to build a machine-learning algorithm for real time detection of upcoming tail biting outbreaks, using feeding behaviour data recorded by an electronic feeder. Prediction capacities of seven machine learning algorithms (e.g., random forest, neural networks) were evaluated from daily feeding data collected from 65 pens originating from 2 herds of grower-finisher pigs (25-100kg), in which 27 tail biting events occurred. Data were divided into training and testing data, either by randomly splitting data into 75% (training set) and 25% (testing set), or by randomly selecting pens to constitute the testing set. The random forest algorithm was able to predict 70% of the upcoming events with an accuracy of 94%, when predicting events in pens for which it had previous data. The detection of events for unknown pens was less sensitive, and the neural network model was able to detect 14% of the upcoming events with an accuracy of 63%. A machine-learning algorithm based on ongoing data collection should be considered for implementation into automatic feeder systems for real time prediction of tail biting events.


2021 ◽  
Author(s):  
Saad Ur Rehman Baig ◽  
Muhammad Wasif ◽  
Anis Fatima ◽  
Mirza Muhammad Anas Baig ◽  
Syed Amir Iqbal

Abstract Sheet metal bending is a typical operation and springback is an unintended consequence of this operation. Since it causes fitting issues in the assembly, which leads to quality problems, anticipating it long before the bending operation is done is essential in today's production, so that machining parameters can be adjusted accordingly. In order to predict springback with minimum errors, this paper presents the idea for the development of machine learning models using tree-based learning algorithms (A class of machine learning algorithms). Tree-based learning algorithms are employed because they are precise, consistent, and easy to understand. Experimental studies provided the data for training and testing the models. The model's input parameters were sheet Material, Thickness, Width, Initial Angle (Desired angle), and Machine used to perform the bending. Following the training and testing of different tree-based learning algorithms, the results were evaluated using MAE and MSE. It was determined that Gradient boosting algorithms (a class of tree-based learning) gave the best results. Later on further evaluation of algorithms, it was found that LightGBM produced the best results, with MAE and MSE of 0.41 and 0.25, respectively.


2020 ◽  
Vol 214 ◽  
pp. 02047
Author(s):  
Haoxuan Li ◽  
Xueyan Zhang ◽  
Ziyan Li ◽  
Chunyuan Zheng

In recent years, many scholars have used different methods to predict and select stocks. Empirical studies have shown that in multi-factor models, machine learning algorithms perform better on stock selection than traditional statistical methods. This article selects six classic machine learning algorithms, and takes the CSI 500 component stocks as an example, using 19 factors to select stocks. In this article, we introduce four of these algorithms in detail and apply them to select stocks. Finally, we back-test six machine learning algorithms, list the data, analyze the performance of each algorithm, and put forward some ideas on the direction of machine learning algorithm improvement.


2019 ◽  
Author(s):  
Mohammed Moreb ◽  
Oguz Ata

Abstract Background We propose a novel framework for health Informatics: framework and methodology of Software Engineering for machine learning in Health Informatics (SEMLHI). This framework shed light on its features, that allow users to study and analyze the requirements, determine the function of objects related to the system and determine the machine learning algorithms that will be used for the dataset.Methods Based on original data that collected from the hospital in Palestine government in the past three years, first the data validated and all outlier removed, analyzed using develop framework in order to compare ML provide patients with real-time. Our proposed module comparison with three Systems Engineering Methods Vee, agile and SEMLHI. The result used by implement prototype system, which require machine learning algorithm, after development phase, questionnaire deliver to developer to indicate the result using three methodology. SEMLHI framework, is composed into four components: software, machine learning model, machine learning algorithms, and health informatics data, Machine learning Algorithm component used five algorithms use to evaluate the accuracy for machine learning models on component.Results we compare our approach with the previously published systems in terms of performance to evaluate the accuracy for machine learning models, the results of accuracy with different algorithms applied for 750 case, linear SVG have about 0.57 value compared with KNeighbors classifier, logistic regression, multinomial NB, random forest classifier. This research investigates the interaction between SE, and ML within the context of health informatics, our proposed framework define the methodology for developers to analyzing and developing software for the health informatic model, and create a space, in which software engineering, and ML experts could work on the ML model lifecycle, on the disease level and the subtype level.Conclusions This article is an ongoing effort towards defining and translating an existing research pipeline into four integrated modules, as framework system using the dataset from healthcare to reduce cost estimation by using a new suggested methodology. The framework is available as open source software, licensed under GNU General Public License Version 3 to encourage others to contribute to the future development of the SEMLHI framework.


2021 ◽  
Author(s):  
Arvind Thorat

<div>In the above research paper we describe the how machine learning algorithm can be applied to cyber security purpose, like how to detect malware, botnet. How can we recognize strong password for our system. And detail implementation of Artificial Intelligence and machine learning algorithms is mentioned.</div>


2021 ◽  
Author(s):  
Omar Alfarisi ◽  
Zeyar Aung ◽  
Mohamed Sassi

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneity, we identified Random Forest, among others, to be the best algorithm.


2021 ◽  
pp. rapm-2021-102715
Author(s):  
Haoyan Zhong ◽  
Jashvant Poeran ◽  
Alex Gu ◽  
Lauren A Wilson ◽  
Alejandro Gonzalez Della Valle ◽  
...  

BackgroundWith continuing financial and regulatory pressures, practice of ambulatory total hip arthroplasty is increasing. However, studies focusing on selection of optimal candidates are burdened by limitations related to traditional statistical approaches. Hereby we aimed to apply machine learning algorithm to identify characteristics associated with optimal candidates.MethodsThis retrospective cohort study included elective total hip arthroplasty (n=63 859) recorded in National Surgical Quality Improvement Program dataset from 2017 to 2018. The main outcome was length of stay. A total of 40 candidate variables were considered. We applied machine learning algorithms (multivariable logistic regression, artificial neural networks, and random forest models) to predict length of stay=0 day. Models’ accuracies and area under the curve were calculated.ResultsApplying machine learning models to compare length of stay=0 day to length of stay=1–3 days cases, we found area under the curve of 0.715, 0.762, and 0.804, accuracy of 0.65, 0.73, and 0.81 for logistic regression, artificial neural networks, and random forest model, respectively. Regarding the most important predictive features, anesthesia type, body mass index, age, ethnicity, white blood cell count, sodium level, and alkaline phosphatase were highlighted in machine learning models.ConclusionsMachine learning algorithm exhibited acceptable model quality and accuracy. Machine learning algorithms highlighted the as yet unrecognized impact of laboratory testing on future patient ambulatory pathway assignment.


Sign in / Sign up

Export Citation Format

Share Document