Comparing different supervised machine learning algorithms for disease prediction

Abstract Background Supervised machine learning algorithms have been a dominant method in the data mining field. Disease prediction using health data has recently shown a potential application area for these methods. This study aims to identify the key trends among different types of supervised machine learning algorithms, and their performance and usage for disease risk prediction. Methods In this study, extensive research efforts were made to identify those studies that applied more than one supervised machine learning algorithm on single disease prediction. Two databases (i.e., Scopus and PubMed) were searched for different types of search items. Thus, we selected 48 articles in total for the comparison among variants supervised machine learning algorithms for disease prediction. Results We found that the Support Vector Machine (SVM) algorithm is applied most frequently (in 29 studies) followed by the Naïve Bayes algorithm (in 23 studies). However, the Random Forest (RF) algorithm showed superior accuracy comparatively. Of the 17 studies where it was applied, RF showed the highest accuracy in 9 of them, i.e., 53%. This was followed by SVM which topped in 41% of the studies it was considered. Conclusion This study provides a wide overview of the relative performance of different variants of supervised machine learning algorithms for disease prediction. This important information of relative performance can be used to aid researchers in the selection of an appropriate supervised machine learning algorithm for their studies.

Download Full-text

FoRex Trading Using Supervised Machine Learning

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.15.23024 ◽

2018 ◽

Vol 7 (4.15) ◽

pp. 400 ◽

Cited By ~ 1

Author(s):

Thuy Nguyen Thi Thu ◽

Vuong Dang Xuan

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Exchange Rate ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

Machine Learning Algorithm ◽

Machine Learning Model

The exchange rate of each money pair can be predicted by using machine learning algorithm during classification process. With the help of supervised machine learning model, the predicted uptrend or downtrend of FoRex rate might help traders to have right decision on FoRex transactions. The installation of machine learning algorithms in the FoRex trading online market can automatically make the transactions of buying/selling. All the transactions in the experiment are performed by using scripts added-on in transaction application. The capital, profits results of use support vector machine (SVM) models are higher than the normal one (without use of SVM).

Download Full-text

An Overview of Supervised Machine Learning Methods and Data Analysis for COVID-19 Detection

Journal of Healthcare Engineering ◽

10.1155/2021/4733167 ◽

2021 ◽

Vol 2021 ◽

pp. 1-18

Author(s):

Aurelle Tchagna Kouanou ◽

Thomas Mih Attia ◽

Cyrille Feudjio ◽

Anges Fleurio Djeumo ◽

Adèle Ngo Mouelas ◽

...

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

High Rate ◽

Supervised Machine Learning ◽

Polymerase Chain Reaction Test ◽

Support Vector ◽

Machine Learning Algorithm ◽

Test Results

Background and Objective. To mitigate the spread of the virus responsible for COVID-19, known as SARS-CoV-2, there is an urgent need for massive population testing. Due to the constant shortage of PCR (polymerase chain reaction) test reagents, which are the tests for COVID-19 by excellence, several medical centers have opted for immunological tests to look for the presence of antibodies produced against this virus. However, these tests have a high rate of false positives (positive but actually negative test results) and false negatives (negative but actually positive test results) and are therefore not always reliable. In this paper, we proposed a solution based on Data Analysis and Machine Learning to detect COVID-19 infections. Methods. Our analysis and machine learning algorithm is based on most cited two clinical datasets from the literature: one from San Raffaele Hospital Milan Italia and the other from Hospital Israelita Albert Einstein São Paulo Brasilia. The datasets were processed to select the best features that most influence the target, and it turned out that almost all of them are blood parameters. EDA (Exploratory Data Analysis) methods were applied to the datasets, and a comparative study of supervised machine learning models was done, after which the support vector machine (SVM) was selected as the one with the best performance. Results. SVM being the best performant is used as our proposed supervised machine learning algorithm. An accuracy of 99.29%, sensitivity of 92.79%, and specificity of 100% were obtained with the dataset from Kaggle (https://www.kaggle.com/einsteindata4u/covid19) after applying optimization to SVM. The same procedure and work were performed with the dataset taken from San Raffaele Hospital (https://zenodo.org/record/3886927#.YIluB5AzbMV). Once more, the SVM presented the best performance among other machine learning algorithms, and 92.86%, 93.55%, and 90.91% for accuracy, sensitivity, and specificity, respectively, were obtained. Conclusion. The obtained results, when compared with others from the literature based on these same datasets, are superior, leading us to conclude that our proposed solution is reliable for the COVID-19 diagnosis.

Download Full-text

Deducing of Optimal Machine Learning Algorithms for Heterogeneity

10.36227/techrxiv.17162147 ◽

2021 ◽

Author(s):

Omar Alfarisi ◽

Zeyar Aung ◽

Mohamed Sassi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Learning Algorithms ◽

Synthetic Data ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Data Set ◽

Optimal Machine

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneity, we identified Random Forest, among others, to be the best algorithm.

Download Full-text

Comparison of Machine Learning Algorithms for Cardiovascular Disease Prediction

Computational Methodologies for Electrical and Electronics Engineers - Advances in Computer and Electrical Engineering ◽

10.4018/978-1-7998-3327-7.ch009 ◽

2021 ◽

pp. 111-126

Author(s):

Stuti Pandey ◽

Abhay Kumar Agarwal

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Learning Algorithm ◽

Learning Algorithms ◽

Research Field ◽

Machine Learning Algorithms ◽

Support Vector ◽

Disease Prediction ◽

K Nearest Neighbors ◽

The University

Cardiovascular disease prediction is a research field of healthcare which depends on a large volume of data for making effective and accurate predictions. These predictions can be more effective and accurate when used with machine learning algorithms because it can disclose all the concealed facts which are helpful in making decisions. The processing capabilities of machine learning algorithms are also very fast which is almost infeasible for human beings. Therefore, the work presented in this research focuses on identifying the best machine learning algorithm by comparing their performances for predicting cardiovascular diseases in a reasonable time. The machine learning algorithms which have been used in the presented work are naïve Bayes, support vector machine, k-nearest neighbors, and random forest. The dataset which has been utilized for this comparison is taken from the University of California, Irvine (UCI) machine learning repository named “Heart Disease Data Set.”

Download Full-text

Machine Learning Algorithm Evaluate the maintainability

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v10i2.7008 ◽

2013 ◽

Vol 10 (2) ◽

pp. 1376-1383

Author(s):

Dr.Vijay Pal Dhaka ◽

Swati Agrawal

Keyword(s):

Machine Learning ◽

Evaluation System ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

General Regression Neural Network ◽

Support Vector ◽

Machine Learning Algorithm ◽

Linear Regression Models ◽

Software Maintainability

Maintainability is an important quality attribute and a difficult concept as it involves a number of measurements. Quality estimation means estimating maintainability of software. Maintainability is a set of attribute that bear on the effort needed to make specified modification. The main goal of this paper is to propose use of few machine learning algorithms with an objective to predict software maintainability and evaluate them. The propose models are Gaussian process regression networks (GPRN), probably approximately correct learning (PAC), Genetic algorithm (GA). This paper predicts the maintenance effort. The QUES (Quality evaluation system) dataset are used in this study. The QUES datasets contains 71 classes. To measure the maintainability, number of “CHANGE” is observed over a period of few years. We can define CHANGE as the number of lines of code which were added, deleted or modified during few year maintenance periods. After this study these machine learning algorithm was compared with few models such as GRNN (General regression neural network) model, RT (Regression tree), MARS (Multiple adaptive regression splines), SVM (Support vector machine), MLR (Multiple linear regression) models. Based on experiments, it was found that GPRN can be predicting the maintainability more accurately and precisely than prevailing models. We also include object oriented software metric to measure the software maintainability. The use of machine learning algorithms to establish the relationship between metrics and maintainability would be much better approach as these are based on quantity as well as quality.

Download Full-text

Deducing of Optimal Machine Learning Algorithms for Heterogeneity

10.36227/techrxiv.17162147.v1 ◽

2021 ◽

Author(s):

Omar Alfarisi ◽

Zeyar Aung ◽

Mohamed Sassi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithm ◽

Learning Algorithms ◽

Synthetic Data ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Algorithm ◽

Data Set ◽

Optimal Machine

Download Full-text

Risk Monitoring and Quantitative Results of Various Attributes of Machine Learning Algorithms with a Time Series Data

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9570.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 4018-4022

Keyword(s):

Machine Learning ◽

Time Series Data ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Series Data ◽

Machine Learning Algorithm ◽

Risk Modelling ◽

Risk Monitoring ◽

Quantitative Results

The aim of this research is to do risk modelling after analysis of twitter posts based on certain sentiment analysis. In this research we analyze posts of several users or a particular user to check whether they can be cause of concern to the society or not. Every sentiment like happy, sad, anger and other emotions are going to provide scaling of severity in the conclusion of final table on which machine learning algorithm is applied. The data which is put under the machine learning algorithms are been monitored over a period of time and it is related to a particular topic in an area

Download Full-text

Significant Impact of Improved Machine Learning Algorithm in The Processes of Large Data Sets

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206133 ◽

2020 ◽

pp. 458-467

Author(s):

Virendra Tiwari ◽

Balendra Garg ◽

Uday Prakash Sharma

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Dynamic Environment ◽

Large Data ◽

Machine Learning Algorithms ◽

Streaming Data ◽

Machine Learning Techniques ◽

Machine Learning Algorithm ◽

Learning Mechanisms

The machine learning algorithms are capable of managing multi-dimensional data under the dynamic environment. Despite its so many vital features, there are some challenges to overcome. The machine learning algorithms still requires some additional mechanisms or procedures for predicting a large number of new classes with managing privacy. The deficiencies show the reliable use of a machine learning algorithm relies on human experts because raw data may complicate the learning process which may generate inaccurate results. So the interpretation of outcomes with expertise in machine learning mechanisms is a significant challenge in the machine learning algorithm. The machine learning technique suffers from the issue of high dimensionality, adaptability, distributed computing, scalability, the streaming data, and the duplicity. The main issue of the machine learning algorithm is found its vulnerability to manage errors. Furthermore, machine learning techniques are also found to lack variability. This paper studies how can be reduced the computational complexity of machine learning algorithms by finding how to make predictions using an improved algorithm.

Download Full-text

Encrypted DNP3 Traffic Classification Using Supervised Machine Learning Algorithms

Machine Learning and Knowledge Extraction ◽

10.3390/make1010022 ◽

2019 ◽

Vol 1 (1) ◽

pp. 384-399 ◽

Cited By ~ 2

Author(s):

Thais de Toledo ◽

Nunzio Torrisi

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Decision Tree ◽

Smart Grids ◽

Learning Algorithms ◽

Electric Utility ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

Communication Link

The Distributed Network Protocol (DNP3) is predominately used by the electric utility industry and, consequently, in smart grids. The Peekaboo attack was created to compromise DNP3 traffic, in which a man-in-the-middle on a communication link can capture and drop selected encrypted DNP3 messages by using support vector machine learning algorithms. The communication networks of smart grids are a important part of their infrastructure, so it is of critical importance to keep this communication secure and reliable. The main contribution of this paper is to compare the use of machine learning techniques to classify messages of the same protocol exchanged in encrypted tunnels. The study considers four simulated cases of encrypted DNP3 traffic scenarios and four different supervised machine learning algorithms: Decision tree, nearest-neighbor, support vector machine, and naive Bayes. The results obtained show that it is possible to extend a Peekaboo attack over multiple substations, using a decision tree learning algorithm, and to gather significant information from a system that communicates using encrypted DNP3 traffic.

Download Full-text

Measuring Biomechanical Risk in Lifting Load Tasks Through Wearable System and Machine-Learning Approach

Sensors ◽

10.3390/s20061557 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1557 ◽

Cited By ~ 4

Author(s):

Ilaria Conforti ◽

Ilaria Mileti ◽

Zaccaria Del Prete ◽

Eduardo Palermo

Keyword(s):

Machine Learning ◽

Material Handling ◽

Learning Algorithm ◽

Occupational Injuries ◽

Wearable Sensors ◽

High Stress ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

Kinematic Parameters

Ergonomics evaluation through measurements of biomechanical parameters in real time has a great potential in reducing non-fatal occupational injuries, such as work-related musculoskeletal disorders. Assuming a correct posture guarantees the avoidance of high stress on the back and on the lower extremities, while an incorrect posture increases spinal stress. Here, we propose a solution for the recognition of postural patterns through wearable sensors and machine-learning algorithms fed with kinematic data. Twenty-six healthy subjects equipped with eight wireless inertial measurement units (IMUs) performed manual material handling tasks, such as lifting and releasing small loads, with two postural patterns: correctly and incorrectly. Measurements of kinematic parameters, such as the range of motion of lower limb and lumbosacral joints, along with the displacement of the trunk with respect to the pelvis, were estimated from IMU measurements through a biomechanical model. Statistical differences were found for all kinematic parameters between the correct and the incorrect postures (p < 0.01). Moreover, with the weight increase of load in the lifting task, changes in hip and trunk kinematics were observed (p < 0.01). To automatically identify the two postures, a supervised machine-learning algorithm, a support vector machine, was trained, and an accuracy of 99.4% (specificity of 100%) was reached by using the measurements of all kinematic parameters as features. Meanwhile, an accuracy of 76.9% (specificity of 76.9%) was reached by using the measurements of kinematic parameters related to the trunk body segment.

Download Full-text