scholarly journals Comparing different supervised machine learning algorithms for disease prediction

Author(s):  
Shahadat Uddin ◽  
Arif Khan ◽  
Md Ekramul Hossain ◽  
Mohammad Ali Moni

Abstract Background Supervised machine learning algorithms have been a dominant method in the data mining field. Disease prediction using health data has recently shown a potential application area for these methods. This study aims to identify the key trends among different types of supervised machine learning algorithms, and their performance and usage for disease risk prediction. Methods In this study, extensive research efforts were made to identify those studies that applied more than one supervised machine learning algorithm on single disease prediction. Two databases (i.e., Scopus and PubMed) were searched for different types of search items. Thus, we selected 48 articles in total for the comparison among variants supervised machine learning algorithms for disease prediction. Results We found that the Support Vector Machine (SVM) algorithm is applied most frequently (in 29 studies) followed by the Naïve Bayes algorithm (in 23 studies). However, the Random Forest (RF) algorithm showed superior accuracy comparatively. Of the 17 studies where it was applied, RF showed the highest accuracy in 9 of them, i.e., 53%. This was followed by SVM which topped in 41% of the studies it was considered. Conclusion This study provides a wide overview of the relative performance of different variants of supervised machine learning algorithms for disease prediction. This important information of relative performance can be used to aid researchers in the selection of an appropriate supervised machine learning algorithm for their studies.

2018 ◽  
Vol 7 (4.15) ◽  
pp. 400 ◽  
Author(s):  
Thuy Nguyen Thi Thu ◽  
Vuong Dang Xuan

The exchange rate of each money pair can be predicted by using machine learning algorithm during classification process. With the help of supervised machine learning model, the predicted uptrend or downtrend of FoRex rate might help traders to have right decision on FoRex transactions. The installation of machine learning algorithms in the FoRex trading online market can automatically make the transactions of buying/selling. All the transactions in the experiment are performed by using scripts added-on in transaction application. The capital, profits results of use support vector machine (SVM) models are higher than the normal one (without use of SVM). 


2021 ◽  
Vol 2021 ◽  
pp. 1-18
Author(s):  
Aurelle Tchagna Kouanou ◽  
Thomas Mih Attia ◽  
Cyrille Feudjio ◽  
Anges Fleurio Djeumo ◽  
Adèle Ngo Mouelas ◽  
...  

Background and Objective. To mitigate the spread of the virus responsible for COVID-19, known as SARS-CoV-2, there is an urgent need for massive population testing. Due to the constant shortage of PCR (polymerase chain reaction) test reagents, which are the tests for COVID-19 by excellence, several medical centers have opted for immunological tests to look for the presence of antibodies produced against this virus. However, these tests have a high rate of false positives (positive but actually negative test results) and false negatives (negative but actually positive test results) and are therefore not always reliable. In this paper, we proposed a solution based on Data Analysis and Machine Learning to detect COVID-19 infections. Methods. Our analysis and machine learning algorithm is based on most cited two clinical datasets from the literature: one from San Raffaele Hospital Milan Italia and the other from Hospital Israelita Albert Einstein São Paulo Brasilia. The datasets were processed to select the best features that most influence the target, and it turned out that almost all of them are blood parameters. EDA (Exploratory Data Analysis) methods were applied to the datasets, and a comparative study of supervised machine learning models was done, after which the support vector machine (SVM) was selected as the one with the best performance. Results. SVM being the best performant is used as our proposed supervised machine learning algorithm. An accuracy of 99.29%, sensitivity of 92.79%, and specificity of 100% were obtained with the dataset from Kaggle (https://www.kaggle.com/einsteindata4u/covid19) after applying optimization to SVM. The same procedure and work were performed with the dataset taken from San Raffaele Hospital (https://zenodo.org/record/3886927#.YIluB5AzbMV). Once more, the SVM presented the best performance among other machine learning algorithms, and 92.86%, 93.55%, and 90.91% for accuracy, sensitivity, and specificity, respectively, were obtained. Conclusion. The obtained results, when compared with others from the literature based on these same datasets, are superior, leading us to conclude that our proposed solution is reliable for the COVID-19 diagnosis.


2021 ◽  
Author(s):  
Omar Alfarisi ◽  
Zeyar Aung ◽  
Mohamed Sassi

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneity, we identified Random Forest, among others, to be the best algorithm.


Author(s):  
Stuti Pandey ◽  
Abhay Kumar Agarwal

Cardiovascular disease prediction is a research field of healthcare which depends on a large volume of data for making effective and accurate predictions. These predictions can be more effective and accurate when used with machine learning algorithms because it can disclose all the concealed facts which are helpful in making decisions. The processing capabilities of machine learning algorithms are also very fast which is almost infeasible for human beings. Therefore, the work presented in this research focuses on identifying the best machine learning algorithm by comparing their performances for predicting cardiovascular diseases in a reasonable time. The machine learning algorithms which have been used in the presented work are naïve Bayes, support vector machine, k-nearest neighbors, and random forest. The dataset which has been utilized for this comparison is taken from the University of California, Irvine (UCI) machine learning repository named “Heart Disease Data Set.”


2013 ◽  
Vol 10 (2) ◽  
pp. 1376-1383
Author(s):  
Dr.Vijay Pal Dhaka ◽  
Swati Agrawal

Maintainability is an important quality attribute and a difficult concept as it involves a number of measurements. Quality estimation means estimating maintainability of software. Maintainability is a set of attribute that bear on the effort needed to make specified modification. The main goal of this paper is to propose use of few machine learning algorithms with an objective to predict software maintainability and evaluate them. The propose models are Gaussian process regression networks (GPRN), probably approximately correct learning (PAC), Genetic algorithm (GA). This paper predicts the maintenance effort. The QUES (Quality evaluation system) dataset are used in this study. The QUES datasets contains 71 classes. To measure the maintainability, number of “CHANGE” is observed over a period of few years. We can define CHANGE as the number of lines of code which were added, deleted or modified during few year maintenance periods. After this study these machine learning algorithm was compared with few models such as GRNN (General regression neural network) model, RT (Regression tree), MARS (Multiple adaptive regression splines), SVM (Support vector machine), MLR (Multiple linear regression) models. Based on experiments, it was found that GPRN can be predicting the maintainability more accurately and precisely than prevailing models. We also include object oriented software metric to measure the software maintainability. The use of machine learning algorithms to establish the relationship between metrics and maintainability would be much better approach as these are based on quantity as well as quality. 


2021 ◽  
Author(s):  
Omar Alfarisi ◽  
Zeyar Aung ◽  
Mohamed Sassi

For defining the optimal machine learning algorithm, the decision was not easy for which we shall choose. To help future researchers, we describe in this paper the optimal among the best of the algorithms. We built a synthetic data set and performed the supervised machine learning runs for five different algorithms. For heterogeneity, we identified Random Forest, among others, to be the best algorithm.


The aim of this research is to do risk modelling after analysis of twitter posts based on certain sentiment analysis. In this research we analyze posts of several users or a particular user to check whether they can be cause of concern to the society or not. Every sentiment like happy, sad, anger and other emotions are going to provide scaling of severity in the conclusion of final table on which machine learning algorithm is applied. The data which is put under the machine learning algorithms are been monitored over a period of time and it is related to a particular topic in an area


Author(s):  
Virendra Tiwari ◽  
Balendra Garg ◽  
Uday Prakash Sharma

The machine learning algorithms are capable of managing multi-dimensional data under the dynamic environment. Despite its so many vital features, there are some challenges to overcome. The machine learning algorithms still requires some additional mechanisms or procedures for predicting a large number of new classes with managing privacy. The deficiencies show the reliable use of a machine learning algorithm relies on human experts because raw data may complicate the learning process which may generate inaccurate results. So the interpretation of outcomes with expertise in machine learning mechanisms is a significant challenge in the machine learning algorithm. The machine learning technique suffers from the issue of high dimensionality, adaptability, distributed computing, scalability, the streaming data, and the duplicity. The main issue of the machine learning algorithm is found its vulnerability to manage errors. Furthermore, machine learning techniques are also found to lack variability. This paper studies how can be reduced the computational complexity of machine learning algorithms by finding how to make predictions using an improved algorithm.


2019 ◽  
Vol 1 (1) ◽  
pp. 384-399 ◽  
Author(s):  
Thais de Toledo ◽  
Nunzio Torrisi

The Distributed Network Protocol (DNP3) is predominately used by the electric utility industry and, consequently, in smart grids. The Peekaboo attack was created to compromise DNP3 traffic, in which a man-in-the-middle on a communication link can capture and drop selected encrypted DNP3 messages by using support vector machine learning algorithms. The communication networks of smart grids are a important part of their infrastructure, so it is of critical importance to keep this communication secure and reliable. The main contribution of this paper is to compare the use of machine learning techniques to classify messages of the same protocol exchanged in encrypted tunnels. The study considers four simulated cases of encrypted DNP3 traffic scenarios and four different supervised machine learning algorithms: Decision tree, nearest-neighbor, support vector machine, and naive Bayes. The results obtained show that it is possible to extend a Peekaboo attack over multiple substations, using a decision tree learning algorithm, and to gather significant information from a system that communicates using encrypted DNP3 traffic.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1557 ◽  
Author(s):  
Ilaria Conforti ◽  
Ilaria Mileti ◽  
Zaccaria Del Prete ◽  
Eduardo Palermo

Ergonomics evaluation through measurements of biomechanical parameters in real time has a great potential in reducing non-fatal occupational injuries, such as work-related musculoskeletal disorders. Assuming a correct posture guarantees the avoidance of high stress on the back and on the lower extremities, while an incorrect posture increases spinal stress. Here, we propose a solution for the recognition of postural patterns through wearable sensors and machine-learning algorithms fed with kinematic data. Twenty-six healthy subjects equipped with eight wireless inertial measurement units (IMUs) performed manual material handling tasks, such as lifting and releasing small loads, with two postural patterns: correctly and incorrectly. Measurements of kinematic parameters, such as the range of motion of lower limb and lumbosacral joints, along with the displacement of the trunk with respect to the pelvis, were estimated from IMU measurements through a biomechanical model. Statistical differences were found for all kinematic parameters between the correct and the incorrect postures (p < 0.01). Moreover, with the weight increase of load in the lifting task, changes in hip and trunk kinematics were observed (p < 0.01). To automatically identify the two postures, a supervised machine-learning algorithm, a support vector machine, was trained, and an accuracy of 99.4% (specificity of 100%) was reached by using the measurements of all kinematic parameters as features. Meanwhile, an accuracy of 76.9% (specificity of 76.9%) was reached by using the measurements of kinematic parameters related to the trunk body segment.


Sign in / Sign up

Export Citation Format

Share Document