scholarly journals Performance Evaluation of Different Machine Learning Classification Algorithms for Disease Diagnosis

Author(s):  
Munder Abdulatef Al-Hashem ◽  
Ali Mohammad Alqudah ◽  
Qasem Qananwah

Knowledge extraction within a healthcare field is a very challenging task since we are having many problems such as noise and imbalanced datasets. They are obtained from clinical studies where uncertainty and variability are popular. Lately, a wide number of machine learning algorithms are considered and evaluated to check their validity of being used in the medical field. Usually, the classification algorithms are compared against medical experts who are specialized in certain disease diagnoses and provide an effective methodological evaluation of classifiers by applying performance metrics. The performance metrics contain four criteria: accuracy, sensitivity, and specificity forming the confusion matrix of each used algorithm. We have utilized eight different well-known machine learning algorithms to evaluate their performances in six different medical datasets. Based on the experimental results we conclude that the XGBoost and K-Nearest Neighbor classifiers were the best overall among the used datasets and signs can be used for diagnosing various diseases.

Knowledge extraction within a healthcare field is a very challenging task since we are having many problems such as noise and imbalanced datasets. They are obtained from clinical studies where uncertainty and variability are popular. Lately, a wide number of machine learning algorithms are considered and evaluated to check their validity of being used in the medical field. Usually, the classification algorithms are compared against medical experts who are specialized in certain disease diagnoses and provide an effective methodological evaluation of classifiers by applying performance metrics. The performance metrics contain four criteria: accuracy, sensitivity, and specificity forming the confusion matrix of each used algorithm. We have utilized eight different well-known machine learning algorithms to evaluate their performances in six different medical datasets. Based on the experimental results we conclude that the XGBoost and K-Nearest Neighbor classifiers were the best overall among the used datasets and signs can be used for diagnosing various diseases.


2021 ◽  
Author(s):  
Prasannavenkatesan Theerthagiri ◽  
Usha Ruby A ◽  
Vidya J

Abstract Diabetes mellitus is characterized as a chronic disease may cause many complications. The machine learning algorithms are used to diagnosis and predict the diabetes. The learning based algorithms plays a vital role on supporting decision making in disease diagnosis and prediction. In this paper, traditional classification algorithms and neural network based machine learning are investigated for the diabetes dataset. Also, various performance methods with different aspects are evaluated for the K-nearest neighbor, Naive Bayes, extra trees, decision trees, radial basis function, and multilayer perceptron algorithms. It supports the estimation on patients suffering from diabetes in future. The results of this work shows that the multilayer perceptron algorithm gives the highest prediction accuracy with lowest MSE of 0.19. The MLP gives the lowest false positive rate and false negative rate with highest area under curve of 86 %.


2021 ◽  
Vol 7 ◽  
pp. e437
Author(s):  
Arushi Agarwal ◽  
Purushottam Sharma ◽  
Mohammed Alshehri ◽  
Ahmed A. Mohamed ◽  
Osama Alfarraj

In today’s cyber world, the demand for the internet is increasing day by day, increasing the concern of network security. The aim of an Intrusion Detection System (IDS) is to provide approaches against many fast-growing network attacks (e.g., DDoS attack, Ransomware attack, Botnet attack, etc.), as it blocks the harmful activities occurring in the network system. In this work, three different classification machine learning algorithms—Naïve Bayes (NB), Support Vector Machine (SVM), and K-nearest neighbor (KNN)—were used to detect the accuracy and reducing the processing time of an algorithm on the UNSW-NB15 dataset and to find the best-suited algorithm which can efficiently learn the pattern of the suspicious network activities. The data gathered from the feature set comparison was then applied as input to IDS as data feeds to train the system for future intrusion behavior prediction and analysis using the best-fit algorithm chosen from the above three algorithms based on the performance metrics found. Also, the classification reports (Precision, Recall, and F1-score) and confusion matrix were generated and compared to finalize the support-validation status found throughout the testing phase of the model used in this approach.


2021 ◽  
Author(s):  
Prasannavenkatesan Theerthagiri ◽  
Usha Ruby A ◽  
Vidya J

Abstract Diabetes mellitus is characterized as a chronic disease may cause many complications. The machine learning algorithms are used to diagnosis and predict the diabetes. The learning based algorithms plays a vital role on supporting decision making in disease diagnosis and prediction. In this paper, traditional classification algorithms and neural network based machine learning are investigated for the diabetes dataset. Also, various performance methods with different aspects are evaluated for the K-nearest neighbor, Naive Bayes, extra trees, decision trees, radial basis function, and multilayer perceptron algorithms. It supports the estimation on patients suffering from diabetes in future. The results of this work shows that the multilayer perceptron algorithm gives the highest prediction accuracy with lowest MSE of 0.19. The MLP gives the lowest false positive rate and false negative rate with highest area under curve of 86 %.


Symmetry ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 1934
Author(s):  
Kyoung Jong Park

Companies in the same supply chain influence each other, so sharing information enables more efficient supply chain management. An efficient supply chain must have a symmetry of information between participating entities, but in reality, the information is asymmetric, causing problems. The sustainability of the supply chain continues to be threatened because companies are reluctant to disclose information to others. If companies participating in the supply chain do not disclose accurate information, the next best way to improve the sustainability of the supply chain is to use data from the supply chain to determine each enterprise’s information. This study takes data from the supply chain and then uses machine learning algorithms to find which enterprise the data refer to when new data from unknown sources arise. The machine learning algorithms used are logistic regression, random forest, naive Bayes, decision tree, support vector machine, k-nearest neighbor, and multi-layer perceptron. Indicators for evaluating the performance of multi-class classification machine learning methods are accuracy, confusion matrix, precision, recall, and F1-score. The experimental results showed that LR and MLP accurately predicted companies (tiers), but NB, DT, RF, SVM, and K-NN did not accurately predict companies. In addition, the performance similarity of machine learning algorithms through experiments was classified into LR and MLP groups, NB and DT groups, and RF, SVM, and K-NN groups.


2021 ◽  
Author(s):  
Prasannavenkatesan Theerthagiri ◽  
Usha Ruby A ◽  
Vidya J

Abstract Diabetes mellitus is characterized as a chronic disease may cause many complications. The machine learning algorithms are used to diagnosis and predict the diabetes. The learning based algorithms plays a vital role on supporting decision making in disease diagnosis and prediction. In this paper, traditional classification algorithms and neural network based machine learning are investigated for the diabetes dataset. Also, various performance methods with different aspects are evaluated for the K-nearest neighbor, Naive Bayes, extra trees, decision trees, radial basis function, and multilayer perceptron algorithms. It supports the estimation on patients suffering from diabetes in future. The results of this work shows that the multilayer perceptron algorithm gives the highest prediction accuracy with lowest MSE of 0.19. The MLP gives the lowest false positive rate and false negative rate with highest area under curve of 86 %.


Author(s):  
Sandy C. Lauguico ◽  
◽  
Ronnie S. Concepcion II ◽  
Jonnel D. Alejandrino ◽  
Rogelio Ruzcko Tobias ◽  
...  

The arising problem on food scarcity drives the innovation of urban farming. One of the methods in urban farming is the smart aquaponics. However, for a smart aquaponics to yield crops successfully, it needs intensive monitoring, control, and automation. An efficient way of implementing this is the utilization of vision systems and machine learning algorithms to optimize the capabilities of the farming technique. To realize this, a comparative analysis of three machine learning estimators: Logistic Regression (LR), K-Nearest Neighbor (KNN), and Linear Support Vector Machine (L-SVM) was conducted. This was done by modeling each algorithm from the machine vision-feature extracted images of lettuce which were raised in a smart aquaponics setup. Each of the model was optimized to increase cross and hold-out validations. The results showed that KNN having the tuned hyperparameters of n_neighbors=24, weights='distance', algorithm='auto', leaf_size = 10 was the most effective model for the given dataset, yielding a cross-validation mean accuracy of 87.06% and a classification accuracy of 91.67%.


2019 ◽  
Vol 11 (8) ◽  
pp. 976
Author(s):  
Nicholas M. Enwright ◽  
Lei Wang ◽  
Hongqing Wang ◽  
Michael J. Osland ◽  
Laura C. Feher ◽  
...  

Barrier islands are dynamic environments because of their position along the marine–estuarine interface. Geomorphology influences habitat distribution on barrier islands by regulating exposure to harsh abiotic conditions. Researchers have identified linkages between habitat and landscape position, such as elevation and distance from shore, yet these linkages have not been fully leveraged to develop predictive models. Our aim was to evaluate the performance of commonly used machine learning algorithms, including K-nearest neighbor, support vector machine, and random forest, for predicting barrier island habitats using landscape position for Dauphin Island, Alabama, USA. Landscape position predictors were extracted from topobathymetric data. Models were developed for three tidal zones: subtidal, intertidal, and supratidal/upland. We used a contemporary habitat map to identify landscape position linkages for habitats, such as beach, dune, woody vegetation, and marsh. Deterministic accuracy, fuzzy accuracy, and hindcasting were used for validation. The random forest algorithm performed best for intertidal and supratidal/upland habitats, while the K-nearest neighbor algorithm performed best for subtidal habitats. A posteriori application of expert rules based on theoretical understanding of barrier island habitats enhanced model results. For the contemporary model, deterministic overall accuracy was nearly 70%, and fuzzy overall accuracy was over 80%. For the hindcast model, deterministic overall accuracy was nearly 80%, and fuzzy overall accuracy was over 90%. We found machine learning algorithms were well-suited for predicting barrier island habitats using landscape position. Our model framework could be coupled with hydrodynamic geomorphologic models for forecasting habitats with accelerated sea-level rise, simulated storms, and restoration actions.


Diagnostics ◽  
2019 ◽  
Vol 9 (3) ◽  
pp. 104 ◽  
Author(s):  
Ahmed ◽  
Yigit ◽  
Isik ◽  
Alpkocak

Leukemia is a fatal cancer and has two main types: Acute and chronic. Each type has two more subtypes: Lymphoid and myeloid. Hence, in total, there are four subtypes of leukemia. This study proposes a new approach for diagnosis of all subtypes of leukemia from microscopic blood cell images using convolutional neural networks (CNN), which requires a large training data set. Therefore, we also investigated the effects of data augmentation for an increasing number of training samples synthetically. We used two publicly available leukemia data sources: ALL-IDB and ASH Image Bank. Next, we applied seven different image transformation techniques as data augmentation. We designed a CNN architecture capable of recognizing all subtypes of leukemia. Besides, we also explored other well-known machine learning algorithms such as naive Bayes, support vector machine, k-nearest neighbor, and decision tree. To evaluate our approach, we set up a set of experiments and used 5-fold cross-validation. The results we obtained from experiments showed that our CNN model performance has 88.25% and 81.74% accuracy, in leukemia versus healthy and multiclass classification of all subtypes, respectively. Finally, we also showed that the CNN model has a better performance than other wellknown machine learning algorithms.


Sign in / Sign up

Export Citation Format

Share Document