Comparison of classical machine learning algorithms in the task of handwritten digits classification

Machine learning in the diagnosis of Myocardial Infarction with Non-Obstructive Coronary Arteries

European Heart Journal ◽

10.1093/eurheartj/ehab724.3067 ◽

2021 ◽

Vol 42 (Supplement_1) ◽

Author(s):

M J Espinosa Pascual ◽

P Vaquero Martinez ◽

V Vaquero Martinez ◽

J Lopez Pais ◽

B Izquierdo Coronel ◽

...

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Support Vector Machine ◽

Logistic Regression ◽

Random Forest ◽

Obstructive Coronary Artery Disease ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector

Abstract Introduction Out of all patients admitted with Myocardial Infarction, 10 to 15% have Myocardial Infarction with Non-Obstructive Coronaries Arteries (MINOCA). Classification algorithms based on deep learning substantially exceed traditional diagnostic algorithms. Therefore, numerous machine learning models have been proposed as useful tools for the detection of various pathologies, but to date no study has proposed a diagnostic algorithm for MINOCA. Purpose The aim of this study was to estimate the diagnostic accuracy of several automated learning algorithms (Support-Vector Machine [SVM], Random Forest [RF] and Logistic Regression [LR]) to discriminate between people suffering from MINOCA from those with Myocardial Infarction with Obstructive Coronary Artery Disease (MICAD) at the time of admission and before performing a coronary angiography, whether invasive or not. Methods A Diagnostic Test Evaluation study was carried out applying the proposed algorithms to a database constituted by 553 consecutive patients admitted to our Hospital with Myocardial Infarction. According to the definitions of 2016 ESC Position Paper on MINOCA, patients were classified into two groups: MICAD and MINOCA. Out of the total 553 patients, 214 were discarded due to the lack of complete data. The set of machine learning algorithms was trained on 244 patients (training sample: 75%) and tested on 80 patients (test sample: 25%). A total of 64 variables were available for each patient, including demographic, clinical and laboratorial features before the angiographic procedure. Finally, the diagnostic precision of each architecture was taken. Results The most accurate classification model was the Random Forest algorithm (Specificity [Sp] 0.88, Sensitivity [Se] 0.57, Negative Predictive Value [NPV] 0.93, Area Under the Curve [AUC] 0.85 [CI 0.83–0.88]) followed by the standard Logistic Regression (Sp 0.76, Se 0.57, NPV 0.92 AUC 0.74 and Support-Vector Machine (Sp 0.84, Se 0.38, NPV 0.90, AUC 0.78) (see graph). The variables that contributed the most in order to discriminate a MINOCA from a MICAD were the traditional cardiovascular risk factors, biomarkers of myocardial injury, hemoglobin and gender. Results were similar when the 19 patients with Takotsubo syndrome were excluded from the analysis. Conclusion A prediction system for diagnosing MINOCA before performing coronary angiographies was developed using machine learning algorithms. Results show higher accuracy of diagnosing MINOCA than conventional statistical methods. This study supports the potential of machine learning algorithms in clinical cardiology. However, further studies are required in order to validate our results. FUNDunding Acknowledgement Type of funding sources: None. ROC curves of different algorithms

Download Full-text

Statistical Analysis for Selective Identifications of VOCs by Using Surface Functionalized MoS2 Based Sensor Array

Chemistry Proceedings ◽

10.3390/csac2021-10451 ◽

2021 ◽

Vol 5 (1) ◽

pp. 35

Author(s):

Uttam Narendra Thakur ◽

Radha Bhardwaj ◽

Arnab Hazra

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Decision Tree ◽

Sensor Array ◽

Multinomial Logistic Regression ◽

Learning Algorithms ◽

Disease Diagnosis ◽

Machine Learning Algorithms ◽

Human Breath

Disease diagnosis through breath analysis has attracted significant attention in recent years due to its noninvasive nature, rapid testing ability, and applicability for patients of all ages. More than 1000 volatile organic components (VOCs) exist in human breath, but only selected VOCs are associated with specific diseases. Selective identification of those disease marker VOCs using an array of multiple sensors are highly desirable in the current scenario. The use of efficient sensors and the use of suitable classification algorithms is essential for the selective and reliable detection of those disease markers in complex breath. In the current study, we fabricated a noble metal (Au, Pd and Pt) nanoparticle-functionalized MoS2 (Chalcogenides, Sigma Aldrich, St. Louis, MO, USA)-based sensor array for the selective identification of different VOCs. Four sensors, i.e., pure MoS2, Au/MoS2, Pd/MoS2, and Pt/MoS2 were tested under exposure to different VOCs, such as acetone, benzene, ethanol, xylene, 2-propenol, methanol and toluene, at 50 °C. Initially, principal component analysis (PCA) and linear discriminant analysis (LDA) were used to discriminate those seven VOCs. As compared to the PCA, LDA was able to discriminate well between the seven VOCs. Four different machine learning algorithms such as k-nearest neighbors (kNN), decision tree, random forest, and multinomial logistic regression were used to further identify those VOCs. The classification accuracy of those seven VOCs using KNN, decision tree, random forest, and multinomial logistic regression was 97.14%, 92.43%, 84.1%, and 98.97%, respectively. These results authenticated that multinomial logistic regression performed best between the four machine learning algorithms to discriminate and differentiate the multiple VOCs that generally exist in human breath.

Download Full-text

An Effective Stratified K-Fold Algorithm with Logistic Regression for Drug Feedback Data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f8166.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 1964-1968

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Pharmaceutical Industry ◽

Naive Bayes ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

New Approach ◽

Feedback Data ◽

Drug Review

Drug reviews are commonly used in pharmaceutical industry to improve the medications given to patients. Generally, drug review contains details of drug name, usage, ratings and comments by the patients. However, these reviews are not clean, and there is a need to improve the cleanness of the review so that they can be benefited for both pharmacists and patients. To do this, we propose a new approach that includes different steps. First, we add extra parameters in the review data by applying VADER sentimental analysis to clean the review data. Then, we apply different machine learning algorithms, namely linear SVC, logistic regression, SVM, random forest, and Naive Bayes on the drug review specify dataset names. However, we found that the accuracy of these algorithms for these datasets is limited. To improve this, we apply stratified K-fold algorithm in combination with Logistic regression. With this approach, the accuracy is increased to 96%.

Download Full-text

Prediction of Prostate Cancer using Machine Learning Algorithms

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e6754.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 5353-5362

Keyword(s):

Prostate Cancer ◽

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Missing Values ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Cancer Disease ◽

The Individual

Background/Aim: Prostate cancer is regarded as the most prevalent cancer in the word and the main cause of deaths worldwide. The early strategies for estimating the prostate cancer sicknesses helped in settling on choices about the progressions to have happened in high-chance patients which brought about the decrease of their dangers. Methods: In the proposed research, we have considered informational collection from kaggle and we have done pre-processing tasks for missing values .We have three missing data values in compactness attribute and two missing values in fractal dimension were replaced by mean of their column values .The performance of the diagnosis model is obtained by using methods like classification, accuracy, sensitivity and specificity analysis. This paper proposes a prediction model to predict whether a people have a prostate cancer disease or not and to provide an awareness or diagnosis on that. This is done by comparing the accuracies of applying rules to the individual results of Support Vector Machine, Random forest, Naive Bayes classifier and logistic regression on the dataset taken in a region to present an accurate model of predicting prostate cancer disease. Results: The machine learning algorithms under study were able to predict prostate cancer disease in patients with accuracy between 70% and 90%. Conclusions: It was shown that Logistic Regression and Random Forest both has better Accuracy (90%) when compared to different Machine-learning Algorithms.

Download Full-text

Application of Data Mining Algorithms for Dementia in People with HIV/AIDS

Computational and Mathematical Methods in Medicine ◽

10.1155/2021/4602465 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Luana Ibiapina Cordeiro Calíope Pinheiro ◽

Maria Lúcia Duarte Pereira ◽

Marcial Porto Fernandez ◽

Francisco Mardônio Vieira Filho ◽

Wilson Jorge Correia Pinto de Abreu ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Data Mining ◽

Logistic Regression ◽

Random Forest ◽

Decision Tree ◽

Learning Algorithms ◽

Principal Component ◽

Machine Learning Algorithms ◽

Hiv Aids

Dementia interferes with the individual’s motor, behavioural, and intellectual functions, causing him to be unable to perform instrumental activities of daily living. This study is aimed at identifying the best performing algorithm and the most relevant characteristics to categorise individuals with HIV/AIDS at high risk of dementia from the application of data mining. Principal component analysis (PCA) algorithm was used and tested comparatively between the following machine learning algorithms: logistic regression, decision tree, neural network, KNN, and random forest. The database used for this study was built from the data collection of 270 individuals infected with HIV/AIDS and followed up at the outpatient clinic of a reference hospital for infectious and parasitic diseases in the State of Ceará, Brazil, from January to April 2019. Also, the performance of the algorithms was analysed for the 104 characteristics available in the database; then, with the reduction of dimensionality, there was an improvement in the quality of the machine learning algorithms and identified that during the tests, even losing about 30% of the variation. Besides, when considering only 23 characteristics, the precision of the algorithms was 86% in random forest, 56% logistic regression, 68% decision tree, 60% KNN, and 59% neural network. The random forest algorithm proved to be more effective than the others, obtaining 84% precision and 86% accuracy.

Download Full-text

Complex Machine-Learning Algorithms and Multivariable Logistic Regression on Par in the Prediction of Insufficient Clinical Response to Methotrexate in Rheumatoid Arthritis

Journal of Personalized Medicine ◽

10.3390/jpm11010044 ◽

2021 ◽

Vol 11 (1) ◽

pp. 44

Author(s):

Helen R. Gosselt ◽

Maxime M. A. Verhoeven ◽

Maja Bulatović-Ćalasan ◽

Paco M. Welsing ◽

Maurits C. F. J. de Rotte ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Combination Therapy ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Multivariable Logistic Regression ◽

Extreme Gradient Boosting ◽

Insufficient Response

The goals of this study were to examine whether machine-learning algorithms outperform multivariable logistic regression in the prediction of insufficient response to methotrexate (MTX); secondly, to examine which features are essential for correct prediction; and finally, to investigate whether the best performing model specifically identifies insufficient responders to MTX (combination) therapy. The prediction of insufficient response (3-month Disease Activity Score 28-Erythrocyte-sedimentation rate (DAS28-ESR) > 3.2) was assessed using logistic regression, least absolute shrinkage and selection operator (LASSO), random forest, and extreme gradient boosting (XGBoost). The baseline features of 355 rheumatoid arthritis (RA) patients from the “treatment in the Rotterdam Early Arthritis CoHort” (tREACH) and the U-Act-Early trial were combined for analyses. The model performances were compared using area under the curve (AUC) of receiver operating characteristic (ROC) curves, 95% confidence intervals (95% CI), and sensitivity and specificity. Finally, the best performing model following feature selection was tested on 101 RA patients starting tocilizumab (TCZ)-monotherapy. Logistic regression (AUC = 0.77 95% CI: 0.68–0.86) performed as well as LASSO (AUC = 0.76, 95% CI: 0.67–0.85), random forest (AUC = 0.71, 95% CI: 0.61 = 0.81), and XGBoost (AUC = 0.70, 95% CI: 0.61–0.81), yet logistic regression reached the highest sensitivity (81%). The most important features were baseline DAS28 (components). For all algorithms, models with six features performed similarly to those with 16. When applied to the TCZ-monotherapy group, logistic regression’s sensitivity significantly dropped from 83% to 69% (p = 0.03). In the current dataset, logistic regression performed equally well compared to machine-learning algorithms in the prediction of insufficient response to MTX. Models could be reduced to six features, which are more conducive for clinical implementation. Interestingly, the prediction model was specific to MTX (combination) therapy response.

Download Full-text

Comparative analysis of machine learning algorithms in water extraction

Journal of Physics Conference Series ◽

10.1088/1742-6596/2076/1/012045 ◽

2021 ◽

Vol 2076 (1) ◽

pp. 012045

Author(s):

Aimin Li ◽

Meng Fan ◽

Guangduo Qin

Keyword(s):

Neural Network ◽

Machine Learning ◽

Logistic Regression ◽

Comparative Analysis ◽

Random Forest ◽

Decision Tree ◽

Water Body ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector

Abstract There are many traditional methods available for water body extraction based on remote sensing images, such as normalised difference water index (NDWI), modified NDWI (MNDWI), and the multi-band spectrum method, but the accuracy of these methods is limited. In recent years, machine learning algorithms have developed rapidly and been applied widely. Using Landsat-8 images, models such as decision tree, logistic regression, a random forest, neural network, support vector method (SVM), and Xgboost were adopted in the present research within machine learning algorithms. Based on this, through cross validation and a grid search method, parameters were determined for each model.Moreover, the merits and demerits of several models in water body extraction were discussed and a comparative analysis was performed with three methods for determining thresholds in the traditional NDWI. The results show that the neural network has excellent performances and is a stable model, followed by the SVM and the logistic regression algorithm. Furthermore, the ensemble algorithms including the random forest and Xgboost were affected by sample distribution and the model of the decision tree returned the poorest performance.

Download Full-text

Application of Convolutional Neural Network Algorithms for Advancing Sedentary and Activity Bout Classification

Journal for the Measurement of Physical Behaviour ◽

10.1123/jmpb.2020-0016 ◽

2020 ◽

pp. 1-9

Author(s):

Supun Nakandala ◽

Marta M. Jankowska ◽

Fatima Tuz-Zahra ◽

John Bellettiere ◽

Jordan A. Carlson ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Logistic Regression ◽

Random Forest ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Feature Engineering ◽

Free Living ◽

Data Set

Background: Machine learning has been used for classification of physical behavior bouts from hip-worn accelerometers; however, this research has been limited due to the challenges of directly observing and coding human behavior “in the wild.” Deep learning algorithms, such as convolutional neural networks (CNNs), may offer better representation of data than other machine learning algorithms without the need for engineered features and may be better suited to dealing with free-living data. The purpose of this study was to develop a modeling pipeline for evaluation of a CNN model on a free-living data set and compare CNN inputs and results with the commonly used machine learning random forest and logistic regression algorithms. Method: Twenty-eight free-living women wore an ActiGraph GT3X+ accelerometer on their right hip for 7 days. A concurrently worn thigh-mounted activPAL device captured ground truth activity labels. The authors evaluated logistic regression, random forest, and CNN models for classifying sitting, standing, and stepping bouts. The authors also assessed the benefit of performing feature engineering for this task. Results: The CNN classifier performed best (average balanced accuracy for bout classification of sitting, standing, and stepping was 84%) compared with the other methods (56% for logistic regression and 76% for random forest), even without performing any feature engineering. Conclusion: Using the recent advancements in deep neural networks, the authors showed that a CNN model can outperform other methods even without feature engineering. This has important implications for both the model’s ability to deal with the complexity of free-living data and its potential transferability to new populations.

Download Full-text

Accelerated Discovery of the Polymer Matrix for Cartilage Repair Through Machine Learning Algorithms

10.21203/rs.3.rs-572145/v1 ◽

2021 ◽

Author(s):

A. Mairpady ◽

Abdel-Hamid I. Mourad ◽

A S Mohammad Sayem Mozumder

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Cartilage Repair ◽

Multinomial Logistic Regression ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Cartilage Tissue ◽

Random Forest Regression ◽

Selection Of

Abstract Cartilage repair is one of the most challenging tasks for the orthopedic surgeons and researchers. The primary challenge lies on the fact that the development of the extracellular matrixes requires specialized cells known as chondrocytes which are sparse in numbers. Chondrocytes’ minimal self-renewal capacity makes it further troublesome and expensive to repair the cartilages. In designing successful substitutes for the cartilages, the selection of materials used for the scaffold fabrication plays the central role among several other important factors in order to ensure the success of the survival and proliferation of any biomaterial substitutes. Since last few decades, polymer and polymers' combination have been extensively used to fabricate such scaffolds and have shown promising results in terms of mechanical integrity and biocompatibility. In an empirical approach, the selection of the most appropriate polymer(s) for cartilage repair is an expensive and time-consuming affair, as traditionally, it requires numerous trials. Moreover, it is humanly impossible to go through the huge library of literature available on the potential polymer(s) and to correlate their physical, mechanical and biological properties that might be suitable for cartilage tissue engineering. With the advancement of machine learning, material design may experience a significant reduction in experimental time and cost. The objective of this study is to implement an inverse design approach to select the best polymer(s) or composites for cartilage repair by using the machine learning algorithms, such as random forest regression (i.e., regression trees) and the multinomial logistic regression. In these algorithms, the mechanical properties of the polymers, which are similar to the cartilages, are considered as the input and the polymer(s)/composites are the predicted output. According to the random forest regression and multinomial logistic regression, the polymer(s)/composites (i.e., the output) having the closest characteristics of the articular cartilages were found to be a composite of polycaprolactone and poly(bisphenol A carbonate) and a blend of polyethylene/polyethylene-graft-poly(maleic anhydride), respectively. These composites exhibit similar biomechanical properties of the natural cartilages and initiate only minimal immune responses in the body environment.

Download Full-text

Incremental and Iterative Learning of Answer Set Programs from Mutually Distinct Examples

Theory and Practice of Logic Programming ◽

10.1017/s1471068418000248 ◽

2018 ◽

Vol 18 (3-4) ◽

pp. 623-637 ◽

Cited By ~ 2

Author(s):

ARINDAM MITRA ◽

CHITTA BARAL

Keyword(s):

Machine Learning ◽

Learning Community ◽

Question Answering ◽

Learning Algorithms ◽

Inductive Logic ◽

Opportunity To Learn ◽

Machine Learning Algorithms ◽

Knowledge Representation And Reasoning ◽

Handwritten Digit ◽

Answer Set

AbstractOver the years the Artificial Intelligence (AI) community has produced several datasets which have given the machine learning algorithms the opportunity to learn various skills across various domains. However, a subclass of these machine learning algorithms that aimed at learning logic programs, namely the Inductive Logic Programming algorithms, have often failed at the task due to the vastness of these datasets. This has impacted the usability of knowledge representation and reasoning techniques in the development of AI systems. In this research, we try to address this scalability issue for the algorithms that learn answer set programs. We present a sound and complete algorithm which takes the input in a slightly different manner and performs an efficient and more user controlled search for a solution. We show via experiments that our algorithm can learn from two popular datasets from machine learning community, namely bAbl (a question answering dataset) and MNIST (a dataset for handwritten digit recognition), which to the best of our knowledge was not previously possible. The system is publicly available athttps://goo.gl/KdWAcV.

Download Full-text