1248. A Machine-Learning Approach to Predict the Cefazolin Inoculum Effect in Methicillin-Susceptible Staphylococcus aureus

Abstract Background The cefazolin (Cz) inoculum effect (CzIE), defined as an increase in the Cz MIC to ≥16 µg/mL at high inoculum (107 CFU/mL), has been associated with poor outcomes in MSSA bacteremia and osteomyelitis. The CzIE is associated with the BlaZ β-lactamase, encoded by blaZ and regulated by BlaR (antibiotic sensor) and BlaI (transcriptional repressor). Here, we aimed to obtain a machine-learning (ML) model to predict the presence of the CzIE based on the nucleotide sequence of the entire bla operon and its regulatory components. Methods Using whole genome sequencing, we analyzed the nucleotide sequences of the entire bla operon in 436 MSSA isolates recovered from blood, soft-tissue infections or pneumonia in adults (training-testing cohort, prevalence of the CzIE: 46%). Also, 32 MSSA recovered from pediatric patients with osteomyelitis with the CzIE were included as validation cohort. The CzIE was determined by broth microdilution at high inoculum. K-mer counts were obtained from the bla operon sequences of the isolates from the testing-training cohort, and then used in a ML pipeline which i) discards uninformative K-mers, ii) identifies optimal hyper-parameters and, iii) performs training of the model using 70% of the sequences as training set and 30% as testing set. The pipeline tested 11 different K-mer sizes and 2 models: Logistic Regression (LR) and Support Vector Machine (SVM). Finally, the model with best predictive ability was applied to the sequences of the MSSA osteomyelitis isolates (validation cohort). Results The ML approach had high specificity ( >90%), accuracy ( >80%) and ROC-AUC values ( >0.7) for detecting the CzIE in the testing set of isolates (Figure 1), independently of the type of model or the K-mer size used. The best predictive ability was with LR using K-mers of 17 nucleotides, with an accuracy of 84%, specificity of 96%, and sensitivity of 70% in the testing set (Figure 2). In the validation cohort, the model was capable to correctly identify all the strains exhibiting the CzIE (100% sensitivity). Figure 1. Prediction metrics of the ML pipeline for the detection of the CzIE in MSSA isolates from the training-test cohort. Predictions are shown accordingly to the model and K-mer sizes tested. Figure 2. ROC of best predictive model (Logistic Regression, K-mer size 17) for the detection of the CzIE in MSSA isolates. Conclusion The ML approach is a promising genomic application to detect the CzIE in MSSA isolates of a variety of sources, bypassing phenotypic testing. Further validation is needed to evaluate its possible utility in clinical settings. Disclosures Jonathon C. McNeil, MD, Agency for Healthcare Research and Quality (Research Grant or Support)Allergan (Grant/Research Support)Nabriva (Grant/Research Support, Other Financial or Material Support, Site PI for a multicenter trial) Anthony R. Flores, MD, MPH, PhD, Nothing to disclose Sheldon L. Kaplan, MD, Pfizer (Research Grant or Support) Cesar A. Arias, M.D., MSc, Ph.D., FIDSA, Entasis Therapeutics (Grant/Research Support)MeMed Diagnostics (Grant/Research Support)Merk (Grant/Research Support) Lorena Diaz, PhD , Nothing to disclose

Download Full-text

Germline BRCA 1-2 status prediction through ovarian ultrasound images radiogenomics: a hypothesis generating study (PROBE study)

Scientific Reports ◽

10.1038/s41598-020-73505-2 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Camilla Nero ◽

Francesca Ciccarone ◽

Luca Boldrini ◽

Jacopo Lenkowicz ◽

Ida Paris ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Univariate Analysis ◽

Multivariable Analysis ◽

Machine Learning Techniques ◽

Support Vector ◽

Ultrasound Images ◽

Imaging Features ◽

Gene Status ◽

Testing Set

Abstract Radiogenomics is a specific application of radiomics where imaging features are linked to genomic profiles. We aim to develop a radiogenomics model based on ovarian US images for predicting germline BRCA1/2 gene status in women with healthy ovaries. From January 2013 to December 2017 a total of 255 patients addressed to germline BRCA1/2 testing and pelvic US documenting normal ovaries, were retrospectively included. Feature selection for univariate analysis was carried out via correlation analysis. Multivariable analysis for classification of germline BRCA1/2 status was then carried out via logistic regression, support vector machine, ensemble of decision trees and automated machine learning pipelines. Data were split into a training (75%) and a testing (25%) set. The four strategies obtained a similar performance in terms of accuracy on the testing set (from 0.54 of logistic regression to 0.64 of the auto-machine learning pipeline). Data coming from one of the tested US machine showed generally higher performances, particularly with the auto-machine learning pipeline (testing set specificity 0.87, negative predictive value 0.73, accuracy value 0.72 and 0.79 on training set). The study shows that a radiogenomics model on machine learning techniques is feasible and potentially useful for predicting gBRCA1/2 status in women with healthy ovaries.

Download Full-text

Structural Health Monitoring Using Machine Learning and Cumulative Absolute Velocity Features

Applied Sciences ◽

10.3390/app11125727 ◽

2021 ◽

Vol 11 (12) ◽

pp. 5727

Author(s):

Sifat Muin ◽

Khalid M. Mosalam

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Structural Health Monitoring ◽

Health Monitoring ◽

Degree Of Freedom ◽

Absolute Velocity ◽

Support Vector ◽

Damage State ◽

Structural Health ◽

Cumulative Absolute Velocity

Machine learning (ML)-aided structural health monitoring (SHM) can rapidly evaluate the safety and integrity of the aging infrastructure following an earthquake. The conventional damage features used in ML-based SHM methodologies face the curse of dimensionality. This paper introduces low dimensional, namely, cumulative absolute velocity (CAV)-based features, to enable the use of ML for rapid damage assessment. A computer experiment is performed to identify the appropriate features and the ML algorithm using data from a simulated single-degree-of-freedom system. A comparative analysis of five ML models (logistic regression (LR), ordinal logistic regression (OLR), artificial neural networks with 10 and 100 neurons (ANN10 and ANN100), and support vector machines (SVM)) is performed. Two test sets were used where Set-1 originated from the same distribution as the training set and Set-2 came from a different distribution. The results showed that the combination of the CAV and the relative CAV with respect to the linear response, i.e., RCAV, performed the best among the different feature combinations. Among the ML models, OLR showed good generalization capabilities when compared to SVM and ANN models. Subsequently, OLR is successfully applied to assess the damage of two numerical multi-degree of freedom (MDOF) models and an instrumented building with CAV and RCAV as features. For the MDOF models, the damage state was identified with accuracy ranging from 84% to 97% and the damage location was identified with accuracy ranging from 93% to 97.5%. The features and the OLR models successfully captured the damage information for the instrumented structure as well. The proposed methodology is capable of ensuring rapid decision-making and improving community resiliency.

Download Full-text

Book Genre Categorization Using Machine Learning Algorithms (K-Nearest Neighbor, Support Vector Machine and Logistic Regression) using Customized Dataset

International Journal of Computer Science and Mobile Computing ◽

10.47760/ijcsmc.2021.v10i03.002 ◽

2021 ◽

Vol 10 (3) ◽

pp. 14-25

Author(s):

Parilkumar Shiroya

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Logistic Regression ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor

Download Full-text

Abstract 16588: Using Machine Learning to Improve Survival Prediction After Heart Transplantation

Circulation ◽

10.1161/circ.142.suppl_3.16588 ◽

2020 ◽

Vol 142 (Suppl_3) ◽

Author(s):

Brian Ayers ◽

Toumas Sandhold ◽

Igor Gosev ◽

Sunil Prasad ◽

Arman Kilic

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Heart Transplantation ◽

Risk Prediction ◽

Predictive Analytics ◽

Predictive Performance ◽

Orthotopic Heart Transplantation ◽

Survival Prediction ◽

One Year ◽

Testing Set

Introduction: Prior risk models for predicting survival after orthotopic heart transplantation (OHT) have displayed only modest discriminatory capability. With increasing interest in the application of machine learning (ML) to predictive analytics in clinical medicine, this study aimed to evaluate whether modern ML techniques could improve risk prediction in OHT. Methods: Data from the United Network for Organ Sharing registry was collected for all adult patients that underwent OHT from 2000 through 2019. The primary outcome was one-year post-transplant mortality. Dimensionality reduction and data re-sampling were employed during training. The final ensemble model was created from 100 different models of each algorithm: deep neural network, logistic regression, adaboost, and random forest. Discriminatory capability was assessed using area under receiver-operating-characteristic curve (AUROC), net reclassification index (NRI), and decision curve analysis (DCA). Results: Of the 33,657 study patients, 26,926 (80%) were randomly selected for the training set and 6,731 (20%) as a separate testing set. One-year mortality was balanced between cohorts (11.0% vs 11.3%). The optimal model performance was a final ensemble ML model. This model demonstrated an improved AUROC of 0.764 (95% CI, 0.745-0.782) in the testing set as compared to the other models (Figure). Additionally, the final model demonstrated an improvement of 72.9% ±3.8% (p<0.001) in predictive performance as assessed by NRI compared to logistic regression. The DCA showed the final ensemble method improved risk prediction across the entire spectrum of predicted risk as compared to all other models (p<0.001). Conclusions: An ensemble ML model was able to achieve greater predictive performance as compared to individual ML models as well as logistic regression for predicting survival after OHT. This analysis demonstrates the promise of ML techniques in risk prediction in OHT.

Download Full-text

Detecting Face Touching Using Smartwatches to Mitigate the Spread of COVID-19: Pilot Study (Preprint)

10.2196/preprints.28799 ◽

2021 ◽

Author(s):

Chen Bai ◽

Yu-Peng Chen ◽

Adam Wolach ◽

Lisa Anthony ◽

Mamoun Mardini

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Respiratory Diseases ◽

Window Size ◽

Support Vector ◽

Accelerometer Data ◽

Respiratory Illnesses ◽

Motion Data ◽

Machine Learning Methods

BACKGROUND Frequent spontaneous facial self-touches, predominantly during outbreaks, have the theoretical potential to be a mechanism of contracting and transmitting diseases. Despite the recent advent of vaccines, behavioral approaches remain an integral part of reducing the spread of COVID-19 and other respiratory illnesses. Real-time biofeedback of face touching can potentially mitigate the spread of respiratory diseases. The gap addressed in this study is the lack of an on-demand platform that utilizes motion data from smartwatches to accurately detect face touching. OBJECTIVE The aim of this study was to utilize the functionality and the spread of smartwatches to develop a smartwatch application to identifying motion signatures that are mapped accurately to face touching. METHODS Participants (n=10, 50% women, aged 20-83) performed 10 physical activities classified into: face touching (FT) and non-face touching (NFT) categories, in a standardized laboratory setting. We developed a smartwatch application on Samsung Galaxy Watch to collect raw accelerometer data from participants. Then, data features were extracted from consecutive non-overlapping windows varying from 2-16 seconds. We examined the performance of state-of-the-art machine learning methods on face touching movements recognition (FT vs NFT) and individual activity recognition (IAR): logistic regression, support vector machine, decision trees and random forest. RESULTS Machine learning models were accurate in recognizing face touching categories; logistic regression achieved the best performance across all metrics (Accuracy: 0.93 +/- 0.08, Recall: 0.89 +/- 0.16, Precision: 0.93 +/- 0.08, F1-score: 0.90 +/- 0.11, AUC: 0.95 +/- 0.07) at the window size of 5 seconds. IAR models resulted in lower performance; the random forest classifier achieved the best performance across all metrics (Accuracy: 0.70 +/- 0.14, Recall: 0.70 +/- 0.14, Precision: 0.70 +/- 0.16, F1-score: 0.67 +/- 0.15) at the window size of 9 seconds. CONCLUSIONS Wearable devices, powered with machine learning, are effective in detecting facial touches. This is highly significant during respiratory infection outbreaks, as it has a great potential to refrain people from touching their faces and potentially mitigate the possibility of transmitting COVID-19 and future respiratory diseases.

Download Full-text

A Framework for Effective Application of Machine Learning to Microbiome-Based Classification Problems

mBio ◽

10.1128/mbio.00434-20 ◽

2020 ◽

Vol 11 (3) ◽

Cited By ~ 9

Author(s):

Begüm D. Topçuoğlu ◽

Nicholas A. Lesniak ◽

Mack T. Ruffin ◽

Jenna Wiens ◽

Patrick D. Schloss

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Sequence Data ◽

Characteristic Curve ◽

Predictive Performance ◽

Model Complexity ◽

Support Vector ◽

Classification Problems ◽

Microbial Biomarkers

ABSTRACT Machine learning (ML) modeling of the human microbiome has the potential to identify microbial biomarkers and aid in the diagnosis of many diseases such as inflammatory bowel disease, diabetes, and colorectal cancer. Progress has been made toward developing ML models that predict health outcomes using bacterial abundances, but inconsistent adoption of training and evaluation methods call the validity of these models into question. Furthermore, there appears to be a preference by many researchers to favor increased model complexity over interpretability. To overcome these challenges, we trained seven models that used fecal 16S rRNA sequence data to predict the presence of colonic screen relevant neoplasias (SRNs) (n = 490 patients, 261 controls and 229 cases). We developed a reusable open-source pipeline to train, validate, and interpret ML models. To show the effect of model selection, we assessed the predictive performance, interpretability, and training time of L2-regularized logistic regression, L1- and L2-regularized support vector machines (SVM) with linear and radial basis function kernels, a decision tree, random forest, and gradient boosted trees (XGBoost). The random forest model performed best at detecting SRNs with an area under the receiver operating characteristic curve (AUROC) of 0.695 (interquartile range [IQR], 0.651 to 0.739) but was slow to train (83.2 h) and not inherently interpretable. Despite its simplicity, L2-regularized logistic regression followed random forest in predictive performance with an AUROC of 0.680 (IQR, 0.625 to 0.735), trained faster (12 min), and was inherently interpretable. Our analysis highlights the importance of choosing an ML approach based on the goal of the study, as the choice will inform expectations of performance and interpretability. IMPORTANCE Diagnosing diseases using machine learning (ML) is rapidly being adopted in microbiome studies. However, the estimated performance associated with these models is likely overoptimistic. Moreover, there is a trend toward using black box models without a discussion of the difficulty of interpreting such models when trying to identify microbial biomarkers of disease. This work represents a step toward developing more-reproducible ML practices in applying ML to microbiome research. We implement a rigorous pipeline and emphasize the importance of selecting ML models that reflect the goal of the study. These concepts are not particular to the study of human health but can also be applied to environmental microbiology studies.

Download Full-text

A Machine Learning View on Momentum and Reversal Trading

Algorithms ◽

10.3390/a11110170 ◽

2018 ◽

Vol 11 (11) ◽

pp. 170 ◽

Cited By ~ 2

Author(s):

Zhixi Li ◽

Vincent Tam

Keyword(s):

Neural Network ◽

Machine Learning ◽

Stock Market ◽

Short Term Memory ◽

Predictive Ability ◽

Trading Strategies ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approaches ◽

Learning Techniques

Momentum and reversal effects are important phenomena in stock markets. In academia, relevant studies have been conducted for years. Researchers have attempted to analyze these phenomena using statistical methods and to give some plausible explanations. However, those explanations are sometimes unconvincing. Furthermore, it is very difficult to transfer the findings of these studies to real-world investment trading strategies due to the lack of predictive ability. This paper represents the first attempt to adopt machine learning techniques for investigating the momentum and reversal effects occurring in any stock market. In the study, various machine learning techniques, including the Decision Tree (DT), Support Vector Machine (SVM), Multilayer Perceptron Neural Network (MLP), and Long Short-Term Memory Neural Network (LSTM) were explored and compared carefully. Several models built on these machine learning approaches were used to predict the momentum or reversal effect on the stock market of mainland China, thus allowing investors to build corresponding trading strategies. The experimental results demonstrated that these machine learning approaches, especially the SVM, are beneficial for capturing the relevant momentum and reversal effects, and possibly building profitable trading strategies. Moreover, we propose the corresponding trading strategies in terms of market states to acquire the best investment returns.

Download Full-text

Bacterial Immunogenicity Prediction by Machine Learning Methods

Vaccines ◽

10.3390/vaccines8040709 ◽

2020 ◽

Vol 8 (4) ◽

pp. 709

Author(s):

Ivan Dimitrov ◽

Nevena Zaharieva ◽

Irini Doytchinova

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Predictive Ability ◽

Initial Step ◽

Majority Voting ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbor ◽

Test Set ◽

Extreme Gradient Boosting

The identification of protective immunogens is the most important and vigorous initial step in the long-lasting and expensive process of vaccine design and development. Machine learning (ML) methods are very effective in data mining and in the analysis of big data such as microbial proteomes. They are able to significantly reduce the experimental work for discovering novel vaccine candidates. Here, we applied six supervised ML methods (partial least squares-based discriminant analysis, k nearest neighbor (kNN), random forest (RF), support vector machine (SVM), random subspace method (RSM), and extreme gradient boosting) on a set of 317 known bacterial immunogens and 317 bacterial non-immunogens and derived models for immunogenicity prediction. The models were validated by internal cross-validation in 10 groups from the training set and by the external test set. All of them showed good predictive ability, but the xgboost model displays the most prominent ability to identify immunogens by recognizing 84% of the known immunogens in the test set. The combined RSM-kNN model was the best in the recognition of non-immunogens, identifying 92% of them in the test set. The three best performing ML models (xgboost, RSM-kNN, and RF) were implemented in the new version of the server VaxiJen, and the prediction of bacterial immunogens is now based on majority voting.

Download Full-text

Evaluation of Prognosis in Nasopharyngeal Cancer Using Machine Learning

Technology in Cancer Research & Treatment ◽

10.1177/1533033820909829 ◽

2020 ◽

Vol 19 ◽

pp. 153303382090982

Author(s):

Melek Akcay ◽

Durmus Etiz ◽

Ozer Celik ◽

Alaattin Ozen

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Naive Bayes ◽

Nasopharyngeal Cancer ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector ◽

Tumor Diameter ◽

Survival Prognosis ◽

Data Set

Background and Aim: Although the prognosis of nasopharyngeal cancer largely depends on a classification based on the tumor-lymph node metastasis staging system, patients at the same stage may have different clinical outcomes. This study aimed to evaluate the survival prognosis of nasopharyngeal cancer using machine learning. Settings and Design: Original, retrospective. Materials and Methods: A total of 72 patients with a diagnosis of nasopharyngeal cancer who received radiotherapy ± chemotherapy were included in the study. The contribution of patient, tumor, and treatment characteristics to the survival prognosis was evaluated by machine learning using the following techniques: logistic regression, artificial neural network, XGBoost, support-vector clustering, random forest, and Gaussian Naive Bayes. Results: In the analysis of the data set, correlation analysis, and binary logistic regression analyses were applied. Of the 18 independent variables, 10 were found to be effective in predicting nasopharyngeal cancer-related mortality: age, weight loss, initial neutrophil/lymphocyte ratio, initial lactate dehydrogenase, initial hemoglobin, radiotherapy duration, tumor diameter, number of concurrent chemotherapy cycles, and T and N stages. Gaussian Naive Bayes was determined as the best algorithm to evaluate the prognosis of machine learning techniques (accuracy rate: 88%, area under the curve score: 0.91, confidence interval: 0.68-1, sensitivity: 75%, specificity: 100%). Conclusion: Many factors affect prognosis in cancer, and machine learning algorithms can be used to determine which factors have a greater effect on survival prognosis, which then allows further research into these factors. In the current study, Gaussian Naive Bayes was identified as the best algorithm for the evaluation of prognosis of nasopharyngeal cancer.

Download Full-text

Comparison of Support Vector Machine, Bayesian Logistic Regression, and Alternating Decision Tree Algorithms for Shallow Landslide Susceptibility Mapping along a Mountainous Road in the West of Iran

Applied Sciences ◽

10.3390/app10155047 ◽

2020 ◽

Vol 10 (15) ◽

pp. 5047 ◽

Cited By ~ 7

Author(s):

Viet-Ha Nhu ◽

Danesh Zandi ◽

Himan Shahabi ◽

Kamran Chapi ◽

Ataollah Shirzadi ◽

...

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Logistic Regression ◽

Decision Tree ◽

Shallow Landslide ◽

Machine Learning Algorithms ◽

Support Vector ◽

Svm Algorithm ◽

Alternating Decision Tree ◽

Bayesian Logistic Regression

This paper aims to apply and compare the performance of the three machine learning algorithms–support vector machine (SVM), bayesian logistic regression (BLR), and alternating decision tree (ADTree)–to map landslide susceptibility along the mountainous road of the Salavat Abad saddle, Kurdistan province, Iran. We identified 66 shallow landslide locations, based on field surveys, by recording the locations of the landslides by a global position System (GPS), Google Earth imagery and black-and-white aerial photographs (scale 1: 20,000) and 19 landslide conditioning factors, then tested these factors using the information gain ratio (IGR) technique. We checked the validity of the models using statistical metrics, including sensitivity, specificity, accuracy, kappa, root mean square error (RMSE), and area under the receiver operating characteristic curve (AUC). We found that, although all three machine learning algorithms yielded excellent performance, the SVM algorithm (AUC = 0.984) slightly outperformed the BLR (AUC = 0.980), and ADTree (AUC = 0.977) algorithms. We observed that not only all three algorithms are useful and effective tools for identifying shallow landslide-prone areas but also the BLR algorithm can be used such as the SVM algorithm as a soft computing benchmark algorithm to check the performance of the models in future.

Download Full-text

1248. A Machine-Learning Approach to Predict the Cefazolin Inoculum Effect in Methicillin-Susceptible Staphylococcus aureus

Germline BRCA 1-2 status prediction through ovarian ultrasound images radiogenomics: a hypothesis generating study (PROBE study)

Structural Health Monitoring Using Machine Learning and Cumulative Absolute Velocity Features

Book Genre Categorization Using Machine Learning Algorithms (K-Nearest Neighbor, Support Vector Machine and Logistic Regression) using Customized Dataset﻿

Abstract 16588: Using Machine Learning to Improve Survival Prediction After Heart Transplantation

Detecting Face Touching Using Smartwatches to Mitigate the Spread of COVID-19: Pilot Study (Preprint)

A Framework for Effective Application of Machine Learning to Microbiome-Based Classification Problems

A Machine Learning View on Momentum and Reversal Trading

Bacterial Immunogenicity Prediction by Machine Learning Methods

Evaluation of Prognosis in Nasopharyngeal Cancer Using Machine Learning

Comparison of Support Vector Machine, Bayesian Logistic Regression, and Alternating Decision Tree Algorithms for Shallow Landslide Susceptibility Mapping along a Mountainous Road in the West of Iran

Book Genre Categorization Using Machine Learning Algorithms (K-Nearest Neighbor, Support Vector Machine and Logistic Regression) using Customized Dataset