P1060USING ARTIFICIAL INTELLIGENCE TO PREDICT HOME THERAPY CANDIDATES

Abstract Background and Aims There are many benefits for performing dialysis at home including more flexibility and more frequent treatments. A possible barrier to election of home therapy (HT) by in-center patients is a lack of adequate HT education. To aid efficient education efforts, a predictive model was developed to help identify patients who are more likely to switch from in-center and succeed on HT. Method We developed a model using machine learning to predict which patients who are treated in-center without prior HT history are most likely to switch to HT in the next 90 days and stay on HT for at least 90 days. Training data was extracted from 2016–2019 for approximately 300,000 patients. We randomly sampled one in-center treatment date per patient and determined if the patient would switch and succeed on HT. The input features consisted of treatment vitals, laboratories, absence history, comprehensive assessments, facility information, county-level housing, and patient characteristics. Patients were excluded if they had less than 30 days on dialysis due to lack of data. A machine learning model (XGBoost classifier) was deployed monthly in a pilot with a team of HT educators to investigate the model’s utility for identifying HT candidates. Results There were approximately 1,200 patients starting a home therapy per month in a large dialysis provider, with approximately one-third being in-center patients. The prevalence of switching and succeeding to HT in this population was 2.54%. The predictive model achieved an area under the curve of 0.87, sensitivity of 0.77, and a specificity of 0.80 on a hold-out test dataset. The pilot was successfully executed for several months and two major lessons were learned: 1) some patients who reappeared on each month’s list should be removed from the list after expressing no interest in HT, and 2) a data collection mechanism should be put in place to capture the reasons why patients are not interested in HT. Conclusion This quality-improvement initiative demonstrates that predictive modeling can be used to identify patients likely to switch and succeed on home therapy. Integration of the model in existing workflows requires creating a feedback loop which can help improve future worklists.

Download Full-text

Predicting bacteriophage hosts based on sequences of annotated receptor-binding proteins

Scientific Reports ◽

10.1038/s41598-021-81063-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Dimitri Boeckaerts ◽

Michiel Stock ◽

Bjorn Criel ◽

Hans Gerstmans ◽

Bernard De Baets ◽

...

Keyword(s):

Machine Learning ◽

Predictive Model ◽

Receptor Binding ◽

Bacterial Infections ◽

Sequence Data ◽

Sequence Similarity ◽

Area Under The Curve ◽

Local Alignment ◽

Search Tool ◽

Different Levels

AbstractNowadays, bacteriophages are increasingly considered as an alternative treatment for a variety of bacterial infections in cases where classical antibiotics have become ineffective. However, characterizing the host specificity of phages remains a labor- and time-intensive process. In order to alleviate this burden, we have developed a new machine-learning-based pipeline to predict bacteriophage hosts based on annotated receptor-binding protein (RBP) sequence data. We focus on predicting bacterial hosts from the ESKAPE group, Escherichia coli, Salmonella enterica and Clostridium difficile. We compare the performance of our predictive model with that of the widely used Basic Local Alignment Search Tool (BLAST). Our best-performing predictive model reaches Precision-Recall Area Under the Curve (PR-AUC) scores between 73.6 and 93.8% for different levels of sequence similarity in the collected data. Our model reaches a performance comparable to that of BLASTp when sequence similarity in the data is high and starts outperforming BLASTp when sequence similarity drops below 75%. Therefore, our machine learning methods can be especially useful in settings in which sequence similarity to other known sequences is low. Predicting the hosts of novel metagenomic RBP sequences could extend our toolbox to tune the host spectrum of phages or phage tail-like bacteriocins by swapping RBPs.

Download Full-text

DeepGOZero: Improving protein function prediction from sequence and zero-shot learning based on ontology axioms

10.1101/2022.01.14.476325 ◽

2022 ◽

Author(s):

Maxat Kulmanov ◽

Robert Hoehndorf

Keyword(s):

Machine Learning ◽

Protein Function ◽

Protein Function Prediction ◽

Prediction Method ◽

Function Prediction ◽

Training Data ◽

Large Set ◽

Theoretic Approach ◽

Machine Learning Model ◽

Protein Functions

Motivation: Protein functions are often described using the Gene Ontology (GO) which is an ontology consisting of over 50,000 classes and a large set of formal axioms. Predicting the functions of proteins is one of the key challenges in computational biology and a variety of machine learning methods have been developed for this purpose. However, these methods usually require significant amount of training data and cannot make predictions for GO classes which have only few or no experimental annotations. Results: We developed DeepGOZero, a machine learning model which improves predictions for functions with no or only a small number of annotations. To achieve this goal, we rely on a model-theoretic approach for learning ontology embeddings and combine it with neural networks for protein function prediction. DeepGOZero can exploit formal axioms in the GO to make zero-shot predictions, i.e., predict protein functions even if not a single protein in the training phase was associated with that function. Furthermore, the zero-shot prediction method employed by DeepGOZero is generic and can be applied whenever associations with ontology classes need to be predicted. Availability: http://github.com/bio-ontology-research-group/deepgozero

Download Full-text

Glean

Proceedings of the VLDB Endowment ◽

10.14778/3447689.3447703 ◽

2021 ◽

Vol 14 (6) ◽

pp. 997-1005

Author(s):

Sandeep Tata ◽

Navneet Potti ◽

James B. Wendt ◽

Lauro Beltrão Costa ◽

Marc Najork ◽

...

Keyword(s):

Machine Learning ◽

Data Management ◽

Real World ◽

Empirical Studies ◽

Ground Truth ◽

Training Data ◽

Ground Truth Data ◽

Document Type ◽

Machine Learning Model ◽

Structured Information

Extracting structured information from templatic documents is an important problem with the potential to automate many real-world business workflows such as payment, procurement, and payroll. The core challenge is that such documents can be laid out in virtually infinitely different ways. A good solution to this problem is one that generalizes well not only to known templates such as invoices from a known vendor, but also to unseen ones. We developed a system called Glean to tackle this problem. Given a target schema for a document type and some labeled documents of that type, Glean uses machine learning to automatically extract structured information from other documents of that type. In this paper, we describe the overall architecture of Glean, and discuss three key data management challenges : 1) managing the quality of ground truth data, 2) generating training data for the machine learning model using labeled documents, and 3) building tools that help a developer rapidly build and improve a model for a given document type. Through empirical studies on a real-world dataset, we show that these data management techniques allow us to train a model that is over 5 F1 points better than the exact same model architecture without the techniques we describe. We argue that for such information-extraction problems, designing abstractions that carefully manage the training data is at least as important as choosing a good model architecture.

Download Full-text

Convolutional Neural Network

10.4018/978-1-6684-2408-7.ch077 ◽

2022 ◽

pp. 1559-1575

Author(s):

Mário Pereira Véstias

Keyword(s):

Neural Network ◽

Machine Learning ◽

Neural Networks ◽

Artificial Neural Networks ◽

Deep Learning ◽

Convolutional Neural Network ◽

Machine Learning Algorithms ◽

Training Data ◽

Machine Learning Model ◽

Artificial Neural

Machine learning is the study of algorithms and models for computing systems to do tasks based on pattern identification and inference. When it is difficult or infeasible to develop an algorithm to do a particular task, machine learning algorithms can provide an output based on previous training data. A well-known machine learning model is deep learning. The most recent deep learning models are based on artificial neural networks (ANN). There exist several types of artificial neural networks including the feedforward neural network, the Kohonen self-organizing neural network, the recurrent neural network, the convolutional neural network, the modular neural network, among others. This article focuses on convolutional neural networks with a description of the model, the training and inference processes and its applicability. It will also give an overview of the most used CNN models and what to expect from the next generation of CNN models.

Download Full-text

MEWS++: Enhancing the Prediction of Clinical Deterioration in Admitted Patients through a Machine Learning Model

Journal of Clinical Medicine ◽

10.3390/jcm9020343 ◽

2020 ◽

Vol 9 (2) ◽

pp. 343 ◽

Cited By ~ 4

Author(s):

Arash Kia ◽

Prem Timsina ◽

Himanshu N. Joshi ◽

Eyal Klang ◽

Rohit R. Gupta ◽

...

Keyword(s):

Machine Learning ◽

At Risk ◽

Area Under The Curve ◽

Learning Model ◽

Clinical Deterioration ◽

Early Warning Score ◽

Support Vector ◽

Adult Age ◽

Machine Learning Model ◽

Patients At Risk

Early detection of patients at risk for clinical deterioration is crucial for timely intervention. Traditional detection systems rely on a limited set of variables and are unable to predict the time of decline. We describe a machine learning model called MEWS++ that enables the identification of patients at risk of escalation of care or death six hours prior to the event. A retrospective single-center cohort study was conducted from July 2011 to July 2017 of adult (age > 18) inpatients excluding psychiatric, parturient, and hospice patients. Three machine learning models were trained and tested: random forest (RF), linear support vector machine, and logistic regression. We compared the models’ performance to the traditional Modified Early Warning Score (MEWS) using sensitivity, specificity, and Area Under the Curve for Receiver Operating Characteristic (AUC-ROC) and Precision-Recall curves (AUC-PR). The primary outcome was escalation of care from a floor bed to an intensive care or step-down unit, or death, within 6 h. A total of 96,645 patients with 157,984 hospital encounters and 244,343 bed movements were included. Overall rate of escalation or death was 3.4%. The RF model had the best performance with sensitivity 81.6%, specificity 75.5%, AUC-ROC of 0.85, and AUC-PR of 0.37. Compared to traditional MEWS, sensitivity increased 37%, specificity increased 11%, and AUC-ROC increased 14%. This study found that using machine learning and readily available clinical data, clinical deterioration or death can be predicted 6 h prior to the event. The model we developed can warn of patient deterioration hours before the event, thus helping make timely clinical decisions.

Download Full-text

Privacy-Preserving Gradient Boosting Decision Trees

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5422 ◽

2020 ◽

Vol 34 (01) ◽

pp. 784-791 ◽

Cited By ~ 1

Author(s):

Qinbin Li ◽

Zhaomin Wu ◽

Zeyi Wen ◽

Bingsheng He

Keyword(s):

Machine Learning ◽

Differential Privacy ◽

Training Data ◽

Gradient Boosting ◽

Training Algorithm ◽

Model Accuracy ◽

Machine Learning Model ◽

Improve Model ◽

Privacy Budget ◽

Privacy Level

The Gradient Boosting Decision Tree (GBDT) is a popular machine learning model for various tasks in recent years. In this paper, we study how to improve model accuracy of GBDT while preserving the strong guarantee of differential privacy. Sensitivity and privacy budget are two key design aspects for the effectiveness of differential private models. Existing solutions for GBDT with differential privacy suffer from the significant accuracy loss due to too loose sensitivity bounds and ineffective privacy budget allocations (especially across different trees in the GBDT model). Loose sensitivity bounds lead to more noise to obtain a fixed privacy level. Ineffective privacy budget allocations worsen the accuracy loss especially when the number of trees is large. Therefore, we propose a new GBDT training algorithm that achieves tighter sensitivity bounds and more effective noise allocations. Specifically, by investigating the property of gradient and the contribution of each tree in GBDTs, we propose to adaptively control the gradients of training data for each iteration and leaf node clipping in order to tighten the sensitivity bounds. Furthermore, we design a novel boosting framework to allocate the privacy budget between trees so that the accuracy loss can be further reduced. Our experiments show that our approach can achieve much better model accuracy than other baselines.

Download Full-text

Machine Learning Prediction for Complete Response to Hypomethylating Agents with or without Additional Agents in Patients with Newly Diagnosed Myelodysplastic Syndrome

Blood ◽

10.1182/blood-2019-130155 ◽

2019 ◽

Vol 134 (Supplement_1) ◽

pp. 1720-1720

Author(s):

Koji Sasaki ◽

Guillermo Montalban Bravo ◽

Rashmi Kanagal-Shamanna ◽

Elias Jabbour ◽

Farhad Ravandi ◽

...

Keyword(s):

Machine Learning ◽

Board Of Directors ◽

Research Funding ◽

Area Under The Curve ◽

Complete Response ◽

Learning Model ◽

Hypomethylating Agents ◽

Newly Diagnosed ◽

Advisory Committees ◽

Machine Learning Model

Background: Myelodysplastic syndrome (MDS) is a heterogeneous malignant myeloid neoplasm of hematopoietic stem cells due to cytogenetic alterations and somatic mutations in genes (DNA methylation, DNA repair, chromatin regulation, RNA splicing, transcription regulation, and signal transduction). Hypomethylating agents (HMA) are the standard of care for MDS, and 40-60% of patients achieved response to HMA. However, the prediction for response is difficult due to the nature of heterogeneity and the context of clinical conditions such as the degree of cytopenias and the dependency on transfusion. Machine learning outperforms conventional statistical models for prediction in statistical competitions. Prediction with machine learning models may predict response in patients with MDS. The aim of this study is to develop a machine learning model for the prediction of complete response (CR) to HMA with or without additional therapeutic agents in patients with newly diagnosed MDS. Methods: From November 2012 to August 2017, we analyzed 435 patients with newly diagnosed MDS who received frontline therapy as follows; azacitidine (AZA) (3-day, 5-day, or 7-day) ± vorinostat ± ipilimumab ± nivolumab; decitabine (DAC) (3-day or 5-day) ± vorinostat; 5-day guadecitabine. Clinical variables, cytogenetic abnormalities, and the presence of genetic mutations by next generation sequencing (NGS) were included for variable selection. The whole cohort was randomly divided into training/validation and test cohorts at an 8:2 ratio. The training/validation cohort was used for 4-fold cross validation. Hyperparameter optimization was performed with Stampede2, which was ranked as the 15th fastest supercomputer at Texas Advanced Computing Center in June 2018. A gradient boosting decision tree-based framework with the LightGBM Python module was used after hyperparameter tuning for the development of the machine learning model with training/validation cohorts. The performance of prediction was assessed with an independent test dataset with the area under the curve. Results: We identified 435 patients with newly diagnosed MDS who enrolled on clinical trials as follows: 33 patients, 5-day AZA; 23, 5-day AZA + vorinostat; 43, 3-day AZA; 20, 5-day AZA + ipilimumab; 19 patients, AZA + nivolumab; 7, AZA + ipilumumab + nivolumab; 114, 5-day DAC; 74, 3-day DAC; 4, DAC + vorinostat; 97, 5-day guadecitabine. In the whole cohort, the median age at diagnosis was 68 years (range, 13.0-90.3); 117 (27%) patients had a history of prior radiation or cytotoxic chemotherapy; the median white blood cell count was 2.9 (×109/L) (range, 0.5-102); median absolute neutrophil count, 1.1 (×109/L) (range, 0.0-55.1); median hemoglobin count, 9.5 (g/dL) (range, 4.7-15.4); median platelet count, 63 (×109/L) (range, 2-881); and median blasts in bone marrow, 8% (range, 0-20). Among 411 evaluable patients for the revised international prognostic scoring system, 15 (4%) had very low risk disease; 42 (10%), low risk; 68 (17%), intermediate risk; 124 (30%), high risk; and 162 (39%), very high risk. Overall, 153 patients (53%) achieved CR. Hyperparameter tuning identified the optimal hyperparameters with colsample by tree of 0.175, learning rate of 0.262, the maximal depth of 2, minimal data in leaf of 29, number of leaves of 11, alpha regularization of 0.010, lambda regularization of 2.085, and subsample of 0.639. On the test cohort with 87 patients, the machine learning model accurately predicted response in 65 patients (75%); 53 non-CR among 56 non-CR (95% accuracy); and 12 CR among 31 CR (39% accuracy). The trend of accuracy improvement by iteration (i.e., the number of decision trees) is shown in Figure 1. The area under the curve was 0.761521 in the test cohort. Conclusion: Our machine learning model with clinical, cytogenetic, and NGS data can predict CR to HMA in patients with newly diagnosed MDS. This approach can identify patients who may benefit from HMA therapy with and without additional agents for response, and can optimize the timing of allogeneic stem cell transplant. Disclosures Sasaki: Otsuka: Honoraria; Pfizer: Consultancy. Jabbour:Takeda: Consultancy, Research Funding; BMS: Consultancy, Research Funding; Adaptive: Consultancy, Research Funding; Amgen: Consultancy, Research Funding; AbbVie: Consultancy, Research Funding; Pfizer: Consultancy, Research Funding; Cyclacel LTD: Research Funding. Ravandi:Cyclacel LTD: Research Funding; Selvita: Research Funding; Menarini Ricerche: Research Funding; Macrogenix: Consultancy, Research Funding; Amgen: Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding; Xencor: Consultancy, Research Funding. Kadia:Pfizer: Membership on an entity's Board of Directors or advisory committees, Research Funding; Celgene: Research Funding; Bioline RX: Research Funding; Jazz: Membership on an entity's Board of Directors or advisory committees, Research Funding; AbbVie: Consultancy, Research Funding; BMS: Research Funding; Amgen: Membership on an entity's Board of Directors or advisory committees, Research Funding; Genentech: Membership on an entity's Board of Directors or advisory committees; Pharmacyclics: Membership on an entity's Board of Directors or advisory committees; Takeda: Membership on an entity's Board of Directors or advisory committees. Takahashi:Symbio Pharmaceuticals: Consultancy. DiNardo:syros: Honoraria; jazz: Honoraria; agios: Consultancy, Honoraria; celgene: Consultancy, Honoraria; notable labs: Membership on an entity's Board of Directors or advisory committees; medimmune: Honoraria; abbvie: Consultancy, Honoraria; daiichi sankyo: Honoraria. Cortes:Novartis: Consultancy, Honoraria, Research Funding; Bristol-Myers Squibb: Consultancy, Research Funding; Immunogen: Consultancy, Honoraria, Research Funding; Sun Pharma: Research Funding; Pfizer: Consultancy, Honoraria, Research Funding; Astellas Pharma: Consultancy, Honoraria, Research Funding; Jazz Pharmaceuticals: Consultancy, Research Funding; Merus: Consultancy, Honoraria, Research Funding; Forma Therapeutics: Consultancy, Honoraria, Research Funding; Daiichi Sankyo: Consultancy, Honoraria, Research Funding; BiolineRx: Consultancy; Biopath Holdings: Consultancy, Honoraria; Takeda: Consultancy, Research Funding. Kantarjian:AbbVie: Honoraria, Research Funding; Cyclacel: Research Funding; Pfizer: Honoraria, Research Funding; Astex: Research Funding; Agios: Honoraria, Research Funding; Jazz Pharma: Research Funding; Daiichi-Sankyo: Research Funding; Novartis: Research Funding; Actinium: Honoraria, Membership on an entity's Board of Directors or advisory committees; Immunogen: Research Funding; Takeda: Honoraria; BMS: Research Funding; Ariad: Research Funding; Amgen: Honoraria, Research Funding. Garcia-Manero:Amphivena: Consultancy, Research Funding; Helsinn: Research Funding; Novartis: Research Funding; AbbVie: Research Funding; Celgene: Consultancy, Research Funding; Astex: Consultancy, Research Funding; Onconova: Research Funding; H3 Biomedicine: Research Funding; Merck: Research Funding.

Download Full-text

A Novel XGBoost Method to Infer the Primary Lesion of 20 Solid Tumor Types From Gene Expression Data

Frontiers in Genetics ◽

10.3389/fgene.2021.632761 ◽

2021 ◽

Vol 12 ◽

Author(s):

Sijie Chen ◽

Wenjing Zhou ◽

Jinghui Tu ◽

Jian Li ◽

Bo Wang ◽

...

Keyword(s):

Machine Learning ◽

Learning Model ◽

Training Data ◽

Diagnostic Efficiency ◽

Metastatic Tumors ◽

Pathological Conditions ◽

Machine Learning Model ◽

Independent Test ◽

Tumor Types ◽

Fold Cross Validation

PurposeEstablish a suitable machine learning model to identify its primary lesions for primary metastatic tumors in an integrated learning approach, making it more accurate to improve primary lesions’ diagnostic efficiency.MethodsAfter deleting the features whose expression level is lower than the threshold, we use two methods to perform feature selection and use XGBoost for classification. After the optimal model is selected through 10-fold cross-validation, it is verified on an independent test set.ResultsSelecting features with around 800 genes for training, theR2-score of a 10-fold CV of training data can reach 96.38%, and theR2-score of test data can reach 83.3%.ConclusionThese findings suggest that by combining tumor data with machine learning methods, each cancer has its corresponding classification accuracy, which can be used to predict primary metastatic tumors’ location. The machine-learning-based method can be used as an orthogonal diagnostic method to judge the machine learning model processing and clinical actual pathological conditions.

Download Full-text

A multi-layer model for the early detection of COVID-19

Journal of The Royal Society Interface ◽

10.1098/rsif.2021.0284 ◽

2021 ◽

Vol 18 (181) ◽

pp. 20210284

Author(s):

Erez Shmueli ◽

Ronen Mansuri ◽

Matan Porcilan ◽

Tamar Amir ◽

Lior Yosha ◽

...

Keyword(s):

Machine Learning ◽

Early Detection ◽

General Health ◽

Medical Condition ◽

Area Under The Curve ◽

Sociodemographic Characteristics ◽

Layer Model ◽

Machine Learning Model ◽

Spatio Temporal ◽

The Individual

Current COVID-19 screening efforts mainly rely on reported symptoms and the potential exposure to infected individuals. Here, we developed a machine-learning model for COVID-19 detection that uses four layers of information: (i) sociodemographic characteristics of the individual, (ii) spatio-temporal patterns of the disease, (iii) medical condition and general health consumption of the individual and (iv) information reported by the individual during the testing episode. We evaluated our model on 140 682 members of Maccabi Health Services who were tested for COVID-19 at least once between February and October 2020. These individuals underwent, in total, 264 516 COVID-19 PCR tests, out of which 16 512 were positive. Our multi-layer model obtained an area under the curve (AUC) of 81.6% when evaluated over all the individuals in the dataset, and an AUC of 72.8% when only individuals who did not report any symptom were included. Furthermore, considering only information collected before the testing episode—i.e. before the individual had the chance to report on any symptom—our model could reach a considerably high AUC of 79.5%. Our ability to predict early on the outcomes of COVID-19 tests is pivotal for breaking transmission chains, and can be used for a more efficient testing policy.

Download Full-text

Machine Learning Algorithms Using Logistic Regression for Predicting Neurosurgical Outcomes

10.21203/rs.3.rs-37934/v1 ◽

2020 ◽

Author(s):

Nida Fatima

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Predictive Model ◽

Web Application ◽

Learning Algorithms ◽

Area Under The Curve ◽

Machine Learning Algorithms ◽

Brier Score ◽

Patient Counselling ◽

Selection Of Variables

Abstract Background: Preoperative prognostication of clinical and surgical outcome in patients with neurosurgical diseases can improve the risk stratification, thus can guide in implementing targeted treatment to minimize these events. Therefore, the author aims to highlight the development and validation of predictive models determining neurosurgical outcomes through machine learning algorithms using logistic regression.Methods: Logistic regression (enter, backward and forward) and least absolute shrinkage and selection operator (LASSO) method for selection of variables from selected database can eventually lead to multiple candidate models. The final model with a set of predictive variables must be selected based upon the clinical knowledge and numerical results.Results: The predictive model which performed best on the discrimination, calibration, Brier score and decision curve analysis must be selected to develop machine learning algorithms. Logistic regression should be compared with the LASSO model. Usually for the big databases, the predictive model selected through logistic regression gives higher Area Under the Curve (AUC) than those with LASSO model. The predictive probability derived from the best model could be uploaded to an open access web application which is easily deployed by the patients and surgeons to make a risk assessment world-wide.Conclusions: Machine learning algorithms provide promising results for the prediction of outcomes following cranial and spinal surgery. These algorithms can provide useful factors for patient-counselling, assessing peri-operative risk factors, and predicting post-operative outcomes after neurosurgery.

Download Full-text