Using machine learning to examine drivers of inappropriate outpatient antibiotic prescribing in acute respiratory illnesses

Author(s):  
Laura M. King ◽  
Michael Kusnetsov ◽  
Avgoustinos Filippoupolitis ◽  
Deniz Arik ◽  
Monina Bartoces ◽  
...  

Abstract Using a machine-learning model, we examined drivers of antibiotic prescribing for antibiotic-inappropriate acute respiratory illnesses in a large US claims data set. Antibiotics were prescribed in 11% of the 42 million visits in our sample. The model identified outpatient setting type, patient age mix, and state as top drivers of prescribing.

Author(s):  
Dhilsath Fathima.M ◽  
S. Justin Samuel ◽  
R. Hari Haran

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.


2021 ◽  
Author(s):  
Junjie Shi ◽  
Jiang Bian ◽  
Jakob Richter ◽  
Kuan-Hsun Chen ◽  
Jörg Rahnenführer ◽  
...  

AbstractThe predictive performance of a machine learning model highly depends on the corresponding hyper-parameter setting. Hence, hyper-parameter tuning is often indispensable. Normally such tuning requires the dedicated machine learning model to be trained and evaluated on centralized data to obtain a performance estimate. However, in a distributed machine learning scenario, it is not always possible to collect all the data from all nodes due to privacy concerns or storage limitations. Moreover, if data has to be transferred through low bandwidth connections it reduces the time available for tuning. Model-Based Optimization (MBO) is one state-of-the-art method for tuning hyper-parameters but the application on distributed machine learning models or federated learning lacks research. This work proposes a framework $$\textit{MODES}$$ MODES that allows to deploy MBO on resource-constrained distributed embedded systems. Each node trains an individual model based on its local data. The goal is to optimize the combined prediction accuracy. The presented framework offers two optimization modes: (1) $$\textit{MODES}$$ MODES -B considers the whole ensemble as a single black box and optimizes the hyper-parameters of each individual model jointly, and (2) $$\textit{MODES}$$ MODES -I considers all models as clones of the same black box which allows it to efficiently parallelize the optimization in a distributed setting. We evaluate $$\textit{MODES}$$ MODES by conducting experiments on the optimization for the hyper-parameters of a random forest and a multi-layer perceptron. The experimental results demonstrate that, with an improvement in terms of mean accuracy ($$\textit{MODES}$$ MODES -B), run-time efficiency ($$\textit{MODES}$$ MODES -I), and statistical stability for both modes, $$\textit{MODES}$$ MODES outperforms the baseline, i.e., carry out tuning with MBO on each node individually with its local sub-data set.


2020 ◽  
Vol 6 ◽  
Author(s):  
Jaime de Miguel Rodríguez ◽  
Maria Eugenia Villafañe ◽  
Luka Piškorec ◽  
Fernando Sancho Caparrini

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.


2021 ◽  
Author(s):  
Eric Sonny Mathew ◽  
Moussa Tembely ◽  
Waleed AlAmeri ◽  
Emad W. Al-Shalabi ◽  
Abdul Ravoof Shaik

Abstract A meticulous interpretation of steady-state or unsteady-state relative permeability (Kr) experimental data is required to determine a complete set of Kr curves. In this work, three different machine learning models was developed to assist in a faster estimation of these curves from steady-state drainage coreflooding experimental runs. The three different models that were tested and compared were extreme gradient boosting (XGB), deep neural network (DNN) and recurrent neural network (RNN) algorithms. Based on existing mathematical models, a leading edge framework was developed where a large database of Kr and Pc curves were generated. This database was used to perform thousands of coreflood simulation runs representing oil-water drainage steady-state experiments. The results obtained from these simulation runs, mainly pressure drop along with other conventional core analysis data, were utilized to estimate Kr curves based on Darcy's law. These analytically estimated Kr curves along with the previously generated Pc curves were fed as features into the machine learning model. The entire data set was split into 80% for training and 20% for testing. K-fold cross validation technique was applied to increase the model accuracy by splitting the 80% of the training data into 10 folds. In this manner, for each of the 10 experiments, 9 folds were used for training and the remaining one was used for model validation. Once the model is trained and validated, it was subjected to blind testing on the remaining 20% of the data set. The machine learning model learns to capture fluid flow behavior inside the core from the training dataset. The trained/tested model was thereby employed to estimate Kr curves based on available experimental results. The performance of the developed model was assessed using the values of the coefficient of determination (R2) along with the loss calculated during training/validation of the model. The respective cross plots along with comparisons of ground-truth versus AI predicted curves indicate that the model is capable of making accurate predictions with error percentage between 0.2 and 0.6% on history matching experimental data for all the three tested ML techniques (XGB, DNN, and RNN). This implies that the AI-based model exhibits better efficiency and reliability in determining Kr curves when compared to conventional methods. The results also include a comparison between classical machine learning approaches, shallow and deep neural networks in terms of accuracy in predicting the final Kr curves. The various models discussed in this research work currently focusses on the prediction of Kr curves for drainage steady-state experiments; however, the work can be extended to capture the imbibition cycle as well.


Author(s):  
Dr. Kalaivazhi Vijayaragavan ◽  
S. Prakathi ◽  
S. Rajalakshmi ◽  
M Sandhiya

Machine learning is a subfield of artificial intelligence, which is learning algorithms to make decision-based on data and try to behave like a human being. Classification is one of the most fundamental concepts in machine learning. It is a process of recognizing, understanding, and grouping ideas and objects into pre-set categories or sub-populations. Using precategorized training datasets, machine learning concept use variety of algorithms to classify the future datasets into categories. Classification algorithms use input training data in machine learning to predict the subsequent data that fall into one of the predetermined categories. To improve the classification accuracy design of neural network is regarded as effective model to obtain better accuracy. However, design of neural network is usually consider scaling layer, perceptron layers and probabilistic layer. In this paper, an enhanced model selection can be evaluated with training and testing strategy. Further, the classification accuracy can be predicted. Finally by using two popular machine learning frameworks: PyTorch and Tensor Flow the prediction of classification accuracy is compared. Results demonstrate that the proposed method can predict with more accuracy. After the deployment of our machine learning model the performance of the model has been evaluated with the help of iris data set.


Author(s):  
Dr. M. P. Borawake

Abstract: The food we consume plays an important role in our daily life. It provides us energy which is needed to work, grow, be active, and to learn and think. The healthy food is essential for good health and nutrition. Light, oxygen, heat, humidity, temperature and spoilage bacteria can all affect both safety and quality of perishable foods. Food kept at room temperature undergoes some chemical reactions after certain period of time, which affects the taste, texture and smell of a food. Consuming spoiled food is harmful for consumers as it can lead to foodborne diseases. This project aims at detecting spoiled food using appropriate sensors and monitoring gases released by the particular food item. Sensors will measure the different parameters of food such as pH, ammonia gas, oxygen level, moisture, etc. The microcontroller takes the readings from sensors and these readings then given as an input to a machine learning model which can decide whether the food is spoilt or not based on training data set. Also, we plan to implement a machine learning model which can calculate the lifespan of that food item. Index Terms: Arduino Uno, Food spoilage, IoT, Machine Learning, Sensors.


2020 ◽  
Vol 23 (1) ◽  
pp. 173-186 ◽  
Author(s):  
Martin Jullum ◽  
Anders Løland ◽  
Ragnar Bang Huseby ◽  
Geir Ånonsen ◽  
Johannes Lorentzen

Purpose The purpose of this paper is to develop, describe and validate a machine learning model for prioritising which financial transactions should be manually investigated for potential money laundering. The model is applied to a large data set from Norway’s largest bank, DNB. Design/methodology/approach A supervised machine learning model is trained by using three types of historic data: “normal” legal transactions; those flagged as suspicious by the bank’s internal alert system; and potential money laundering cases reported to the authorities. The model is trained to predict the probability that a new transaction should be reported, using information such as background information about the sender/receiver, their earlier behaviour and their transaction history. Findings The paper demonstrates that the common approach of not using non-reported alerts (i.e. transactions that are investigated but not reported) in the training of the model can lead to sub-optimal results. The same applies to the use of normal (un-investigated) transactions. Our developed method outperforms the bank’s current approach in terms of a fair measure of performance. Originality/value This research study is one of very few published anti-money laundering (AML) models for suspicious transactions that have been applied to a realistically sized data set. The paper also presents a new performance measure specifically tailored to compare the proposed method to the bank’s existing AML system.


Author(s):  
Laura Bigorra ◽  
Iciar Larriba ◽  
Ricardo Gutiérrez-Gallego

Context.— The goal of the lymphocytosis diagnosis approach is its classification into benign or neoplastic categories. Nevertheless, a nonnegligible percentage of laboratories fail in that classification. Objective.— To design and develop a machine learning model by using objective data from the DxH 800 analyzer, including cell population data, leukocyte and absolute lymphoid counts, hemoglobin concentration, and platelet counts, besides age and sex, with classification purposes for lymphocytosis diagnosis. Design.— A total of 1565 samples were included from 10 different lymphoid categories grouped into 4 diagnostic categories: normal controls (458), benign causes of lymphocytosis (567), neoplastic lymphocytosis (399), and spurious causes of lymphocytosis (141). The data set was distributed in a 60-20-20 scheme for training, testing, and validation stages. Six machine learning models were built and compared, and the selection of the final model was based on the minimum generalization error and 10-fold cross validation accuracy. Results.— The selected neural network classifier rendered a global 10-class classification validation accuracy corresponding to 89.9%, which, considering the aforementioned 4 diagnostic categories, presented a diagnostic impact accuracy corresponding to 95.8%. Finally, a prospective proof of concept was performed with 100 new cases with a global diagnostic accuracy corresponding to 91%. Conclusions.— The proposed machine learning model was feasible, with a high benefit-cost ratio, as the results were obtained within the complete blood count with differential. Finally, the diagnostic impact with high accuracies in both model validation and proof of concept encourages exploration of the model for real-world application on a daily basis.


2020 ◽  
Author(s):  
Yingjian Liang ◽  
Chengrui Zhu ◽  
Cong Tian ◽  
Qizhong Lin ◽  
Zhiliang Li ◽  
...  

Abstract Background: This study was performed to develop and validate machine learning models for the early detection of ventilator-associated pneumonia (VAP) in patients 24 h before the diagnosis that enables VAP patients to receive early intervention and reduces the occurrence of complications.Patients and Methods: This study was based on the MIMIC-III dataset, which was a retrospective cohort. The random forest algorithm was applied to construct a base classifier, and the area under the receiver operating characteristic (ROC) curve (AUC), sensitivity and specificity of the prediction model were evaluated. Meanwhile, a Clinical Pulmonary Infection Score (CPIS)-based model (threshold value≥3) using the same training and test data set was used as the control model.Results: A total of 38,515 ventilation durations occurred in 61,532 ICU admissions. VAP occurred in 212 of these durations. We incorporated 42 VAP risk factors on admission and routinely measured vital characteristics and laboratory results. Five-fold cross-validation was performed to evaluate the model performance, and the model achieved an AUC of 84.4%±1.7% on validation, 74.3%±2.5% sensitivity and 70.7.6%±1.2% specificity 24 h before the gold standard time (at least 48 h after ventilation). Our VAP machine learning model improved the AUC of the CPIS-based model by almost 25%, and the sensitivity and specificity were also improved by almost 14% and 15%, respectively.Conclusions: We developed and internally validated an automated model of VAP prediction in the MIMIC-III cohort. The VAP prediction model achieved high performance for AUC, sensitivity and specificity. and its performance was superior to that of the CPIS model. External validation and prospective interventional or outcome studies using this prediction model are envisioned as future work.


Sign in / Sign up

Export Citation Format

Share Document