The Impact of Programming Language’s Type on Probabilistic Machine Learning Models

330 Background: Machine learning models are well-positioned to transform cancer care delivery by providing oncologists with more accurate or accessible information to augment clinical decisions. Many machine learning projects, however, focus on model accuracy without considering the impact of using the model in real-world settings and rarely carry forward to clinical implementation. We present a human-centered systems engineering approach to address clinical problems with workflow interventions utilizing machine learning algorithms. Methods: We aimed to develop a mortality predictive tool, using a Random Forest algorithm, to identify oncology patients at high risk of death within 30 days to move advance care planning (ACP) discussions earlier in the illness trajectory. First, a project sponsor defined the clinical need and requirements of an intervention. The data scientists developed the predictive algorithm using data available in the electronic health record (EHR). A multidisciplinary workgroup was assembled including oncology physicians, advanced practice providers, nurses, social workers, chaplain, clinical informaticists, and data scientists. Meeting bi-monthly, the group utilized human-centered design (HCD) methods to understand clinical workflows and identify points of intervention. The workgroup completed a workflow redesign workshop, a 90-minute facilitated group discussion, to integrate the model in a future state workflow. An EHR (Epic) analyst built the user interface to support the intervention per the group’s requirements. The workflow was piloted in thoracic oncology and bone marrow transplant with plans to scale to other cancer clinics. Results: Our predictive model performance on test data was acceptable (sensitivity 75%, specificity 75%, F-1 score 0.71, AUC 0.82). The workgroup identified a “quality of life coordinator” who: reviews an EHR report of patients scheduled in the upcoming 7 days who have a high risk of 30-day mortality; works with the oncology team to determine ACP clinical appropriateness; documents the need for ACP; identifies potential referrals to supportive oncology, social work, or chaplain; and coordinates the oncology appointment. The oncologist receives a reminder on the day of the patient’s scheduled visit. Conclusions: This workgroup is a viable approach that can be replicated at institutions to address clinical needs and realize the full potential of machine learning models in healthcare. The next steps for this project are to address end-user feedback from the pilot, expand the intervention to other cancer disease groups, and track clinical metrics.

Download Full-text

Influence of social determinants of health and county vaccination rates on machine learning models to predict COVID-19 case growth in Tennessee

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2021-100439 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100439

Author(s):

Lukasz S Wylezinski ◽

Coleman R Harris ◽

Cody N Heiser ◽

Jamieson D Gray ◽

Charles F Spurlock

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Social Determinants Of Health ◽

Social Determinants ◽

Determinants Of Health ◽

Learning Models ◽

Vaccination Rates ◽

Data Framework ◽

The Impact ◽

Machine Learning Models

IntroductionThe SARS-CoV-2 (COVID-19) pandemic has exposed health disparities throughout the USA, particularly among racial and ethnic minorities. As a result, there is a need for data-driven approaches to pinpoint the unique constellation of clinical and social determinants of health (SDOH) risk factors that give rise to poor patient outcomes following infection in US communities.MethodsWe combined county-level COVID-19 testing data, COVID-19 vaccination rates and SDOH information in Tennessee. Between February and May 2021, we trained machine learning models on a semimonthly basis using these datasets to predict COVID-19 incidence in Tennessee counties. We then analyzed SDOH data features at each time point to rank the impact of each feature on model performance.ResultsOur results indicate that COVID-19 vaccination rates play a crucial role in determining future COVID-19 disease risk. Beginning in mid-March 2021, higher vaccination rates significantly correlated with lower COVID-19 case growth predictions. Further, as the relative importance of COVID-19 vaccination data features grew, demographic SDOH features such as age, race and ethnicity decreased while the impact of socioeconomic and environmental factors, including access to healthcare and transportation, increased.ConclusionIncorporating a data framework to track the evolving patterns of community-level SDOH risk factors could provide policy-makers with additional data resources to improve health equity and resilience to future public health emergencies.

Download Full-text

Practical Considerations for Accuracy Evaluation in Sensor-Based Machine Learning and Deep Learning

Sensors ◽

10.3390/s19163491 ◽

2019 ◽

Vol 19 (16) ◽

pp. 3491 ◽

Cited By ~ 1

Author(s):

Issam Hammad ◽

Kamal El-Sankary

Keyword(s):

Machine Learning ◽

Thermal Noise ◽

Error Resilience ◽

Sensor Data ◽

Accuracy Evaluation ◽

Sensor Failure ◽

Learning Models ◽

Analog To Digital ◽

The Impact ◽

Machine Learning Models

Accuracy evaluation in machine learning is based on the split of data into a training set and a test set. This critical step is applied to develop machine learning models including models based on sensor data. For sensor-based problems, comparing the accuracy of machine learning models using the train/test split provides only a baseline comparison in ideal situations. Such comparisons won’t consider practical production problems that can impact the inference accuracy such as the sensors’ thermal noise, performance with lower inference quantization, and tolerance to sensor failure. Therefore, this paper proposes a set of practical tests that can be applied when comparing the accuracy of machine learning models for sensor-based problems. First, the impact of the sensors’ thermal noise on the models’ inference accuracy was simulated. Machine learning algorithms have different levels of error resilience to thermal noise, as will be presented. Second, the models’ accuracy using lower inference quantization was compared. Lowering inference quantization leads to lowering the analog-to-digital converter (ADC) resolution which is cost-effective in embedded designs. Moreover, in custom designs, analog-to-digital converters’ (ADCs) effective number of bits (ENOB) is usually lower than the ideal number of bits due to various design factors. Therefore, it is practical to compare models’ accuracy using lower inference quantization. Third, the models’ accuracy tolerance to sensor failure was evaluated and compared. For this study, University of California Irvine (UCI) ‘Daily and Sports Activities’ dataset was used to present these practical tests and their impact on model selection.

Download Full-text

Identification of Primary Antimicrobial Resistance Drivers in Agricultural Nontyphoidal Salmonella enterica Serovars by Using Machine Learning

mSystems ◽

10.1128/msystems.00211-19 ◽

2019 ◽

Vol 4 (4) ◽

Cited By ~ 1

Author(s):

Finlay Maguire ◽

Muhammad Attiq Rehman ◽

Catherine Carrillo ◽

Moussa S. Diarra ◽

Robert G. Beiko

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Antimicrobial Resistance ◽

Broiler Chicken ◽

Genomic Data ◽

Set Covering ◽

Learning Models ◽

Commercial Chicken ◽

The Impact ◽

Machine Learning Models

ABSTRACT Nontyphoidal Salmonella (NTS) is a leading global cause of bacterial foodborne morbidity and mortality. Our ability to treat severe NTS infections has been impaired by increasing antimicrobial resistance (AMR). To understand and mitigate the global health crisis AMR represents, we need to link the observed resistance phenotypes with their underlying genomic mechanisms. Broiler chickens represent a key reservoir and vector for NTS infections, but isolates from this setting have been characterized in only very low numbers relative to clinical isolates. In this study, we sequenced and assembled 97 genomes encompassing 7 serotypes isolated from broiler chicken in farms in British Columbia between 2005 and 2008. Through application of machine learning (ML) models to predict the observed AMR phenotype from this genomic data, we were able to generate highly (0.92 to 0.99) precise logistic regression models using known AMR gene annotations as features for 7 antibiotics (amoxicillin-clavulanic acid, ampicillin, cefoxitin, ceftiofur, ceftriaxone, streptomycin, and tetracycline). Similarly, we also trained “reference-free” k-mer-based set-covering machine phenotypic prediction models (0.91 to 1.0 precision) for these antibiotics. By combining the inferred k-mers and logistic regression weights, we identified the primary drivers of AMR for the 7 studied antibiotics in these isolates. With our research representing one of the largest studies of a diverse set of NTS isolates from broiler chicken, we can thus confirm that the AmpC-like CMY-2 β-lactamase is a primary driver of β-lactam resistance and that the phosphotransferases APH(6)-Id and APH(3″-Ib) are the principal drivers of streptomycin resistance in this important ecosystem. IMPORTANCE Antimicrobial resistance (AMR) represents an existential threat to the function of modern medicine. Genomics and machine learning methods are being increasingly used to analyze and predict AMR. This type of surveillance is very important to try to reduce the impact of AMR. Machine learning models are typically trained using genomic data, but the aspects of the genomes that they use to make predictions are rarely analyzed. In this work, we showed how, by using different types of machine learning models and performing this analysis, it is possible to identify the key genes underlying AMR in nontyphoidal Salmonella (NTS). NTS is among the leading cause of foodborne illness globally; however, AMR in NTS has not been heavily studied within the food chain itself. Therefore, in this work we performed a broad-scale analysis of the AMR in NTS isolates from commercial chicken farms and identified some priority AMR genes for surveillance.

Download Full-text

Predictive Capability Assessment of Probabilistic Machine Learning Models for Density Prediction of Conventional and Synthetic Jet Fuels

Energy & Fuels ◽

10.1021/acs.energyfuels.0c03779 ◽

2021 ◽

Vol 35 (3) ◽

pp. 2520-2530 ◽

Cited By ~ 1

Author(s):

Clemens Hall ◽

Bastian Rauch ◽

Uwe Bauder ◽

Patrick Le Clercq ◽

Manfred Aigner

Keyword(s):

Machine Learning ◽

Synthetic Jet ◽

Jet Fuels ◽

Learning Models ◽

Predictive Capability ◽

Density Prediction ◽

Probabilistic Machine Learning ◽

Machine Learning Models

Download Full-text

Storm-Based Probabilistic Hail Forecasting with Machine Learning Applied to Convection-Allowing Ensembles

Weather and Forecasting ◽

10.1175/waf-d-17-0010.1 ◽

2017 ◽

Vol 32 (5) ◽

pp. 1819-1840 ◽

Cited By ~ 48

Author(s):

David John Gagne ◽

Amy McGovern ◽

Sue Ellen Haupt ◽

Ryan A. Sobash ◽

John K. Williams ◽

...

Keyword(s):

Machine Learning ◽

Size Distribution ◽

Prediction Models ◽

Weather Prediction ◽

Radar Data ◽

Object Identification ◽

Atmospheric Conditions ◽

Learning Models ◽

Probabilistic Machine Learning ◽

Machine Learning Models

Abstract Forecasting severe hail accurately requires predicting how well atmospheric conditions support the development of thunderstorms, the growth of large hail, and the minimal loss of hail mass to melting before reaching the surface. Existing hail forecasting techniques incorporate information about these processes from proximity soundings and numerical weather prediction models, but they make many simplifying assumptions, are sensitive to differences in numerical model configuration, and are often not calibrated to observations. In this paper a storm-based probabilistic machine learning hail forecasting method is developed to overcome the deficiencies of existing methods. An object identification and tracking algorithm locates potential hailstorms in convection-allowing model output and gridded radar data. Forecast storms are matched with observed storms to determine hail occurrence and the parameters of the radar-estimated hail size distribution. The database of forecast storms contains information about storm properties and the conditions of the prestorm environment. Machine learning models are used to synthesize that information to predict the probability of a storm producing hail and the radar-estimated hail size distribution parameters for each forecast storm. Forecasts from the machine learning models are produced using two convection-allowing ensemble systems and the results are compared to other hail forecasting methods. The machine learning forecasts have a higher critical success index (CSI) at most probability thresholds and greater reliability for predicting both severe and significant hail.

Download Full-text

Rainfall Prediction for Udaipur, Rajasthan using Machine Learning Models Based on Temperature, Vapour Pressure and Relative Humidity

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f1024.0386s20 ◽

2020 ◽

Vol 8 (6S) ◽

pp. 133-137

Keyword(s):

Machine Learning ◽

Relative Humidity ◽

Vapour Pressure ◽

Predictor Variables ◽

Ensemble Model ◽

Learning Models ◽

Rainfall Prediction ◽

Predictor Importance ◽

The Impact ◽

Machine Learning Models

The study aims at Rainfall prediction using Machine Learning models using the minimum of features. The prediction here is based on temperature, vapour pressure and relative humidity. Numerous studies carried out earlier used more features than this study. A training-test split of 75-25 was used. The best results were obtained by combining the best of the candidate models into an ensemble model to identify that predictor importance of vapour pressure was 0.89 while that of relative humidity was 0.11 with temperature not seen as a significant predictor for rainfall though the high correlation of temperature (°C) with vapour pressure (Torr) and relative humidity (Percentage) suggests that the two predictor variables subsume the impact of temperature.

Download Full-text

THE IMPACT OF ARTIFICIAL NEURAL NETWORK’S STRUCTURE ON ITS EFFICIENCY FOR FINANCIAL INDICATORS FORECASTING

the System analysis and logistics ◽

10.31799/2077-5687-2021-2-44-51 ◽

2021 ◽

Vol 2 (28) ◽

pp. 44-51

Author(s):

B. S. Ermakov ◽

Keyword(s):

Machine Learning ◽

The Other ◽

Financial Indicators ◽

Test Results ◽

Learning Models ◽

Multiple Tests ◽

Artificial Neural ◽

The Impact ◽

Overfitting Problem ◽

Machine Learning Models

The article investigates the influence of artificial neural network’s structure on the results, with example of multlayer perceptron for forecasting some of the financial indicators. Multiple tests were made with various networks structures: different numbers of hidden layers and different numbers of neurons in these layers. Based on tests results, the increase of network’s size is effective to a certain extent, but at some point the further size increase is unreasonable. Also, the test results demonstrate that overfitting problem for multilayer perceptron is not as crucial as for the other machine learning models, such as regression. Key words: artificial neural networks, forecasting, multlayer perceptron, overfitting, artificial neural netwok’s size.

Download Full-text

Systematic Review of Approaches to Preserve Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine

Applied Clinical Informatics ◽

10.1055/s-0041-1735184 ◽

2021 ◽

Vol 12 (04) ◽

pp. 808-815

Author(s):

Lin Lawrence Guo ◽

Stephen R. Pfohl ◽

Jason Fries ◽

Jose Posada ◽

Scott Lanyon Fleming ◽

...

Keyword(s):

Machine Learning ◽

Decision Making ◽

Clinical Decision Making ◽

Clinical Medicine ◽

Mitigation Strategies ◽

Learning Performance ◽

Learning Models ◽

Dataset Shift ◽

The Impact ◽

Machine Learning Models

Abstract Objective The change in performance of machine learning models over time as a result of temporal dataset shift is a barrier to machine learning-derived models facilitating decision-making in clinical practice. Our aim was to describe technical procedures used to preserve the performance of machine learning models in the presence of temporal dataset shifts. Methods Studies were included if they were fully published articles that used machine learning and implemented a procedure to mitigate the effects of temporal dataset shift in a clinical setting. We described how dataset shift was measured, the procedures used to preserve model performance, and their effects. Results Of 4,457 potentially relevant publications identified, 15 were included. The impact of temporal dataset shift was primarily quantified using changes, usually deterioration, in calibration or discrimination. Calibration deterioration was more common (n = 11) than discrimination deterioration (n = 3). Mitigation strategies were categorized as model level or feature level. Model-level approaches (n = 15) were more common than feature-level approaches (n = 2), with the most common approaches being model refitting (n = 12), probability calibration (n = 7), model updating (n = 6), and model selection (n = 6). In general, all mitigation strategies were successful at preserving calibration but not uniformly successful in preserving discrimination. Conclusion There was limited research in preserving the performance of machine learning models in the presence of temporal dataset shift in clinical medicine. Future research could focus on the impact of dataset shift on clinical decision making, benchmark the mitigation strategies on a wider range of datasets and tasks, and identify optimal strategies for specific settings.

Download Full-text

Hardware-Aware Probabilistic Machine Learning Models

10.1007/978-3-030-74042-9 ◽

2021 ◽

Author(s):

Laura Isabel Galindez Olascoaga ◽

Wannes Meert ◽

Marian Verhelst

Keyword(s):

Machine Learning ◽

Learning Models ◽

Probabilistic Machine Learning ◽

Machine Learning Models

Download Full-text