Probabilistic Machine Learning for Healthcare

Machine learning can be used to make sense of healthcare data. Probabilistic machine learning models help provide a complete picture of observed data in healthcare. In this review, we examine how probabilistic machine learning can advance healthcare. We consider challenges in the predictive model building pipeline where probabilistic models can be beneficial, including calibration and missing data. Beyond predictive models, we also investigate the utility of probabilistic machine learning models in phenotyping, in generative models for clinical use cases, and in reinforcement learning. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 4 is July 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

Modern Clinical Text Mining: A Guide and Review

Annual Review of Biomedical Data Science ◽

10.1146/annurev-biodatasci-030421-030931 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Bethany Percha

Keyword(s):

Machine Learning ◽

Text Mining ◽

Data Science ◽

Annual Review ◽

Publication Date ◽

Biomedical Data ◽

Clinical Text ◽

Quality Improvement Research ◽

Comprehensive Survey ◽

Technical Advances

Electronic health records (EHRs) are becoming a vital source of data for healthcare quality improvement, research, and operations. However, much of the most valuable information contained in EHRs remains buried in unstructured text. The field of clinical text mining has advanced rapidly in recent years, transitioning from rule-based approaches to machine learning and, more recently, deep learning. With new methods come new challenges, however, especially for those new to the field. This review provides an overview of clinical text mining for those who are encountering it for the first time (e.g., physician researchers, operational analytics teams, machine learning scientists from other domains). While not a comprehensive survey, this review describes the state of the art, with a particular focus on new tasks and methods developed over the past few years. It also identifies key barriers between these remarkable technical advances and the practical realities of implementation in health systems and in industry. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 4 is July 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

Algorithmic Fairness: Choices, Assumptions, and Definitions

Annual Review of Statistics and Its Application ◽

10.1146/annurev-statistics-042720-125902 ◽

2020 ◽

Vol 8 (1) ◽

Author(s):

Shira Mitchell ◽

Eric Potash ◽

Solon Barocas ◽

Alexander D’Amour ◽

Kristian Lum

Keyword(s):

Machine Learning ◽

Decision Making ◽

Rapid Growth ◽

Online Publication ◽

Annual Review ◽

Publication Date ◽

Learning Models ◽

Fairness Concerns ◽

Machine Learning Models

A recent wave of research has attempted to define fairness quantitatively. In particular, this work has explored what fairness might mean in the context of decisions based on the predictions of statistical and machine learning models. The rapid growth of this new field has led to wildly inconsistent motivations, terminology, and notation, presenting a serious challenge for cataloging and comparing definitions. This article attempts to bring much-needed order. First, we explicate the various choices and assumptions made—often implicitly—to justify the use of prediction-based decision-making. Next, we show how such choices and assumptions can raise fairness concerns and we present a notationally consistent catalog of fairness definitions from the literature. In doing so, we offer a concise reference for thinking through the choices, assumptions, and fairness considerations of prediction-based decision-making. Expected final online publication date for the Annual Review of Statistics, Volume 8 is March 8, 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

Perspectives on Allele-Specific Expression

Annual Review of Biomedical Data Science ◽

10.1146/annurev-biodatasci-021621-122219 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Siobhan Cleary ◽

Cathal Seoighe

Keyword(s):

Gene Expression ◽

Genetic Variants ◽

Data Science ◽

Genetic Diseases ◽

Annual Review ◽

Publication Date ◽

Biomedical Data ◽

Specific Expression ◽

Cis Acting ◽

Gene Copies

Diploidy has profound implications for population genetics and susceptibility to genetic diseases. Although two copies are present for most genes in the human genome, they are not necessarily both active or active at the same level in a given individual. Genomic imprinting, resulting in exclusive or biased expression in favor of the allele of paternal or maternal origin, is now believed to affect hundreds of human genes. A far greater number of genes display unequal expression of gene copies due to cis-acting genetic variants that perturb gene expression. The availability of data generated by RNA sequencing applied to large numbers of individuals and tissue types has generated unprecedented opportunities to assess the contribution of genetic variation to allelic imbalance in gene expression. Here we review the insights gained through the analysis of these data about the extent of the genetic contribution to allelic expression imbalance, the tools and statistical models for gene expression imbalance, and what the results obtained reveal about the contribution of genetic variants that alter gene expression to complex human diseases and phenotypes. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 4 is July 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

Classification and Success Investigation of Biomedical Data Sets Using Supervised Machine Learning Models

2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT) ◽

10.1109/ismsit.2019.8932734 ◽

2019 ◽

Author(s):

Sarmad N. Mohammed ◽

Mehmet Serdar Guzel ◽

Erkan Bostanci

Keyword(s):

Machine Learning ◽

Supervised Machine Learning ◽

Data Sets ◽

Biomedical Data ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Predictive Capability Assessment of Probabilistic Machine Learning Models for Density Prediction of Conventional and Synthetic Jet Fuels

Energy & Fuels ◽

10.1021/acs.energyfuels.0c03779 ◽

2021 ◽

Vol 35 (3) ◽

pp. 2520-2530 ◽

Cited By ~ 1

Author(s):

Clemens Hall ◽

Bastian Rauch ◽

Uwe Bauder ◽

Patrick Le Clercq ◽

Manfred Aigner

Keyword(s):

Machine Learning ◽

Synthetic Jet ◽

Jet Fuels ◽

Learning Models ◽

Predictive Capability ◽

Density Prediction ◽

Probabilistic Machine Learning ◽

Machine Learning Models

Download Full-text

Storm-Based Probabilistic Hail Forecasting with Machine Learning Applied to Convection-Allowing Ensembles

Weather and Forecasting ◽

10.1175/waf-d-17-0010.1 ◽

2017 ◽

Vol 32 (5) ◽

pp. 1819-1840 ◽

Cited By ~ 48

Author(s):

David John Gagne ◽

Amy McGovern ◽

Sue Ellen Haupt ◽

Ryan A. Sobash ◽

John K. Williams ◽

...

Keyword(s):

Machine Learning ◽

Size Distribution ◽

Prediction Models ◽

Weather Prediction ◽

Radar Data ◽

Object Identification ◽

Atmospheric Conditions ◽

Learning Models ◽

Probabilistic Machine Learning ◽

Machine Learning Models

Abstract Forecasting severe hail accurately requires predicting how well atmospheric conditions support the development of thunderstorms, the growth of large hail, and the minimal loss of hail mass to melting before reaching the surface. Existing hail forecasting techniques incorporate information about these processes from proximity soundings and numerical weather prediction models, but they make many simplifying assumptions, are sensitive to differences in numerical model configuration, and are often not calibrated to observations. In this paper a storm-based probabilistic machine learning hail forecasting method is developed to overcome the deficiencies of existing methods. An object identification and tracking algorithm locates potential hailstorms in convection-allowing model output and gridded radar data. Forecast storms are matched with observed storms to determine hail occurrence and the parameters of the radar-estimated hail size distribution. The database of forecast storms contains information about storm properties and the conditions of the prestorm environment. Machine learning models are used to synthesize that information to predict the probability of a storm producing hail and the radar-estimated hail size distribution parameters for each forecast storm. Forecasts from the machine learning models are produced using two convection-allowing ensemble systems and the results are compared to other hail forecasting methods. The machine learning forecasts have a higher critical success index (CSI) at most probability thresholds and greater reliability for predicting both severe and significant hail.

Download Full-text

Glycowork: A Python package for glycan data science and machine learning

10.1101/2021.04.22.440981 ◽

2021 ◽

Author(s):

Luc Thomès ◽

Rebekka Burkholz ◽

Daniel Bojar

Keyword(s):

Machine Learning ◽

Open Source ◽

Data Science ◽

Biological Processes ◽

Biological Sequence ◽

Learning Models ◽

Related Data ◽

Strong Focus ◽

Python Package ◽

Machine Learning Models

AbstractAs a biological sequence, glycans occur in every domain of life and comprise monosaccharides that are chained together to form oligo- or polysaccharides. While glycans are crucial for most biological processes, existing analysis modalities make it difficult for researchers with limited computational background to include information from these diverse and nonlinear sequences into standard workflows. Here, we present glycowork, an open-source Python package that was designed for the processing and analysis of glycan data by end users, with a strong focus on glycan-related data science and machine learning. Glycowork includes numerous functions to, for instance, automatically annotate glycan motifs and analyze their distributions via heatmaps and statistical enrichment. We also provide visualization methods, routines to interact with stored databases, trained machine learning models, and learned glycan representations. We envision that glycowork can extract further insights from any glycan dataset and demonstrate this with several workflows that analyze glycan motifs in various biological contexts. Glycowork can be freely accessed at https://github.com/BojarLab/glycowork/.

Download Full-text

A Review on Use of Data Science for Visualization and Prediction of the COVID-19 Pandemic and Early Diagnosis of COVID-19 Using Machine Learning Models

Studies in Big Data - Internet of Medical Things for Smart Healthcare ◽

10.1007/978-981-15-8097-0_10 ◽

2020 ◽

pp. 241-265

Author(s):

Shiv Kumar Choubey ◽

Harshit Naman

Keyword(s):

Machine Learning ◽

Early Diagnosis ◽

Data Science ◽

Learning Models ◽

Use Of Data ◽

Machine Learning Models

Download Full-text

Data Science in the Food Industry

Annual Review of Biomedical Data Science ◽

10.1146/annurev-biodatasci-020221-123602 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

George-John Nychas ◽

Emma Sims ◽

Panagiotis Tsakanikas ◽

Fady Mohareb

Keyword(s):

Food Safety ◽

Food Chain ◽

Food Industry ◽

Data Science ◽

Annual Review ◽

Publication Date ◽

Biomedical Data ◽

Constant State ◽

Food Integrity ◽

Multi Stakeholder

Food safety is one of the main challenges of the agri-food industry that is expected to be addressed in the current environment of tremendous technological progress, where consumers’ lifestyles and preferences are in a constant state of flux. Food chain transparency and trust are drivers for food integrity control and for improvements in efficiency and economic growth. Similarly, the circular economy has great potential to reduce wastage and improve the efficiency of operations in multi-stakeholder ecosystems. Throughout the food chain cycle, all food commodities are exposed to multiple hazards, resulting in a high likelihood of contamination. Such biological or chemical hazards may be naturally present at any stage of food production, whether accidentally introduced or fraudulently imposed, risking consumers’ health and their faith in the food industry. Nowadays, a massive amount of data is generated, not only from the next generation of food safety monitoring systems and along the entire food chain (primary production included) but also from the internet of things, media, and other devices. These data should be used for the benefit of society, and the scientific field of data science should be a vital player in helping to make this possible. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 4 is July 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

Hardware-Aware Probabilistic Machine Learning Models

10.1007/978-3-030-74042-9 ◽

2021 ◽

Author(s):

Laura Isabel Galindez Olascoaga ◽

Wannes Meert ◽

Marian Verhelst

Keyword(s):

Machine Learning ◽

Learning Models ◽

Probabilistic Machine Learning ◽

Machine Learning Models

Download Full-text