scholarly journals Probabilistic Machine Learning for Healthcare

Author(s):  
Irene Y. Chen ◽  
Shalmali Joshi ◽  
Marzyeh Ghassemi ◽  
Rajesh Ranganath

Machine learning can be used to make sense of healthcare data. Probabilistic machine learning models help provide a complete picture of observed data in healthcare. In this review, we examine how probabilistic machine learning can advance healthcare. We consider challenges in the predictive model building pipeline where probabilistic models can be beneficial, including calibration and missing data. Beyond predictive models, we also investigate the utility of probabilistic machine learning models in phenotyping, in generative models for clinical use cases, and in reinforcement learning. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 4 is July 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Author(s):  
Bethany Percha

Electronic health records (EHRs) are becoming a vital source of data for healthcare quality improvement, research, and operations. However, much of the most valuable information contained in EHRs remains buried in unstructured text. The field of clinical text mining has advanced rapidly in recent years, transitioning from rule-based approaches to machine learning and, more recently, deep learning. With new methods come new challenges, however, especially for those new to the field. This review provides an overview of clinical text mining for those who are encountering it for the first time (e.g., physician researchers, operational analytics teams, machine learning scientists from other domains). While not a comprehensive survey, this review describes the state of the art, with a particular focus on new tasks and methods developed over the past few years. It also identifies key barriers between these remarkable technical advances and the practical realities of implementation in health systems and in industry. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 4 is July 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


Author(s):  
Shira Mitchell ◽  
Eric Potash ◽  
Solon Barocas ◽  
Alexander D’Amour ◽  
Kristian Lum

A recent wave of research has attempted to define fairness quantitatively. In particular, this work has explored what fairness might mean in the context of decisions based on the predictions of statistical and machine learning models. The rapid growth of this new field has led to wildly inconsistent motivations, terminology, and notation, presenting a serious challenge for cataloging and comparing definitions. This article attempts to bring much-needed order. First, we explicate the various choices and assumptions made—often implicitly—to justify the use of prediction-based decision-making. Next, we show how such choices and assumptions can raise fairness concerns and we present a notationally consistent catalog of fairness definitions from the literature. In doing so, we offer a concise reference for thinking through the choices, assumptions, and fairness considerations of prediction-based decision-making. Expected final online publication date for the Annual Review of Statistics, Volume 8 is March 8, 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


Author(s):  
Siobhan Cleary ◽  
Cathal Seoighe

Diploidy has profound implications for population genetics and susceptibility to genetic diseases. Although two copies are present for most genes in the human genome, they are not necessarily both active or active at the same level in a given individual. Genomic imprinting, resulting in exclusive or biased expression in favor of the allele of paternal or maternal origin, is now believed to affect hundreds of human genes. A far greater number of genes display unequal expression of gene copies due to cis-acting genetic variants that perturb gene expression. The availability of data generated by RNA sequencing applied to large numbers of individuals and tissue types has generated unprecedented opportunities to assess the contribution of genetic variation to allelic imbalance in gene expression. Here we review the insights gained through the analysis of these data about the extent of the genetic contribution to allelic expression imbalance, the tools and statistical models for gene expression imbalance, and what the results obtained reveal about the contribution of genetic variants that alter gene expression to complex human diseases and phenotypes. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 4 is July 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


2017 ◽  
Vol 32 (5) ◽  
pp. 1819-1840 ◽  
Author(s):  
David John Gagne ◽  
Amy McGovern ◽  
Sue Ellen Haupt ◽  
Ryan A. Sobash ◽  
John K. Williams ◽  
...  

Abstract Forecasting severe hail accurately requires predicting how well atmospheric conditions support the development of thunderstorms, the growth of large hail, and the minimal loss of hail mass to melting before reaching the surface. Existing hail forecasting techniques incorporate information about these processes from proximity soundings and numerical weather prediction models, but they make many simplifying assumptions, are sensitive to differences in numerical model configuration, and are often not calibrated to observations. In this paper a storm-based probabilistic machine learning hail forecasting method is developed to overcome the deficiencies of existing methods. An object identification and tracking algorithm locates potential hailstorms in convection-allowing model output and gridded radar data. Forecast storms are matched with observed storms to determine hail occurrence and the parameters of the radar-estimated hail size distribution. The database of forecast storms contains information about storm properties and the conditions of the prestorm environment. Machine learning models are used to synthesize that information to predict the probability of a storm producing hail and the radar-estimated hail size distribution parameters for each forecast storm. Forecasts from the machine learning models are produced using two convection-allowing ensemble systems and the results are compared to other hail forecasting methods. The machine learning forecasts have a higher critical success index (CSI) at most probability thresholds and greater reliability for predicting both severe and significant hail.


2021 ◽  
Author(s):  
Luc Thomès ◽  
Rebekka Burkholz ◽  
Daniel Bojar

AbstractAs a biological sequence, glycans occur in every domain of life and comprise monosaccharides that are chained together to form oligo- or polysaccharides. While glycans are crucial for most biological processes, existing analysis modalities make it difficult for researchers with limited computational background to include information from these diverse and nonlinear sequences into standard workflows. Here, we present glycowork, an open-source Python package that was designed for the processing and analysis of glycan data by end users, with a strong focus on glycan-related data science and machine learning. Glycowork includes numerous functions to, for instance, automatically annotate glycan motifs and analyze their distributions via heatmaps and statistical enrichment. We also provide visualization methods, routines to interact with stored databases, trained machine learning models, and learned glycan representations. We envision that glycowork can extract further insights from any glycan dataset and demonstrate this with several workflows that analyze glycan motifs in various biological contexts. Glycowork can be freely accessed at https://github.com/BojarLab/glycowork/.


Author(s):  
George-John Nychas ◽  
Emma Sims ◽  
Panagiotis Tsakanikas ◽  
Fady Mohareb

Food safety is one of the main challenges of the agri-food industry that is expected to be addressed in the current environment of tremendous technological progress, where consumers’ lifestyles and preferences are in a constant state of flux. Food chain transparency and trust are drivers for food integrity control and for improvements in efficiency and economic growth. Similarly, the circular economy has great potential to reduce wastage and improve the efficiency of operations in multi-stakeholder ecosystems. Throughout the food chain cycle, all food commodities are exposed to multiple hazards, resulting in a high likelihood of contamination. Such biological or chemical hazards may be naturally present at any stage of food production, whether accidentally introduced or fraudulently imposed, risking consumers’ health and their faith in the food industry. Nowadays, a massive amount of data is generated, not only from the next generation of food safety monitoring systems and along the entire food chain (primary production included) but also from the internet of things, media, and other devices. These data should be used for the benefit of society, and the scientific field of data science should be a vital player in helping to make this possible. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 4 is July 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


2021 ◽  
Author(s):  
Laura Isabel Galindez Olascoaga ◽  
Wannes Meert ◽  
Marian Verhelst

Sign in / Sign up

Export Citation Format

Share Document