scholarly journals Generalized Read-Across prediction using genra-py

Author(s):  
Imran Shah ◽  
Tia Tate ◽  
Grace Patlewicz

Abstract Motivation Generalized Read-Across (GenRA) is a data-driven approach to estimate physico-chemical, biological or eco-toxicological properties of chemicals by inference from analogues. GenRA attempts to mimic a human expert’s manual read-across reasoning for filling data gaps about new chemicals from known chemicals with an interpretable and automated approach based on nearest-neighbors. A key objective of GenRA is to systematically explore different choices of input data selection and neighborhood definition to objectively evaluate predictive performance of automated read-across estimates of chemical properties. Results We have implemented genra-py as a python package that can be freely used for chemical safety analysis and risk assessment applications. Automated read-across prediction in genra-py conforms to the scikit-learn machine learning library's estimator design pattern, making it easy to use and integrate in computational pipelines. We demonstrate the data-driven application of genra-py to address two key human health risk assessment problems namely: hazard identification and point of departure estimation. Availability and implementation The package is available from github.com/i-shah/genra-py.

2016 ◽  
Author(s):  
Geoffrey Fouad ◽  
André Skupin ◽  
Christina L. Tague

Abstract. Percentile flows are statistics derived from the flow duration curve (FDC) that describe the flow equaled or exceeded for a given percent of time. These statistics provide important information for managing rivers, but are often unavailable since most basins are ungauged. A common approach for predicting percentile flows is to deploy regional regression models based on gauged percentile flows and related independent variables derived from physical and climatic data. The first step of this process identifies groups of basins through a cluster analysis of the independent variables, followed by the development of a regression model for each group. This entire process hinges on the independent variables selected to summarize the physical and climatic state of basins. Distributed physical and climatic datasets now exist for the contiguous United States (US). However, it remains unclear how to best represent these data for the development of regional regression models. The study presented here developed regional regression models for the contiguous US, and evaluated the effect of different approaches for selecting the initial set of independent variables on the predictive performance of the regional regression models. An expert assessment of the dominant controls on the FDC was used to identify a small set of independent variables likely related to percentile flows. A data-driven approach was also applied to evaluate two larger sets of variables that consist of either (1) the averages of data for each basin or (2) both the averages and statistical distribution of basin data distributed in space and time. The small set of variables from the expert assessment of the FDC and two larger sets of variables for the data-driven approach were each applied for a regional regression procedure. Differences in predictive performance were evaluated using 184 validation basins withheld from regression model development. The small set of independent variables selected through expert assessment produced similar, if not better, performance than the two larger sets of variables. A parsimonious set of variables only consisted of mean annual precipitation, potential evapotranspiration, and baseflow index. Additional variables in the two larger sets of variables added little to no predictive information. Regional regression models based on the parsimonious set of variables were developed using 734 calibration basins, and were converted into a tool for predicting 13 percentile flows in the contiguous US. Supplementary Material for this paper includes an R graphical user interface for predicting the percentile flows of basins within the range of conditions used to calibrate the regression models. The equations and performance statistics of the models are also supplied in tabular form.


2019 ◽  
Author(s):  
Floriane Montanari ◽  
Lara Kuhnke ◽  
Antonius ter Laak ◽  
Djork-Arné Clevert

Simple physico-chemical properties like logD, solubility or serum albumin binding have a direct impact on the likelihood of success of compounds in clinical trials. Here, we collected all the Bayer in house data related to these properties and applied machine learning techniques to predict them for new compounds. We report that, for the endpoints studied here, a multitask graph convolutional network appears a highly competitive choice. The new model shows increased predictive performance on all endpoints compared to previous modeling methods.<br>


2019 ◽  
Author(s):  
Floriane Montanari ◽  
Lara Kuhnke ◽  
Antonius ter Laak ◽  
Djork-Arné Clevert

Simple physico-chemical properties like logD, solubility or serum albumin binding have a direct impact on the likelihood of success of compounds in clinical trials. Here, we collected all the Bayer in house data related to these properties and applied machine learning techniques to predict them for new compounds. We report that, for the endpoints studied here, a multitask graph convolutional network appears a highly competitive choice. The new model shows increased predictive performance on all endpoints compared to previous modeling methods.<br>


1997 ◽  
Vol 60 (11) ◽  
pp. 1420-1425 ◽  
Author(s):  
ANNA M. LAMMERDING

This is an overview of the application of risk assessment for evaluating and managing foodborne microbiological health risks. Risk assessment comprises four steps: hazard identification, hazard characterization, exposure assessment, and risk characterization. The process provides a framework for systemic and objective evaluation of all available information pertaining to the foodborne hazard. The outcome of microbial risk assessment is an estimation of the magnitude of human health risk in terms of likelihood of exposure to a pathogenic microorganism in a food and the likelihood and impact of any adverse health effects after exposure. Characterization of the uncertainties and variability in the information used and in the risk estimate itself is part of the overall process. Risk assessment thus provides an objective scientific basis for decision making in ensuring the safety of the food supply. This approach to evaluation and management of microbial food safety risks is still in the developmental stages, but as it evolves it will facilitate the process of establishing microbiological criteria for foods in international trade and guidelines for national standards and policies. Furthermore, a detailed risk assessment can be used to identify critical gaps in our knowledge base, characterize the most important risk factors in the production-to-consumption food chain, help identify strategies for risk reduction, and provide guidance for determining priorities in public health and food safety research programs.


Sign in / Sign up

Export Citation Format

Share Document