Multi-output Gaussian processes for multi-population longevity modelling

2021 ◽  
pp. 1-28
Author(s):  
Nhan Huynh ◽  
Mike Ludkovski

Abstract We investigate joint modelling of longevity trends using the spatial statistical framework of Gaussian process (GP) regression. Our analysis is motivated by the Human Mortality Database (HMD) that provides unified raw mortality tables for nearly 40 countries. Yet few stochastic models exist for handling more than two populations at a time. To bridge this gap, we leverage a spatial covariance framework from machine learning that treats populations as distinct levels of a factor covariate, explicitly capturing the cross-population dependence. The proposed multi-output GP models straightforwardly scale up to a dozen populations and moreover intrinsically generate coherent joint longevity scenarios. In our numerous case studies, we investigate predictive gains from aggregating mortality experience across nations and genders, including by borrowing the most recently available “foreign” data. We show that in our approach, information fusion leads to more precise (and statistically more credible) forecasts. We implement our models in R, as well as a Bayesian version in Stan that provides further uncertainty quantification regarding the estimated mortality covariance structure. All examples utilise public HMD datasets.

2020 ◽  
Author(s):  
Nicholas Jose ◽  
mikhail Kovalev ◽  
Eric Bradford ◽  
Artur Schweidtmann ◽  
Hua Chun Zeng ◽  
...  

Novel materials are the backbone of major technological advances. However, the development and wide-scale introduction of new materials, such as nanomaterials, is limited by three main factors—the expense of experiments, inefficiency of synthesis methods and complexity of scale-up. Reaching the kilogram scale is a hurdle that takes years of effort for many nanomaterials. We introduce an improved methodology for materials development, combining state-of-the-art techniques—multi-objective machine learning optimization, high yield microreactors and high throughput analysis. We demonstrate this approach by efficiently developing a kg per day reaction process for highly active antibacterial ZnO nanoparticles. The proposed method has the potential to significantly reduce experimental costs, increase process efficiency and enhance material performance, which culminate to form a new pathway for materials discovery.


2019 ◽  
Vol 12 (1) ◽  
Author(s):  
Obiora A. Eneanya ◽  
Claudio Fronterre ◽  
Ifeoma Anagbogu ◽  
Chukwu Okoronkwo ◽  
Tini Garske ◽  
...  

Abstract Introduction The baseline endemicity profile of lymphatic filariasis (LF) is a key benchmark for planning control programmes, monitoring their impact on transmission and assessing the feasibility of achieving elimination. Presented in this work is the modelled serological and parasitological prevalence of LF prior to the scale-up of mass drug administration (MDA) in Nigeria using a machine learning based approach. Methods LF prevalence data generated by the Nigeria Lymphatic Filariasis Control Programme during country-wide mapping surveys conducted between 2000 and 2013 were used to build the models. The dataset comprised of 1103 community-level surveys based on the detection of filarial antigenemia using rapid immunochromatographic card tests (ICT) and 184 prevalence surveys testing for the presence of microfilaria (Mf) in blood. Using a suite of climate and environmental continuous gridded variables and compiled site-level prevalence data, a quantile regression forest (QRF) model was fitted for both antigenemia and microfilaraemia LF prevalence. Model predictions were projected across a continuous 5 × 5 km gridded map of Nigeria. The number of individuals potentially infected by LF prior to MDA interventions was subsequently estimated. Results Maps presented predict a heterogeneous distribution of LF antigenemia and microfilaraemia in Nigeria. The North-Central, North-West, and South-East regions displayed the highest predicted LF seroprevalence, whereas predicted Mf prevalence was highest in the southern regions. Overall, 8.7 million and 3.3 million infections were predicted for ICT and Mf, respectively. Conclusions QRF is a machine learning-based algorithm capable of handling high-dimensional data and fitting complex relationships between response and predictor variables. Our models provide a benchmark through which the progress of ongoing LF control efforts can be monitored.


Author(s):  
Michael McCartney ◽  
Matthias Haeringer ◽  
Wolfgang Polifke

Abstract This paper examines and compares commonly used Machine Learning algorithms in their performance in interpolation and extrapolation of FDFs, based on experimental and simulation data. Algorithm performance is evaluated by interpolating and extrapolating FDFs and then the impact of errors on the limit cycle amplitudes are evaluated using the xFDF framework. The best algorithms in interpolation and extrapolation were found to be the widely used cubic spline interpolation, as well as the Gaussian Processes regressor. The data itself was found to be an important factor in defining the predictive performance of a model, therefore a method of optimally selecting data points at test time using Gaussian Processes was demonstrated. The aim of this is to allow a minimal amount of data points to be collected while still providing enough information to model the FDF accurately. The extrapolation performance was shown to decay very quickly with distance from the domain and so emphasis should be put on selecting measurement points in order to expand the covered domain. Gaussian Processes also give an indication of confidence on its predictions and is used to carry out uncertainty quantification, in order to understand model sensitivities. This was demonstrated through application to the xFDF framework.


1986 ◽  
Vol 23 (01) ◽  
pp. 1-13
Author(s):  
S. E. Hitchcock

Two stochastic models are developed for the predator-prey process. In each case it is shown that ultimate extinction of one of the two populations is certain to occur in finite time. For each model an exact expression is derived for the probability that the predators eventually become extinct when the prey birth rate is 0. These probabilities are used to derive power series approximations to extinction probabilities when the prey birth rate is not 0. On comparison with values obtained by numerical analysis the approximations are shown to be very satisfactory when initial population sizes and prey birth rate are all small. An approximation to the mean number of changes before extinction occurs is also obtained for one of the models.


2012 ◽  
Vol 198-199 ◽  
pp. 1333-1337 ◽  
Author(s):  
San Xi Wei ◽  
Zong Hai Sun

Gaussian processes (GPs) is a very promising technology that has been applied both in the regression problem and the classification problem. In recent years, models based on Gaussian process priors have attracted much attention in the machine learning. Binary (or two-class, C=2) classification using Gaussian process is a very well-developed method. In this paper, a Multi-classification (C>2) method is illustrated, which is based on Binary GPs classification. A good accuracy can be obtained through this method. Meanwhile, a comparison about decision time and accuracy between this method and Support Vector Machine (SVM) is made during the experiments.


2010 ◽  
Vol 23 (10) ◽  
pp. 2759-2781 ◽  
Author(s):  
Martin P. Tingley ◽  
Peter Huybers

Abstract Reconstructing the spatial pattern of a climate field through time from a dataset of overlapping instrumental and climate proxy time series is a nontrivial statistical problem. The need to transform the proxy observations into estimates of the climate field, and the fact that the observed time series are not uniformly distributed in space, further complicate the analysis. Current leading approaches to this problem are based on estimating the full covariance matrix between the proxy time series and instrumental time series over a “calibration” interval and then using this covariance matrix in the context of a linear regression to predict the missing instrumental values from the proxy observations for years prior to instrumental coverage. A fundamentally different approach to this problem is formulated by specifying parametric forms for the spatial covariance and temporal evolution of the climate field, as well as “observation equations” describing the relationship between the data types and the corresponding true values of the climate field. A hierarchical Bayesian model is used to assimilate both proxy and instrumental datasets and to estimate the probability distribution of all model parameters and the climate field through time on a regular spatial grid. The output from this approach includes an estimate of the full covariance structure of the climate field and model parameters as well as diagnostics that estimate the utility of the different proxy time series. This methodology is demonstrated using an instrumental surface temperature dataset after corrupting a number of the time series to mimic proxy observations. The results are compared to those achieved using the regularized expectation–maximization algorithm, and in these experiments the Bayesian algorithm produces reconstructions with greater skill. The assumptions underlying these two methodologies and the results of applying each to simple surrogate datasets are explored in greater detail in Part II.


Sign in / Sign up

Export Citation Format

Share Document