Multi-output Gaussian processes for multi-population longevity modelling

Annals of Actuarial Science ◽

10.1017/s1748499521000142 ◽

2021 ◽

pp. 1-28

Author(s):

Nhan Huynh ◽

Mike Ludkovski

Keyword(s):

Machine Learning ◽

Information Fusion ◽

Gaussian Processes ◽

Stochastic Models ◽

Scale Up ◽

Covariance Structure ◽

Spatial Covariance ◽

Mortality Experience ◽

Statistical Framework ◽

Two Populations

Abstract We investigate joint modelling of longevity trends using the spatial statistical framework of Gaussian process (GP) regression. Our analysis is motivated by the Human Mortality Database (HMD) that provides unified raw mortality tables for nearly 40 countries. Yet few stochastic models exist for handling more than two populations at a time. To bridge this gap, we leverage a spatial covariance framework from machine learning that treats populations as distinct levels of a factor covariate, explicitly capturing the cross-population dependence. The proposed multi-output GP models straightforwardly scale up to a dozen populations and moreover intrinsically generate coherent joint longevity scenarios. In our numerous case studies, we investigate predictive gains from aggregating mortality experience across nations and genders, including by borrowing the most recently available “foreign” data. We show that in our approach, information fusion leads to more precise (and statistically more credible) forecasts. We implement our models in R, as well as a Bayesian version in Stan that provides further uncertainty quantification regarding the estimated mortality covariance structure. All examples utilise public HMD datasets.

Download Full-text

Hydrological drought forecasting using multi-scalar streamflow drought index, stochastic models and machine learning approaches, in northern Iran

Stochastic Environmental Research and Risk Assessment ◽

10.1007/s00477-020-01949-z ◽

2021 ◽

Author(s):

Pouya Aghelpour ◽

Hadigheh Bahrami-Pichaghchi ◽

Vahid Varshavian

Keyword(s):

Machine Learning ◽

Stochastic Models ◽

Drought Index ◽

Hydrological Drought ◽

Learning Approaches ◽

Drought Forecasting ◽

Northern Iran ◽

Streamflow Drought Index

Download Full-text

Stable Likelihood Computation for Machine Learning of Linear Differential Operators with Gaussian Processes

International Journal for Uncertainty Quantification ◽

10.1615/int.j.uncertaintyquantification.2022038966 ◽

2022 ◽

Author(s):

Omid Chatrabgoun ◽

Mohsen Esmaeilbeigi ◽

Meisam Cheraghi ◽

Alireza Daneshkhah

Keyword(s):

Machine Learning ◽

Gaussian Processes ◽

Differential Operators ◽

Linear Differential Operators ◽

Linear Differential

Download Full-text

A framework for occupancy prediction based on image information fusion and machine learning

Building and Environment ◽

10.1016/j.buildenv.2021.108524 ◽

2021 ◽

pp. 108524

Author(s):

Yuren Yang ◽

Ting Pan ◽

Ye Yuan ◽

Xingyu Zang ◽

Gang Liu

Keyword(s):

Machine Learning ◽

Information Fusion ◽

Image Information ◽

Occupancy Prediction

Download Full-text

Pushing Nanomaterials past the Kilogram Scale—a Targeted Approach Integrating Scalable Microreactors, Machine Learning and High-Throughput Analysis

10.26434/chemrxiv.12732914 ◽

2020 ◽

Author(s):

Nicholas Jose ◽

mikhail Kovalev ◽

Eric Bradford ◽

Artur Schweidtmann ◽

Hua Chun Zeng ◽

...

Keyword(s):

Machine Learning ◽

High Throughput ◽

Scale Up ◽

Process Efficiency ◽

High Yield ◽

Synthesis Methods ◽

High Throughput Analysis ◽

Throughput Analysis ◽

Highly Active ◽

Technological Advances

Novel materials are the backbone of major technological advances. However, the development and wide-scale introduction of new materials, such as nanomaterials, is limited by three main factors—the expense of experiments, inefficiency of synthesis methods and complexity of scale-up. Reaching the kilogram scale is a hurdle that takes years of effort for many nanomaterials. We introduce an improved methodology for materials development, combining state-of-the-art techniques—multi-objective machine learning optimization, high yield microreactors and high throughput analysis. We demonstrate this approach by efficiently developing a kg per day reaction process for highly active antibacterial ZnO nanoparticles. The proposed method has the potential to significantly reduce experimental costs, increase process efficiency and enhance material performance, which culminate to form a new pathway for materials discovery.

Download Full-text

Gaussian Processes in Machine Learning

Advanced Lectures on Machine Learning - Lecture Notes in Computer Science ◽

10.1007/978-3-540-28650-9_4 ◽

2004 ◽

pp. 63-71 ◽

Cited By ~ 704

Author(s):

Carl Edward Rasmussen

Keyword(s):

Machine Learning ◽

Gaussian Processes

Download Full-text

Mapping the baseline prevalence of lymphatic filariasis across Nigeria

Parasites & Vectors ◽

10.1186/s13071-019-3682-6 ◽

2019 ◽

Vol 12 (1) ◽

Author(s):

Obiora A. Eneanya ◽

Claudio Fronterre ◽

Ifeoma Anagbogu ◽

Chukwu Okoronkwo ◽

Tini Garske ◽

...

Keyword(s):

Machine Learning ◽

Lymphatic Filariasis ◽

Scale Up ◽

Control Programme ◽

Prevalence Data ◽

Heterogeneous Distribution ◽

North West ◽

The North ◽

Control Programmes ◽

Complex Relationships

Abstract Introduction The baseline endemicity profile of lymphatic filariasis (LF) is a key benchmark for planning control programmes, monitoring their impact on transmission and assessing the feasibility of achieving elimination. Presented in this work is the modelled serological and parasitological prevalence of LF prior to the scale-up of mass drug administration (MDA) in Nigeria using a machine learning based approach. Methods LF prevalence data generated by the Nigeria Lymphatic Filariasis Control Programme during country-wide mapping surveys conducted between 2000 and 2013 were used to build the models. The dataset comprised of 1103 community-level surveys based on the detection of filarial antigenemia using rapid immunochromatographic card tests (ICT) and 184 prevalence surveys testing for the presence of microfilaria (Mf) in blood. Using a suite of climate and environmental continuous gridded variables and compiled site-level prevalence data, a quantile regression forest (QRF) model was fitted for both antigenemia and microfilaraemia LF prevalence. Model predictions were projected across a continuous 5 × 5 km gridded map of Nigeria. The number of individuals potentially infected by LF prior to MDA interventions was subsequently estimated. Results Maps presented predict a heterogeneous distribution of LF antigenemia and microfilaraemia in Nigeria. The North-Central, North-West, and South-East regions displayed the highest predicted LF seroprevalence, whereas predicted Mf prevalence was highest in the southern regions. Overall, 8.7 million and 3.3 million infections were predicted for ICT and Mf, respectively. Conclusions QRF is a machine learning-based algorithm capable of handling high-dimensional data and fitting complex relationships between response and predictor variables. Our models provide a benchmark through which the progress of ongoing LF control efforts can be monitored.

Download Full-text

Comparison of Machine Learning Algorithms in the Interpolation and Extrapolation of Flame Describing Functions

Volume 4B: Combustion, Fuels, and Emissions ◽

10.1115/gt2019-91319 ◽

2019 ◽

Author(s):

Michael McCartney ◽

Matthias Haeringer ◽

Wolfgang Polifke

Keyword(s):

Machine Learning ◽

Gaussian Processes ◽

Spline Interpolation ◽

Learning Algorithms ◽

Predictive Performance ◽

Machine Learning Algorithms ◽

Test Time ◽

Minimal Amount ◽

Data Points ◽

The Impact

Abstract This paper examines and compares commonly used Machine Learning algorithms in their performance in interpolation and extrapolation of FDFs, based on experimental and simulation data. Algorithm performance is evaluated by interpolating and extrapolating FDFs and then the impact of errors on the limit cycle amplitudes are evaluated using the xFDF framework. The best algorithms in interpolation and extrapolation were found to be the widely used cubic spline interpolation, as well as the Gaussian Processes regressor. The data itself was found to be an important factor in defining the predictive performance of a model, therefore a method of optimally selecting data points at test time using Gaussian Processes was demonstrated. The aim of this is to allow a minimal amount of data points to be collected while still providing enough information to model the FDF accurately. The extrapolation performance was shown to decay very quickly with distance from the domain and so emphasis should be put on selecting measurement points in order to expand the covered domain. Gaussian Processes also give an indication of confidence on its predictions and is used to carry out uncertainty quantification, in order to understand model sensitivities. This was demonstrated through application to the xFDF framework.

Download Full-text

Extinction probabilities in predator–prey models

Journal of Applied Probability ◽

10.1017/s0021900200106230 ◽

1986 ◽

Vol 23 (01) ◽

pp. 1-13

Author(s):

S. E. Hitchcock

Keyword(s):

Stochastic Models ◽

Birth Rate ◽

Exact Expression ◽

Initial Population ◽

Predator Prey ◽

Extinction Probabilities ◽

Population Sizes ◽

The Mean ◽

Predator Prey Models ◽

Two Populations

Two stochastic models are developed for the predator-prey process. In each case it is shown that ultimate extinction of one of the two populations is certain to occur in finite time. For each model an exact expression is derived for the probability that the predators eventually become extinct when the prey birth rate is 0. These probabilities are used to derive power series approximations to extinction probabilities when the prey birth rate is not 0. On comparison with values obtained by numerical analysis the approximations are shown to be very satisfactory when initial population sizes and prey birth rate are all small. An approximation to the mean number of changes before extinction occurs is also obtained for one of the models.

Download Full-text

A Multi-Classification Method Based on Gaussian Processes

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.198-199.1333 ◽

2012 ◽

Vol 198-199 ◽

pp. 1333-1337 ◽

Cited By ~ 2

Author(s):

San Xi Wei ◽

Zong Hai Sun

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Gaussian Process ◽

Gaussian Processes ◽

Good Accuracy ◽

Classification Problem ◽

Decision Time ◽

Support Vector ◽

Regression Problem ◽

Multi Classification

Gaussian processes (GPs) is a very promising technology that has been applied both in the regression problem and the classification problem. In recent years, models based on Gaussian process priors have attracted much attention in the machine learning. Binary (or two-class, C=2) classification using Gaussian process is a very well-developed method. In this paper, a Multi-classification (C>2) method is illustrated, which is based on Binary GPs classification. A good accuracy can be obtained through this method. Meanwhile, a comparison about decision time and accuracy between this method and Support Vector Machine (SVM) is made during the experiments.

Download Full-text

A Bayesian Algorithm for Reconstructing Climate Anomalies in Space and Time. Part I: Development and Applications to Paleoclimate Reconstruction Problems

Journal of Climate ◽

10.1175/2009jcli3015.1 ◽

2010 ◽

Vol 23 (10) ◽

pp. 2759-2781 ◽

Cited By ~ 95

Author(s):

Martin P. Tingley ◽

Peter Huybers

Keyword(s):

Time Series ◽

Covariance Matrix ◽

Expectation Maximization Algorithm ◽

Covariance Structure ◽

Model Parameters ◽

Data Types ◽

Spatial Covariance ◽

Bayesian Algorithm ◽

Climate Anomalies ◽

Instrumental Values

Abstract Reconstructing the spatial pattern of a climate field through time from a dataset of overlapping instrumental and climate proxy time series is a nontrivial statistical problem. The need to transform the proxy observations into estimates of the climate field, and the fact that the observed time series are not uniformly distributed in space, further complicate the analysis. Current leading approaches to this problem are based on estimating the full covariance matrix between the proxy time series and instrumental time series over a “calibration” interval and then using this covariance matrix in the context of a linear regression to predict the missing instrumental values from the proxy observations for years prior to instrumental coverage. A fundamentally different approach to this problem is formulated by specifying parametric forms for the spatial covariance and temporal evolution of the climate field, as well as “observation equations” describing the relationship between the data types and the corresponding true values of the climate field. A hierarchical Bayesian model is used to assimilate both proxy and instrumental datasets and to estimate the probability distribution of all model parameters and the climate field through time on a regular spatial grid. The output from this approach includes an estimate of the full covariance structure of the climate field and model parameters as well as diagnostics that estimate the utility of the different proxy time series. This methodology is demonstrated using an instrumental surface temperature dataset after corrupting a number of the time series to mimic proxy observations. The results are compared to those achieved using the regularized expectation–maximization algorithm, and in these experiments the Bayesian algorithm produces reconstructions with greater skill. The assumptions underlying these two methodologies and the results of applying each to simple surrogate datasets are explored in greater detail in Part II.

Download Full-text