scholarly journals Integrated Optimal Fingerprinting: Method Description and Illustration

2016 ◽  
Vol 29 (6) ◽  
pp. 1977-1998 ◽  
Author(s):  
Alexis Hannart

Abstract The present paper introduces and illustrates methodological developments intended for so-called optimal fingerprinting methods, which are of frequent use in detection and attribution studies. These methods used to involve three independent steps: preliminary reduction of the dimension of the data, estimation of the covariance associated to internal climate variability, and, finally, linear regression inference with associated uncertainty assessment. It is argued that such a compartmentalized treatment presents several issues; an integrated method is thus introduced to address them. The suggested approach is based on a single-piece statistical model that represents both linear regression and control runs. The unknown covariance is treated as a nuisance parameter that is eliminated by integration. This allows for the introduction of regularization assumptions. Point estimates and confidence intervals follow from the integrated likelihood. Further, it is shown that preliminary dimension reduction is not required for implementability and that computational issues associated to using the raw, high-dimensional, spatiotemporal data can be resolved quite easily. Results on simulated data show improved performance compared to existing methods w.r.t. both estimation error and accuracy of confidence intervals and also highlight the need for further improvements regarding the latter. The method is illustrated on twentieth-century precipitation and surface temperature, suggesting a potentially high informational benefit of using the raw, nondimension-reduced data in detection and attribution (D&A), provided model error is appropriately built into the inference.

2020 ◽  
Author(s):  
Georgia Ntani ◽  
Hazel Inskip ◽  
Clive Osmond ◽  
David Coggon

Abstract BackgroundClustering of observations is a common phenomenon in epidemiological and clinical research. Previous studies have highlighted the importance of using multilevel analysis to account for such clustering, but in practice, methods ignoring clustering are often used. We used simulated data to explore the circumstances in which failure to account for clustering in linear regression analysis could lead to importantly erroneous conclusions. MethodsWe simulated data following the random-intercept model specification under different scenarios of clustering of a continuous outcome and a single continuous or binary explanatory variable. We fitted random-intercept (RI) and cluster-unadjusted ordinary least squares (OLS) models and compared the derived estimates of effect, as quantified by regression coefficients, and their estimated precision. We also assessed the extent to which coverage by 95% confidence intervals and rates of Type I error were appropriate. ResultsWe found that effects estimated from OLS linear regression models that ignored clustering were on average unbiased. The precision of effect estimates from the OLS model was overestimated when both the outcome and explanatory variable were continuous. By contrast, in linear regression with a binary explanatory variable, in most circumstances, the precision of effects was somewhat underestimated by the OLS model. The magnitude of bias, both in point estimates and their precision, increased with greater clustering of the outcome variable, and was influenced also by the amount of clustering in the explanatory variable. The cluster-unadjusted model resulted in poor coverage rates by 95% confidence intervals and high rates of Type I error especially when the explanatory variable was continuous. ConclusionsIn this study we identified situations in which an OLS regression model is more likely to affect statistical inference, namely when the explanatory variable is continuous, and its intraclass correlation coefficient is higher than 0.01. Situations in which statistical inference is less likely to be affected have also been identified.


Author(s):  
Hervé Cardot ◽  
Pascal Sarda

This article presents a selected bibliography on functional linear regression (FLR) and highlights the key contributions from both applied and theoretical points of view. It first defines FLR in the case of a scalar response and shows how its modelization can also be extended to the case of a functional response. It then considers two kinds of estimation procedures for this slope parameter: projection-based estimators in which regularization is performed through dimension reduction, such as functional principal component regression, and penalized least squares estimators that take into account a penalized least squares minimization problem. The article proceeds by discussing the main asymptotic properties separating results on mean square prediction error and results on L2 estimation error. It also describes some related models, including generalized functional linear models and FLR on quantiles, and concludes with a complementary bibliography and some open problems.


Author(s):  
Dylan J. Foster ◽  
Vasilis Syrgkanis

We provide excess risk guarantees for statistical learning in a setting where the population risk with respect to which we evaluate a target parameter depends on an unknown parameter that must be estimated from data (a "nuisance parameter"). We analyze a two-stage sample splitting meta-algorithm that takes as input two arbitrary estimation algorithms: one for the target parameter and one for the nuisance parameter. We show that if the population risk satisfies a condition called Neyman orthogonality, the impact of the nuisance estimation error on the excess risk bound achieved by the meta-algorithm is of second order. Our theorem is agnostic to the particular algorithms used for the target and nuisance and only makes an assumption on their individual performance. This enables the use of a plethora of existing results from statistical learning and machine learning literature to give new guarantees for learning with a nuisance component. Moreover, by focusing on excess risk rather than parameter estimation, we can give guarantees under weaker assumptions than in previous works and accommodate the case where the target parameter belongs to a complex nonparametric class. We characterize conditions on the metric entropy such that oracle rates---rates of the same order as if we knew the nuisance parameter---are achieved. We also analyze the rates achieved by specific estimation algorithms such as variance-penalized empirical risk minimization, neural network estimation and sparse high-dimensional linear model estimation. We highlight the applicability of our results in four settings of central importance in the literature: 1) heterogeneous treatment effect estimation, 2) offline policy optimization, 3) domain adaptation, and 4) learning with missing data.


2019 ◽  
Author(s):  
Qiqing Tao ◽  
Koichiro Tamura ◽  
Beatriz Mello ◽  
Sudhir Kumar

AbstractConfidence intervals (CIs) depict the statistical uncertainty surrounding evolutionary divergence time estimates. They capture variance contributed by the finite number of sequences and sites used in the alignment, deviations of evolutionary rates from a strict molecular clock in a phylogeny, and uncertainty associated with clock calibrations. Reliable tests of biological hypotheses demand reliable CIs. However, current non-Bayesian methods may produce unreliable CIs because they do not incorporate rate variation among lineages and interactions among clock calibrations properly. Here, we present a new analytical method to calculate CIs of divergence times estimated using the RelTime method, along with an approach to utilize multiple calibration uncertainty densities in these analyses. Empirical data analyses showed that the new methods produce CIs that overlap with Bayesian highest posterior density (HPD) intervals. In the analysis of computer-simulated data, we found that RelTime CIs show excellent average coverage probabilities, i.e., the true time is contained within the CIs with a 95% probability. These developments will encourage broader use of computationally-efficient RelTime approach in molecular dating analyses and biological hypothesis testing.


Metals ◽  
2019 ◽  
Vol 9 (8) ◽  
pp. 882
Author(s):  
Alfredo L. Coello-Velázquez ◽  
Víctor Quijano Arteaga ◽  
Juan M. Menéndez-Aguado ◽  
Francisco M. Pole ◽  
Luis Llorente

Mathematical models of particle size distribution (PSD) are necessary in the modelling and simulation of comminution circuits. In order to evaluate the application of the Swebrec PSD model (SWEF) in the grinding circuit at the Punta Gorda Ni-Co plant, a sampling campaign was carried out with variations in the operating parameters. Subsequently, the fitting of the data to the Gates-Gaudin-Schumann (GGS), Rosin-Rammler (RRS) and SWEF PSD functions was evaluated under statistical criteria. The fitting of the evaluated distribution models showed that these functions are characterized as being sufficiently accurate, as the estimation error does not exceed 3.0% in any of the cases. In the particular case of the Swebrec function, reproducibility for all the products is high. Furthermore, its estimation error does not exceed 2.7% in any of the cases, with a correlation coefficient of the ratio between experimental and simulated data greater than 0.99.


2019 ◽  
Vol 12 (3) ◽  
pp. 107 ◽  
Author(s):  
Golodnikov ◽  
Kuzmenko ◽  
Uryasev

A popular risk measure, conditional value-at-risk (CVaR), is called expected shortfall (ES) in financial applications. The research presented involved developing algorithms for the implementation of linear regression for estimating CVaR as a function of some factors. Such regression is called CVaR (superquantile) regression. The main statement of this paper is: CVaR linear regression can be reduced to minimizing the Rockafellar error function with linear programming. The theoretical basis for the analysis is established with the quadrangle theory of risk functions. We derived relationships between elements of CVaR quadrangle and mixed-quantile quadrangle for discrete distributions with equally probable atoms. The deviation in the CVaR quadrangle is an integral. We present two equivalent variants of discretization of this integral, which resulted in two sets of parameters for the mixed-quantile quadrangle. For the first set of parameters, the minimization of error from the CVaR quadrangle is equivalent to the minimization of the Rockafellar error from the mixed-quantile quadrangle. Alternatively, a two-stage procedure based on the decomposition theorem can be used for CVaR linear regression with both sets of parameters. This procedure is valid because the deviation in the mixed-quantile quadrangle (called mixed CVaR deviation) coincides with the deviation in the CVaR quadrangle for both sets of parameters. We illustrated theoretical results with a case study demonstrating the numerical efficiency of the suggested approach. The case study codes, data, and results are posted on the website. The case study was done with the Portfolio Safeguard (PSG) optimization package, which has precoded risk, deviation, and error functions for the considered quadrangles.


Sign in / Sign up

Export Citation Format

Share Document