Optimal Designs for Comparing Regression Curves: Dependence Within and Between Groups

Kirsten Schorning; Holger Dette

doi:10.1007/s42519-021-00218-8

Optimal Designs for Comparing Regression Curves: Dependence Within and Between Groups

Journal of Statistical Theory and Practice ◽

10.1007/s42519-021-00218-8 ◽

2021 ◽

Vol 15 (4) ◽

Author(s):

Kirsten Schorning ◽

Holger Dette

Keyword(s):

Mean Squared Error ◽

Optimal Solution ◽

Continuous Model ◽

Confidence Bands ◽

Optimal Designs ◽

Linear Estimator ◽

Simultaneous Estimation ◽

Finite Sample ◽

Regression Curves ◽

Uniform Designs

AbstractWe consider the problem of designing experiments for the comparison of two regression curves describing the relation between a predictor and a response in two groups, where the data between and within the group may be dependent. In order to derive efficient designs we use results from stochastic analysis to identify the best linear unbiased estimator (BLUE) in a corresponding continuous model. It is demonstrated that in general simultaneous estimation using the data from both groups yields more precise results than estimation of the parameters separately in the two groups. Using the BLUE from simultaneous estimation, we then construct an efficient linear estimator for finite sample size by minimizing the mean squared error between the optimal solution in the continuous model and its discrete approximation with respect to the weights (of the linear estimator). Finally, the optimal design points are determined by minimizing the maximal width of a simultaneous confidence band for the difference of the two regression functions. The advantages of the new approach are illustrated by means of a simulation study, where it is shown that the use of the optimal designs yields substantially narrower confidence bands than the application of uniform designs.

Download Full-text

Optimal designs for frequentist model averaging

Biometrika ◽

10.1093/biomet/asz036 ◽

2019 ◽

Vol 106 (3) ◽

pp. 665-682

Author(s):

K Alhorn ◽

K Schorning ◽

H Dette

Keyword(s):

Mean Squared Error ◽

Model Averaging ◽

Optimal Solution ◽

Regression Function ◽

Optimal Designs ◽

Squared Error ◽

Bayesian Optimal Design ◽

The Mean ◽

Asymptotic Mean Squared Error ◽

Frequentist Model Averaging

SummaryWe consider the problem of designing experiments for estimating a target parameter in regression analysis when there is uncertainty about the parametric form of the regression function. A new optimality criterion is proposed that chooses the experimental design to minimize the asymptotic mean squared error of the frequentist model averaging estimate. Necessary conditions for the optimal solution of a locally and Bayesian optimal design problem are established. The results are illustrated in several examples, and it is demonstrated that Bayesian optimal designs can yield a reduction of the mean squared error of the model averaging estimator by up to 45%.

Download Full-text

Nonparametric panel data regression with parametric cross-sectional dependence

Econometrics Journal ◽

10.1093/ectj/utab016 ◽

2021 ◽

Author(s):

Alexandra Soberon ◽

Juan M Rodriguez-Poo ◽

Peter M Robinson

Keyword(s):

Panel Data ◽

Monte Carlo Study ◽

Generalized Least Squares ◽

Linear Estimator ◽

Dependence Structure ◽

Finite Sample ◽

Cross Sectional ◽

Panel Data Regression ◽

Local Linear Estimator ◽

Cross Sectional Dependence

Abstract In this paper, we consider efficiency improvement in a nonparametric panel data model with cross-sectional dependence. A Generalized Least Squares (GLS)-type estimator is proposed by taking into account this dependence structure. Parameterizing the cross-sectional dependence, a local linear estimator is shown to be dominated by this type of GLS estimator. Also, possible gains in terms of rate of convergence are studied. Asymptotically optimal bandwidth choice is justified. To assess the finite sample performance of the proposed estimators, a Monte Carlo study is carried out. Further, some empirical applications are conducted with the aim of analyzing the implications of the European Monetary Union for its member countries.

Download Full-text

RobOMP: Robust variants of Orthogonal Matching Pursuit for sparse representations

10.7287/peerj.preprints.27482v1 ◽

2019 ◽

Author(s):

Carlos A Loza

Keyword(s):

Mean Squared Error ◽

Matching Pursuit ◽

Synthetic Data ◽

Optimal Solution ◽

Parameter Tuning ◽

Weight Vector ◽

Orthogonal Matching Pursuit ◽

Main Mode ◽

Observation Matrix ◽

And Performance

Sparse coding aims to find a parsimonious representation of an example given an observation matrix or dictionary. In this regard, Orthogonal Matching Pursuit (OMP) provides an intuitive, simple and fast approximation of the optimal solution. However, its main building block is anchored on the minimization of the Mean Squared Error cost function (MSE). This approach is only optimal if the errors are distributed according to a Gaussian distribution without samples that strongly deviate from the main mode, i.e. outliers. If such assumption is violated, the sparse code will likely be biased and performance will degrade accordingly. In this paper, we introduce five robust variants of OMP (RobOMP) fully based on the theory of M-Estimators under a linear model. The proposed framework exploits efficient Iteratively Reweighted Least Squares (IRLS) techniques to mitigate the effect of outliers and emphasize the samples corresponding to the main mode of the data. This is done adaptively via a learned weight vector that models the distribution of the data in a robust manner. Experiments on synthetic data under several noise distributions and image recognition under different combinations of occlusion and missing pixels thoroughly detail the superiority of RobOMP over MSE-based approaches and similar robust alternatives. We also introduce a denoising framework based on robust, sparse and redundant representations that open the door to potential further applications of the proposed techniques. The five different variants of RobOMP do not require parameter tuning from the user and, hence, constitute principled alternatives to OMP.

Download Full-text

Being Uncertain in Chromatographic Calibration—Some Unobvious Details in Experimental Design

Molecules ◽

10.3390/molecules26227035 ◽

2021 ◽

Vol 26 (22) ◽

pp. 7035

Author(s):

Łukasz Komsta ◽

Katarzyna Wicha-Komsta ◽

Tomasz Kocki

Keyword(s):

Experimental Design ◽

Calibration Curve ◽

Optimality Criteria ◽

Optimal Designs ◽

Curve Estimation ◽

Estimation Uncertainty ◽

Uncertainty Measurement ◽

Calibration Samples ◽

Uniform Designs

This is an introductory tutorial and review about the uncertainty problem in chromatographic calibration. It emphasizes some unobvious, but important details influencing errors in the calibration curve estimation, uncertainty in prediction, as well as the connections and dependences between them, all from various perspectives of uncertainty measurement. Nonuniform D-optimal designs coming from Fedorov theorem are computed and presented. As an example, all possible designs of 24 calibration samples (3–8, 4–6, 6–4, 8–3 and 12–2, both uniform and D-optimal) are compared in context of many optimality criteria. It can be concluded that there are only two independent (orthogonal, but slightly complex) trends in optimality of these designs. The conclusions are important, as the uniform designs with many concentrations are not the best choices, contrary to some intuitive perception. Nonuniform designs are visibly better alternative in most calibration cases.

Download Full-text

Optimal Design of Plastic Structures for Fixed and Shakedown Loadings

Journal of Applied Mechanics ◽

10.1115/1.3423030 ◽

1973 ◽

Vol 40 (2) ◽

pp. 595-599 ◽

Cited By ~ 22

Author(s):

M. Z. Cohn ◽

S. R. Parimi

Keyword(s):

Optimal Design ◽

Loading Condition ◽

Minimum Weight ◽

Optimal Solution ◽

Failure Criteria ◽

Optimal Designs ◽

Design Solution ◽

Variable Loading ◽

Framed Structures ◽

A Value

Optimal (minimum weight) solutions for plastic framed structures under shakedown conditions are found by linear programming. Designs that are optimal for two failure criteria (collapse under fixed loads and collapse under variable repeated loads) are then investigated. It is found that these designs are governed by the ratio of the specified factors defining the two failure criteria, i.e., for shakedown, λs and for collapse under fixed loading, λ. Below a certain value (λs/λ)min the optimal solution under fixed loading is also optimal for fixed and shakedown loading. Above a value (λs/λ)max the optimal design for variable loading is also optimal under the two loading conditions. For intermediate values of λs/λ the optimal design that simultaneously satisfies the two criteria is different from the optimal designs for each independent loading condition. An example illustrates the effect of λs/λ on the nature of the design solution.

Download Full-text

Information Theoretic Learning

Encyclopedia of Artificial Intelligence ◽

10.4018/978-1-59904-849-9.ch133 ◽

2011 ◽

pp. 902-909 ◽

Cited By ~ 2

Author(s):

Deniz Erdogmus ◽

Jose C. Principe

Keyword(s):

Neural Network ◽

Least Squares ◽

Adaptive Systems ◽

Mean Squared Error ◽

Learning Algorithms ◽

Optimal Solution ◽

Principal Component ◽

Second Order ◽

Linear Discriminant ◽

Non Gaussian

Learning systems depend on three interrelated components: topologies, cost/performance functions, and learning algorithms. Topologies provide the constraints for the mapping, and the learning algorithms offer the means to find an optimal solution; but the solution is optimal with respect to what? Optimality is characterized by the criterion and in neural network literature, this is the least addressed component, yet it has a decisive influence in generalization performance. Certainly, the assumptions behind the selection of a criterion should be better understood and investigated. Traditionally, least squares has been the benchmark criterion for regression problems; considering classification as a regression problem towards estimating class posterior probabilities, least squares has been employed to train neural network and other classifier topologies to approximate correct labels. The main motivation to utilize least squares in regression simply comes from the intellectual comfort this criterion provides due to its success in traditional linear least squares regression applications – which can be reduced to solving a system of linear equations. For nonlinear regression, the assumption of Gaussianity for the measurement error combined with the maximum likelihood principle could be emphasized to promote this criterion. In nonparametric regression, least squares principle leads to the conditional expectation solution, which is intuitively appealing. Although these are good reasons to use the mean squared error as the cost, it is inherently linked to the assumptions and habits stated above. Consequently, there is information in the error signal that is not captured during the training of nonlinear adaptive systems under non-Gaussian distribution conditions when one insists on second-order statistical criteria. This argument extends to other linear-second-order techniques such as principal component analysis (PCA), linear discriminant analysis (LDA), and canonical correlation analysis (CCA). Recent work tries to generalize these techniques to nonlinear scenarios by utilizing kernel techniques or other heuristics. This begs the question: what other alternative cost functions could be used to train adaptive systems and how could we establish rigorous techniques for extending useful concepts from linear and second-order statistical techniques to nonlinear and higher-order statistical learning methodologies?

Download Full-text

FINITE-SAMPLE MOMENTS OF THE COEFFICIENT OF VARIATION

Econometric Theory ◽

10.1017/s0266466608090555 ◽

2009 ◽

Vol 25 (1) ◽

pp. 291-297 ◽

Cited By ~ 11

Author(s):

Yong Bao

Keyword(s):

Coefficient Of Variation ◽

Quadratic Forms ◽

Mean Squared Error ◽

General Distribution ◽

Finite Sample ◽

Sample Bias ◽

Squared Error ◽

Finite Sample Bias ◽

Sample Coefficient Of Variation

We study the finite-sample bias and mean squared error, when properly defined, of the sample coefficient of variation under a general distribution. We employ a Nagar-type expansion and use moments of quadratic forms to derive the results. We find that the approximate bias depends on not only the skewness but also the kurtosis of the distribution, whereas the approximate mean squared error depends on the cumulants up to order 6.

Download Full-text

Dynamic Regression and Filtered Data Series: A Laplace Approximation to the Effects of Filtering in Small Samples

Econometric Theory ◽

10.1017/s0266466600006800 ◽

1996 ◽

Vol 12 (3) ◽

pp. 432-457 ◽

Cited By ~ 2

Author(s):

Eric Ghysels ◽

Offer Lieberman

Keyword(s):

Regression Model ◽

Quadratic Forms ◽

Mean Squared Error ◽

Small Sample ◽

Data Series ◽

Finite Sample ◽

Sample Bias ◽

Squared Error ◽

Lagged Dependent Variable ◽

Dynamic Regression Model

It is common for an applied researcher to use filtered data, like seasonally adjusted series, for instance, to estimate the parameters of a dynamic regression model. In this paper, we study the effect of (linear) filters on the distribution of parameters of a dynamic regression model with a lagged dependent variable and a set of exogenous regressors. So far, only asymptotic results are available. Our main interest is to investigate the effect of filtering on the small sample bias and mean squared error. In general, these results entail a numerical integration of derivatives of the joint moment generating function of two quadratic forms in normal variables. The computation of these integrals is quite involved. However, we take advantage of the Laplace approximations to the bias and mean squared error, which substantially reduce the computational burden, as they yield relatively simple analytic expressions. We obtain analytic formulae for approximating the effect of filtering on the finite sample bias and mean squared error. We evaluate the adequacy of the approximations by comparison with Monte Carlo simulations, using the Census X-11 filter as a specific example

Download Full-text

A martingale approach to estimating confidence band with censored data

Monte Carlo Methods and Applications ◽

10.1515/mcma-2014-0003 ◽

2014 ◽

Vol 20 (4) ◽

Author(s):

Seung-Hwan Lee ◽

Eun-Joo Lee

Keyword(s):

Gaussian Process ◽

Survival Function ◽

Confidence Bands ◽

Computer Assisted ◽

Finite Sample ◽

Real World Data ◽

Data Set ◽

Simulation Techniques ◽

Simultaneous Confidence Bands ◽

The Right

AbstractThis paper develops some non-parametric simultaneous confidence bands for survival function when data are randomly censored on the right. To construct the confidence bands, a computer-assisted method is utilized and this approach requires no distributional assumptions, so the confidence bands can be easily estimated. The procedures are based on the integrated martingale whose distribution is approximated by a Gaussian process. The supremum distribution of the Gaussian process generated by simulation techniques leads to the construction of the confidence bands. To improve the estimation procedures for the finite sample sizes, the log-minus-log transformation is employed. The proposed confidence bands are assessed using numerical simulations and applied to a real-world data set regarding leukemia.

Download Full-text

The Orthogonally Partitioned EM Algorithm: Extending the EM Algorithm for Algorithmic Stability and Bias Correction Due to Imperfect Data

The International Journal of Biostatistics ◽

10.1515/ijb-2015-0016 ◽

2016 ◽

Vol 12 (1) ◽

pp. 65-77

Author(s):

Michael D. Regier ◽

Erica E. M. Moodie

Keyword(s):

Missing Data ◽

Measurement Error ◽

Em Algorithm ◽

Optimal Solution ◽

Bias Reduction ◽

Model Parameters ◽

Finite Sample ◽

Imperfect Data ◽

The Em Algorithm ◽

Finite Sample Properties

Abstract We propose an extension of the EM algorithm that exploits the common assumption of unique parameterization, corrects for biases due to missing data and measurement error, converges for the specified model when standard implementation of the EM algorithm has a low probability of convergence, and reduces a potentially complex algorithm into a sequence of smaller, simpler, self-contained EM algorithms. We use the theory surrounding the EM algorithm to derive the theoretical results of our proposal, showing that an optimal solution over the parameter space is obtained. A simulation study is used to explore the finite sample properties of the proposed extension when there is missing data and measurement error. We observe that partitioning the EM algorithm into simpler steps may provide better bias reduction in the estimation of model parameters. The ability to breakdown a complicated problem in to a series of simpler, more accessible problems will permit a broader implementation of the EM algorithm, permit the use of software packages that now implement and/or automate the EM algorithm, and make the EM algorithm more accessible to a wider and more general audience.

Download Full-text