scholarly journals Understanding double descent through the lens of principal component regression

2021 ◽  
Author(s):  
Christine H. Lind ◽  
Angela J. Yu

AbstractSeveral recent papers have studied the double descent phenomenon: a classic U-shaped empirical risk curve when the number of parameters is smaller or equal to the number of data points, followed by a decrease in empirical risk (referred to as “second descent”) as the number of features is increased past the interpolation threshold (the minimum number of parameters needed to have 0 training error). In a similar vein as several recent papers on double descent, we concentrate here on the special case of over-parameterized linear regression, one of the simplest model classes that exhibit double descent, with the aim of better understanding the nature of the solution in the second descent and how it relates to solutions in the first descent. In this paper, we show that the final second-descent model (obtained using all features) is equivalent to the model estimated using principal component (PC) regression when all PCs of training data are included. It follows that many properties of double descent can be understood through the relatively simple and well-characterized lens of PC regression. In particular, we will identify a set of conditions that will guarantee final second-descent performance to be better than the best first-descent performance: it is the scenario in which PC regression using all features does not suffer from over-fitting and can be guaranteed to outperform any other first-descent model (any linear regression model using no more features than training data points). We will also discuss how this work relates to transfer learning, semi-supervised learning, few-shot learning, as well as theoretical concepts in neuroscience.

Author(s):  
Hervé Cardot ◽  
Pascal Sarda

This article presents a selected bibliography on functional linear regression (FLR) and highlights the key contributions from both applied and theoretical points of view. It first defines FLR in the case of a scalar response and shows how its modelization can also be extended to the case of a functional response. It then considers two kinds of estimation procedures for this slope parameter: projection-based estimators in which regularization is performed through dimension reduction, such as functional principal component regression, and penalized least squares estimators that take into account a penalized least squares minimization problem. The article proceeds by discussing the main asymptotic properties separating results on mean square prediction error and results on L2 estimation error. It also describes some related models, including generalized functional linear models and FLR on quantiles, and concludes with a complementary bibliography and some open problems.


2013 ◽  
Vol 141 (7) ◽  
pp. 2519-2525 ◽  
Author(s):  
Michael K. Tippett ◽  
Timothy DelSole

Abstract The constructed analog procedure produces a statistical forecast that is a linear combination of past predictand values. The weights used to form the linear combination depend on the current predictor value and are chosen so that the linear combination of past predictor values approximates the current predictor value. The properties of the constructed analog method have previously been described as being distinct from those of linear regression. However, here the authors show that standard implementations of the constructed analog method give forecasts that are identical to linear regression forecasts. A consequence of this equivalence is that constructed analog forecasts based on many predictors tend to suffer from overfitting just as in linear regression. Differences between linear regression and constructed analog forecasts only result from implementation choices, especially ones related to the preparation and truncation of data. Two particular constructed analog implementations are shown to correspond to principal component regression and ridge regression. The equality of linear regression and constructed analog forecasts is illustrated in a Niño-3.4 prediction example, which also shows that increasing the number of predictors results in low-skill, high-variance forecasts, even at long leads, behavior typical of overfitting. Alternative definitions of the analog weights lead naturally to nonlinear extensions of linear regression such as local linear regression.


2013 ◽  
Vol 37 (3) ◽  
Author(s):  
Rainer Haeckel ◽  
Werner Wosniok ◽  
Rainer Klauke

AbstractA well-accepted tool for method validation is a method comparison study. Results are usually assessed on a scatter plot of which the fitting line is calculated by several approaches, for example, ordinary (vertical) linear regression (OLR), orthogonal regression (OR), Deming regression (DR), Passing-Bablok method (PBR) or standardized principal component regression (SPCR). DR was applied in its general form (gDR), requiring information of the imprecision of at least two different quantities and as simple DR (sDR) with imprecision information of only one quantity. The equation of the regression line calculated by these concepts varies depending on range of measurement, analytical variation and on imprecision ratio (


Author(s):  
Nur Nazmi Liyana Mohd Napi ◽  
Mohammad Syazwan Noor Mohamed ◽  
Samsuri Abdullah ◽  
Amalina Abu Mansor ◽  
Ali Najah Ahmed ◽  
...  

INDIAN DRUGS ◽  
2019 ◽  
Vol 56 (03) ◽  
pp. 32-38
Author(s):  
S. S Sonawane ◽  
S. S More ◽  
S. S. Chhajed ◽  
S. J. Kshirsagar ◽  

Two simple, accurate, precise and economical UV spectrophotometric methods, Multiple Linear Regression (MLR) and Principal Component Regression (PCR), were developed for the simultaneous estimation of dapaglifozin (DAPA) and saxagliptin (SAXA) in tablets. Beer’s law was obeyed in the concentration ranges of 10 – 50 μg/mL for DAPA and 5 – 25 μg/mL for SAXA. Synthetic mixtures containing two drugs were prepared to build the training set and validation set in the calibration range using D-optimal mixture design in phosphate buffer pH 6.8 and were recorded at six wavelengths in the range of 230 – 215 nm at intervals of Δλ = 3 nm. Both methods were validated as per ICH guidelines with respect to the accuracy and precision and found suitable for routine analysis of tablets containing DAPA and SAXA without separation.


Proceedings ◽  
2018 ◽  
Vol 2 (13) ◽  
pp. 1010
Author(s):  
Mahbubur Rahman Mishal ◽  
Tanvir Tazul Islam ◽  
Shahadat Hossain Antor ◽  
Tanzilur Rahman

This study proposes a new preprocessing technique that combines Chebyshev filtering with baseline correction technique Asymmetric Least Squares (ALS) and Savitzky-Golay transformation (SGT) to improve the prediction of Glucose from near Infrared (NIR) spectra through linear regression models Partial Least Squares (PLS) and Principal Component Regression (PCR). To investigate the performance of the proposed technique, a calibration model was first developed and then validated through prediction of Glucose from NIR spectra of a mixture of glucose, urea, and triacetin in a phosphate buffer solution where the component concentrations are within their physiological range in blood. Results indicate that the proposed technique improves the performance of both PLS and PCR and achieves standard error of prediction (SEP) as low as 12.76 mg/dL which is in the clinically acceptable level and comparable to the existing literature.


Author(s):  
Mengchen Zhao ◽  
Bo An ◽  
Wei Gao ◽  
Teng Zhang

Label contamination attack (LCA) is an important type of data poisoning attack where an attacker manipulates the labels of training data to make the learned model beneficial to him. Existing work on LCA assumes that the attacker has full knowledge of the victim learning model, whereas the victim model is usually a black-box to the attacker. In this paper, we develop a Projected Gradient Ascent (PGA) algorithm to compute LCAs on a family of empirical risk minimizations and show that an attack on one victim model can also be effective on other victim models. This makes it possible that the attacker designs an attack against a substitute model and transfers it to a black-box victim model. Based on the observation of the transferability, we develop a defense algorithm to identify the data points that are most likely to be attacked. Empirical studies show that PGA significantly outperforms existing baselines and linear learning models are better substitute models than nonlinear ones.


BioResources ◽  
2011 ◽  
Vol 6 (1) ◽  
pp. 807-822 ◽  
Author(s):  
Brian K. Via ◽  
Oladiran Fasin ◽  
Hui Pan

The assessment of wood biomass density through multivariate modeling of mid-infrared spectra can be useful for interpreting the relationship between feedstock density and functional groups. This study looked at predicting feedstock density from mid-infrared spectra and interpreting the multivariate models. The wood samples possessed a random cell wall orientation, which would be typical of wood chips in a feedstock process. Principal component regression and multiple linear regression models were compared both before and after conversion of the raw spectra into the 1st derivative. A principal component regression model from 1st derivative spectra exhibited the best calibration statistics, while a multiple linear regression model from the 1st derivative spectra yielded nearly similar performance. Earlywood and latewood based spectra exhibited significant differences in carbohydrate-associated bands (1000 and 1060 cm-1). Only statistically significant principal component terms (alpha less than 0.05) were chosen for regression; likewise, band assignments only originated from statistically significant principal components. Cellulose, lignin, and hemicelllose associated bands were found to be important in the prediction of wood density.


Sign in / Sign up

Export Citation Format

Share Document