scholarly journals Fitting limit lines (envelope curves) to spreads of geoenvironmental data

Author(s):  
Paul A Carling ◽  
Philip Jonathan ◽  
Teng Su

Geoscientists frequently are interested in defining the overall trend in x- y data clouds using techniques such as least-squares regression. Yet often the sample data exhibits considerable spread of y-values for given x-values, which is itself of interest. In some cases, the data may exhibit a distinct visual upper (or lower) ‘limit’ to a broad spread of y-values for a given x-value, defined by a marked reduction in concentration of y-values. As a function of x-value, the locus of this ‘limit’ defines a ‘limit line’, with no (or few) points lying above (or below) it. Despite numerous examples of such situations in geoscience, there has been little consideration within the general geoenvironmental literature of methods used to define limit lines (sometimes termed ‘envelope curves’ when they enclose all data of interest). In this work, methods to fit limit lines are reviewed. Many commonly applied methods are ad-hoc and statistically not well founded, often because the data sample available is small and noisy. Other methods are considered which correspond to specific statistical models offering more objective and reproducible estimation. The strengths and weaknesses of methods are considered by application to real geoscience data sets. Wider adoption of statistical models would enhance confidence in the utility of fitted limits and promote statistical developments in limit fitting methodologies which are likely to be transformative in the interpretation of limits. Supplements, a spreadsheet and references to software are provided for ready application by geoscientists.

2019 ◽  
Vol 5 (1) ◽  
pp. 10 ◽  
Author(s):  
Ahmed Rady ◽  
Daniel Guyer ◽  
William Kirk ◽  
Irwin R Donis-González

The sprouting of potato tubers during storage is a significant problem that suppresses obtaining high quality seeds or fried products. In this study, the potential of fusing data obtained from visible (VIS)/near-infrared (NIR) spectroscopic and hyperspectral imaging systems was investigated, to improve the prediction of primordial leaf count as a significant sign for tubers sprouting. Electronic and lab measurements were conducted on whole tubers of Frito Lay 1879 (FL1879) and Russet Norkotah (R.Norkotah) potato cultivars. The interval partial least squares (IPLS) technique was adopted to extract the most effective wavelengths for both systems. Linear regression was utilized using partial least squares regression (PLSR), and the best calibration model was chosen using four-fold cross-validation. Then the prediction models were obtained using separate test data sets. Prediction results were enhanced compared with those obtained from individual systems’ models. The values of the correlation coefficient (the ratio between performance to deviation, or r(RPD)) were 0.95(3.01) and 0.9s6(3.55) for FL1879 and R.Norkotah, respectively, which represented a feasible improvement by 6.7%(35.6%) and 24.7%(136.7%) for FL1879 and R.Norkotah, respectively. The proposed study shows the possibility of building a rapid, noninvasive, and accurate system or device that requires minimal or no sample preparation to track the sprouting activity of stored potato tubers.


2008 ◽  
Vol 8 (2) ◽  
pp. 6409-6436 ◽  
Author(s):  
C. A. Cantrell

Abstract. The representation of data, whether geophysical observations, numerical model output or laboratory results, by a best fit straight line is a routine practice in the geosciences and other fields. While the literature is full of detailed analyses of procedures for fitting straight lines to values with uncertainties, a surprising number of scientists blindly use the standard least squares method, such as found on calculators and in spreadsheet programs, that assumes no uncertainties in the x values. Here, the available procedures for estimating the best fit straight line to data, including those applicable to situations for uncertainties present in both the x and y variables, are reviewed. Representative methods that are presented in the literature for bivariate weighted fits are compared using several sample data sets, and guidance is presented as to when the somewhat more involved iterative methods are required, or when the standard least-squares procedure would be expected to be satisfactory. A spreadsheet-based template is made available that employs one method for bivariate fitting.


1996 ◽  
Vol 26 (4) ◽  
pp. 590-600 ◽  
Author(s):  
Katherine L. Bolster ◽  
Mary E. Martin ◽  
John D. Aber

Further evaluation of near infrared reflectance spectroscopy as a method for the determination of nitrogen, lignin, and cellulose concentrations in dry, ground, temperate forest woody foliage is presented. A comparison is made between two regression methods, stepwise multiple linear regression and partial least squares regression. The partial least squares method showed consistently lower standard error of calibration and higher R2 values with first and second difference equations. The first difference partial least squares regression equation resulted in standard errors of calibration of 0.106%, with an R2 of 0.97 for nitrogen, 1.613% with an R2 of 0.88 for lignin, and 2.103% with an R2 of 0.89 for cellulose. The four most highly correlated wavelengths in the near infrared region, and the chemical bonds represented, are shown for each constituent and both regression methods. Generalizability of both methods for prediction of protein, lignin, and cellulose concentrations on independent data sets is discussed. Prediction accuracy for independent data sets and species from other sites was increased using partial least squares regression, but was poor for sample sets containing tissue types or laboratory-measured concentration ranges beyond those of the calibration set.


2008 ◽  
Vol 8 (17) ◽  
pp. 5477-5487 ◽  
Author(s):  
C. A. Cantrell

Abstract. The representation of data, whether geophysical observations, numerical model output or laboratory results, by a best fit straight line is a routine practice in the geosciences and other fields. While the literature is full of detailed analyses of procedures for fitting straight lines to values with uncertainties, a surprising number of scientists blindly use the standard least-squares method, such as found on calculators and in spreadsheet programs, that assumes no uncertainties in the x values. Here, the available procedures for estimating the best fit straight line to data, including those applicable to situations for uncertainties present in both the x and y variables, are reviewed. Representative methods that are presented in the literature for bivariate weighted fits are compared using several sample data sets, and guidance is presented as to when the somewhat more involved iterative methods are required, or when the standard least-squares procedure would be expected to be satisfactory. A spreadsheet-based template is made available that employs one method for bivariate fitting.


2019 ◽  
Author(s):  
Derek Beaton ◽  
Gilbert Saporta ◽  
Hervé Abdi ◽  

AbstractCurrent large scale studies of brain and behavior typically involve multiple populations, diverse types of data (e.g., genetics, brain structure, behavior, demographics, or “mutli-omics,” and “deep-phenotyping”) measured on various scales of measurement. To analyze these heterogeneous data sets we need simple but flexible methods able to integrate the inherent properties of these complex data sets. Here we introduce partial least squares-correspondence analysis-regression (PLS-CA-R) a method designed to address these constraints. PLS-CA-R generalizes PLS regression to most data types (e.g., continuous, ordinal, categorical, non-negative values). We also show that PLS-CA-R generalizes many “two-table” multivariate techniques and their respective algorithms, such as various PLS approaches, canonical correlation analysis, and redundancy analysis (a.k.a. reduced rank regression).


Author(s):  
Jing Wang ◽  
Jinglin Zhou ◽  
Xiaolu Chen

AbstractThis chapter proposes another nonlinear PLS method, named as locality-preserving partial least squares (LPPLS), which embeds the nonlinear degenerative and structure-preserving properties of LPP into the PLS model. The core of LPPLS is to replace the role of PCA in PLS with LPP. When extracting the principal components of $$\boldsymbol{t}_i$$ t i and $$\boldsymbol{u}_i$$ u i , two conditions must satisfy: (1) $$\boldsymbol{t}_i$$ t i and $$\boldsymbol{u}_i$$ u i retain the most information about the local nonlinear structure of their respective data sets. (2) The correlation between $$\boldsymbol{t}_i$$ t i and $$\boldsymbol{u}_i$$ u i is the largest. Finally, a quality-related monitoring strategy is established based on LPPLS.


2002 ◽  
Vol 56 (7) ◽  
pp. 887-896 ◽  
Author(s):  
Henrik Öjelund ◽  
Henrik Madsen ◽  
Poul Thyregod

In this article a new calibration method called empirically weighted mean subset (EMS) is presented. The method is illustrated using spectral data. Using several near-infrared (NIR) benchmark data sets, EMS is compared to partial least-squares regression (PLS) and interval partial least-squares regression (iPLS). It is found that EMS improves on the prediction performance over PLS in terms of the mean squared errors and is more robust than iPLS. Furthermore, by investigating the estimated coefficient vector of EMS, knowledge about the important spectral regions can be gained. The EMS solution is obtained by calculating the weighted mean of all coefficient vectors for subsets of the same size. The weighting is proportional to SS−ωγ, where SSγ is the residual sum of squares from a linear regression with subset γ and ω is a weighting parameter estimated using cross-validation. This construction of the weighting implies that even if some coefficients will become numerically small, none will become exactly zero. An efficient algorithm has been implemented in MATLAB to calculate the EMS solution and the source code has been made available on the Internet.


2006 ◽  
Vol 82 (4) ◽  
pp. 463-468 ◽  
Author(s):  
N.P.P. Macciotta ◽  
C. Dimauro ◽  
N. Bacciu ◽  
P. Fresi ◽  
A. Cappio-Borlino

AbstractA model able to predict missing test day data for milk, fat and protein yields on the basis of few recorded tests was proposed, based on the partial least squares (PLS) regression technique, a multivariate method that is able to solve problems related to high collinearity among predictors. A data set of 1731 lactations of Sarda breed dairy Goats was split into two data sets, one for model estimation and the other for the evaluation of PLS prediction capability. Eight scenarios of simplified recording schemes for fat and protein yields were simulated. Correlations among predicted and observed test day yields were quite high (from 0·50 to 0·88 and from 0·53 to 0·96 for fat and protein yields, respectively, in the different scenarios). Results highlight great flexibility and accuracy of this multivariate technique.


2017 ◽  
Vol 2017 ◽  
pp. 1-13 ◽  
Author(s):  
Aleksander Jaworski ◽  
Hanna Wikiel ◽  
Kazimierz Wikiel

The Real Time Analyzer (RTA) utilizing DC- and AC-voltammetric techniques is an in situ, online monitoring system that provides a complete chemical analysis of different electrochemical deposition solutions. The RTA employs multivariate calibration when predicting concentration parameters from a multivariate data set. Although the hierarchical and multiblock Principal Component Regression- (PCR-) and Partial Least Squares- (PLS-) based methods can handle data sets even when the number of variables significantly exceeds the number of samples, it can be advantageous to reduce the number of variables to obtain improvement of the model predictions and better interpretation. This presentation focuses on the introduction of a multistep, rigorous method of data-selection-based Least Squares Regression, Simple Modeling of Class Analogy modeling power, and, as a novel application in electroanalysis, Uninformative Variable Elimination by PLS and by PCR, Variable Importance in the Projection coupled with PLS, Interval PLS, Interval PCR, and Moving Window PLS. Selection criteria of the optimum decomposition technique for the specific data are also demonstrated. The chief goal of this paper is to introduce to the community of electroanalytical chemists numerous variable selection methods which are well established in spectroscopy and can be successfully applied to voltammetric data analysis.


1984 ◽  
Vol 14 (3) ◽  
pp. 376-384 ◽  
Author(s):  
T. Cunia ◽  
R. D. Briggs

Three methods for insuring the additivity of biomass tables are described and their application to a sample of trees illustrated. The advantages and drawbacks of each method are identified and the results obtained compared with each other. All three methods yield biomass tables that are not significantly different from each other (at least for the sample data used), but their estimated precision and conditions of applicability are different. At least one of the methods can be applied to the cases where the weights to use in the least squares regression procedure may differ from one to the next tree component, when sample data of some component are missing from some sample trees and when nonlinear seem better than linear regression functions.


Sign in / Sign up

Export Citation Format

Share Document