scholarly journals Analysis of hidden information - PLSR on XRD raw data

2014 ◽  
Vol 70 (a1) ◽  
pp. C954-C954 ◽  
Author(s):  
Uwe König ◽  
Thomas Degen ◽  
Detlef Beckers

Usually in XRPD we are paying lots of attention to accurately describe profile shapes. We do that to eventually extract/predict information from the full pattern using physical models and fitting techniques. Sometimes this approach is stretched to its limits. That usually happens, when no realistic physical model is available, or when the model is either too complex or doesn't fit to reality. In such cases there is one very elegant way out: multivariate statistics and Partial Least-Squares Regression. This technique is rather popular in spectroscopy as well as in a number of science fields like biosciences, proteomics and social sciences. PLSR as developed by Herman Wold [1] in 1960 is able to predict any defined property Y directly from the variability in a data matrix X. In the XRPD the rows of the data matrix used for calibration are formed by the individual scans and the columns are formed by all measured data points. PLSR is particularly well-suited when the matrix of predictors has more variables than observations, and when there exists multi-collinearity among X values. In fact with PLSR we have a full pattern approach that totally dismisses profile shapes but still uses the complete information present in our XRPD data sets. We will show a number of cases where PLSR was used to easily and precisely predict properties like crystallinity and more from XRPD data.

1996 ◽  
Vol 26 (4) ◽  
pp. 590-600 ◽  
Author(s):  
Katherine L. Bolster ◽  
Mary E. Martin ◽  
John D. Aber

Further evaluation of near infrared reflectance spectroscopy as a method for the determination of nitrogen, lignin, and cellulose concentrations in dry, ground, temperate forest woody foliage is presented. A comparison is made between two regression methods, stepwise multiple linear regression and partial least squares regression. The partial least squares method showed consistently lower standard error of calibration and higher R2 values with first and second difference equations. The first difference partial least squares regression equation resulted in standard errors of calibration of 0.106%, with an R2 of 0.97 for nitrogen, 1.613% with an R2 of 0.88 for lignin, and 2.103% with an R2 of 0.89 for cellulose. The four most highly correlated wavelengths in the near infrared region, and the chemical bonds represented, are shown for each constituent and both regression methods. Generalizability of both methods for prediction of protein, lignin, and cellulose concentrations on independent data sets is discussed. Prediction accuracy for independent data sets and species from other sites was increased using partial least squares regression, but was poor for sample sets containing tissue types or laboratory-measured concentration ranges beyond those of the calibration set.


2021 ◽  
Vol 6 (1) ◽  
pp. 114-118
Author(s):  
Stanley I. Okafor ◽  
Azubuike H. Amadi ◽  
Mobolaji A. Abegunde

This project uses production data to generate well-specific correlations for GLR, BSW and sand concentration which are used for predictions. A software has been developed to effect a smart control algorithm. This results in a bean up or bean down operation depending on the current flowing conditions and constraints. Excel programming environment was used to write a code that constantly takes in measured data points, models the behavior of the individual data sets with bean size and controls the choke if the parameters of interest go above a predetermined cut-off. The software was also equipped with an inverse matrix solving algorithm that enables it to determine the choke performance constants for any set of initialization data. A set of data from field X were supplied and the choke performance constants; A, B, C, D and E, were found to be 10, 0.546, 0.0, 1.89 and 1.0 respectively. In addition to that, data from subsequent production operations were entered and the software was able to control the choke size to ensure that production stays below set constraints of 500, 80 and 10 in field units for GLR, BSW and sand concentration respectively. From this, it can be concluded that the software can effectively maintain the production of unwanted well effluents below their cut-offs, thereby improving oil production and the overall Net Profit Value (NPV) of a project.


2002 ◽  
Vol 56 (7) ◽  
pp. 887-896 ◽  
Author(s):  
Henrik Öjelund ◽  
Henrik Madsen ◽  
Poul Thyregod

In this article a new calibration method called empirically weighted mean subset (EMS) is presented. The method is illustrated using spectral data. Using several near-infrared (NIR) benchmark data sets, EMS is compared to partial least-squares regression (PLS) and interval partial least-squares regression (iPLS). It is found that EMS improves on the prediction performance over PLS in terms of the mean squared errors and is more robust than iPLS. Furthermore, by investigating the estimated coefficient vector of EMS, knowledge about the important spectral regions can be gained. The EMS solution is obtained by calculating the weighted mean of all coefficient vectors for subsets of the same size. The weighting is proportional to SS−ωγ, where SSγ is the residual sum of squares from a linear regression with subset γ and ω is a weighting parameter estimated using cross-validation. This construction of the weighting implies that even if some coefficients will become numerically small, none will become exactly zero. An efficient algorithm has been implemented in MATLAB to calculate the EMS solution and the source code has been made available on the Internet.


2013 ◽  
Vol 448-453 ◽  
pp. 2380-2383
Author(s):  
Xiao Jun Zhu ◽  
Xiang Li ◽  
Bin Fu ◽  
Ang Fu ◽  
Min You Chen ◽  
...  

This paper presents a novel method for determining the harmonic emission responsibilities of utility and customer at the point of common coupling (PCC). The proposed approach is based on robust partial least squares regression (robust PLS), which estimates system harmonic impedance by utilizing the signals of harmonic voltage and current measured synchronously at PCC. Consequently the harmonic emission responsibilities are calculated. The presented method reduces or removes the effect of outlying data points. The simulation and the practical engineering results indicate that the proposed method is valid and feasible.


2005 ◽  
Vol 83 (11) ◽  
pp. 1422-1433 ◽  
Author(s):  
D.P. Overy ◽  
J.G. Valdez ◽  
J.C. Frisvad

Fifteen strains representing each Penicillium ser. Corymbifera taxa were compared using phenotypic and chemotaxonomic characters by cluster analysis and discriminant partial least squares regression. Variability in phenotypic expression of species strains resulted in a more fragmented classification compared with secondary metabolite expression. Although the observed phenotypic expression varied for strains cultured upon the same media, it was possible to classify strains into species groupings based only upon a few distinctive phenotypic traits. Data analysis of secondary metabolite profiles generated from HPLC-diode array dectection analysis gave reliable strain classification when more than one media type was employed. Depending on the species, Czapek yeast autolysate agar typically yielded the greatest chemical diversity; however, several metabolites (terrestric acid, corymbiferone, the corymbiferan lactones, and daldinin D) were only produced when strains were grown on either yeast extract sucrose or oatmeal agar. For the classification of strains based on a binary data matrix, application of the Yule coefficient gave the best clustering. Several secondary metabolites, of importance for the classification of ser. Corymbifera strains, were identified by discriminant-partial least squares regression analysis. A diagnostic key based on phenotypic, chemotaxonomic, and pathogenic traits is provided as an aid for species identification.


2006 ◽  
Vol 82 (4) ◽  
pp. 463-468 ◽  
Author(s):  
N.P.P. Macciotta ◽  
C. Dimauro ◽  
N. Bacciu ◽  
P. Fresi ◽  
A. Cappio-Borlino

AbstractA model able to predict missing test day data for milk, fat and protein yields on the basis of few recorded tests was proposed, based on the partial least squares (PLS) regression technique, a multivariate method that is able to solve problems related to high collinearity among predictors. A data set of 1731 lactations of Sarda breed dairy Goats was split into two data sets, one for model estimation and the other for the evaluation of PLS prediction capability. Eight scenarios of simplified recording schemes for fat and protein yields were simulated. Correlations among predicted and observed test day yields were quite high (from 0·50 to 0·88 and from 0·53 to 0·96 for fat and protein yields, respectively, in the different scenarios). Results highlight great flexibility and accuracy of this multivariate technique.


2013 ◽  
Vol 756-759 ◽  
pp. 3324-3329
Author(s):  
Ji Fu Nong

We review a recent neural implementation of Canonical Correlation Analysis and show, using ideas suggested by Ridge Regression, how to make the algorithm robust. The network is shown to operate on data sets which exhibit multicollinearity. We develop a second model which not only performs as well on multicollinear data but also on general data sets. This model allows us to vary a single parameter so that the network is capable of performing Partial Least Squares regression to Canonical Correlation Analysis and every intermediate operation between the two. On multicollinear data, the parameter setting is shown to be important but on more general data no particular parameter setting is required. Finally, we develop a second penalty term which acts on such data as a smoother in that the resulting weight vectors are much smoother and more interpretable than the weights without the robustification term. We illustrate our algorithms on both artificial and real data.


2020 ◽  
Vol 2 (1) ◽  
pp. 83
Author(s):  
Sander Vervoort ◽  
Marcus Wolff

For mixtures of compounds with very similar spectral features, common for larger organic molecules, multivariate analysis (MVA) methods can be applied to determine the concentration of the individual components. We analyzed photoacoustic spectra of mixtures of different volatile organic compounds with and without different feature selection and feature projection methods. These include: Multiple Linear Regression (MLR), Principal Component Analysis (PCA), Partial Least Squares Regression (PLSR) and Random Forest Algorithm (RFA). Even though PLSR provided the best prediction accuracy, the other techniques also exhibited some advantages.


2012 ◽  
Vol 61 (2) ◽  
pp. 277-290 ◽  
Author(s):  
Ádám Csorba ◽  
Vince Láng ◽  
László Fenyvesi ◽  
Erika Michéli

Napjainkban egyre nagyobb igény mutatkozik olyan technológiák és módszerek kidolgozására és alkalmazására, melyek lehetővé teszik a gyors, költséghatékony és környezetbarát talajadat-felvételezést és kiértékelést. Ezeknek az igényeknek felel meg a reflektancia spektroszkópia, mely az elektromágneses spektrum látható (VIS) és közeli infravörös (NIR) tartományában (350–2500 nm) végzett reflektancia-mérésekre épül. Figyelembe véve, hogy a talajokról felvett reflektancia spektrum információban nagyon gazdag, és a vizsgált tartományban számos talajalkotó rendelkezik karakterisztikus spektrális „ujjlenyomattal”, egyetlen görbéből lehetővé válik nagyszámú, kulcsfontosságú talajparaméter egyidejű meghatározása. Dolgozatunkban, a reflektancia spektroszkópia alapjaira helyezett, a talajok ösz-szetételének meghatározását célzó módszertani fejlesztés első lépéseit mutatjuk be. Munkánk során talajok szervesszén- és CaCO3-tartalmának megbecslését lehetővé tévő többváltozós matematikai-statisztikai módszerekre (részleges legkisebb négyzetek módszere, partial least squares regression – PLSR) épülő prediktív modellek létrehozását és tesztelését végeztük el. A létrehozott modellek tesztelése során megállapítottuk, hogy az eljárás mindkét talajparaméter esetében magas R2értéket [R2(szerves szén) = 0,815; R2(CaCO3) = 0,907] adott. A becslés pontosságát jelző közepes négyzetes eltérés (root mean squared error – RMSE) érték mindkét paraméter esetében közepesnek mondható [RMSE (szerves szén) = 0,467; RMSE (CaCO3) = 3,508], mely a reflektancia mérési előírások standardizálásával jelentősen javítható. Vizsgálataink alapján arra a következtetésre jutottunk, hogy a reflektancia spektroszkópia és a többváltozós kemometriai eljárások együttes alkalmazásával, gyors és költséghatékony adatfelvételezési és -értékelési módszerhez juthatunk.


Sign in / Sign up

Export Citation Format

Share Document