scholarly journals Multivariate Spectra Analysis: PLSR vs. PCA + MLR

2020 ◽  
Vol 2 (1) ◽  
pp. 83
Author(s):  
Sander Vervoort ◽  
Marcus Wolff

For mixtures of compounds with very similar spectral features, common for larger organic molecules, multivariate analysis (MVA) methods can be applied to determine the concentration of the individual components. We analyzed photoacoustic spectra of mixtures of different volatile organic compounds with and without different feature selection and feature projection methods. These include: Multiple Linear Regression (MLR), Principal Component Analysis (PCA), Partial Least Squares Regression (PLSR) and Random Forest Algorithm (RFA). Even though PLSR provided the best prediction accuracy, the other techniques also exhibited some advantages.

1995 ◽  
Vol 32 (9-10) ◽  
pp. 341-348
Author(s):  
V. Librando ◽  
G. Magazzù ◽  
A. Puglisi

The monitoring of water quality today provides a great quantity of data consisting of the values of the parameters measured as a function of time. In the marine environment, and especially in the suspended material, increasing importance is being given to the presence of organic micropollutants, particularly since some are known to be carcinogenic. As the number of measured parameters increases examining the data and their consequent interpretation becomes more difficult. To overcome such difficulties, numerous chemometric techniques have been introduced in environmental chemistry, such as Multivariate Data Analysis (MVDA), Principal Component Analysis (PCA) and Partial Least Squares Regression (PLSR). The use of the first technique in this work has been applied to the interpretation of the quality of Augusta bay, by measuring the concentration of numerous organic micropollutants, together with the classical water pollution parameters, in different sites and at different times. The MVDA has highlighted the difference between various sampling sites whose data were initially thought to be similar. Furthermore, it has allowed a choice of more significant parameters for future monitoring and more suitable sampling site locations.


2019 ◽  
Vol 59 (6) ◽  
pp. 1190 ◽  
Author(s):  
A. Bahri ◽  
S. Nawar ◽  
H. Selmi ◽  
M. Amraoui ◽  
H. Rouissi ◽  
...  

Rapid measurement optical techniques have the advantage over traditional methods of being faster and non-destructive. In this work visible and near-infrared spectroscopy (vis-NIRS) was used to investigate differences between measured values of key milk properties (e.g. fat, protein and lactose) in 30 samples of ewes milk according to three feed systems; faba beans, field peas and control diet. A mobile fibre-optic vis-NIR spectrophotometer (350–2500 nm) was used to collect reflectance spectra from milk samples. Principal component analysis was used to explore differences between milk samples according to the feed supplied, and a partial least-squares regression and random forest regression were adopted to develop calibration models for the prediction of milk properties. Results of the principal component analysis showed clear separation between the three groups of milk samples according to the diet of the ewes throughout the lactation period. Milk fat, protein and lactose were predicted with good accuracy by means of partial least-squares regression (R2 = 0.70–0.83 and ratio of prediction deviation, which is the ratio of standard deviation to root mean square error of prediction = 1.85–2.44). However, the best prediction results were obtained with random forest regression models (R2 = 0.86–0.90; ratio of prediction deviation = 2.73–3.26). The adoption of the vis-NIRS coupled with multivariate modelling tools can be recommended for exploring to differences between milk samples according to different feed systems, and to predict key milk properties, based particularly on the random forest regression modelling technique.


2009 ◽  
Vol 92 (5) ◽  
pp. 1526-1530 ◽  
Author(s):  
Paola Zunin ◽  
Riccardo Leardi ◽  
Raffaella Boggia

Abstract Headspace sorptive extraction and GC/MS, coupled with chemometric tools, were used to predict the amounts of pine nuts and Pecorino in Pesto Genovese, a typical Italian basil-based pasta sauce. Two groups of samples were prepared at different times and with ingredients from different batches for building the predicting models and testing their performances. Principal component analysis and partial least-squares regression (PLS) were applied to the chromatographic data. The 24 most-predictive variables were selected, and the application of PLS to the training set samples led to two models that explained approximately 70 of the variance in cross-validation, with prediction errors of 0.1 g for Pecorino and 0.6 g for pine nuts, thus confirming the reliability of the analytical method and the predicting ability of the models. The results obtained for the test set samples were not completely satisfactory, with a prediction error and a bias of 5.0 and 4.1 g, respectively, for Pecorino and corresponding values of 4.1 and 2.0 g for pine nuts. This preliminary study shows that the analytical methods used can allow construction of models with high predictive ability only if the great variability of the headspace composition of the ingredients and the effect of Twister<sup/> are considered.


2014 ◽  
Vol 70 (a1) ◽  
pp. C954-C954 ◽  
Author(s):  
Uwe König ◽  
Thomas Degen ◽  
Detlef Beckers

Usually in XRPD we are paying lots of attention to accurately describe profile shapes. We do that to eventually extract/predict information from the full pattern using physical models and fitting techniques. Sometimes this approach is stretched to its limits. That usually happens, when no realistic physical model is available, or when the model is either too complex or doesn't fit to reality. In such cases there is one very elegant way out: multivariate statistics and Partial Least-Squares Regression. This technique is rather popular in spectroscopy as well as in a number of science fields like biosciences, proteomics and social sciences. PLSR as developed by Herman Wold [1] in 1960 is able to predict any defined property Y directly from the variability in a data matrix X. In the XRPD the rows of the data matrix used for calibration are formed by the individual scans and the columns are formed by all measured data points. PLSR is particularly well-suited when the matrix of predictors has more variables than observations, and when there exists multi-collinearity among X values. In fact with PLSR we have a full pattern approach that totally dismisses profile shapes but still uses the complete information present in our XRPD data sets. We will show a number of cases where PLSR was used to easily and precisely predict properties like crystallinity and more from XRPD data.


2013 ◽  
Vol 44 (2s) ◽  
Author(s):  
Chiara Cevoli ◽  
Angelo Fabbri ◽  
Alessandro Gori ◽  
Maria Fiorenza Caboni ◽  
Adriano Guarnieri

Parmigiano–Reggiano (PR) cheese is one of the oldest traditional cheeses produced in Europe, and it is still one of the most valuable Protected Designation of Origin (PDO) cheeses of Italy. The denomination of origin is extended to the grated cheese when manufactured exclusively from whole Parmigiano-Reggiano cheese wheels that respond to the production standard. The grated cheese must be matured for a period of at least 12 months and characterized by a rind content not over 18%. In this investigation the potential of near infrared spectroscopy (NIR), coupled to different statistical methods, were used to estimate the authenticity of grated Parmigiano Reggiano cheese PDO. Cheese samples were classified as: compliance PR, competitors, non-compliance PR (defected PR), and PR with rind content greater then 18%. NIR spectra were obtained using a spectrophotometer Vector 22/N (Bruker Optics, Milan, Italy) in the diffuse reflectance mode. Instrument was equipped with a rotating integrating sphere. Principal Component Analysis (PCA) was conducted for an explorative spectra analysis, while the Artificial Neural Networks (ANN) were used to classify spectra, according to different cheese categories. Subsequently the rind percentage and month of ripening were estimated by a Partial Least Squares regression (PLS). Score plots of the PCA show a clear separation between compliance PR samples and the rest of the sample was observed. Competitors samples and the defected PR samples were grouped together. The classification performance for all sample classes, obtained by ANN analysis, was higher of 90%, in test set validation. Rind content and month of ripening were predicted by PLS a with a determination coefficient greater then 0.95 (test set). These results showed that the method can be suitable for a fast screening of grated cheese authenticity.


Sign in / Sign up

Export Citation Format

Share Document