Local prediction models by principal component regression

1997 ◽  
Vol 348 (1-3) ◽  
pp. 29-38 ◽  
Author(s):  
Yu-Long Xie ◽  
John H. Kalivas
2022 ◽  
Vol 951 (1) ◽  
pp. 012112
Author(s):  
A A Munawar ◽  
Z Zulfahrizal ◽  
R Hayati ◽  
Syahrul

Abstract Cocoa is one of main agricultural products cultivated in many tropical countries and processed onto several derivative products. To determine cocoa beans qualities, laboratory procedures based on solvent extractions were mainly used, however most of them are destructive and may cause environmental pollutions. The main purpose of this present study is to employ near infrared spectroscopy (NIRS) for rapid and non-destructive assessment of cocoa beans in form of fat content. Near infrared spectral data of cocoa bean samples were measured as diffuse reflectance in wavelength range from 1000 to 2500 nm. Reference fat contents were measured using standard laboratory methods. Prediction models were developed using principal component regression with raw and baseline corrected spectra data. The results showed that fat contents of cocoa beans can be predicted and determined with maximum correlation coefficient (r) of 0.89 and ratio prediction to deviation (RPD) index of 2.87 for raw spectra and r of 0.91, RPD of 3.18 for baseline spectra correction. It may conclude that NIRS was feasible to be applied as a rapid and non-destructive method for cocoa bean quality assessment.


Plants ◽  
2020 ◽  
Vol 9 (12) ◽  
pp. 1775
Author(s):  
Amol N. Nankar ◽  
M Paul Scott ◽  
Richard C. Pratt

One aim of this experiment was to develop NIR calibrations for 20-grain components in 143 pigmented maize samples evaluated in four locations across New Mexico during 2013 and 2014. Based on reference analysis, prediction models were developed using principal component regression (PCR) and partial least squares (PLS). The predictive ability of calibrations was generally low, with the calibrations for methionine and glycine performing best by PCR and PLS. The second aim was to explore the relationships among grain constituents. In PCA, the first three PCs explained 49.62, 22.20, and 6.92% of the total variance and tend to align with nitrogen-containing compounds (amino acids), carbon-rich compounds (starch, anthocyanin, fiber, and fat), and sulfur-containing compounds (cysteine and methionine), respectively. Correlations among traits were identified, and these relationships were illustrated by a correlation network. Some relationships among components were driven by common synthetic origins, for example, among amino acids derived from pyruvate. Similarly, anthocyanins, crude fat, and fatty acids all share malonyl CoA in their biosynthetic pathways and were correlated. In contrast, crude fiber and starch have similar biosynthetic origins but were negatively correlated, and this may have been due to their different functional roles in structure and energy storage, respectively.


2008 ◽  
Vol 21 (17) ◽  
pp. 4384-4398 ◽  
Author(s):  
Michael K. Tippett ◽  
Timothy DelSole ◽  
Simon J. Mason ◽  
Anthony G. Barnston

Abstract There are a variety of multivariate statistical methods for analyzing the relations between two datasets. Two commonly used methods are canonical correlation analysis (CCA) and maximum covariance analysis (MCA), which find the projections of the data onto coupled patterns with maximum correlation and covariance, respectively. These projections are often used in linear prediction models. Redundancy analysis and principal predictor analysis construct projections that maximize the explained variance and the sum of squared correlations of regression models. This paper shows that the above pattern methods are equivalent to different diagonalizations of the regression between the two datasets. The different diagonalizations are computed using the singular value decomposition of the regression matrix developed using data that are suitably transformed for each method. This common framework for the pattern methods permits easy comparison of their properties. Principal component regression is shown to be a special case of CCA-based regression. A commonly used linear prediction model constructed from MCA patterns does not give a least squares estimate since correlations among MCA predictors are neglected. A variation, denoted least squares estimate (LSE)-MCA, is suggested that uses the same patterns but minimizes squared error. Since the different pattern methods correspond to diagonalizations of the same regression matrix, they all produce the same regression model when a complete set of patterns is used. Different prediction models are obtained when an incomplete set of patterns is used, with each method optimizing different properties of the regression. Some key points are illustrated in two idealized examples, and the methods are applied to statistical downscaling of rainfall over the northeast of Brazil.


2016 ◽  
Vol 24 (6) ◽  
pp. 595-604 ◽  
Author(s):  
Knut Arne Smeland ◽  
Kristian Hovde Liland ◽  
Jakub Sandak ◽  
Anna Sandak ◽  
Lone Ross Gobakken ◽  
...  

Untreated wooden surfaces degrade when exposed to natural weathering. In this study thin wood samples were studied for weather degradation effects utilising a hyperspectral camera in the near infrared wavelength range in transmission mode. Several sets of samples were exposed outdoors for time intervals from 0 days to 21 days, and one set of samples was exposed to ultraviolet (UV) radiation in a laboratory chamber. Spectra of earlywood and latewood were extracted from the hyperspectral image cubes using a principal component analysis-based masking algorithm. The degradation was modelled as a function of UV solar radiation with four regression techniques, partial least squares, principal component regression, Ridge regression and Tikhonov regression. It was found that all the techniques yielded robust prediction models on this dataset. The result from the study is a first step towards a weather dose model determined by temperature and moisture content on the wooden surface in addition to the solar radiation.


Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3736
Author(s):  
Ernesto González ◽  
Juan Casanova-Chafer ◽  
Aanchal Alagh ◽  
Alfonso Romero ◽  
Xavier Vilanova ◽  
...  

This paper presents a methodology to quantify oxidizing and reducing gases using n-type and p-type chemiresistive sensors, respectively. Low temperature sensor heating with pulsed UV or visible light modulation is used together with the application of the fast Fourier transform (FFT) to extract sensor response features. These features are further processed via principal component analysis (PCA) and principal component regression (PCR) for achieving gas discrimination and building concentration prediction models with R2 values up to 98% and RMSE values as low as 5% for the total gas concentration range studied. UV and visible light were used to study the influence of the light wavelength in the prediction model performance. We demonstrate that n-type and p-type sensors need to be used together for achieving good quantification of oxidizing and reducing species, respectively, since the semiconductor type defines the prediction model's effectiveness towards an oxidizing or reducing gas. The presented method reduces considerably the total time needed to quantify the gas concentration compared with the results obtained in a previous work. The use of visible light LEDs for performing pulsed light modulation enhances system performance and considerably reduces cost in comparison to previously reported UV light-based approaches.


2020 ◽  
Vol 10 (3) ◽  
pp. 902 ◽  
Author(s):  
Baijun Xie ◽  
Jonathan C. Kim ◽  
Chung Hyuk Park

This paper presents a method for extracting novel spectral features based on a sinusoidal model. The method is focused on characterizing the spectral shapes of audio signals using spectral peaks in frequency sub-bands. The extracted features are evaluated for predicting the levels of emotional dimensions, namely arousal and valence. Principal component regression, partial least squares regression, and deep convolutional neural network (CNN) models are used as prediction models for the levels of the emotional dimensions. The experimental results indicate that the proposed features include additional spectral information that common baseline features may not include. Since the quality of audio signals, especially timbre, plays a major role in affecting the perception of emotional valence in music, the inclusion of the presented features will contribute to decreasing the prediction error rate.


Processes ◽  
2020 ◽  
Vol 8 (4) ◽  
pp. 461
Author(s):  
Philipp Doppler ◽  
Lukas Veiter ◽  
Oliver Spadiut ◽  
Christoph Herwig ◽  
Vignesh Rajamanickam

Monitoring process variables in bioprocesses with complex expression systems, such as filamentous fungi, requires a vast number of offline methods or sophisticated inline sensors. In this respect, cell viability is a crucial process variable determining the overall process performance. Thus, fast and precise tools for identification of key process deviations or transitions are needed. However, such reliable monitoring tools are still scarce to date or require sophisticated equipment. In this study, we used the commonly available size exclusion chromatography (SEC) HPLC technique to capture impurity release information in Penicillium chrysogenum bioprocesses. We exploited the impurity release information contained in UV chromatograms as fingerprints for development of principal component analysis (PCA) models to descriptively analyze the process trends. Prediction models using well established approaches, such as partial least squares (PLS), orthogonal PLS (OPLS) and principal component regression (PCR), were made to predict the viability with model accuracies of 90% or higher. Furthermore, we demonstrated the platform applicability of our method by monitoring viability in a Trichoderma reesei process for cellulase production. We are convinced that this method will not only facilitate monitoring viability of complex bioprocesses but could also be used for enhanced process control with hybrid models in the future.


2019 ◽  
Vol 12 (1) ◽  
pp. 61-66
Author(s):  
Devianti Devianti ◽  
Zulfahrizal Zulfahrizal ◽  
Sufardi Sufardi ◽  
Agus Arip Munawar

Abstract. The functions soil depends on the balances of its structure, nutrients composition as well as other chemical and physical properties. Conventional methods, used to determine nutrients content on agricultural soil were time consuming, complicated sample processing and destructive in nature. Near infrared reflectance spectroscopy (NIRS) has become one of the most promising and used non-destructive methods of analysis in many field areas including in soil science. The main aim of this present study is to apply NIRS in predicting nutrients content of soils in form of total nitrogen (N). Transmittance spectra data were obtained from a total of 18 soil samples from 8 different sites followed by N measurement using standard laboratory method. Principal component regression (PCR) with full cross validation were used to develop and validate N prediction models. The results showed that N content can be predicted very well even with raw spectra data with coefficient correlation (r) and residual predictive deviation index (RPD) were 0.95 and 3.35 respectively. Furthermore, spectra correction clearly enhances and improve prediction accuracy with r = 0.96 and RPD = 3.51. It may conclude that NIRS can be used as fast and simultaneous method in determining nutrient content of agricultural soils.


2019 ◽  
Vol 8 (1) ◽  
Author(s):  
Khairunnisa Khairunnisa ◽  
Rizka Pitri ◽  
Victor P Butar-Butar ◽  
Agus M Soleh

This research used CFSRv2 data as output data general circulation model. CFSRv2 involves some variables data with high correlation, so in this research is using principal component regression (PCR) and partial least square (PLS) to solve the multicollinearity occurring in CFSRv2 data. This research aims to determine the best model between PCR and PLS to estimate rainfall at Bandung geophysical station, Bogor climatology station, Citeko meteorological station, and Jatiwangi meteorological station by comparing RMSEP value and correlation value. Size used was 3×3, 4×4, 5×5, 6×6, 7×7, 8×8, 9×9, and 11×11 that was located between (-40) N - (-90) S and 1050 E -1100 E with a grid size of 0.5×0.5 The PLS model was the best model used in stastistical downscaling in this research than PCR model because of the PLS model obtained the lower RMSEP value and the higher correlation value. The best domain and RMSEP value for Bandung geophysical station, Bogor climatology station, Citeko meteorological station, and Jatiwangi meteorological station is 9 × 9 with 100.06, 6 × 6 with 194.3, 8 × 8 with 117.6, and 6 × 6 with 108.2, respectively.


Sign in / Sign up

Export Citation Format

Share Document