scholarly journals Kernel Partial Least Square Regression with High Resistance to Multiple Outliers and Bad Leverage Points on Near-Infrared Spectral Data Analysis

Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 547
Author(s):  
Divo Dharma Silalahi ◽  
Habshah Midi ◽  
Jayanthi Arasan ◽  
Mohd Shafie Mustafa ◽  
Jean-Pierre Caliman

Multivariate statistical analysis such as partial least square regression (PLSR) is the common data processing technique used to handle high-dimensional data space on near-infrared (NIR) spectral datasets. The PLSR is useful to tackle the multicollinearity and heteroscedasticity problem that can be commonly found in such data space. With the problem of the nonlinear structure in the original input space, the use of the classical PLSR model might not be appropriate. In addition, the contamination of multiple outliers and high leverage points (HLPs) in the dataset could further damage the model. Generally, HLPs contain both good leverage points (GLPs) and bad leverage points (BLPs); therefore, in this case, removing the BLPs seems relevant since it has a significant impact on the parameter estimates and can slow down the convergence process. On the other hand, the GLPs provide a good efficiency in the model calibration process; thus, they should not be eliminated. In this study, robust alternatives to the existing kernel partial least square (KPLS) regression, which are called the kernel partial robust GM6-estimator (KPRGM6) regression and the kernel partial robust modified GM6-estimator (KPRMGM6) regression are introduced. The nonlinear solution on PLSR was handled through kernel-based learning by nonlinearly projecting the original input data matrix into a high-dimensional feature mapping that corresponded to the reproducing kernel Hilbert spaces (RKHS). To increase the robustness, the improvements on GM6 estimators are presented with the nonlinear PLSR. Based on the investigation using several artificial dataset scenarios from Monte Carlo simulations and two sets from the near-infrared (NIR) spectral dataset, the proposed robust KPRMGM6 is found to be superior to the robust KPRGM6 and non-robust KPLS.

2005 ◽  
Vol 13 (3) ◽  
pp. 147-154 ◽  
Author(s):  
Wolfgang Becker ◽  
Norbert Eisenreich

Near infrared spectroscopy was used as an in-line control system for the measurement of polypropylene filled with different amounts of Irganox additives. For this purpose transmission probes were installed in an extruder. The probes can withstand temperatures up to 300°C and pressures up to 60 MPa. Transmission spectra of polypropylene mixed with an Irganox additive were recorded. PCA score plot was carried out revealing the influence of varying conditions for the mixing of the sample preparation. Prediction models were generated with partial least square regression which resulted in a model which estimated Irganox with a coefficient of detremination of 0.984 and a root mean square error of prediction of 0.098%. Furthermore the possibilities for controlling process conditions by measuring transmission at a specific wavelength were shown.


2020 ◽  
Vol 28 (3) ◽  
pp. 153-162
Author(s):  
Lijun Wu ◽  
Baoxing Wang ◽  
Lei Zhang ◽  
Rumin Duan ◽  
Rui Gao ◽  
...  

Near infrared spectroscopy coupled with sample set partitioning based on joint X-Y distances combined with partial least square regression was applied to the quantitative analysis of six routine chemicals, five physical indices and four macromolecular substances in reconstituted tobacco. The quantitative regression models of these indices were established by joint X-Y distances combined with partial least square regression. Results showed remarkable correlation between predicted and measured values of the 15 indices. The root mean square error of prediction of all the indices was low, and the correlation coefficients of these PLS models were all greater than 0.85. This was the first study in which NIR spectroscopy had been used to determine the macromolecular substances as well as certain physical indices in reconstituted tobacco. Results showed that this method could be feasibly applied for rapid detection of these properties of industrial products.


Author(s):  
PATTEERA SODATA ◽  
JOMJAI PEERAPATTANA

Objective: This study aimed to apply near-infrared spectroscopy along with a thief as a tool to determine the endpoint of the blending process. Methods: The calibration model was constructed by partial least square regression. The best model was applied to determine the endpoint of the blending process, also the effect of loading order on the endpoint for the blending of the formulation containing a low concentration of the active pharmaceutical ingredient. Results: The best partial least square regression model yielded the lowest root mean square error of calibration of 1.4004, the lowest root mean square error of prediction of 1.4108 and the highest correlation coefficient of 0.9921. Validation study revealed the reference values were not statistically different from those of the predicted values. The model could predict the endpoint of the blending process with acceptable precision and accuracy. Standard deviation of the content of active pharmaceutical ingredients was ≤ 3% of the target after eighteen minutes of the blending process, which indicated the uniformity of powder blends. Additionally, the model revealed the order of powder loading slightly affected the blending time. The protocol that loaded the active pharmaceutical ingredient first or last needed a longer time to achieve the uniformity of blend. Conclusion: NIR spectroscopy is the rapid and effective tools that could be applied to study the blending process in the pharmaceutical manufacturing.


Sign in / Sign up

Export Citation Format

Share Document