scholarly journals A note on Mahalanobis and related distance measures in WinISI and The Unscrambler

2019 ◽  
Vol 27 (4) ◽  
pp. 253-258 ◽  
Author(s):  
A Garrido-Varo ◽  
J Garcia-Olmo ◽  
T Fearn

In identifying spectral outliers in near infrared calibration it is common to use a distance measure that is related to Mahalanobis distance. However, different software packages tend to use different variants, which lead to a translation problem if more than one package is used. Here the relationships between squared Mahalanobis distance D2, the GH distance of WinISI, and the T2 and leverage (L) statistics of The Unscrambler are established as D2 = T2 ≈ L × n ≈ GH × k, where n and k are the numbers of samples and variables, respectively, in the set of spectral data used to establish the distance measure. The implications for setting thresholds for outlier detection are discussed. On the way to this result the principal component scores from WinISI and The Unscrambler are compared. Both packages scale the scores for a component to have variances proportional to the contribution of that component to total variance, but the WinISI scores, unlike those from The Unscrambler, do not have mean zero.

2018 ◽  
Vol 10 (4) ◽  
pp. 351
Author(s):  
João S. Panero ◽  
Henrique E. B. da Silva ◽  
Pedro S. Panero ◽  
Oscar J. Smiderle ◽  
Francisco S. Panero ◽  
...  

Near Infrared (NIR) Spectroscopy technique combined with chemometrics methods were used to group and identify samples of different soy cultivars. Spectral data, collected in the range of 714 to 2500 nm (14000 to 4000 cm-1), were obtained from whole grains of four different soybean cultivars and were submitted to different types of pre-treatments. Chemometrics algorithms were applied to extract relevant information from the spectral data, to remove the anomalous samples and to group the samples. The best results were obtained considering the spectral range from 1900.6 to 2187.7 nm (5261.4 cm-1 to 4570.9 cm-1) and with spectral treatment using Multiplicative Signal Correction (MSC) + Baseline Correct (linear fit), what made it possible to the exploratory techniques Principal Component Analysis (PCA) and Hierarchical Cluster Analysis (HCA) to separate the cultivars. Thus, the results demonstrate that NIR spectroscopy allied with de chemometrics techniques can provide a rapid, nondestructive and reliable method to distinguish different cultivars of soybeans.


2021 ◽  
Vol 27 (1) ◽  
pp. 55-60
Author(s):  
Sampson Twumasi-Ankrah ◽  
Simon Kojo Appiah ◽  
Doris Arthur ◽  
Wilhemina Adoma Pels ◽  
Jonathan Kwaku Afriyie ◽  
...  

This study examined the performance of six outlier detection techniques using a non-stationary time series dataset. Two key issues were of interest. Scenario one was the method that could correctly detect the number of outliers introduced into the dataset whiles scenario two was to find the technique that would over detect the number of outliers introduced into the dataset, when a dataset contains only extreme maxima values, extreme minima values or both. Air passenger dataset was used with different outliers or extreme values ranging from 1 to 10 and 40. The six outlier detection techniques used in this study were Mahalanobis distance, depth-based, robust kernel-based outlier factor (RKOF), generalized dispersion, Kth nearest neighbors distance (KNND), and principal component (PC) methods. When detecting extreme maxima, the Mahalanobis and the principal component methods performed better in correctly detecting outliers in the dataset. Also, the Mahalanobis method could identify more outliers than the others, making it the "best" method for the extreme minima category. The kth nearest neighbor distance method was the "best" method for not over-detecting the number of outliers for extreme minima. However, the Mahalanobis distance and the principal component methods were the "best" performed methods for not over-detecting the number of outliers for the extreme maxima category. Therefore, the Mahalanobis outlier detection technique is recommended for detecting outlier in nonstationary time series data.


1992 ◽  
Vol 46 (1) ◽  
pp. 34-43 ◽  
Author(s):  
Tormod Næs ◽  
Tomas Isaksson

This paper presents an application of locally weighted regression (LWR) in diffuse near-infrared transmittance spectroscopy. The data are from beef and pork samples. The LWR method is based on the idea that a nonlinearity can be approximated by local linear equations. Different weight functions (for the samples) as well as different distance measures for “closeness” are tested. The LWR is compared to principal component regression and partial least-squares regression. The LWR with weighted principal components is shown to give the best results. The improvements with respect to linear regression are up to 15% of the prediction errors.


1989 ◽  
Vol 43 (6) ◽  
pp. 1045-1049 ◽  
Author(s):  
P. Robert ◽  
D. Bertrand ◽  
M. Crochon ◽  
J. Sabino

Analytical applications of near-infrared spectroscopy require the determination of calibration equations linking chemical and spectral values. Such equations are difficult to update by including new calibration specimens. A new procedure for prediction which was not based on multiple linear regression has been investigated. This procedure could be included in a data base system. The proposed method consists of three steps: compression of the spectral data by applying principal component analysis, creation of a predictive lattice, and projection of the spectra of unknown specimens on to the predictive lattice. This enables the prediction of chemical data that are not perfectly linked to spectral data by a linear relationship. The procedure has been applied to the prediction of the refractive index of apples. A predictive lattice was designed with the use of 45 specimens of calibration. A prediction with 43 verification specimens gave a standard error of 0.8%, which appeared sufficient for grading apples in quality classes. Further studies are required in order to include the proposed method in spectral libraries specializing in analytical applications.


2021 ◽  
Vol 922 (1) ◽  
pp. 012011
Author(s):  
Samadi ◽  
S Wajizah ◽  
Z Zulfahrizal

Abstract This presented study aimed to study the near infrared spectroscopic features of cocoa pod husk samples used as raw materials for animal feedstuff. Spectral data of organic material samples contains chemical properties information that can be revealed through modelling, Thus, the study of this features is essential to assess and reveal buried respective information. Cocoa pod husk samples were obtained from several districts in Aceh Province, grinded and prepared as bulk samples. Diffuse reflectance spectral data for a total of 30 bulk cocoa pod husk samples were acquired and recorded in wavelength range from 1000 to 2500 nm. Spectral data were firstly projected onto principal component analysis to observe similarities among samples. Spectra correction, namely mean normalization was employed to enhance spectra features. The results showed that several chemical information related to cocoa properties can be revealed such as dry matter, crude protein, crude fibre, ether extract, nitrogen-free extract and ash content due to the second and third overtones pf combination bands O-H, C-O-H and N-H. Optimum wavelength for estimating cocoa pod husk attributes are in 1217, 1405-1474 nm, 1629 nm, 1906-1979 nm, and 2283 nm. Based on obtained study, it may conclude that several quality attributes of animal feed samples further can be determined by means of near infrared spectroscopy approach.


2018 ◽  
Vol 26 (2) ◽  
pp. 101-105 ◽  
Author(s):  
Zhang Jianqiang ◽  
Liu Weijuan ◽  
Zhang Huaihui ◽  
Hou Ying ◽  
Yang Panpan ◽  
...  

A nonnegative least squares classifier was proposed in this paper to classify near infrared spectral data. The method used near infrared spectral data of training samples to make up a data dictionary of the sparse representation. By adopting the nonnegative least squares sparse coding algorithm, the near infrared spectral data of test samples would be expressed via the sparsest linear combinations of the dictionary. The regression residual of the test sample of each class was computed, and finally it was assigned to the class with the minimum residual. The method was compared with the other classifying approaches, including the well-performing principal component analysis–linear discriminant analysis and principal component analysis–particle swarm optimization–support vector machine. Experimental results showed that the approach was faster and generally achieved a better prediction performance over compared methods. The method can accurately recognize different classes of tobacco leaves and it provides a new technology for quality evaluation of tobacco leaf in its purchasing activities.


NIR news ◽  
2017 ◽  
Vol 28 (2) ◽  
pp. 7-12 ◽  
Author(s):  
Michal Oravec ◽  
Lukáš Gál ◽  
Michal Čeppan

The aim of this work was to prepare spectral data for principal component analysis and to examine 19 samples of six different brands. Samples consisted of the same type of office paper with black areas printed in black ink only. The spectral data were acquired by fibre optics reflection spectroscopy in Vis-NIR and only NIR (Vis-NIR FORS) directly on paper. The black inkjet-printed samples were analysed with regard to the forensic analysis of documents. The method used is based on the combination of molecular spectroscopy in the visible (Vis) and near infrared region (NIR) combined with a chemometric method, – principal component analysis (PCA). The PCA method divides the inkjet inks sample into clusters. It was found out that by a combination of spectrum pre-processing methods and principal component analysis, it is possible to separate inks containing carbon black from the other inks using other organic colourants. This method appears to be a useful tool for forensic examination of printed documents containing inkjet inks. Spectra of inkjet inks were acquired without any destructive or invasive procedure, for example cutting sample or for extraction with the possibility to measure out of the laboratory.


Sign in / Sign up

Export Citation Format

Share Document