Dimensionality Reduction of near Infrared Spectral Data Using Global and Local Implementations of Principal Component Analysis for Neural Network Calibrations

2007 ◽  
Vol 15 (1) ◽  
pp. 21-28 ◽  
Author(s):  
Igor V. Kovalenko ◽  
Glen R. Rippke ◽  
Charles R. Hurburgh
2018 ◽  
Vol 26 (2) ◽  
pp. 101-105 ◽  
Author(s):  
Zhang Jianqiang ◽  
Liu Weijuan ◽  
Zhang Huaihui ◽  
Hou Ying ◽  
Yang Panpan ◽  
...  

A nonnegative least squares classifier was proposed in this paper to classify near infrared spectral data. The method used near infrared spectral data of training samples to make up a data dictionary of the sparse representation. By adopting the nonnegative least squares sparse coding algorithm, the near infrared spectral data of test samples would be expressed via the sparsest linear combinations of the dictionary. The regression residual of the test sample of each class was computed, and finally it was assigned to the class with the minimum residual. The method was compared with the other classifying approaches, including the well-performing principal component analysis–linear discriminant analysis and principal component analysis–particle swarm optimization–support vector machine. Experimental results showed that the approach was faster and generally achieved a better prediction performance over compared methods. The method can accurately recognize different classes of tobacco leaves and it provides a new technology for quality evaluation of tobacco leaf in its purchasing activities.


2019 ◽  
Vol 27 (5) ◽  
pp. 379-390
Author(s):  
Mazlina Mohd Said ◽  
Simon Gibbons ◽  
Anthony Moffat ◽  
Mire Zloh

This research was initiated as part of the fight against public health problems of rising counterfeit, substandard and poor quality medicines and herbal products. An effective screening strategy using a two-step combination approach of an incremental near infrared spectral database (step 1) followed by principal component analysis (step 2) was developed to overcome the limitations of current procedures for the identification of medicines by near infrared spectroscopy which rely on the direct comparison of the unknown spectra to spectra of reference samples or products. The near infrared spectral database consisted of almost 4000 spectra from different types of medicines acquired and stored in the database throughout the study. The spectra of the test samples (pharmaceutical and herbal formulations) were initially compared to the reference spectra of common medicines from the database using a correlation algorithm. Complementary similarity assessment of the spectra was conducted based on the observation of the principal component analysis score plot. The validation of the approach was achieved by the analysis of known counterfeit Viagra samples, as the spectra did not fully match with the spectra of samples from reliable sources and did not cluster together in the principal component analysis score plot. Pre-screening analysis of an herbal formulation (Pronoton) showed similarity with a product containing sildenafil citrate in the database. This finding supported by principal component analysis has indicated that the product was adulterated. The identification of a sildenafil analogue, hydroxythiohomosildenafil, was achieved by mass spectrometry and Nuclear Magnetic Resonance (NMR) analyses. This approach proved to be a suitable technique for quick, simple and cost-effective pre-screening of products for guiding the analysis of pharmaceutical and herbal formulations in the quest for the identification of potential adulterants.


NIR news ◽  
2017 ◽  
Vol 28 (2) ◽  
pp. 7-12 ◽  
Author(s):  
Michal Oravec ◽  
Lukáš Gál ◽  
Michal Čeppan

The aim of this work was to prepare spectral data for principal component analysis and to examine 19 samples of six different brands. Samples consisted of the same type of office paper with black areas printed in black ink only. The spectral data were acquired by fibre optics reflection spectroscopy in Vis-NIR and only NIR (Vis-NIR FORS) directly on paper. The black inkjet-printed samples were analysed with regard to the forensic analysis of documents. The method used is based on the combination of molecular spectroscopy in the visible (Vis) and near infrared region (NIR) combined with a chemometric method, – principal component analysis (PCA). The PCA method divides the inkjet inks sample into clusters. It was found out that by a combination of spectrum pre-processing methods and principal component analysis, it is possible to separate inks containing carbon black from the other inks using other organic colourants. This method appears to be a useful tool for forensic examination of printed documents containing inkjet inks. Spectra of inkjet inks were acquired without any destructive or invasive procedure, for example cutting sample or for extraction with the possibility to measure out of the laboratory.


2015 ◽  
Vol 8 (2) ◽  
pp. 191-196 ◽  
Author(s):  
Michal Oravec ◽  
Lukáš Gál ◽  
Michal Čeppan

Abstract This paper presents a novel approach in non-destructive analysis of inkjet-printed documents. Our method is based on the combination of molecular spectroscopy in the Near Infrared Region (NIR) and a chemometric method - principal component analysis (PCA). The aim of this work was to prepare spectral data for the analysis of the interrelationships between 19 samples consisting of the same type of office paper on which black squares were full printed in black ink only. The spectra were obtained separately using the Ocean Optics System in two spectral regions, i.e., overtones: 1000-1600 nm and combination bands: 1600-2300 nm, with the paper base. Experimental results confirmed the high reliability of the proposed approach despite the sparse dataset.


Author(s):  
А.О. Алексанян ◽  
С.О. Старков ◽  
К.В. Моисеев

Данная статья затрагивает проблему распознавания лиц при решении задачи идентификации, где в качестве входных данных для последующей классификации используются вектора-признаки, полученные в результате работы сети глубокого обучения. Немногие существующие алгоритмы способны проводить классификацию на открытых наборах (open-set classification) с достаточно высокой степенью надежности. Общепринятым подходом к проведению классификации является применение классификатора на основании порогового значения. Такой подход обладает рядом существенных недостатков, что и является причиной низкого качества классификации на открытых наборах. Из основных недостатков можно выделить следующие. Во-первых, отсутствие фиксированного порога — невозможно подобрать универсальный порог для каждого лица. Во-вторых, увеличение порога ведет к снижению качества классификации. И, в-третьих, при пороговой классификации одному лицу может соответствовать сразу большое количество классов. В связи с этим мы предлагаем использование метода главных компонент в качестве дополнительного способа понижения размерности, вдобавок к выделению ключевых признаков лица сетью глубокого обучения, для дальнейшей классификации векторов-признаков. Геометрически применение метода главных компонент к векторам-признакам и проведение дальнейшей классификации равносильно поиску пространства меньшей размерности, в котором проекции исходных векторов будут хорошо разделимы. Идея понижения размерности логически вытекает из предположения, что не все компоненты N-мерных векторов-признаков несут значимый вклад в описание человеческого лица и что лишь некоторые компоненты образуют большую часть дисперсии. Таким образом, выделение только значимых компонентов из векторов-признаков позволяет производить разделение классов на основании самых вариативных признаков, без изучения при этом менее информативных данных и без сравнения вектора в пространстве большой размерности. The study objective is face recognition for identification purposes. The input data to be classified are attribute vectors generated by a deep learning neural network. The few existing algorithms can perform sufficiently reliable openset classification. The common approach to classification is using a classification threshold. It has several disadvantages leading to the low quality of openset classifications. The key disadvantages are as follows. First, there is no set threshold: it is impossible to find a common threshold suitable for every face. Second, the higher the threshold, the lower the quality of classification. Third, with the threshold classification more than one class can match a face. For this reason, we proposed to apply the principal component analysis as an extra dimensionality reduction tool besides identifying the key face attributes by a deep learning neural network for subsequent classification of the attribute vectors. In geometric terms, the principal component analysis application to attribute vectors with subsequent classification is similar to a search for a lowdimension space where the projections of the source vectors can be easily separated. The dimensionality reduction concept is based on the assumption that not all the components on Ndimensional attribute vectors are relevant for the human face representation, and only some of them produce the larger part of the dispersion. Therefore, by selecting only the relevant components of the attribute vectors we can separate the classes using the most variable attributes while skipping the less informative data and not comparing the vectors in a highdimensional space.


PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0248896
Author(s):  
Nico Migenda ◽  
Ralf Möller ◽  
Wolfram Schenck

“Principal Component Analysis” (PCA) is an established linear technique for dimensionality reduction. It performs an orthonormal transformation to replace possibly correlated variables with a smaller set of linearly independent variables, the so-called principal components, which capture a large portion of the data variance. The problem of finding the optimal number of principal components has been widely studied for offline PCA. However, when working with streaming data, the optimal number changes continuously. This requires to update both the principal components and the dimensionality in every timestep. While the continuous update of the principal components is widely studied, the available algorithms for dimensionality adjustment are limited to an increment of one in neural network-based and incremental PCA. Therefore, existing approaches cannot account for abrupt changes in the presented data. The contribution of this work is to enable in neural network-based PCA the continuous dimensionality adjustment by an arbitrary number without the necessity to learn all principal components. A novel algorithm is presented that utilizes several PCA characteristics to adaptivly update the optimal number of principal components for neural network-based PCA. A precise estimation of the required dimensionality reduces the computational effort while ensuring that the desired amount of variance is kept. The computational complexity of the proposed algorithm is investigated and it is benchmarked in an experimental study against other neural network-based and incremental PCA approaches where it produces highly competitive results.


2019 ◽  
Vol 31 (3) ◽  
pp. 1061-1069
Author(s):  
Guangyu Shi ◽  
Jun Cao ◽  
Chao Li ◽  
Yuliang Liang

Abstract A transfer learning system was designed to predict Xylosma racemosum compression strength. Near-infrared (NIR) spectral data for Acer mono and its compression strength values were used to resolve the weak generalization problem caused by using a X. racemosum dataset alone. Transfer component analysis and principal component analysis are domain adaption and feature extraction processes to enable the use of A. mono NIR spectral data to design the transfer learning system. A five-layer neural network relevant to the X. racemosum dataset, was fine-tuned using the A. mono dataset. There were 109 A. mono samples used as the source dataset and 79 X. racemosum samples as the target dataset. When the ratio of the training set to the test set was 1:9, the correlation coefficient was 0.88, and mean square error was 8.84. The results show that NIR spectral data of hardwood species are related. Predicting the mechanical strength of hardwood species using multi-species NIR spectral datasets will improve the generalization ability of the model and increase accuracy.


Sign in / Sign up

Export Citation Format

Share Document