scholarly journals Pattern Recognition of Tobacco Headspace GC Profiles: A Potential New Analytical Tool for the Classification of Raw Tobaccos

Author(s):  
F Heinzer ◽  
HP Maitre ◽  
M Rigaux ◽  
J Wild

AbstractThe first part of the paper describes a new method of obtaining reproducible and meaningful headspace profiles of tobacco lamina by using a modified closed loop stripping apparatus. The complex chromatograms are obtained by high-resolution glass capillary gas chromatography. The second part summarizes the results of a chemometric approach to interpret the chromatograms obtained from a series of nine Virginia flue-cured tobaccos from different origins and belonging to different quality groups, each one analysed three times by the method described above. After the elimination of peaks containing redundant information, the resulting data set, consisting of 27 × 17 data points, was analysed to detect natural groupings by using an in-house program (in Basic) for principal component analysis. A subsequent discriminant analysis yielded two discriminant functions capable of separating the nine Virginia tobaccos into three quality groups as defined by a conventional organoleptic analysis carried out by a smoking panel. All the tobaccos could be classified correctly (100 %). A first attempt to classify, by the procedure described above, a group of six Virginia tobaccos whose organoleptic scores were not known, did not yield clearly interpretable results, possibly because the performance of the capillary column used for analysis had slightly deteriorated during the experiment with resultant changes in retention characteristics, which led to wrong identifications of certain peaks.

The problem of medical data classification is analyzed and the methods of classification are reviewed in various aspects. However, the efficiency of classification algorithms is still under question. With the motivation to leverage the classification performance, a Class Level disease Convergence and Divergence (CLDC) measure based algorithm is presented in this paper. For any dimension of medical data, it convergence or divergence indicates the support for the disease class. Initially, the data set has been preprocessed to remove the noisy data points. Further, the method estimates disease convergence/divergence measure on different dimensions. The convergence measure is computed based on the frequency of dimensional match where the divergence is estimated based on the dimensional match of other classes. Based on the measures a disease support factor is estimated. The value of disease support has been used to classify the data point and improves the classification performance.


2011 ◽  
Vol 9 (2) ◽  
Author(s):  
Norzailawati Mohd Noor ◽  
Alias Abdullah ◽  
Mazlan Hashim

Land use mapping in development plan basically provides resources of information and important tool in decision making. In relation to this, fine resolution of recent satellite remotely sensed data have found wide applications in land use/land cover mapping. This study reports on work carried out for classification of fused image for land use mapping in detail scale for Local Plan. The LANDSATTM, SPOT Pan and IKONOS satellite were fused and examined using three data fusion techniques, namely Principal Component Transfonn (PCT), Wavelet Transform and Multiplicative fusing approach. The best fusion technique for three datasets was determined based on the assessment of class separabilities and visualizations evaluation of the selected subset of the fused datasets, respectively. Principal Component Transform has been found to be the best technique for fusing the three datasets, where the best fused data set was subjected to further classification for producing level of land use classes while level II and III pass on to nine classes of detail classification for local plan. The overall data classification accuracy of the best fused data set was 0.86 (kappa statistic). Final land use output from classified data was successfully generated in accordance to local plan land use mapping for development plan purposes.


2020 ◽  
Vol 498 (3) ◽  
pp. 3440-3451
Author(s):  
Alan F Heavens ◽  
Elena Sellentin ◽  
Andrew H Jaffe

ABSTRACT Bringing a high-dimensional data set into science-ready shape is a formidable challenge that often necessitates data compression. Compression has accordingly become a key consideration for contemporary cosmology, affecting public data releases, and reanalyses searching for new physics. However, data compression optimized for a particular model can suppress signs of new physics, or even remove them altogether. We therefore provide a solution for exploring new physics during data compression. In particular, we store additional agnostic compressed data points, selected to enable precise constraints of non-standard physics at a later date. Our procedure is based on the maximal compression of the MOPED algorithm, which optimally filters the data with respect to a baseline model. We select additional filters, based on a generalized principal component analysis, which are carefully constructed to scout for new physics at high precision and speed. We refer to the augmented set of filters as MOPED-PC. They enable an analytic computation of Bayesian Evidence that may indicate the presence of new physics, and fast analytic estimates of best-fitting parameters when adopting a specific non-standard theory, without further expensive MCMC analysis. As there may be large numbers of non-standard theories, the speed of the method becomes essential. Should no new physics be found, then our approach preserves the precision of the standard parameters. As a result, we achieve very rapid and maximally precise constraints of standard and non-standard physics, with a technique that scales well to large dimensional data sets.


Sensors ◽  
2019 ◽  
Vol 19 (23) ◽  
pp. 5097 ◽  
Author(s):  
David Agis ◽  
Francesc Pozo

This work presents a structural health monitoring (SHM) approach for the detection and classification of structural changes. The proposed strategy is based on t-distributed stochastic neighbor embedding (t-SNE), a nonlinear procedure that is able to represent the local structure of high-dimensional data in a low-dimensional space. The steps of the detection and classification procedure are: (i) the data collected are scaled using mean-centered group scaling (MCGS); (ii) then principal component analysis (PCA) is applied to reduce the dimensionality of the data set; (iii) t-SNE is applied to represent the scaled and reduced data as points in a plane defining as many clusters as different structural states; and (iv) the current structure to be diagnosed will be associated with a cluster or structural state based on three strategies: (a) the smallest point-centroid distance; (b) majority voting; and (c) the sum of the inverse distances. The combination of PCA and t-SNE improves the quality of the clusters related to the structural states. The method is evaluated using experimental data from an aluminum plate with four piezoelectric transducers (PZTs). Results are illustrated in frequency domain, and they manifest the high classification accuracy and the strong performance of this method.


2007 ◽  
Vol 77 (5) ◽  
pp. 821-830 ◽  
Author(s):  
Chihiro Tanikawa ◽  
Yasuhiro Kakiuchi ◽  
Masakazu Yagi ◽  
Kayoko Miyata ◽  
Kenji Takada

Abstract Objective: (1) To determine feature vector representations (geometric pattern parameters) that are effective in describing human nasal profiles, (2) to determine the number of code vectors (typical nasal patterns) that are mathematically optimized by applying the vector quantization method to each feature vector extracted for each subject, and (3) to determine the morphological traits of each code. Materials and Methods: Lateral facial photographs of 200 Japanese women recorded for orthodontic diagnosis were selected. Five anatomic landmarks were identified on each image together with a set of data points that constituted the contour of the facial profile. An eight-dimensional feature vector effective in distinguishing differences in nasal profile patterns was extracted from the data set using experts' knowledge of the anatomic traits of the nose. The vector quantization technique was applied to the feature vectors to provide the optimum number of nasal profile patterns. Results: The number of code vectors mathematically optimized was six, and the differences between vectors were maximized by morphological traits of the root, dorsum, tip, and base of the nose. Proportions of the number of image records classified into each code were 25.5%, 24.5%, 21.5%, 15.0%, 10.0%, and 3.5% from code 1 to code 6, respectively. Conclusions: Classifying nasal profile patterns based on knowledge from a linguistic description was found to be more effective than a method based on uniform sectioning. The differences between vectors were maximized by morphological traits of the root, the dorsum, the tip, and the base of the nose.


2019 ◽  
Vol 11 (22) ◽  
pp. 2690 ◽  
Author(s):  
Yushi Chen ◽  
Lingbo Huang ◽  
Lin Zhu ◽  
Naoto Yokoya ◽  
Xiuping Jia

Hyperspectral remote sensing obtains abundant spectral and spatial information of the observed object simultaneously. It is an opportunity to classify hyperspectral imagery (HSI) with a fine-grained manner. In this study, the fine-grained classification of HSI, which contains a large number of classes, is investigated. On one hand, traditional classification methods cannot handle fine-grained classification of HSI well; on the other hand, deep learning methods have shown their powerfulness in fine-grained classification. So, in this paper, deep learning is explored for HSI supervised and semi-supervised fine-grained classification. For supervised HSI fine-grained classification, densely connected convolutional neural network (DenseNet) is explored for accurate classification. Moreover, DenseNet is combined with pre-processing technique (i.e., principal component analysis or auto-encoder) or post-processing technique (i.e., conditional random field) to further improve classification performance. For semi-supervised HSI fine-grained classification, a generative adversarial network (GAN), which includes a discriminative CNN and a generative CNN, is carefully designed. The GAN fully uses the labeled and unlabeled samples to improve classification accuracy. The proposed methods were tested on the Indian Pines data set, which contains 33,3951 samples with 52 classes. The experimental results show that the deep learning-based methods provide great improvements compared with other traditional methods, which demonstrate that deep models have huge potential for HSI fine-grained classification.


2017 ◽  
Vol 82 (6) ◽  
pp. 711-721 ◽  
Author(s):  
Jelena Cvejanov ◽  
Biljana Skrbic

The contents of major ions in bottled waters were analyzed by principal component (PCA) and hierarchical cluster (HCA) analysis in order to investigate if these techniques could provide the information necessary for classifications of the water brands marketed in Serbia. Data on the contents of Ca2+, Mg2+, Na+, K+, Cl-, SO4 2-, HCO3 - and total dissolved solids (TDS) of 33 bottled waters was used as the input data set. The waters were separated into three main clusters according to their levels of TDS, Na+ and HCO3 -; sub-clustering revealed a group of soft waters with the lowest total hardness. Based on the determined chemical parameters, the Serbian waters were further compared with available literature data on bottled waters from some other European countries. To the best of our knowledge, this is the first report applying chemometric classification of bottled waters from different European countries, thereby representing a unique attempt in contrast to previous studies reporting the results primarily on a country-to-country scale. The diverse character of Serbian bottled waters was demonstrated as well as the usefulness of PCA and HCA in the fast classification of the water brands based on their main chemical parameters.


2011 ◽  
Vol 9 ◽  
Author(s):  
Noorzailawati Mohd Noor ◽  
Alias Abdullah ◽  
Mazlan Hashim

Land use mapping in development plan basically provides resources of information and important tool in decision making. In relation to this, fine resolution of recent satellite remotely sensed data have found wide applications in land use/land cover mapping. This study reports on work carried out for classification of fused image for land use mapping in detail scale for Local Plan. The LANDSATTM, SPOT Pan and IKONOS satellite were fused and examined using three data fusion techniques, namely Principal Component Transfonn (PCT), Wavelet Transform and Multiplicative fusing approach. The best fusion technique for three datasets was determined based on the assessment of class separabilities and visualizations evaluation of the selected subset of the fused datasets, respectively. Principal Component Transform has been found to be the best technique for fusing the three datasets, where the best fused data set was subjected to further classification for producing level of land use classes while level II and III pass on to nine classes of detail classification for local plan. The overall data classification accuracy of the best fused data set was 0.86 (kappa statistic). Final land use output from classified data was successfully generated in accordance to local plan land use mapping for development plan purposes.


2015 ◽  
Vol 33 (1) ◽  
pp. 119
Author(s):  
Alexandre Cruz Sanchetta ◽  
Emilson Pereira Leite ◽  
Bruno César Zanardo Honório ◽  
Alexandre Campane Vidal

ABSTRACT. The problem of automatic classification of facies was addressed using the Fast Independent Component Analysis (FastICA) of a data set of geophysical well logs of the Namorado Field, Campos Basin, Brazil, followed by a k-nearest neighbor (k-NN) classification. The goal of an automatic classification of facies is to produce spatial models of facies that assist the geological characterization of petroleum reservoirs. The FastICA technique provides a new data set that has the most stable and less Gaussian distribution possible. The k-NN classifies this new data set according to its characteristics. The previous application of FastICA improves the accuracy of the k-NN automatic classification and it also provides better results in comparison with the automatic classification by means of the Principal Component Analysis (PCA).Keywords: automatic classification, geophysical well logs, Independent Component Analysis.RESUMO. O problema da classificação automática de fácies foi abordado através da Análise de Componentes Independentes Rápida (FastICA – Fast Independent Component Analysis ) de um conjunto de dados de perfis geofísicos de poços do Campo de Namorado, Bacia de Campos, seguida de classificação por k vizinhos mais próximos (k-NN – k-nearest neighbor ). A classificação automática de fácies é utilizada para gerar modelos de distribuição espacial de fácies que auxiliam a caracterização geológica dos reservatórios de petróleo. A técnica FastICA encontra um novo conjunto de dados com distribuição mais estável e menos gaussiana possível e o k-NN classifica esse novo conjunto de acordo com suas características. A aplicação prévia da FastICA melhora a porcentagem de acerto da classificação automática pelo k-NN, fornecendo melhores resultados quando comparada com a classificação automática por Análise de Componentes Principais (PCA – Principal Component Analysis ).Palavras-chave: classificação automática, perfis geofísicos de poços, Análise de Componentes Independentes.


Sign in / Sign up

Export Citation Format

Share Document