Evaluation of the effect of data pre-treatment procedures on classical pattern recognition and principal components analysis: a case study for the geographical classification of tea

2001 ◽  
Vol 3 (4) ◽  
pp. 352-360 ◽  
Author(s):  
Antonio Moreda-Piñeiro ◽  
Ana Marcos ◽  
Andrew Fisher ◽  
Steve J. Hill
ACTA IMEKO ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 129
Author(s):  
Leila Es Sebar ◽  
Leonardo Iannucci ◽  
Yuval Goren ◽  
Peter Fabian ◽  
Emma Angelini ◽  
...  

<p class="Abstract">This paper illustrates a case study related to the characterisation of corrosion products present on recently excavated artefacts. The archaeological findings, from the Rakafot 54 site (Beer-Sheva, Israel), consist of 23 coins and a pendant, all dating back to the Roman period. Raman spectroscopy was used to identify the corrosion products that compose the patina covering the objects. To facilitate and support their identification, spectra were then processed using principal components analysis. This chemometric technique allowed the identification of two main compounds, classified as atacamite and clinoatacamite, which formed the main components of the patinas. The results of this investigation can help in assessing the conservation state of artefacts and defining the correct restoration strategy.</p>


2005 ◽  
Vol 35 (12) ◽  
pp. 2860-2874 ◽  
Author(s):  
Nikos Nanos ◽  
Fernando Pardo ◽  
Jesus Alonso Nager ◽  
José Alberto Pardos ◽  
Luis Gil

Vegetation ordination is usually based on classical data reduction techniques such as principal components analysis, correspondence analysis, or multidimensional scaling. The usual methods do not account for multiscale correlations among species. In this paper, we use a geostatistical method, known as multivariate factorial kriging, for studying multiple-scale correlations. The case study was carried out in a mixed broadleaf forest of central Spain. Six tree species were included in the analysis. Data analysis included (i) experimental variogram calculation and modeling with the use of the linear model of coregionalization, (ii) principal components analysis, and (iii) cokriging. The results indicate that correlations among species are different depending on the spatial scale. We conclude that competition for light is the main factor controlling the spatial distribution of species at the plot-level scale of variation. At larger scales of variation, soil conditions and (or) human intervention are the key factors in determining the observed vegetation pattern. Based on the factor scores for the largest scale of variation, we conducted a cluster analysis to identify plots with similar characteristics. The resulting clusters have the remarkable property of being spatially continuous.


Blood ◽  
2005 ◽  
Vol 106 (11) ◽  
pp. 1463-1463
Author(s):  
Georges Jung ◽  
Sylvie Thiebault ◽  
Jean-Claude Eisenmann ◽  
Eckart Wunder ◽  
Marie Haas ◽  
...  

Abstract Multivariate analysis classification of chronic lymphocytic leukemia (CLL) and lymphoma (non-CLL) disorders is investigated in 299 patients by an extended panel of surface markers, and compared with Matutes classical scoring proposal. Diagnosis was based on clinical features, cell morphology, node or bone marrow histology, and immunological scoring system. Results are obtained on directly labeled tumoral cells by flow cytometry gating. Patients included 154 CLL, 2 Richter transformation, and 143 lymphoma (26 follicular, 49 lymphocytic, 18 other low-grade, 7 Waldenström macroglobulinemia, 13 mantel, 11 diffuse large-cell, 6 Burkitt, 4 marginal zone-cell, 5 hairy-cell leukemia, 2 MALT, 1 prolymphocytic leukemia, 1 SLVL). For CD43, FMC7, CD23, CD5, CD79b (% stained cells) and CD20, CD22 surface antigen intensities Chi-Square values indicate very high probability of correct classification (varing from 621 to 94.9; p&lt;0.0000). If, alternatively, % of CD22, CD20, CD19 and intensities of CD79b, CD5, CD19, CD43, CD23 and kappa/lamba chains are employed, Chi-Square yields values of lower significance (varing from 65 to 0.1; p&lt;0.0000 to 0.6573). Using classical panel scoring with CD79b, 82.4 % of patients were correctly classified, compared to 84.5% after replacing CD79b by CD22 intensity. If CD43 is added, correct classification increased to 89.6% and 88.1% of patients, respectively; this improvement is due to better allocation of CLL. In discriminant analysis 91.3% of patients are correctly classified with the panel including CD79b, and 90.9% with CD22 intensity. CD43 enhances the allocation of either one to 94.3%. Using our previous discriminant analysis with CD79b (Jung G, et al. Br J Haematol.2003; 120:496–499), this blind analysis correctly classified the population in 87.1%, compared to 91.3% with the new one. By adding CD43, it moved from 92.4% up to 94.3%. In order to find the optimal combination of the selected best markers, a stepwise probit discrimination was performed. Using CD43 and FMC7 yields a correct classification of 90.3%; after addition of CD5, CD79b, CD23, and CD22 intensity, efficiency increased to 94.6%. Further added markers don’t improve classification. Efficiency of this panel was further confirmed by hierarchical cluster and principal components analysis. Cluster analysis with squared Euclidian distances separated CLL from non-CLL patients with low overlaps: 86.6% of cases are correctly identified. Separated points in the plot representing patients with CLL and non-CLL, obtained by principal components analysis of surface markers, confirm the high predictive potential of this panel. The same analysis of surface marker positions for non-CLL suggests use of: % of CD79b, FMC7, and CD22 intensity, and for CLL: % of CD5, CD23, CD43. So, the addition of CD43 improves as well the discriminant function as the scoring system. Our selected panel of best markers is useful in distinguishing CLL from non-CLL and offers a better distinction by discriminant analysis. Furthermore quantitative expression of each marker and its predictive value improve diagnosis and classification.


Author(s):  
Miguel A. Perez ◽  
Maury Nussbaum

Many biomechanical models used to produce injury risk estimates for the lower trunk require lower trunk muscle forces as inputs. These forces are typically estimated through the use of surface electromyography (sEMG). The variability inherent in sEMG measurements can, and should, be analyzed to determine the possible presence and sources of excessive variation in the data. Principal components analysis (PCA) provides a robust and straightforward method for performing an analysis of the variability of complex sEMG datasets. This paper describes the results obtained from the application of PCA to a dataset consisting of activation levels for several lower trunk muscles. The results demonstrate the value of the technique in identifying clusters of observations in the data and in simplifying the multidimensional dataset. The use of PCA as a hypothesis generation tool is also explored.


Sign in / Sign up

Export Citation Format

Share Document