Cross-validation, influential observations and selection of variables in chemometric studies of wines by principal components analysis

1990 ◽  
Vol 4 (3) ◽  
pp. 217-240 ◽  
Author(s):  
Giuseppe Scarponi ◽  
Ivo Moret ◽  
Gabriele Capodaglio ◽  
Mario Romanazzi
2020 ◽  
Vol 2 (1) ◽  
Author(s):  
Marta M Correia ◽  
Timothy Rittman ◽  
Christopher L Barnes ◽  
Ian T Coyle-Gilchrist ◽  
Boyd Ghosh ◽  
...  

Abstract The early and accurate differential diagnosis of parkinsonian disorders is still a significant challenge for clinicians. In recent years, a number of studies have used magnetic resonance imaging data combined with machine learning and statistical classifiers to successfully differentiate between different forms of Parkinsonism. However, several questions and methodological issues remain, to minimize bias and artefact-driven classification. In this study, we compared different approaches for feature selection, as well as different magnetic resonance imaging modalities, with well-matched patient groups and tightly controlling for data quality issues related to patient motion. Our sample was drawn from a cohort of 69 healthy controls, and patients with idiopathic Parkinson’s disease (n = 35), progressive supranuclear palsy Richardson’s syndrome (n = 52) and corticobasal syndrome (n = 36). Participants underwent standardized T1-weighted and diffusion-weighted magnetic resonance imaging. Strict data quality control and group matching reduced the control and patient numbers to 43, 32, 33 and 26, respectively. We compared two different methods for feature selection and dimensionality reduction: whole-brain principal components analysis, and an anatomical region-of-interest based approach. In both cases, support vector machines were used to construct a statistical model for pairwise classification of healthy controls and patients. The accuracy of each model was estimated using a leave-two-out cross-validation approach, as well as an independent validation using a different set of subjects. Our cross-validation results suggest that using principal components analysis for feature extraction provides higher classification accuracies when compared to a region-of-interest based approach. However, the differences between the two feature extraction methods were significantly reduced when an independent sample was used for validation, suggesting that the principal components analysis approach may be more vulnerable to overfitting with cross-validation. Both T1-weighted and diffusion magnetic resonance imaging data could be used to successfully differentiate between subject groups, with neither modality outperforming the other across all pairwise comparisons in the cross-validation analysis. However, features obtained from diffusion magnetic resonance imaging data resulted in significantly higher classification accuracies when an independent validation cohort was used. Overall, our results support the use of statistical classification approaches for differential diagnosis of parkinsonian disorders. However, classification accuracy can be affected by group size, age, sex and movement artefacts. With appropriate controls and out-of-sample cross validation, diagnostic biomarker evaluation including magnetic resonance imaging based classifiers may be an important adjunct to clinical evaluation.


Filomat ◽  
2018 ◽  
Vol 32 (5) ◽  
pp. 1499-1506 ◽  
Author(s):  
Yangwu Zhang ◽  
Guohe Li ◽  
Heng Zong

Dimensionality reduction, including feature extraction and selection, is one of the key points for text classification. In this paper, we propose a mixed method of dimensionality reduction constructed by principal components analysis and the selection of components. Principal components analysis is a method of feature extraction. Not all of the components in principal component analysis contribute to classification, because PCA objective is not a form of discriminant analysis (see, e.g. Jolliffe, 2002). In this context, we present a function of components selection, which returns the useful components for classification by the indicators of the performances on the different subsets of the components. Compared to traditional methods of feature selection, SVM classifiers trained on selected components show improved classification performance and a reduction in computational overhead.


2006 ◽  
Vol 6 (3) ◽  
pp. 227-238 ◽  
Author(s):  
Antonia Correia ◽  
Pedro Pintassilgo

The purpose of this article is to investigate the motivations behind golf demand in the Algarve — one of Europe's most popular golf destinations. The research is based on the results of a survey on the golf demand of Algarve's golf courses, held in 2002. In order to identify the main motives behind golf demand in the region, a principal components analysis was performed. Four main choice factors were identified to explain the selection of Algarve's golf courses. The first was designated social environment and is associated with motives such as events and beaches. The second, leisure, is related to restaurants and bars, landscape, weather and accommodation. The third, entitled golf, is directly related to characteristics of courses. The fourth, logistics, is associated with variables such as price and accessibility. It is also found, through a cluster analysis that the choice factors can be associated with three market segments: the tourist golfer, who is mostly concerned with the golf courses and the game; the householder golfer, essentially centred on accommodation, gastronomy, landscape, weather, price and accessibility; and finally, the sun-beach tourist, who is mostly interested in tourist opportunities.


2019 ◽  
Author(s):  
Marta M Correia ◽  
Tim Rittman ◽  
Christopher L Barnes ◽  
Ian T Coyle-Gilchrist ◽  
Boyd Ghosh ◽  
...  

AbstractThe early and accurate differential diagnosis of parkinsonian disorders is still a significant challenge for clinicians. In recent years, a number of studies have used MRI data combined with machine learning and statistical classifiers to successfully differentiate between different forms of Parkinsonism. However, several questions and methodological issues remain, to minimise bias and artefact-driven classification. In this study we compared different approaches for feature selection, as well as different MRI modalities, with well matched patient groups and tightly controlling for data quality issues related to patient motion.Our sample was drawn from a cohort of 69 healthy controls, and patients with idiopathic Parkinson’s disease (n=35, PD), Progressive Supranuclear Palsy Richardson’s syndrome (n=52, PSP) and corticobasal syndrome (n=36, CBS). Participants underwent standardised T1-weighted MPRAGE and diffusion-weighted MRI. We compared two different methods for feature selection and dimensionality reduction: whole-brain principal components analysis, and an anatomical region-of-interest based approach. In both cases, support vector machines were used to construct a statistical model for pairwise classification of healthy controls and patients. The accuracy of each model was estimated using a leave-two-out cross-validation approach, as well as an independent validation using a different set of subjects.Our cross-validation results suggest that using principal components analysis (PCA) for feature extraction provides higher classification accuracies when compared to a region-of-interest based approach. However, the differences between the two feature extraction methods were significantly reduced when an independent sample was used for validation, suggesting that the principal components analysis approach may be more vulnerable to overfitting with cross-validation. Both T1-weighted and diffusion MRI data could be used to successfully differentiate between subject groups, with neither modality outperforming the other across all pairwise comparisons in the cross-validation analysis. However, features obtained from diffusion MRI data resulted in significantly higher classification accuracies when an independent validation cohort was used.Overall, our results support the use of statistical classification approaches for differential diagnosis of parkinsonian disorders. However, classification accuracy can be affected by group size, age, sex and movement artifacts. With appropriate controls and out-of-sample cross validation, diagnostic biomarker evaluation including MRI based classifiers can be an important adjunct to clinical evaluation.


2011 ◽  
Vol 175-176 ◽  
pp. 993-998
Author(s):  
Xiu Qin Hu ◽  
Yan Chen

The shape is an important factor to form the garment style and manifest its beauty. It is affected not only by the style and design but also by the fabric properties. A systematic experiment was carried out by selecting twenty commonly used suit fabrics and making use of KES fabrics instrument. The main indexes were drawn by Principal Components Analysis (PCA), which mainly influenced the men’s suits shape. The obtained results will surely help to make the selection of fabric easier, provide the reference to the men’s suit design and manufacture, and thus produce high-quality men’s suits.


Sign in / Sign up

Export Citation Format

Share Document