scholarly journals Introduction to Statistical Methods to Analyze Large Data Sets: Principal Components Analysis

2011 ◽  
Vol 4 (190) ◽  
pp. tr3-tr3 ◽  
Author(s):  
N. R. Clark ◽  
A. Ma'ayan
2013 ◽  
Vol 7 (1) ◽  
pp. 19-24
Author(s):  
Kevin Blighe

Elaborate downstream methods are required to analyze large microarray data-sets. At times, where the end goal is to look for relationships between (or patterns within) different subgroups or even just individual samples, large data-sets must first be filtered using statistical thresholds in order to reduce their overall volume. As an example, in anthropological microarray studies, such ‘dimension reduction’ techniques are essential to elucidate any links between polymorphisms and phenotypes for given populations. In such large data-sets, a subset can first be taken to represent the larger data-set. For example, polling results taken during elections are used to infer the opinions of the population at large. However, what is the best and easiest method of capturing a sub-set of variation in a data-set that can represent the overall portrait of variation? In this article, principal components analysis (PCA) is discussed in detail, including its history, the mathematics behind the process, and in which ways it can be applied to modern large-scale biological datasets. New methods of analysis using PCA are also suggested, with tentative results outlined.


Author(s):  
Thomas W. Shattuck ◽  
James R. Anderson ◽  
Neil W. Tindale ◽  
Peter R. Buseck

Individual particle analysis involves the study of tens of thousands of particles using automated scanning electron microscopy and elemental analysis by energy-dispersive, x-ray emission spectroscopy (EDS). EDS produces large data sets that must be analyzed using multi-variate statistical techniques. A complete study uses cluster analysis, discriminant analysis, and factor or principal components analysis (PCA). The three techniques are used in the study of particles sampled during the FeLine cruise to the mid-Pacific ocean in the summer of 1990. The mid-Pacific aerosol provides information on long range particle transport, iron deposition, sea salt ageing, and halogen chemistry.Aerosol particle data sets suffer from a number of difficulties for pattern recognition using cluster analysis. There is a great disparity in the number of observations per cluster and the range of the variables in each cluster. The variables are not normally distributed, they are subject to considerable experimental error, and many values are zero, because of finite detection limits. Many of the clusters show considerable overlap, because of natural variability, agglomeration, and chemical reactivity.


Author(s):  
Paschalina Ntotsi ◽  
Sofia D Anastasiadou

The current paper analyses two different statistical techniques: i.e., principal components analysis (PCA) and correspondence analysis (L’analysee factorielle des correspondances) (AFC). A survey was carried out using a structured questionnaire for a sample of 135 nurses which studied in the School of Pedagogical and Technological Education (ASPETE) in Greece. Tangibility, Reliability, Responsiveness, Assurance, Empathy and Associability subscales are related to Qualitative Services ASPETE offers. These subscales were measured by 24 items, rated on a seven-point Likert scale. The study focuses on the presentation of the two main types of clustering methods, PCA and AFC. Lee and Lin’s model contains a one-item scale developed to measure overall service quality and a one-item scale for customer satisfaction. The assessment of the students’ satisfaction degree is evaluated based on a seven-step on the Likert scale statement, investigating the extent that the respondents are satisfied from the experience they had with the specific tertiary education organisation (CSF). Keywords: Advanced, statistical, methods, AFC, PCA


2004 ◽  
Vol 12 (5) ◽  
pp. 36-39 ◽  
Author(s):  
Brent Neal ◽  
John C. Russ

Principal components analysis of multivariate data sets is a standard statistical method that was developed in the early halt or the 20th century. It provides researchers with a method for transforming their source data axes into a set of orthogonal principal axes and ranks. The rank for each axis in the principal set represents the significance of that axis as defined by the variance in the data along that axis. Thus, the first principal axis is the one with the greatest amount of scatter in the data and consequently the greatest amount of contrast and information, while the last principal axis represents the least amount of information.


2005 ◽  
Vol 83 (12) ◽  
pp. 1511-1524 ◽  
Author(s):  
Megan K Johnson ◽  
Anthony P Russell ◽  
Aaron M Bauer

The Pachydactylus radiation comprises a diverse group of African gekkonids that exploit a variety of microhabitats and exhibit both climbing and terrestrial locomotion. The phylogeny of this radiation is well supported, making it a promising candidate for the investigation of relationships between limb proportions, ecology, and behaviour. Skeletal and external measurements were recorded for an array of taxa and analyzed using principal components analysis (PCA). The results of the PCAs were further analyzed using phylogenetic statistical methods to ascertain whether climbing and terrestrial species and (or) clades within the radiation differed significantly from each other in limb proportions. Phylogenetically based comparisons revealed that although there is some differentiation between climbing and terrestrial species, this is not a general pattern but is primarily attributable to certain species and clades within the radiation that differ considerably from other members of the group. The results indicate that Chondrodactylus angulifer Peters, 1870 possesses shortened distal phalanges and that Pachydactylus rangei (Andersson, 1908), P. austeni Hewitt, 1923, and the Rhoptropus clade (particularly R. afer Peters, 1869) possess elongated limbs relative to the rest of the radiation. These differences correlate with aspects of the lifestyles of these species, such as increased terrestriality, a reduction or loss of the subdigital adhesive apparatus, digging behaviour, and a transition to diurnality.


1987 ◽  
Vol 16 (2) ◽  
pp. 179-204 ◽  
Author(s):  
Barbara Horvath ◽  
David Sankoff

ABSTRACTQuantitative analyses of large data sets make use of both linguistic and sociological categories in sociolinguistic studies. While the linguistic categories are generally well-defined and there are sufficient tokens for further definition based on mathematical manipulation, the social characteristics such as socioeconomic class or ethnicity are neither. The familiar problem of grouping speakers by such sociological characteristics prior to quantitative analysis is addressed and an alternative solution – principal components analysis – is suggested. Principal components analysis is used here as a heuristic for grouping speakers solely on the basis of linguistic behaviour; the groups thus defined can then be described according to sociological characteristics. In addition, by naming the principal components, the major linguistic and social dimensions of the variation in the data can be identified. Principal components analysis was applied to vowel variation data collected as part of a sociolinguistic survey of English in Sydney, New South Wales, Australia. (Sociolinguistics, variation studies, quantitative methods in linguistics, dialectology, Australian English, role of migrants in language change)


Sign in / Sign up

Export Citation Format

Share Document