Elaborate downstream methods are required to analyze large microarray data-sets. At times, where the end goal
is to look for relationships between (or patterns within) different subgroups or even just individual samples, large data-sets
must first be filtered using statistical thresholds in order to reduce their overall volume. As an example, in anthropological
microarray studies, such ‘dimension reduction’ techniques are essential to elucidate any links between polymorphisms and
phenotypes for given populations. In such large data-sets, a subset can first be taken to represent the larger data-set. For
example, polling results taken during elections are used to infer the opinions of the population at large. However, what is
the best and easiest method of capturing a sub-set of variation in a data-set that can represent the overall portrait of
variation?
In this article, principal components analysis (PCA) is discussed in detail, including its history, the mathematics behind
the process, and in which ways it can be applied to modern large-scale biological datasets. New methods of analysis using
PCA are also suggested, with tentative results outlined.