The use of principal components analysis for the investigation of an organic air pollutants data set

AbstractObjectiveTo derive dietary patterns using principal components analysis from separate FFQ completed by mothers and their teenagers and to assess associations with nutrient intakes and sociodemographic variables.DesignTwo distinct FFQ were completed by 13-year-olds and their mothers, with some overlap in the foods covered. A combined data set was obtained.SettingAvon Longitudinal Study of Parents and Children (ALSPAC), Bristol, UK.SubjectsTeenagers (n 5334) with adequate dietary data.ResultsFour patterns were obtained using principal components analysis: a ‘Traditional/health-conscious’ pattern, a ‘Processed’ pattern, a ‘Snacks/sugared drinks’ pattern and a ‘Vegetarian’ pattern. The ‘Traditional/health-conscious’ pattern was the most nutrient-rich, having high positive correlations with many nutrients. The ‘Processed’ and ‘Snacks/sugared drinks’ patterns showed little association with important nutrients but were positively associated with energy, fats and sugars. There were clear gender and sociodemographic differences across the patterns. Lower scores were seen on the ‘Traditional/health conscious’ and ‘Vegetarian’ patterns in males and in those with younger and less educated mothers. Higher scores were seen on the ‘Traditional/health-conscious’ and ‘Vegetarian’ patterns in girls and in those whose mothers had higher levels of education.ConclusionsIt is important to establish healthy eating patterns by the teenage years. However, this is a time when it is difficult to accurately establish dietary intake from a single source, since teenagers consume increasing amounts of foods outside the home. Further dietary pattern studies should focus on teenagers and the source of dietary data collection merits consideration.

Download Full-text

Multivariate Characterization of Hydrogen Balmer Emission in Cataclysmic Variables

Publications of the Astronomical Society of Australia ◽

10.1071/as06011 ◽

2006 ◽

Vol 23 (3) ◽

pp. 106-118 ◽

Cited By ~ 5

Author(s):

Gordon E. Sarty ◽

Kinwah Wu

Keyword(s):

Principal Components Analysis ◽

Discriminant Function ◽

Principal Components ◽

Discriminant Function Analysis ◽

Function Analysis ◽

Cataclysmic Variables ◽

Data Set ◽

Analysis Methods ◽

Wide Range ◽

Components Analysis

AbstractThe ratios of hydrogen Balmer emission line intensities in cataclysmic variables are signatures of the physical processes that produce them. To quantify those signatures relative to classifications of cataclysmic variable types, we applied the multivariate statistical analysis methods of principal components analysis and discriminant function analysis to the spectroscopic emission data set of Williams (1983). The two analysis methods reveal two different sources of variation in the ratios of the emission lines. The source of variation seen in the principal components analysis was shown to be correlated with the binary orbital period. The source of variation seen in the discriminant function analysis was shown to be correlated with the equivalent width of the Hβ line. Comparison of the data scatterplot with scatterplots of theoretical models shows that Balmer line emission from T CrB systems is consistent with the photoionization of a surrounding nebula. Otherwise, models that we considered do not reproduce the wide range of Balmer decrements, including ‘inverted’ decrements, seen in the data.

Download Full-text

Planktonic Biomass Trajectories In Lake Ecosystems

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/f83-204 ◽

1983 ◽

Vol 40 (10) ◽

pp. 1752-1760 ◽

Cited By ~ 3

Author(s):

Michael A. Gates ◽

Ann P. Zimmerman ◽

W. Gary Sprules ◽

Roy Knoechel

Keyword(s):

Hypothesis Testing ◽

Principal Components Analysis ◽

Biomass Allocation ◽

Principal Components ◽

Primary Component ◽

Size Category ◽

Data Set ◽

Lake Ecosystems ◽

The Mean ◽

Components Analysis

We introduce a method, based on principal components analysis, for studying temporal changes in biomass allocation among 16 size–category compartments of lake plankton. Applied to data from a series of 12 Ontario lakes over three sampling seasons, the technique provides a simple means of visualizing shifts in patterns of biomass allocation, and it allows comparative analyses of biomass fluctuations in different lakes. Each of the primary component axes is interpretable. Furthermore, a large proportion of the variance in both the mean position of a lake and its movement along these axes is interpreted as a function of lake physicochemistry. The analysis also provides weighted scores for use in hypothesis testing which are an improvement over mean biomass values alone, because they take into account the structure of variation in the data set.

Download Full-text

Haplotype Classification Using Copy Number Variation and Principal Components Analysis

The Open Bioinformatics Journal ◽

10.2174/1875036201307010019 ◽

2013 ◽

Vol 7 (1) ◽

pp. 19-24

Author(s):

Kevin Blighe

Keyword(s):

Principal Components Analysis ◽

Principal Components ◽

Large Scale ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Reduction Techniques ◽

Number Variation ◽

Components Analysis

Elaborate downstream methods are required to analyze large microarray data-sets. At times, where the end goal is to look for relationships between (or patterns within) different subgroups or even just individual samples, large data-sets must first be filtered using statistical thresholds in order to reduce their overall volume. As an example, in anthropological microarray studies, such ‘dimension reduction’ techniques are essential to elucidate any links between polymorphisms and phenotypes for given populations. In such large data-sets, a subset can first be taken to represent the larger data-set. For example, polling results taken during elections are used to infer the opinions of the population at large. However, what is the best and easiest method of capturing a sub-set of variation in a data-set that can represent the overall portrait of variation? In this article, principal components analysis (PCA) is discussed in detail, including its history, the mathematics behind the process, and in which ways it can be applied to modern large-scale biological datasets. New methods of analysis using PCA are also suggested, with tentative results outlined.

Download Full-text

Adaptive multiscale principal components analysis for online monitoring of wastewater treatment

Water Science & Technology ◽

10.2166/wst.2002.0593 ◽

2002 ◽

Vol 45 (4-5) ◽

pp. 227-235 ◽

Cited By ~ 29

Author(s):

J. Lennox ◽

C. Rosen

Keyword(s):

Wastewater Treatment ◽

Principal Components Analysis ◽

Principal Components ◽

Fault Detection And Isolation ◽

Scale Model ◽

Multivariate Statistical ◽

Data Set ◽

Significant Difference ◽

Components Analysis ◽

Adaptive Pca

Fault detection and isolation (FDI) are important steps in the monitoring and supervision of industrial processes. Biological wastewater treatment (WWT) plants are difficult to model, and hence to monitor, because of the complexity of the biological reactions and because plant influent and disturbances are highly variable and/or unmeasured. Multivariate statistical models have been developed for a wide variety of situations over the past few decades, proving successful in many applications. In this paper we develop a new monitoring algorithm based on Principal Components Analysis (PCA). It can be seen equivalently as making Multiscale PCA (MSPCA) adaptive, or as a multiscale decomposition of adaptive PCA. Adaptive Multiscale PCA (AdMSPCA) exploits the changing multivariate relationships between variables at different time-scales. Adaptation of scale PCA models over time permits them to follow the evolution of the process, inputs or disturbances. Performance of AdMSPCA and adaptive PCA on a real WWT data set is compared and contrasted. The most significant difference observed was the ability of AdMSPCA to adapt to a much wider range of changes. This was mainly due to the flexibility afforded by allowing each scale model to adapt whenever it did not signal an abnormal event at that scale. Relative detection speeds were examined only summarily, but seemed to depend on the characteristics of the faults/disturbances. The results of the algorithms were similar for sudden changes, but AdMSPCA appeared more sensitive to slower changes.

Download Full-text

A Data-Driven Approach to the Fragile Families Challenge: Prediction through Principal-Components Analysis and Random Forests

Socius Sociological Research for a Dynamic World ◽

10.1177/2378023118818720 ◽

2019 ◽

Vol 5 ◽

pp. 237802311881872 ◽

Cited By ~ 1

Author(s):

Ryan Compton

Keyword(s):

Principal Components Analysis ◽

Principal Components ◽

Data Driven ◽

Machine Learning Techniques ◽

Fragile Families ◽

Sociological Research ◽

Data Set ◽

Dependent Variables ◽

Data Driven Approach ◽

Components Analysis

Sociological research typically involves exploring theoretical relationships, but the emergence of “big data” enables alternative approaches. This work shows the promise of data-driven machine-learning techniques involving feature engineering and predictive model optimization to address a sociological data challenge. The author’s group develops improved generalizable models to identify at-risk families. Principal-components analysis and decision tree modeling are used to predict six main dependent variables in the Fragile Families Challenge, successfully modeling one binary variable but no continuous dependent variables in the diagnostic data set. This indicates that some binary dependent variables are more predictable using a reduced set of uncorrelated independent variables, and continuous dependent variables demand more complexity.

Download Full-text

Wintertime ocean conditions synchronize rockfish growth and seabird reproduction in the central California Current ecosystem

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/f10-055 ◽

2010 ◽

Vol 67 (7) ◽

pp. 1149-1158 ◽

Cited By ~ 34

Author(s):

Bryan A. Black ◽

Isaac D. Schroeder ◽

William J. Sydeman ◽

Steven J. Bograd ◽

Peter W. Lawson

Keyword(s):

Time Series ◽

Principal Components Analysis ◽

Principal Components ◽

Southern Oscillation ◽

California Current ◽

Data Set ◽

Central California ◽

Components Analysis ◽

Ocean Conditions ◽

Upwelling Index

Chronologies developed from annual growth-increment widths of splitnose rockfish ( Sebastes pinniger ) and yelloweye rockfish ( Sebastes ruberrimus ) otoliths were compared with time series of lay date and fledgling success for the common murre ( Uria aalge ) and Cassin’s auklet ( Ptychoramphus aleuticus ) in the north-central California Current. All time series were exactly dated and spanned 1972 through 1994. In a principal components analysis, the leading principal component (PC1bio) accounted for 64% of the variance in the data set. By entering the upwelling index, the Northern Oscillation index, sea surface temperatures, and the multivariate ENSO (El Niño Southern Oscillation) index into principal components analysis, a time series of environmental variability PC1env was developed for each month of the year. Over the interval 1972 through 1994, PC1bio most strongly correlated with PC1env for February and, to a lesser extent, January and March. Moreover, when each of the six biological time series was related to the 12 PC1env through stepwise multiple regression, February was always the most significant (p < 0.01). The same was true if upwelling index was substituted for PC1env. As upper-trophic predators, rockfish and seabirds independently corroborate that wintertime ocean conditions are critical for productivity in the California Current ecosystem.

Download Full-text

Principal Components Analysis Competitive Learning

Neural Computation ◽

10.1162/0899766041941880 ◽

2004 ◽

Vol 16 (11) ◽

pp. 2459-2481 ◽

Cited By ~ 12

Author(s):

Ezequiel López-Rubio ◽

Juan Miguel Ortiz-de-Lazcano-Lobato ◽

José Muñoz-Pérez ◽

José Antonio Gómez-Ruiz

Keyword(s):

Dimensionality Reduction ◽

Principal Components Analysis ◽

Principal Components ◽

Neural Model ◽

Competitive Learning ◽

Experimental Results ◽

Data Set ◽

Input Distribution ◽

Local Pca ◽

Components Analysis

We present a new neural model that extends the classical competitive learning by performing a principal components analysis (PCA) at each neuron. This model represents an improvement with respect to known local PCA methods, because it is not needed to present the entire data set to the network on each computing step. This allows a fast execution while retaining the dimensionality-reduction properties of the PCA. Furthermore, every neuron is able to modify its behavior to adapt to the local dimensionality of the input distribution. Hence, our model has a dimensionality estimation capability. The experimental results we present show the dimensionality-reduction capabilities of the model with multisensor images.

Download Full-text

Circular effects in representations of an RNA nucleotides data set in relation with principal components analysis

Chemometrics and Intelligent Laboratory Systems ◽

10.1016/s0169-7439(01)00109-5 ◽

2001 ◽

Vol 56 (2) ◽

pp. 61-71 ◽

Cited By ~ 10

Author(s):

T.H Reijmers ◽

R Wehrens ◽

L.M.C Buydens

Keyword(s):

Principal Components Analysis ◽

Principal Components ◽

Data Set ◽

Components Analysis

Download Full-text

Application of Chemometric Analysis to the Study of Snow at the Sudety Mountains, Poland

Ecological Chemistry and Engineering S ◽

10.1515/eces-2016-0044 ◽

2016 ◽

Vol 23 (4) ◽

pp. 621-637

Author(s):

Marek Błaś ◽

Żaneta Polkowska ◽

Vasil Simeonov ◽

Stefan Tsakovski ◽

Mieczysław Sobik ◽

...

Keyword(s):

Principal Components Analysis ◽

Principal Components ◽

Anthropogenic Activities ◽

Time Factors ◽

Significance Test ◽

Fixed Number ◽

Data Set ◽

Components Analysis ◽

Sudety Mountains ◽

The Impact

Abstract Snow samples were collected during winter 2011/2012 in three posts in the Western Sudety Mountains (Poland) in 3 consecutive phases of snow cover development, i.e. stabilisation (Feb 1st), growth (Mar 15th) and its ablation (Mar 27th). To maintain a fixed number of samples, each snow profile has been divided into six layers, but hydrochemical indications were made for each 10 cm section of core. The complete data set was subjected in the first run of chemometric data interpretation to Cluster Analysis as well as Principal Components Analysis. Further, Self-Organizing Maps, type of neutral network described by Kohonen were used for visualization and interpretation of large high-dimensional data sets. For each site the hierarchical Ward’s method of linkage, squared Euclidean distance as similarity measure, standardized raw data, cluster significance test according to Sneath’s criterion clustering of the chemical variables was done. Afterwards this grouping of the chemical variables was confirmed by the results from Principal Components Analysis. The major conclusion is that the whole system of three sampling sites four patterns of variable groupings are observed: the first one is related to the mineral salt impact; the second one - with the impact of secondary emissions and organic pollutants; next one - with dissolved matter effect and the last one - with oxidative influence, again with relation to anthropogenic activities like smog, coal burning, traffic etc. It might be also concluded that specificity of the samples is determined by the factors responsible for the data set structure and not by particular individual or time factors.

Download Full-text