The use of principal components analysis for the investigation of an organic air pollutants data set

1984 ◽  
Vol 18 (11) ◽  
pp. 2471-2478 ◽  
Author(s):  
J. Smeyers-Verbeke ◽  
J.C. Den Hartog ◽  
W.H. Dehker ◽  
D. Coomans ◽  
L. Buydens ◽  
...  
2013 ◽  
Vol 17 (7) ◽  
pp. 1476-1485 ◽  
Author(s):  
Kate Northstone ◽  
Andrew DAC Smith ◽  
Victoria L Cribb ◽  
Pauline M Emmett

AbstractObjectiveTo derive dietary patterns using principal components analysis from separate FFQ completed by mothers and their teenagers and to assess associations with nutrient intakes and sociodemographic variables.DesignTwo distinct FFQ were completed by 13-year-olds and their mothers, with some overlap in the foods covered. A combined data set was obtained.SettingAvon Longitudinal Study of Parents and Children (ALSPAC), Bristol, UK.SubjectsTeenagers (n 5334) with adequate dietary data.ResultsFour patterns were obtained using principal components analysis: a ‘Traditional/health-conscious’ pattern, a ‘Processed’ pattern, a ‘Snacks/sugared drinks’ pattern and a ‘Vegetarian’ pattern. The ‘Traditional/health-conscious’ pattern was the most nutrient-rich, having high positive correlations with many nutrients. The ‘Processed’ and ‘Snacks/sugared drinks’ patterns showed little association with important nutrients but were positively associated with energy, fats and sugars. There were clear gender and sociodemographic differences across the patterns. Lower scores were seen on the ‘Traditional/health conscious’ and ‘Vegetarian’ patterns in males and in those with younger and less educated mothers. Higher scores were seen on the ‘Traditional/health-conscious’ and ‘Vegetarian’ patterns in girls and in those whose mothers had higher levels of education.ConclusionsIt is important to establish healthy eating patterns by the teenage years. However, this is a time when it is difficult to accurately establish dietary intake from a single source, since teenagers consume increasing amounts of foods outside the home. Further dietary pattern studies should focus on teenagers and the source of dietary data collection merits consideration.


2006 ◽  
Vol 23 (3) ◽  
pp. 106-118 ◽  
Author(s):  
Gordon E. Sarty ◽  
Kinwah Wu

AbstractThe ratios of hydrogen Balmer emission line intensities in cataclysmic variables are signatures of the physical processes that produce them. To quantify those signatures relative to classifications of cataclysmic variable types, we applied the multivariate statistical analysis methods of principal components analysis and discriminant function analysis to the spectroscopic emission data set of Williams (1983). The two analysis methods reveal two different sources of variation in the ratios of the emission lines. The source of variation seen in the principal components analysis was shown to be correlated with the binary orbital period. The source of variation seen in the discriminant function analysis was shown to be correlated with the equivalent width of the Hβ line. Comparison of the data scatterplot with scatterplots of theoretical models shows that Balmer line emission from T CrB systems is consistent with the photoionization of a surrounding nebula. Otherwise, models that we considered do not reproduce the wide range of Balmer decrements, including ‘inverted’ decrements, seen in the data.


1983 ◽  
Vol 40 (10) ◽  
pp. 1752-1760 ◽  
Author(s):  
Michael A. Gates ◽  
Ann P. Zimmerman ◽  
W. Gary Sprules ◽  
Roy Knoechel

We introduce a method, based on principal components analysis, for studying temporal changes in biomass allocation among 16 size–category compartments of lake plankton. Applied to data from a series of 12 Ontario lakes over three sampling seasons, the technique provides a simple means of visualizing shifts in patterns of biomass allocation, and it allows comparative analyses of biomass fluctuations in different lakes. Each of the primary component axes is interpretable. Furthermore, a large proportion of the variance in both the mean position of a lake and its movement along these axes is interpreted as a function of lake physicochemistry. The analysis also provides weighted scores for use in hypothesis testing which are an improvement over mean biomass values alone, because they take into account the structure of variation in the data set.


2013 ◽  
Vol 7 (1) ◽  
pp. 19-24
Author(s):  
Kevin Blighe

Elaborate downstream methods are required to analyze large microarray data-sets. At times, where the end goal is to look for relationships between (or patterns within) different subgroups or even just individual samples, large data-sets must first be filtered using statistical thresholds in order to reduce their overall volume. As an example, in anthropological microarray studies, such ‘dimension reduction’ techniques are essential to elucidate any links between polymorphisms and phenotypes for given populations. In such large data-sets, a subset can first be taken to represent the larger data-set. For example, polling results taken during elections are used to infer the opinions of the population at large. However, what is the best and easiest method of capturing a sub-set of variation in a data-set that can represent the overall portrait of variation? In this article, principal components analysis (PCA) is discussed in detail, including its history, the mathematics behind the process, and in which ways it can be applied to modern large-scale biological datasets. New methods of analysis using PCA are also suggested, with tentative results outlined.


2002 ◽  
Vol 45 (4-5) ◽  
pp. 227-235 ◽  
Author(s):  
J. Lennox ◽  
C. Rosen

Fault detection and isolation (FDI) are important steps in the monitoring and supervision of industrial processes. Biological wastewater treatment (WWT) plants are difficult to model, and hence to monitor, because of the complexity of the biological reactions and because plant influent and disturbances are highly variable and/or unmeasured. Multivariate statistical models have been developed for a wide variety of situations over the past few decades, proving successful in many applications. In this paper we develop a new monitoring algorithm based on Principal Components Analysis (PCA). It can be seen equivalently as making Multiscale PCA (MSPCA) adaptive, or as a multiscale decomposition of adaptive PCA. Adaptive Multiscale PCA (AdMSPCA) exploits the changing multivariate relationships between variables at different time-scales. Adaptation of scale PCA models over time permits them to follow the evolution of the process, inputs or disturbances. Performance of AdMSPCA and adaptive PCA on a real WWT data set is compared and contrasted. The most significant difference observed was the ability of AdMSPCA to adapt to a much wider range of changes. This was mainly due to the flexibility afforded by allowing each scale model to adapt whenever it did not signal an abnormal event at that scale. Relative detection speeds were examined only summarily, but seemed to depend on the characteristics of the faults/disturbances. The results of the algorithms were similar for sudden changes, but AdMSPCA appeared more sensitive to slower changes.


2019 ◽  
Vol 5 ◽  
pp. 237802311881872 ◽  
Author(s):  
Ryan Compton

Sociological research typically involves exploring theoretical relationships, but the emergence of “big data” enables alternative approaches. This work shows the promise of data-driven machine-learning techniques involving feature engineering and predictive model optimization to address a sociological data challenge. The author’s group develops improved generalizable models to identify at-risk families. Principal-components analysis and decision tree modeling are used to predict six main dependent variables in the Fragile Families Challenge, successfully modeling one binary variable but no continuous dependent variables in the diagnostic data set. This indicates that some binary dependent variables are more predictable using a reduced set of uncorrelated independent variables, and continuous dependent variables demand more complexity.


2010 ◽  
Vol 67 (7) ◽  
pp. 1149-1158 ◽  
Author(s):  
Bryan A. Black ◽  
Isaac D. Schroeder ◽  
William J. Sydeman ◽  
Steven J. Bograd ◽  
Peter W. Lawson

Chronologies developed from annual growth-increment widths of splitnose rockfish ( Sebastes pinniger ) and yelloweye rockfish ( Sebastes ruberrimus ) otoliths were compared with time series of lay date and fledgling success for the common murre ( Uria aalge ) and Cassin’s auklet ( Ptychoramphus aleuticus ) in the north-central California Current. All time series were exactly dated and spanned 1972 through 1994. In a principal components analysis, the leading principal component (PC1bio) accounted for 64% of the variance in the data set. By entering the upwelling index, the Northern Oscillation index, sea surface temperatures, and the multivariate ENSO (El Niño Southern Oscillation) index into principal components analysis, a time series of environmental variability PC1env was developed for each month of the year. Over the interval 1972 through 1994, PC1bio most strongly correlated with PC1env for February and, to a lesser extent, January and March. Moreover, when each of the six biological time series was related to the 12 PC1env through stepwise multiple regression, February was always the most significant (p < 0.01). The same was true if upwelling index was substituted for PC1env. As upper-trophic predators, rockfish and seabirds independently corroborate that wintertime ocean conditions are critical for productivity in the California Current ecosystem.


2004 ◽  
Vol 16 (11) ◽  
pp. 2459-2481 ◽  
Author(s):  
Ezequiel López-Rubio ◽  
Juan Miguel Ortiz-de-Lazcano-Lobato ◽  
José Muñoz-Pérez ◽  
José Antonio Gómez-Ruiz

We present a new neural model that extends the classical competitive learning by performing a principal components analysis (PCA) at each neuron. This model represents an improvement with respect to known local PCA methods, because it is not needed to present the entire data set to the network on each computing step. This allows a fast execution while retaining the dimensionality-reduction properties of the PCA. Furthermore, every neuron is able to modify its behavior to adapt to the local dimensionality of the input distribution. Hence, our model has a dimensionality estimation capability. The experimental results we present show the dimensionality-reduction capabilities of the model with multisensor images.


2016 ◽  
Vol 23 (4) ◽  
pp. 621-637
Author(s):  
Marek Błaś ◽  
Żaneta Polkowska ◽  
Vasil Simeonov ◽  
Stefan Tsakovski ◽  
Mieczysław Sobik ◽  
...  

Abstract Snow samples were collected during winter 2011/2012 in three posts in the Western Sudety Mountains (Poland) in 3 consecutive phases of snow cover development, i.e. stabilisation (Feb 1st), growth (Mar 15th) and its ablation (Mar 27th). To maintain a fixed number of samples, each snow profile has been divided into six layers, but hydrochemical indications were made for each 10 cm section of core. The complete data set was subjected in the first run of chemometric data interpretation to Cluster Analysis as well as Principal Components Analysis. Further, Self-Organizing Maps, type of neutral network described by Kohonen were used for visualization and interpretation of large high-dimensional data sets. For each site the hierarchical Ward’s method of linkage, squared Euclidean distance as similarity measure, standardized raw data, cluster significance test according to Sneath’s criterion clustering of the chemical variables was done. Afterwards this grouping of the chemical variables was confirmed by the results from Principal Components Analysis. The major conclusion is that the whole system of three sampling sites four patterns of variable groupings are observed: the first one is related to the mineral salt impact; the second one - with the impact of secondary emissions and organic pollutants; next one - with dissolved matter effect and the last one - with oxidative influence, again with relation to anthropogenic activities like smog, coal burning, traffic etc. It might be also concluded that specificity of the samples is determined by the factors responsible for the data set structure and not by particular individual or time factors.


Sign in / Sign up

Export Citation Format

Share Document