Multivariate statistical approaches as applied to environmental physics studies

Open Physics ◽  
2006 ◽  
Vol 4 (2) ◽  
Author(s):  
Vasil Lovchinov ◽  
Stefan Tsakovski

AbstractThe present communication deals with the application of the most important environmetric approaches like cluster analysis, principal components analysis and principal components regression (apportioning models) to environmental systems which are of substantial interest for environmental physics — surface waters, aerosols, and coastal sediments. Using various case studies we identify the latent factors responsible for the data set structure and construct models showing the contribution of each identified source (anthropogenic or natural) to the total measure of the pollution. In this way the information obtained by the monitoring data becomes broader and more intelligent, which help in problem solving in environmental physics.

2002 ◽  
Vol 45 (4-5) ◽  
pp. 227-235 ◽  
Author(s):  
J. Lennox ◽  
C. Rosen

Fault detection and isolation (FDI) are important steps in the monitoring and supervision of industrial processes. Biological wastewater treatment (WWT) plants are difficult to model, and hence to monitor, because of the complexity of the biological reactions and because plant influent and disturbances are highly variable and/or unmeasured. Multivariate statistical models have been developed for a wide variety of situations over the past few decades, proving successful in many applications. In this paper we develop a new monitoring algorithm based on Principal Components Analysis (PCA). It can be seen equivalently as making Multiscale PCA (MSPCA) adaptive, or as a multiscale decomposition of adaptive PCA. Adaptive Multiscale PCA (AdMSPCA) exploits the changing multivariate relationships between variables at different time-scales. Adaptation of scale PCA models over time permits them to follow the evolution of the process, inputs or disturbances. Performance of AdMSPCA and adaptive PCA on a real WWT data set is compared and contrasted. The most significant difference observed was the ability of AdMSPCA to adapt to a much wider range of changes. This was mainly due to the flexibility afforded by allowing each scale model to adapt whenever it did not signal an abnormal event at that scale. Relative detection speeds were examined only summarily, but seemed to depend on the characteristics of the faults/disturbances. The results of the algorithms were similar for sudden changes, but AdMSPCA appeared more sensitive to slower changes.


2005 ◽  
Vol 3 (1) ◽  
pp. 1-9 ◽  
Author(s):  
Vasil Simeonov ◽  
Juergen Einax ◽  
Stafan Tsakovski ◽  
Joerg Kraft

AbstractThis study deals with the application of several multivariate statistical methods (cluster analysis, principal components analysis, multiple regression on absolute principal components scores) for assessment of soil pollution by heavy metals. The sampling was performed in a heavily polluted region and the chemometric analysis revealed four latent factors, which describe 84.5 % of the total variance of the system, responsible for the data structure. These factors, whose identity was proved also by cluster analysis, were conditionally named “ore specific”, “metal industrial”, “cement industrial”, and “steel production” factors. Further, the contribution of each identified factor to the total pollution of the soil by each metal pollutant in consideration was determined.


2006 ◽  
Vol 53 (4) ◽  
pp. 427-437 ◽  
Author(s):  
Mirko Savic

For the last decade, the employment structure is one of the fastest changing areas of Eastern Europe. This paper explores the best methodology to compare the employment situations in the countries of this region. Multivariate statistical analyses are very reliable in portraying the full picture of the problem. Principal components analysis is one of the simplest multivariate methods. It can produce very useful information about Eastern European employment in a very easy and understandable way.


2013 ◽  
Vol 17 (7) ◽  
pp. 1476-1485 ◽  
Author(s):  
Kate Northstone ◽  
Andrew DAC Smith ◽  
Victoria L Cribb ◽  
Pauline M Emmett

AbstractObjectiveTo derive dietary patterns using principal components analysis from separate FFQ completed by mothers and their teenagers and to assess associations with nutrient intakes and sociodemographic variables.DesignTwo distinct FFQ were completed by 13-year-olds and their mothers, with some overlap in the foods covered. A combined data set was obtained.SettingAvon Longitudinal Study of Parents and Children (ALSPAC), Bristol, UK.SubjectsTeenagers (n 5334) with adequate dietary data.ResultsFour patterns were obtained using principal components analysis: a ‘Traditional/health-conscious’ pattern, a ‘Processed’ pattern, a ‘Snacks/sugared drinks’ pattern and a ‘Vegetarian’ pattern. The ‘Traditional/health-conscious’ pattern was the most nutrient-rich, having high positive correlations with many nutrients. The ‘Processed’ and ‘Snacks/sugared drinks’ patterns showed little association with important nutrients but were positively associated with energy, fats and sugars. There were clear gender and sociodemographic differences across the patterns. Lower scores were seen on the ‘Traditional/health conscious’ and ‘Vegetarian’ patterns in males and in those with younger and less educated mothers. Higher scores were seen on the ‘Traditional/health-conscious’ and ‘Vegetarian’ patterns in girls and in those whose mothers had higher levels of education.ConclusionsIt is important to establish healthy eating patterns by the teenage years. However, this is a time when it is difficult to accurately establish dietary intake from a single source, since teenagers consume increasing amounts of foods outside the home. Further dietary pattern studies should focus on teenagers and the source of dietary data collection merits consideration.


Open Medicine ◽  
2012 ◽  
Vol 7 (4) ◽  
pp. 465-474 ◽  
Author(s):  
Ventzislav Bardarov ◽  
Pavlina Simeonova ◽  
Ludmila Neikova ◽  
Krum Bardarov ◽  
Vasil Simeonov ◽  
...  

AbstractAn attempt is made to assess a set of biochemical, kinetic and anthropometric data for patients suffering from alcohol abuse (alcoholics) and healthy patients (non-alcoholics). The main goal is to identify the data set structure, finding groups of similarity among the clinical parameters or among the patients. Multivariate statistical methods (cluster analysis and principal components analysis) were used to assess the data collection. Several significant patterns of related parameters were found to be representative of the role of the liver function, kinetic and anthropometric indicators (conditionally named “liver function factor”, “ethanol metabolism factor”, “body weight factor”, and “acetaldehyde metabolic factor”). An effort is made to connect the role of kinetic parameters for acetaldehyde metabolism with biochemical, ethanol kinetic and anthropometric data in parallel.


1984 ◽  
Vol 18 (11) ◽  
pp. 2471-2478 ◽  
Author(s):  
J. Smeyers-Verbeke ◽  
J.C. Den Hartog ◽  
W.H. Dehker ◽  
D. Coomans ◽  
L. Buydens ◽  
...  

1994 ◽  
Vol 2 (4) ◽  
pp. 185-198 ◽  
Author(s):  
Joseph G. Montalvo ◽  
Steven E. Buco ◽  
Harmon H. Ramey

In Part I of this series, both cotton fibre property and reflectance spectra data on 185 US cottons including four Pimas were analysed by descriptive statistics. In this paper, principal components regression (PCR) models for measuring six properties from the cotton's vis/NIR reflectance spectra are critically examined. These properties are upper-half mean length (UHM), uniformity index (UI), bundle strength (STR), micronaire (MIC) and colour (Rd and +b). The spectra were recorded with a scanning spectrophotometer in the wavelength range from 400 to 2498 nm. A variety of spectral processing options, some of which give improved PCR analysis results, were applied prior to the regressions and allowed for testing of over 100 PCR models. All PCR model results are based on the PRESS statistic by one-out-rotation, a fast approximation of the PRESS statistic (to reduce computer time) or on cluster analysis using separate calibration and validation data sets. The standard error of prediction (SEP) of all the properties except UHM compared well to the reference method precision. The precision of the UHM measure by reflectance spectroscopy was strongly influenced by the sample repack error. The SEP of UHM, UI and STR was improved by excluding the Pimas from the data set.


2006 ◽  
Vol 23 (3) ◽  
pp. 106-118 ◽  
Author(s):  
Gordon E. Sarty ◽  
Kinwah Wu

AbstractThe ratios of hydrogen Balmer emission line intensities in cataclysmic variables are signatures of the physical processes that produce them. To quantify those signatures relative to classifications of cataclysmic variable types, we applied the multivariate statistical analysis methods of principal components analysis and discriminant function analysis to the spectroscopic emission data set of Williams (1983). The two analysis methods reveal two different sources of variation in the ratios of the emission lines. The source of variation seen in the principal components analysis was shown to be correlated with the binary orbital period. The source of variation seen in the discriminant function analysis was shown to be correlated with the equivalent width of the Hβ line. Comparison of the data scatterplot with scatterplots of theoretical models shows that Balmer line emission from T CrB systems is consistent with the photoionization of a surrounding nebula. Otherwise, models that we considered do not reproduce the wide range of Balmer decrements, including ‘inverted’ decrements, seen in the data.


2019 ◽  
Author(s):  
Fred L. Bookstein

AbstractGood empirical applications of geometric morphometrics (GMM) typically involve several times more variables than specimens, a situation the statistician refers to as “highp/n,” wherepis the count of variables andnthe count of specimens. This note calls your attention to two predictable catastrophic failures of one particular multivariate statistical technique, between-groups principal components analysis (bgPCA), in this high-p/nsetting. The more obvious pathology is this: when applied to the patternless (null) model ofpidentically distributed Gaussians over groups of the same size, both bgPCA and its algebraic equivalent, partial least squares (PLS) analysis against group, necessarily generate the appearance of huge equilateral group separations that are actually fictitious (absent from the statistical model). When specimen counts by group vary greatly or when any group includes fewer than about ten specimens, an even worse failure of the technique obtains: the smaller the group, the more likely a bgPCA is to fictitiously identify that group as the end-member of one of its derived axes. For these two reasons, when used in GMM and other high-p/nsettings the bgPCA method very often leads to invalid or insecure bioscientific inferences. This paper demonstrates and quantifies these and other pathological outcomes both for patternless models and for models with one or two valid factors, then offers suggestions for how GMM practitioners should protect themselves against the consequences for inference of these lamentably predictable misrepresentations. The bgPCA method should never be used unskeptically — it is never authoritative — and whenever it appears in partial support of any biological inference it must be accompanied by a wide range of diagnostic plots and other challenges, many of which are presented here for the first time.


2013 ◽  
Vol 43 (1) ◽  
pp. 57-78
Author(s):  
Masakazu Fujiwara ◽  
Tomohiro Minamidani ◽  
Isamu Nagai ◽  
Hirofumi Wakaki

Sign in / Sign up

Export Citation Format

Share Document