scholarly journals Principal components analysis of employment in Eastern Europe

2006 ◽  
Vol 53 (4) ◽  
pp. 427-437 ◽  
Author(s):  
Mirko Savic

For the last decade, the employment structure is one of the fastest changing areas of Eastern Europe. This paper explores the best methodology to compare the employment situations in the countries of this region. Multivariate statistical analyses are very reliable in portraying the full picture of the problem. Principal components analysis is one of the simplest multivariate methods. It can produce very useful information about Eastern European employment in a very easy and understandable way.

2019 ◽  
Author(s):  
Fred L. Bookstein

AbstractGood empirical applications of geometric morphometrics (GMM) typically involve several times more variables than specimens, a situation the statistician refers to as “highp/n,” wherepis the count of variables andnthe count of specimens. This note calls your attention to two predictable catastrophic failures of one particular multivariate statistical technique, between-groups principal components analysis (bgPCA), in this high-p/nsetting. The more obvious pathology is this: when applied to the patternless (null) model ofpidentically distributed Gaussians over groups of the same size, both bgPCA and its algebraic equivalent, partial least squares (PLS) analysis against group, necessarily generate the appearance of huge equilateral group separations that are actually fictitious (absent from the statistical model). When specimen counts by group vary greatly or when any group includes fewer than about ten specimens, an even worse failure of the technique obtains: the smaller the group, the more likely a bgPCA is to fictitiously identify that group as the end-member of one of its derived axes. For these two reasons, when used in GMM and other high-p/nsettings the bgPCA method very often leads to invalid or insecure bioscientific inferences. This paper demonstrates and quantifies these and other pathological outcomes both for patternless models and for models with one or two valid factors, then offers suggestions for how GMM practitioners should protect themselves against the consequences for inference of these lamentably predictable misrepresentations. The bgPCA method should never be used unskeptically — it is never authoritative — and whenever it appears in partial support of any biological inference it must be accompanied by a wide range of diagnostic plots and other challenges, many of which are presented here for the first time.


2017 ◽  
pp. 660-666
Author(s):  
Ned Kock

Simpson's paradox is a phenomenon arising from multivariate statistical analyses that often leads to paradoxical conclusions in the field of e-collaboration as well as many other fields where multivariate methods are employed. This work derives a general inequality for the occurrence of Simpson's paradox in path models with or without latent variables. The inequality is then used to estimate the probability that Simpson's paradox would occur at random in path models with two predictors and one criterion variable. This probability is found to be approximately 12.8 percent, slightly higher than 1 occurrence per 8 path models. This estimate suggests that Simpson's paradox is likely to occur in empirical studies, in the field of e-collaboration and other fields, frequently enough to be a source of concern.


2015 ◽  
Vol 11 (1) ◽  
pp. 1-7 ◽  
Author(s):  
Ned Kock

Simpson's paradox is a phenomenon arising from multivariate statistical analyses that often leads to paradoxical conclusions in the field of e-collaboration as well as many other fields where multivariate methods are employed. This work derives a general inequality for the occurrence of Simpson's paradox in path models with or without latent variables. The inequality is then used to estimate the probability that Simpson's paradox would occur at random in path models with two predictors and one criterion variable. This probability is found to be approximately 12.8 percent, slightly higher than 1 occurrence per 8 path models. This estimate suggests that Simpson's paradox is likely to occur in empirical studies, in the field of e-collaboration and other fields, frequently enough to be a source of concern.


2002 ◽  
Vol 45 (4-5) ◽  
pp. 227-235 ◽  
Author(s):  
J. Lennox ◽  
C. Rosen

Fault detection and isolation (FDI) are important steps in the monitoring and supervision of industrial processes. Biological wastewater treatment (WWT) plants are difficult to model, and hence to monitor, because of the complexity of the biological reactions and because plant influent and disturbances are highly variable and/or unmeasured. Multivariate statistical models have been developed for a wide variety of situations over the past few decades, proving successful in many applications. In this paper we develop a new monitoring algorithm based on Principal Components Analysis (PCA). It can be seen equivalently as making Multiscale PCA (MSPCA) adaptive, or as a multiscale decomposition of adaptive PCA. Adaptive Multiscale PCA (AdMSPCA) exploits the changing multivariate relationships between variables at different time-scales. Adaptation of scale PCA models over time permits them to follow the evolution of the process, inputs or disturbances. Performance of AdMSPCA and adaptive PCA on a real WWT data set is compared and contrasted. The most significant difference observed was the ability of AdMSPCA to adapt to a much wider range of changes. This was mainly due to the flexibility afforded by allowing each scale model to adapt whenever it did not signal an abnormal event at that scale. Relative detection speeds were examined only summarily, but seemed to depend on the characteristics of the faults/disturbances. The results of the algorithms were similar for sudden changes, but AdMSPCA appeared more sensitive to slower changes.


2020 ◽  
Vol 25 (1-2) ◽  
pp. 35-56
Author(s):  
Vasil Simeonov

Abstract The present introductory course of lectures summarizes the principles and algorithms of several widely used multivariate statistical methods: cluster analysis, principal components analysis, principal components regression, N-way principal components analysis, partial least squares regression and self-organizing maps with respect to their possible application in intelligent analysis, classification, modelling and interpretation to environmental monitoring data. The target group of possible users is master program students (environmental chemistry, analytical chemistry, environmental modelling and risk assessment etc.).


2019 ◽  
Vol 46 (4) ◽  
pp. 271-302 ◽  
Author(s):  
Fred L. Bookstein

Abstract Good empirical applications of geometric morphometrics (GMM) typically involve several times more variables than specimens, a situation the statistician refers to as “high p/n,” where p is the count of variables and n the count of specimens. This note calls your attention to two predictable catastrophic failures of one particular multivariate statistical technique, between-groups principal components analysis (bgPCA), in this high-p/n setting. The more obvious pathology is this: when applied to the patternless (null) model of p identically distributed Gaussians over groups of the same size, both bgPCA and its algebraic equivalent, partial least squares (PLS) analysis against group, necessarily generate the appearance of huge equilateral group separations that are fictitious (absent from the statistical model). When specimen counts by group vary greatly or when any group includes fewer than about ten specimens, an even worse failure of the technique obtains: the smaller the group, the more likely a bgPCA is to fictitiously identify that group as the end-member of one of its derived axes. For these two reasons, when used in GMM and other high-p/n settings the bgPCA method very often leads to invalid or insecure biological inferences. This paper demonstrates and quantifies these and other pathological outcomes both for patternless models and for models with one or two valid factors, then offers suggestions for how GMM practitioners should protect themselves against the consequences for inference of these lamentably predictable misrepresentations. The bgPCA method should never be used unskeptically—it is always untrustworthy, never authoritative—and whenever it appears in partial support of any biological inference it must be accompanied by a wide range of diagnostic plots and other challenges, many of which are presented here for the first time.


1979 ◽  
Vol 57 (8) ◽  
pp. 1693-1709 ◽  
Author(s):  
Bernadette Pinel-Alloul ◽  
Pierre Legendre ◽  
Etienne Magnin

From June through October 1973, 335 samples of limnetic plankton were collected from 46 lakes and 17 rivers of the James Bay area. Sixty zooplanktonic species were identified (20 Copepoda, 27 Cladocera, and 13 Rotifera). The most common and widespread species are cold stenotherms (Leptodiaptomus minutus, Diacyclops bicuspidatus thomasi, Epischura lacustris, Holopedium gibberum, Bosmina longirostris, Daphnia longiremis, and Kellicottia longispina). In order to study the typology of the lake samples, the data were subjected to three types of statistical analyses: principal components analysis, single linkage, and complete linkage clustering. Five groups of lakes emerged from these analyses: types IV and V are located in the northeastern portion of the studied area, whereas types I and II were identified in the western portion, corresponding with the area occupied by the Tyrrell glacial sea. Type III fills an intermediate position. Types II, III, and V are smalt lakes. The characteristic zooplanktonic communities of each group are described, whereas the principal components and the components of the diversity are correlated with the environmental data.


Sign in / Sign up

Export Citation Format

Share Document