Theoretical Analysis of Principal Components in an Umbrella Model of Intraspecific Evolution

Mapping Intimacies ◽

10.1101/2021.11.28.470252 ◽

2021 ◽

Author(s):

Maxime Estavoyer ◽

Olivier Francois

Keyword(s):

Principal Components ◽

Rare Variants ◽

Isolation By Distance ◽

Geographic Distance ◽

Principal Component ◽

Ancestral Population ◽

Alternative Theory ◽

Multilocus Genotype ◽

Coalescent Simulations ◽

Largest Eigenvalues

Principal component analysis (PCA) is one of the most frequently-used approach to describe population structure from multilocus genotype data. Regarding geographic range expansions of modern humans, interpretations of PCA have, however, been questioned, as there is uncertainty about the wave-like patterns that have been observed in principal components. It has indeed been argued that wave-like patterns are mathematical artifacts that arise generally when PCA is applied to data in which genetic differentiation increases with geographic distance. Here, we present an alternative theory for the observation of wave-like patterns in PCA. We study a coalescent model -- the umbrella model -- for the diffusion of genetic variants. The model is based on a hierarchy of splits from an ancestral population without any particular geographical structure. In the umbrella model, splits occur almost continuously in time, giving birth to small daughter populations at a regular pace. Our results provide detailed mathematical descriptions of eigenvalues and eigenvectors for the PCA of sampled genomic sequences under the model. Removing variants uniquely represented in the sample, the PCA eigenvectors are defined as cosine functions of increasing periodicity, reproducing wave-like patterns observed in equilibrium isolation-by-distance models. Including rare variants in the analysis, the eigenvectors corresponding to the largest eigenvalues exhibit complex wave shapes. The accuracy of our predictions is further investigated with coalescent simulations. Our analysis supports the hypothesis that highly structured wave-like patterns could arise from genetic drift only, and may not always be artificial outcomes of spatially structured data. Genomic data related to the peopling of the Americas are reanalyzed in the light of our new theory.

Download Full-text

Genetic divergence correlates with the contemporary landscape in populations of Slimy Salamander (Plethodon glutinosus) species complex across the lower Piedmont and Coastal Plain of the southeastern United States

Canadian Journal of Zoology ◽

10.1139/cjz-2018-0050 ◽

2018 ◽

Vol 96 (11) ◽

pp. 1244-1254 ◽

Cited By ~ 1

Author(s):

Walter H. Smith ◽

Jessica A. Wooten ◽

Carlos D. Camp ◽

Dirk J. Stevenson ◽

John B. Jensen ◽

...

Keyword(s):

South Carolina ◽

Genetic Distance ◽

Principal Components ◽

Coastal Plain ◽

Genetic Divergence ◽

Species Complex ◽

Isolation By Distance ◽

Geographic Distance ◽

Plethodon Glutinosus ◽

Isolation By Environment

A primary goal of landscape genetics is to elucidate factors associated with genetic structure among populations. Among the important patterns identified have been isolation by distance (IBD), isolation by barrier (IBB), and isolation by environment (IBE). We tested hypotheses relating each of these possible patterns to genetic divergence in the Slimy Salamander (Plethodon glutinosus (Green, 1818)) species complex across the lower Piedmont and Coastal Plain of Georgia, USA, and adjacent areas of South Carolina, USA. We sequenced 2148 total bp, including three regions of the mitochondrial genome and a nuclear intron, and related genetic distance to GIS-derived surrogate variables representing possible IBD (geographic distance), IBE (principal components of 19 climate variables, watershed, and normalized difference vegetation index (NDVI)), and IBB (streams of fourth order and higher). Multiple matrix regression with randomization analysis indicated significant relationships between genetic distance and two principal components of climate, as well as NDVI. These results support roles for environment (IBE) in helping to drive genetic divergence in this group of salamanders. The absence of a significant influence of IBD and IBB was surprising. It is possible that the signal effects of geographic distance and barriers on genetic divergence may have been erased by more recent responses to the environment.

Download Full-text

Principals about principal components in statistical genetics

Briefings in Bioinformatics ◽

10.1093/bib/bby081 ◽

2018 ◽

Vol 20 (6) ◽

pp. 2200-2216 ◽

Cited By ~ 4

Author(s):

Fentaw Abegaz ◽

Kridsadakorn Chaichoompu ◽

Emmanuelle Génin ◽

David W Fardo ◽

Inke R König ◽

...

Keyword(s):

Principal Components ◽

Rare Variants ◽

Association Studies ◽

Meta Analysis ◽

Principal Component ◽

Statistical Genetics ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Study Results

Abstract Principal components (PCs) are widely used in statistics and refer to a relatively small number of uncorrelated variables derived from an initial pool of variables, while explaining as much of the total variance as possible. Also in statistical genetics, principal component analysis (PCA) is a popular technique. To achieve optimal results, a thorough understanding about the different implementations of PCA is required and their impact on study results, compared to alternative approaches. In this review, we focus on the possibilities, limitations and role of PCs in ancestry prediction, genome-wide association studies, rare variants analyses, imputation strategies, meta-analysis and epistasis detection. We also describe several variations of classic PCA that deserve increased attention in statistical genetics applications.

Download Full-text

Structuring Assessments of Psychopathology

Journal of Individual Differences ◽

10.1027/1614-0001.27.2.87 ◽

2006 ◽

Vol 27 (2) ◽

pp. 87-92 ◽

Cited By ~ 2

Author(s):

Willem K.B. Hofstee ◽

Dick P.H. Barelds ◽

Jos M.F. Ten Berge

Keyword(s):

Principal Components ◽

Personality Assessment ◽

Clinical Sample ◽

Principal Component ◽

Normal Sample ◽

Normative Sample ◽

Assessment Data ◽

Obsessive Compulsive ◽

Oblique Rotation ◽

Two Samples

Hofstee and Ten Berge (2004a) have proposed a new look at personality assessment data, based on a bipolar proportional (-1, .. . 0, .. . +1) scale, a corresponding coefficient of raw-scores likeness L = ΢XY/N, and raw-scores principal component analysis. In a normal sample, the approach resulted in a structure dominated by a first principal component, according to which most people are faintly to mildly socially desirable. We hypothesized that a more differentiated structure would arise in a clinical sample. We analyzed the scores of 775 psychiatric clients on the 132 items of the Dutch Personality Questionnaire (NPV). In comparison to a normative sample (N = 3140), the eigenvalue for the first principal component appeared to be 1.7 times as small, indicating that such clients have less personality (social desirability) in common. Still, the match between the structures in the two samples was excellent after oblique rotation of the loadings. We applied the abridged m-dimensional circumplex design, by which persons are typed by their two highest scores on the principal components, to the scores on the first four principal components. We identified five types: Indignant (1-), Resilient (1-2+), Nervous (1-2-), Obsessive-Compulsive (1-3-), and Introverted (1-4-), covering 40% of the psychiatric sample. Some 26% of the individuals had negligible scores on all type vectors. We discuss the potential and the limitations of our approach in a clinical context.

Download Full-text

Comparison of Principal Component Solutions in Two Populations

Methodology ◽

10.1027/1614-2241/a000099 ◽

2016 ◽

Vol 12 (1) ◽

pp. 11-20 ◽

Cited By ~ 1

Author(s):

Gregor Sočan

Keyword(s):

Simulation Study ◽

Principal Components ◽

Common Factor ◽

Principal Component ◽

Component Model ◽

Bootstrap Procedure ◽

Factor Loadings ◽

Component Loadings ◽

Principal Component Model ◽

Two Populations

Abstract. When principal component solutions are compared across two groups, a question arises whether the extracted components have the same interpretation in both populations. The problem can be approached by testing null hypotheses stating that the congruence coefficients between pairs of vectors of component loadings are equal to 1. Chan, Leung, Chan, Ho, and Yung (1999) proposed a bootstrap procedure for testing the hypothesis of perfect congruence between vectors of common factor loadings. We demonstrate that the procedure by Chan et al. is both theoretically and empirically inadequate for the application on principal components. We propose a modification of their procedure, which constructs the resampling space according to the characteristics of the principal component model. The results of a simulation study show satisfactory empirical properties of the modified procedure.

Download Full-text

Stormwater inflow prediction using radar rainfall data compressed by principal component analysis

Water Practice & Technology ◽

10.2166/wpt.2006.017 ◽

2006 ◽

Vol 1 (1) ◽

Author(s):

K. Katayama ◽

K. Kimijima ◽

O. Yamanaka ◽

A. Nagaiwa ◽

Y. Ono

Keyword(s):

Principal Component Analysis ◽

Prediction Model ◽

Principal Components ◽

Prediction Method ◽

Principal Component ◽

Component Analysis ◽

Rainfall Data ◽

Radar Rainfall ◽

Input Variables ◽

Inflow Prediction

This paper proposes a method of stormwater inflow prediction using radar rainfall data as the input of the prediction model constructed by system identification. The aim of the proposal is to construct a compact system by reducing the dimension of the input data. In this paper, Principal Component Analysis (PCA), which is widely used as a statistical method for data analysis and compression, is applied to pre-processing radar rainfall data. Then we evaluate the proposed method using the radar rainfall data and the inflow data acquired in a certain combined sewer system. This study reveals that a few principal components of radar rainfall data can be appropriate as the input variables to storm water inflow prediction model. Consequently, we have established a procedure for the stormwater prediction method using a few principal components of radar rainfall data.

Download Full-text

Evaluation functions for integral mapping

Geodesy and Cartography ◽

10.22389/0016-7126-2017-921-3-24-29 ◽

2017 ◽

Vol 921 (3) ◽

pp. 24-29 ◽

Cited By ~ 2

Author(s):

S.I. Lesnykh ◽

A.K. Cherkashin

Keyword(s):

Principal Component Analysis ◽

Environmental Factors ◽

Principal Components ◽

Principal Component ◽

Evaluation Function ◽

Final Value ◽

Evaluation Functions ◽

Geographical Environment ◽

Environmental Background ◽

Integral Mapping

The proposed procedure of integral mapping is based on calculation of evaluation functions on the integral indicators (II) taking into account the feature of the local geographical environment, when geosystems in the same states in the different environs have various estimates. Calculation of II is realized with application of a Principal Component Analysis for processing of the forest database, allowing to consider in II the weight of each indicator (attribute). The final value of II is equal to a difference of the first (condition of geosystem) and the second (condition of environmental background) principal components. The evaluation functions are calculated on this value for various problems of integral mapping. The environmental factors of variability is excluded from final value of II, therefore there is an opportunity to find the invariant evaluation function and to determine coefficients of this function. Concepts and functions of the theory of reliability for making the evaluation maps of the hazard of functioning and stability of geosystems are used.

Download Full-text

EXPRESS: Exploration of Principal Component Analysis: Deriving PCA Visually Using Spectra

Applied Spectroscopy ◽

10.1177/0003702820987847 ◽

2021 ◽

pp. 000370282098784

Author(s):

James Renwick Beattie ◽

Francis Esmonde-White

Keyword(s):

Principal Components ◽

Principal Component ◽

Original Data ◽

Specific Chemical ◽

Successive Refinement ◽

Minimal Loss ◽

Components Analysis ◽

The Mathematical Model ◽

Application Specific ◽

Reconstructed Data

Spectroscopy rapidly captures a large amount of data that is not directly interpretable. Principal Components Analysis (PCA) is widely used to simplify complex spectral datasets into comprehensible information by identifying recurring patterns in the data with minimal loss of information. The linear algebra underpinning PCA is not well understood by many applied analytical scientists and spectroscopists who use PCA. The meaning of features identified through PCA are often unclear. This manuscript traces the journey of the spectra themselves through the operations behind PCA, with each step illustrated by simulated spectra. PCA relies solely on the information within the spectra, consequently the mathematical model is dependent on the nature of the data itself. The direct links between model and spectra allow concrete spectroscopic explanation of PCA, such the scores representing âconcentrationâ or âweightsâ. The principal components (loadings) are by definition hidden, repeated and uncorrelated spectral shapes that linearly combine to generate the observed spectra. They can be visualized as subtraction spectra between extreme differences within the dataset. Each PC is shown to be a successive refinement of the estimated spectra, improving the fit between PC reconstructed data and the original data. Understanding the data-led development of a PCA model shows how to interpret application specific chemical meaning of the PCA loadings and how to analyze scores. A critical benefit of PCA is its simplicity and the succinctness of its description of a dataset, making it powerful and flexible.

Download Full-text

The factor structure of the GHQ-60 in a community sample

Psychological Medicine ◽

10.1017/s0033291700002038 ◽

1988 ◽

Vol 18 (1) ◽

pp. 211-218 ◽

Cited By ~ 25

Author(s):

J. L. Vazquez-Barquero ◽

P. Williams ◽

J. F. Diez-Manrique ◽

J. Lequerica ◽

A. Arenal

Keyword(s):

Factor Structure ◽

Principal Components ◽

Good Description ◽

Social Performance ◽

Community Sample ◽

General Health Questionnaire ◽

Principal Component ◽

Northern Spain ◽

Relative Operating Characteristic ◽

Component Structure

SynopsisThe factor structure of the 60-item version of the General Health Questionnaire was explored, using data collected in a community study in a rural area of northern Spain. Six principal components, similar to those previously reported with this instrument, were found to provide a good description of the data structure.The 30-item and 12-item versions of the GHQ were then disembedded from the parent version, and further principal components analyses carried out. Again, the results were similar to previous studies: in each of the three versions analysed here, the two most important components represented a disturbance of mood (‘general dysphoria’)– including aspects of anxiety, depression and irritability– and a disturbance of social performance (‘social function/optimism’).The principal component structure of the GHQ-60 was then utilized to calculate factor scores, and these were compared with PSE ratings using Relative Operating Characteristic (ROC) analysis. While four of the six factors discriminated well (area under the ROC curve 0–75 or more) between PSE ‘cases’ and ‘non-cases’, only one, depressive thoughts, was a good discriminator between depressed and non-depressed PSE ‘cases’.

Download Full-text

Introducing Asymmetry into Interneuron Learning

Neural Computation ◽

10.1162/neco.1995.7.6.1191 ◽

1995 ◽

Vol 7 (6) ◽

pp. 1191-1205 ◽

Cited By ~ 16

Author(s):

Colin Fyfe

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Negative Feedback ◽

Principal Components ◽

Network Architecture ◽

Hebbian Learning ◽

Principal Component ◽

Neural Network Architecture ◽

Weight Decay ◽

Artificial Neural Network Architecture

A review is given of a new artificial neural network architecture in which the weights converge to the principal component subspace. The weights learn by only simple Hebbian learning yet require no clipping, normalization or weight decay. The net self-organizes using negative feedback of activation from a set of "interneurons" to the input neurons. By allowing this negative feedback from the interneurons to act on other interneurons we can introduce the necessary asymmetry to cause convergence to the actual principal components. Simulations and analysis confirm such convergence.

Download Full-text

The Design of Index about Corporate Governance Based on PCA Method

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.926-930.4085 ◽

2014 ◽

Vol 926-930 ◽

pp. 4085-4088

Author(s):

Chuan Jun Li

Keyword(s):

Principal Component Analysis ◽

Corporate Governance ◽

Principal Components ◽

Principal Component ◽

Component Analysis ◽

Contribution Rate ◽

Variance Contribution ◽

Pca Method

This article uses the PCA method (Principal component analysis) to evaluate the level of corporate governance. PCA is used to analyze the correlation among 10 original indicators, and extract some principal components so that most of the information of the original indicators is extracted. The formulation of the index of corporate governance can be got by calculating the weight based on the variance contribution rate of the principal component, which can comprehensively evaluate corporate governance.

Download Full-text