scholarly journals Theoretical Analysis of Principal Components in an Umbrella Model of Intraspecific Evolution

2021 ◽  
Author(s):  
Maxime Estavoyer ◽  
Olivier Francois

Principal component analysis (PCA) is one of the most frequently-used approach to describe population structure from multilocus genotype data. Regarding geographic range expansions of modern humans, interpretations of PCA have, however, been questioned, as there is uncertainty about the wave-like patterns that have been observed in principal components. It has indeed been argued that wave-like patterns are mathematical artifacts that arise generally when PCA is applied to data in which genetic differentiation increases with geographic distance. Here, we present an alternative theory for the observation of wave-like patterns in PCA. We study a coalescent model -- the umbrella model -- for the diffusion of genetic variants. The model is based on a hierarchy of splits from an ancestral population without any particular geographical structure. In the umbrella model, splits occur almost continuously in time, giving birth to small daughter populations at a regular pace. Our results provide detailed mathematical descriptions of eigenvalues and eigenvectors for the PCA of sampled genomic sequences under the model. Removing variants uniquely represented in the sample, the PCA eigenvectors are defined as cosine functions of increasing periodicity, reproducing wave-like patterns observed in equilibrium isolation-by-distance models. Including rare variants in the analysis, the eigenvectors corresponding to the largest eigenvalues exhibit complex wave shapes. The accuracy of our predictions is further investigated with coalescent simulations. Our analysis supports the hypothesis that highly structured wave-like patterns could arise from genetic drift only, and may not always be artificial outcomes of spatially structured data. Genomic data related to the peopling of the Americas are reanalyzed in the light of our new theory.

2018 ◽  
Vol 96 (11) ◽  
pp. 1244-1254 ◽  
Author(s):  
Walter H. Smith ◽  
Jessica A. Wooten ◽  
Carlos D. Camp ◽  
Dirk J. Stevenson ◽  
John B. Jensen ◽  
...  

A primary goal of landscape genetics is to elucidate factors associated with genetic structure among populations. Among the important patterns identified have been isolation by distance (IBD), isolation by barrier (IBB), and isolation by environment (IBE). We tested hypotheses relating each of these possible patterns to genetic divergence in the Slimy Salamander (Plethodon glutinosus (Green, 1818)) species complex across the lower Piedmont and Coastal Plain of Georgia, USA, and adjacent areas of South Carolina, USA. We sequenced 2148 total bp, including three regions of the mitochondrial genome and a nuclear intron, and related genetic distance to GIS-derived surrogate variables representing possible IBD (geographic distance), IBE (principal components of 19 climate variables, watershed, and normalized difference vegetation index (NDVI)), and IBB (streams of fourth order and higher). Multiple matrix regression with randomization analysis indicated significant relationships between genetic distance and two principal components of climate, as well as NDVI. These results support roles for environment (IBE) in helping to drive genetic divergence in this group of salamanders. The absence of a significant influence of IBD and IBB was surprising. It is possible that the signal effects of geographic distance and barriers on genetic divergence may have been erased by more recent responses to the environment.


2018 ◽  
Vol 20 (6) ◽  
pp. 2200-2216 ◽  
Author(s):  
Fentaw Abegaz ◽  
Kridsadakorn Chaichoompu ◽  
Emmanuelle Génin ◽  
David W Fardo ◽  
Inke R König ◽  
...  

Abstract Principal components (PCs) are widely used in statistics and refer to a relatively small number of uncorrelated variables derived from an initial pool of variables, while explaining as much of the total variance as possible. Also in statistical genetics, principal component analysis (PCA) is a popular technique. To achieve optimal results, a thorough understanding about the different implementations of PCA is required and their impact on study results, compared to alternative approaches. In this review, we focus on the possibilities, limitations and role of PCs in ancestry prediction, genome-wide association studies, rare variants analyses, imputation strategies, meta-analysis and epistasis detection. We also describe several variations of classic PCA that deserve increased attention in statistical genetics applications.


2006 ◽  
Vol 27 (2) ◽  
pp. 87-92 ◽  
Author(s):  
Willem K.B. Hofstee ◽  
Dick P.H. Barelds ◽  
Jos M.F. Ten Berge

Hofstee and Ten Berge (2004a) have proposed a new look at personality assessment data, based on a bipolar proportional (-1, .. . 0, .. . +1) scale, a corresponding coefficient of raw-scores likeness L = ΢XY/N, and raw-scores principal component analysis. In a normal sample, the approach resulted in a structure dominated by a first principal component, according to which most people are faintly to mildly socially desirable. We hypothesized that a more differentiated structure would arise in a clinical sample. We analyzed the scores of 775 psychiatric clients on the 132 items of the Dutch Personality Questionnaire (NPV). In comparison to a normative sample (N = 3140), the eigenvalue for the first principal component appeared to be 1.7 times as small, indicating that such clients have less personality (social desirability) in common. Still, the match between the structures in the two samples was excellent after oblique rotation of the loadings. We applied the abridged m-dimensional circumplex design, by which persons are typed by their two highest scores on the principal components, to the scores on the first four principal components. We identified five types: Indignant (1-), Resilient (1-2+), Nervous (1-2-), Obsessive-Compulsive (1-3-), and Introverted (1-4-), covering 40% of the psychiatric sample. Some 26% of the individuals had negligible scores on all type vectors. We discuss the potential and the limitations of our approach in a clinical context.


Methodology ◽  
2016 ◽  
Vol 12 (1) ◽  
pp. 11-20 ◽  
Author(s):  
Gregor Sočan

Abstract. When principal component solutions are compared across two groups, a question arises whether the extracted components have the same interpretation in both populations. The problem can be approached by testing null hypotheses stating that the congruence coefficients between pairs of vectors of component loadings are equal to 1. Chan, Leung, Chan, Ho, and Yung (1999) proposed a bootstrap procedure for testing the hypothesis of perfect congruence between vectors of common factor loadings. We demonstrate that the procedure by Chan et al. is both theoretically and empirically inadequate for the application on principal components. We propose a modification of their procedure, which constructs the resampling space according to the characteristics of the principal component model. The results of a simulation study show satisfactory empirical properties of the modified procedure.


2006 ◽  
Vol 1 (1) ◽  
Author(s):  
K. Katayama ◽  
K. Kimijima ◽  
O. Yamanaka ◽  
A. Nagaiwa ◽  
Y. Ono

This paper proposes a method of stormwater inflow prediction using radar rainfall data as the input of the prediction model constructed by system identification. The aim of the proposal is to construct a compact system by reducing the dimension of the input data. In this paper, Principal Component Analysis (PCA), which is widely used as a statistical method for data analysis and compression, is applied to pre-processing radar rainfall data. Then we evaluate the proposed method using the radar rainfall data and the inflow data acquired in a certain combined sewer system. This study reveals that a few principal components of radar rainfall data can be appropriate as the input variables to storm water inflow prediction model. Consequently, we have established a procedure for the stormwater prediction method using a few principal components of radar rainfall data.


2017 ◽  
Vol 921 (3) ◽  
pp. 24-29 ◽  
Author(s):  
S.I. Lesnykh ◽  
A.K. Cherkashin

The proposed procedure of integral mapping is based on calculation of evaluation functions on the integral indicators (II) taking into account the feature of the local geographical environment, when geosystems in the same states in the different environs have various estimates. Calculation of II is realized with application of a Principal Component Analysis for processing of the forest database, allowing to consider in II the weight of each indicator (attribute). The final value of II is equal to a difference of the first (condition of geosystem) and the second (condition of environmental background) principal components. The evaluation functions are calculated on this value for various problems of integral mapping. The environmental factors of variability is excluded from final value of II, therefore there is an opportunity to find the invariant evaluation function and to determine coefficients of this function. Concepts and functions of the theory of reliability for making the evaluation maps of the hazard of functioning and stability of geosystems are used.


2021 ◽  
pp. 000370282098784
Author(s):  
James Renwick Beattie ◽  
Francis Esmonde-White

Spectroscopy rapidly captures a large amount of data that is not directly interpretable. Principal Components Analysis (PCA) is widely used to simplify complex spectral datasets into comprehensible information by identifying recurring patterns in the data with minimal loss of information. The linear algebra underpinning PCA is not well understood by many applied analytical scientists and spectroscopists who use PCA. The meaning of features identified through PCA are often unclear. This manuscript traces the journey of the spectra themselves through the operations behind PCA, with each step illustrated by simulated spectra. PCA relies solely on the information within the spectra, consequently the mathematical model is dependent on the nature of the data itself. The direct links between model and spectra allow concrete spectroscopic explanation of PCA, such the scores representing ‘concentration’ or ‘weights’. The principal components (loadings) are by definition hidden, repeated and uncorrelated spectral shapes that linearly combine to generate the observed spectra. They can be visualized as subtraction spectra between extreme differences within the dataset. Each PC is shown to be a successive refinement of the estimated spectra, improving the fit between PC reconstructed data and the original data. Understanding the data-led development of a PCA model shows how to interpret application specific chemical meaning of the PCA loadings and how to analyze scores. A critical benefit of PCA is its simplicity and the succinctness of its description of a dataset, making it powerful and flexible.


1988 ◽  
Vol 18 (1) ◽  
pp. 211-218 ◽  
Author(s):  
J. L. Vazquez-Barquero ◽  
P. Williams ◽  
J. F. Diez-Manrique ◽  
J. Lequerica ◽  
A. Arenal

SynopsisThe factor structure of the 60-item version of the General Health Questionnaire was explored, using data collected in a community study in a rural area of northern Spain. Six principal components, similar to those previously reported with this instrument, were found to provide a good description of the data structure.The 30-item and 12-item versions of the GHQ were then disembedded from the parent version, and further principal components analyses carried out. Again, the results were similar to previous studies: in each of the three versions analysed here, the two most important components represented a disturbance of mood (‘general dysphoria’)– including aspects of anxiety, depression and irritability– and a disturbance of social performance (‘social function/optimism’).The principal component structure of the GHQ-60 was then utilized to calculate factor scores, and these were compared with PSE ratings using Relative Operating Characteristic (ROC) analysis. While four of the six factors discriminated well (area under the ROC curve 0–75 or more) between PSE ‘cases’ and ‘non-cases’, only one, depressive thoughts, was a good discriminator between depressed and non-depressed PSE ‘cases’.


1995 ◽  
Vol 7 (6) ◽  
pp. 1191-1205 ◽  
Author(s):  
Colin Fyfe

A review is given of a new artificial neural network architecture in which the weights converge to the principal component subspace. The weights learn by only simple Hebbian learning yet require no clipping, normalization or weight decay. The net self-organizes using negative feedback of activation from a set of "interneurons" to the input neurons. By allowing this negative feedback from the interneurons to act on other interneurons we can introduce the necessary asymmetry to cause convergence to the actual principal components. Simulations and analysis confirm such convergence.


2014 ◽  
Vol 926-930 ◽  
pp. 4085-4088
Author(s):  
Chuan Jun Li

This article uses the PCA method (Principal component analysis) to evaluate the level of corporate governance. PCA is used to analyze the correlation among 10 original indicators, and extract some principal components so that most of the information of the original indicators is extracted. The formulation of the index of corporate governance can be got by calculating the weight based on the variance contribution rate of the principal component, which can comprehensively evaluate corporate governance.


Sign in / Sign up

Export Citation Format

Share Document