scholarly journals Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Sridevi Padakanti ◽  
Khong-Loon Tiong ◽  
Yan-Bin Chen ◽  
Chen-Hsiang Yeang

AbstractPrincipal Component Analysis (PCA) projects high-dimensional genotype data into a few components that discern populations. Ancestry Informative Markers (AIMs) are a small subset of SNPs capable of distinguishing populations. We integrate these two approaches by proposing an algorithm to identify necessary informative loci whose removal from the data deteriorates the PCA structure. Unlike classical AIMs, necessary informative loci densely cover the genome, hence can illuminate the evolution and mixing history of populations. We conduct a comprehensive analysis to the genotype data of the 1000 Genomes Project using necessary informative loci. Projections along the top seven principal components demarcate populations at distinct geographic levels. Millions of necessary informative loci along each PC are identified. Population identities along each PC are approximately determined by weighted sums of minor (or major) alleles over the informative loci. Variations of allele frequencies are aligned with the history and direction of population evolution. The population distribution of projections along the top three PCs is recapitulated by a simple demographic model based on several waves of founder population separation and mixing. Informative loci possess locational concentration in the genome and functional enrichment. Genes at two hot spots encompassing dense PC 7 informative loci exhibit differential expressions among European populations. The mosaic of local ancestry in the genome of a mixed descendant from multiple populations can be inferred from partial PCA projections of informative loci. Finally, informative loci derived from the 1000 Genomes data well predict the projections of an independent genotype data of South Asians. These results demonstrate the utility and relevance of informative loci to investigate human evolution.

Author(s):  
Malay Banerjee ◽  
Sergei V. Petrovskii ◽  
Vitaly Volpert

Dynamics of human populations can be affected by various socio-economic factors through their influence on the natality and mortality rates, and on the migration intensity and directions. In this work we study an economic-demographic model which takes into account the dependence of the wealth production rate on the available resources. In the case of nonlocal consumption of resources the homogeneous in space wealth-population distribution is replaced by a periodic in space distribution for which the total wealth increases. For the global consumption of resources, if the wealth redistribution is small enough, then the homogeneous distribution is replaced by a heterogeneous one with a single wealth accumulation center. Thus, economic and demographic characteristics of nonlocal and global economies can be quite different in comparison with the local economy.


Mathematics ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 351
Author(s):  
Malay Banerjee ◽  
Sergei V. Petrovskii ◽  
Vitaly Volpert

Dynamics of human populations can be affected by various socio-economic factors through their influence on the natality and mortality rates, and on the migration intensity and directions. In this work we study an economic–demographic model which takes into account the dependence of the wealth production rate on the available resources. In the case of nonlocal consumption of resources, the homogeneous-in-space wealth–population distribution is replaced by a periodic-in-space distribution for which the total wealth increases. For the global consumption of resources, if the wealth redistribution is small enough, then the homogeneous distribution is replaced by a heterogeneous one with a single wealth accumulation center. Thus, economic and demographic characteristics of nonlocal and global economies can be quite different in comparison with the local economy.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Andrew J. Pakstis ◽  
William C. Speed ◽  
Usha Soundararajan ◽  
Haseena Rajeevan ◽  
Judith R. Kidd ◽  
...  

AbstractThe benefits of ancestry informative SNP (AISNP) panels can best accrue and be properly evaluated only as sufficient reference population data become readily accessible. Ideally the set of reference populations should approximate the genetic diversity of human populations worldwide. The Kidd and Seldin AISNP sets are two panels that have separately accumulated thus far the largest and most diverse collections of data on human reference populations from the major continental regions. A recent tally in the ALFRED allele frequency database finds 164 reference populations available for all the 55 Kidd AISNPs and 132 reference populations for all the 128 Seldin AISNPs. Although much more of the genetic diversity in human populations around the world still needs to be documented, 81 populations have genotype data available for all 170 AISNPs in the union of the Kidd and Seldin panels. In this report we examine admixture and principal component analyses on these 81 worldwide populations and some regional subsets of these reference populations to determine how well the combined panel illuminates population relationships. Analyses of this dataset that focused on Native American populations revealed very strong cluster patterns associated with many of the individual populations studied.


Author(s):  
Timothy Jinam ◽  
Yosuke Kawai ◽  
Yoichiro Kamatani ◽  
Shunro Sonoda ◽  
Kanro Makisumi ◽  
...  

AbstractThe “Dual Structure” model on the formation of the modern Japanese population assumes that the indigenous hunter-gathering population (symbolized as Jomon people) admixed with rice-farming population (symbolized as Yayoi people) who migrated from the Asian continent after the Yayoi period started. The Jomon component remained high both in Ainu and Okinawa people who mainly reside in northern and southern Japan, respectively, while the Yayoi component is higher in the mainland Japanese (Yamato people). The model has been well supported by genetic data, but the Yamato population was mostly represented by people from Tokyo area. We generated new genome-wide SNP data using Japonica Array for 45 individuals in Izumo City of Shimane Prefecture and for 72 individuals in Makurazaki City of Kagoshima Prefecture in Southern Kyushu, and compared these data with those of other human populations in East Asia, including BioBank Japan data. Using principal component analysis, phylogenetic network, and f4 tests, we found that Izumo, Makurazaki, and Tohoku populations are slightly differentiated from Kanto (including Tokyo), Tokai, and Kinki regions. These results suggest the substructure within Mainland Japanese maybe caused by multiple migration events from the Asian continent following the Jomon period, and we propose a modified version of “Dual Structure” model called the “Inner-Dual Structure” model.


2019 ◽  
Vol 14 (6) ◽  
pp. 711-717 ◽  
Author(s):  
Gustavo Monnerat ◽  
Alex S. Maior ◽  
Marcio Tannure ◽  
Lia K.F.C. Back ◽  
Caleb G.M. Santos

Purpose: Soccer is one of the most popular sports worldwide, a physical activity of great physiological demand and complexity. Currently, numerous trials involving physiological responses such as hypertrophy, energy expenditure, vasodilation, cardiac output, VO2max, and recovery have supported the possibility of genomic predictors’ affecting performance. In a complementary way to association studies with single nucleotide polymorphisms (SNPs), the objective was to evaluate if the use of population genetics data from human-genomics databases can provide information for a better understanding of the relationship between heritability and sport performance. Methods: The study included 25 healthy male professional soccer players (25.5 [4.3] y, 177.4 [6.4] cm, 76.4 [6.4] kg, body fat 10.5% [4.3%]) from the Brazilian first-division soccer club. Anthropometric measurements and field and isokinetic tests were performed to evaluate performance and physiologic parameters of subjects. Moreover, 10 genetic polymorphisms previously related to performance were genotyped. The genotypes of the same polymorphisms were obtained for 2504 individuals from the populations deposited in the 1000 Genomes database. A principal-component analysis and matrix genetic-distances approach (Fst) were evaluated. Results: As expected, the admixture Brazilian population has numerous genetic similarities with the European and American populations from genomic databases. Although the African component is absolutely recognized in genomes from the Brazilian population, using the specific performance-related SNPs, surprisingly the African population was one of the most genetically distant of the players (P < .00001). Conclusions: The early results suggest a selective pressure on genes of elite soccer players, possibly related simultaneously to physical-performance, environmental, cognitive, and sociocultural aspects.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jonas Meisner ◽  
Anders Albrechtsen ◽  
Kristian Hanghøj

Abstract Background Identification of selection signatures between populations is often an important part of a population genetic study. Leveraging high-throughput DNA sequencing larger sample sizes of populations with similar ancestries has become increasingly common. This has led to the need of methods capable of identifying signals of selection in populations with a continuous cline of genetic differentiation. Individuals from continuous populations are inherently challenging to group into meaningful units which is why existing methods rely on principal components analysis for inference of the selection signals. These existing methods require called genotypes as input which is problematic for studies based on low-coverage sequencing data. Materials and methods We have extended two principal component analysis based selection statistics to genotype likelihood data and applied them to low-coverage sequencing data from the 1000 Genomes Project for populations with European and East Asian ancestry to detect signals of selection in samples with continuous population structure. Results Here, we present two selections statistics which we have implemented in the framework. These methods account for genotype uncertainty, opening for the opportunity to conduct selection scans in continuous populations from low and/or variable coverage sequencing data. To illustrate their use, we applied the methods to low-coverage sequencing data from human populations of East Asian and European ancestries and show that the implemented selection statistics can control the false positive rate and that they identify the same signatures of selection from low-coverage sequencing data as state-of-the-art software using high quality called genotypes. Conclusion We show that selection scans of low-coverage sequencing data of populations with similar ancestry perform on par with that obtained from high quality genotype data. Moreover, we demonstrate that outperform selection statistics obtained from called genotypes from low-coverage sequencing data without the need for ad-hoc filtering.


2020 ◽  
Author(s):  
Ruth W. Waineina ◽  
Kiplangat Ngeno ◽  
Tobias O. Otieno ◽  
Evans D. Ilatsia

AbstractPopulation structure and relationship information among goats is critical for genetic improvement, utilization and conservation. This study explored population structure and level of gene intermixing among four goat genotypes in Kenya: Alpine (n = 30), Toggenburg (n = 28), Saanen (n = 24) and Galla (n = 12). The population structuring and relatedness were estimated using principal component analysis utilizing allele frequencies of the SNP markers. Genotype relationships were evaluated based on the calculated Reynolds genetic distances. A phylogenetic tree was constructed to represent genotype clustering using iTOL software. Population structure was investigated using model-based clustering (ADMIXTURE) Genotypes relationships revealed four distinctive clusters: Alpine, Galla, Saanen and Toggenburg. The ADMIXTURE results revealed some level of gene intermixing among Alpine, Toggenburg and Saanen with Galla. Saanen goats were the most admixed genotype with 84%, 7% and 4% of its genome derived from Galla, Alpine and Toggenburg respectively. Alpine and Toggenburg goats shared some associations with the Galla goat; 10% and 1% respectively. The association of Galla with other genotypes was anticipated since Galla goat was used as the founder population for crossbreeding with Saanen, Alpine and Toggenburg breed. The genetic variations among the goat genotypes observed, will provide a good opportunity for sustainable utilization, conservation and future genetic resource improvement programs in goat genotypes in Kenya.


2020 ◽  
Vol 498 (3) ◽  
pp. 4021-4032 ◽  
Author(s):  
Emir Uzeirbegovic ◽  
James E Geach ◽  
Sugata Kaviraj

ABSTRACT We demonstrate how galaxy morphologies can be represented by weighted sums of ‘eigengalaxies’ and how eigengalaxies can be used in a probabilistic framework to enable principled and simplified approaches in a variety of applications. Eigengalaxies can be derived from a Principal Component Analysis (PCA) of sets of single- or multiband images. They encode the image space equivalent of basis vectors that can be combined to describe the structural properties of large samples of galaxies in a massively reduced manner. As an illustration, we show how a sample of  10243 galaxies in the Hubble Space Telescope CANDELS survey can be represented by just 12 eigengalaxies. We show in some detail how this image space may be derived and tested. We also describe a probabilistic extension to PCA (PPCA) which enables the eigengalaxy framework to assign probabilities to galaxies. We present four practical applications of the probabilistic eigengalaxy framework that are particularly relevant for the next generation of large imaging surveys: we (i) show how low likelihood galaxies make for natural candidates for outlier detection; (ii) demonstrate how missing data can be predicted; (iii) show how a similarity search can be performed on exemplars; (iv) demonstrate how unsupervised clustering of objects can be implemented.


2018 ◽  
Vol 37 (4) ◽  
pp. 111-129 ◽  
Author(s):  
Soufiane Boukarta ◽  
Ewa Berezowska-Azzag

Abstract Households are the major energy consumer and contributor to the emission of greenhouse gases. The Algerian policy of mastering energy has improved building energy efficiency since 1994 by introducing thermal regulation (DTR). However, energy consumption is still increasing instead of decreasing, which is mainly due to occupants’ behaviour which is difficult to estimate and predict. This paper explores the impact of households and housing characteristics on residential gas and electricity consumption in the 36 municipalities of the department of Djelfa (Algeria) which is located in an arid and semi-arid climate zone. This paper is based on GIS and statistical techniques. It considers the yearly gas and electricity energy consumption (2013) of the municipalities of the department of Djelfa. The method is organised in four steps: (a) a multiple linear regression is used to construct two estimative models of gas and electricity. The models have more than 93% of accuracy for both gas and electricity models; (b) estimating gas and electricity consumption for 2008 according to the developed models; (c) organisation of the census data of 2008 in five dimensions: the population distribution, household characteristics, housing type and occupancy, and finally household appliance ownership; (d) a set of sensitivity analysis is performed based on Principal Component Analysis (PCA) and Pearson’s bivariate correlation and finally a path analysis is performed based on Structural Equation Model (SEM) to assess the importance of each variable. The overall impact of all these variables indicates that increasing the household size is the first factor reducing the electricity and gas consumption followed by the housing surface, density, room occupancy, and older households, while increasing the education level and appliance ownership boosts both per-capita gas and electricity consumption.


2019 ◽  
Vol 37 (1) ◽  
pp. 2-10 ◽  
Author(s):  
Luke Anderson-Trocmé ◽  
Rick Farouni ◽  
Mathieu Bourgey ◽  
Yoichiro Kamatani ◽  
Koichiro Higasa ◽  
...  

Abstract Recent reports have identified differences in the mutational spectra across human populations. Although some of these reports have been replicated in other cohorts, most have been reported only in the 1000 Genomes Project (1kGP) data. While investigating an intriguing putative population stratification within the Japanese population, we identified a previously unreported batch effect leading to spurious mutation calls in the 1kGP data and to the apparent population stratification. Because the 1kGP data are used extensively, we find that the batch effects also lead to incorrect imputation by leading imputation servers and a small number of suspicious GWAS associations. Lower quality data from the early phases of the 1kGP thus continue to contaminate modern studies in hidden ways. It may be time to retire or upgrade such legacy sequencing data.


Sign in / Sign up

Export Citation Format

Share Document