scholarly journals Major sex differences in allele frequencies for X chromosome variants in the 1000 Genomes Project data

2021 ◽  
Author(s):  
Zhong Wang ◽  
Lei Sun ◽  
Andrew D Paterson

An unexpectedly high proportion of SNPs on the X chromosome in the 1000 Genomes Project phase 3 data were identified with significant sex differences in minor allele frequencies (sdMAF). sdMAF persisted for many of these SNPs in the recently released high coverage whole genome sequence, and it was consistent between the five super-populations. Among the 245,825 common biallelic SNPs in phase 3 data presumed to be high quality, 2,039 have genome-wide significant sdMAF (p-value <5e-8). sdMAF varied by location: (NPR)=0.83%, pseudo-autosomal region (PAR1)=0.29%, PAR2=13.1%, and PAR3=0.85% of SNPs had sdMAF, and they were clustered at the NPR-PAR boundaries, among others. sdMAF at the NPR-PAR boundaries are biologically expected due to sex-linkage, but have generally been ignored in association studies. For comparison, similar analyses found only 6, 1 and 0 SNPs with significant sdMAF on chromosomes 1, 7 and 22, respectively. Future X chromosome analyses need to take sdMAF into account.

2021 ◽  
Author(s):  
Tamara Soledad Frontanilla ◽  
Guilherme Valle Silva ◽  
Jesus Ayala ◽  
Celso Teixeira Mendes

Accurate STR genotyping from next-generation sequencing (NGS) data has been challenging. Haplotype inference and phasing for STRs (HipSTR) was specifically developed to deal with genotyping errors and obtain reliable STR genotypes from whole-genome sequencing datasets. The objective of this investigation was to perform a comprehensive genotyping analysis of a set of STRs of broad forensic interest from the 1000 Genomes populations and release a reliable open-access STR database to the forensic genetics community. A set of 22 STR markers were analyzed using the CRAM files of the 1000 Genomes Project Phase 3 high-coverage (30x) dataset generated by the New York Genome Center (NYGC). HipSTR was used to call genotypes from 2,504 samples from 26 populations organized into five groups: African, East Asian, European, South Asian, and admixed American. The D21S11 marker could not be detected in the present study. Moreover, the Hardy-Weinberg equilibrium analysis, coupled with a comprehensive analysis of allele frequencies, revealed that HipSTR could not identify longer Penta E (and Penta D at a lesser extent) alleles. This issue is probably due to the limited length of sequencing reads available for genotype calling, resulting in heterozygote deficiency. Notwithstanding that, AMOVA, a clustering analysis using STRUCTURE, and a Principal Coordinates Analysis revealed a clear-cut separation between the four major ancestries sampled by the 1000 Genomes Consortium (AFR, EUR, EAS, SAS). Meanwhile, the AMOVA results corroborated previous reports that most of the variance is (97.12%) observed within populations. This set of analyses revealed that except for larger Penta D and Penta E alleles, allele frequencies and genotypes defined by HipSTR from the 1000 Genomes Project phase 3 data and offered as an open-access database are consistent and highly reliable.


Author(s):  
Marta Byrska-Bishop ◽  
Uday S. Evani ◽  
Xuefang Zhao ◽  
Anna O. Basile ◽  
Haley J. Abel ◽  
...  

ABSTRACTThe 1000 Genomes Project (1kGP), launched in 2008, is the largest fully open resource of whole genome sequencing (WGS) data consented for public distribution of raw sequence data without access or use restrictions. The final (phase 3) 2015 release of 1kGP included 2,504 unrelated samples from 26 populations, representing five continental regions of the world and was based on a combination of technologies including low coverage WGS (mean depth 7.4X), high coverage whole exome sequencing (mean depth 65.7X), and microarray genotyping. Here, we present a new, high coverage WGS resource encompassing the original 2,504 1kGP samples, as well as an additional 698 related samples that result in 602 complete trios in the 1kGP cohort. We sequenced this expanded 1kGP cohort of 3,202 samples to a targeted depth of 30X using Illumina NovaSeq 6000 instruments. We performed SNV/INDEL calling against the GRCh38 reference using GATK’s HaplotypeCaller, and generated a comprehensive set of SVs by integrating multiple analytic methods through a sophisticated machine learning model, upgrading the 1kGP dataset to current state-of-the-art standards. Using this strategy, we defined over 111 million SNVs, 14 million INDELs, and ∼170 thousand SVs across the entire cohort of 3,202 samples with estimated false discovery rate (FDR) of 0.3%, 1.0%, and 1.8%, respectively. By comparison to the low-coverage phase 3 callset, we observed substantial improvements in variant discovery and estimated FDR that were facilitated by high coverage re-sequencing and expansion of the cohort. Specifically, we called 7% more SNVs, 59% more INDELs, and 170% more SVs per genome than the phase 3 callset. Moreover, we leveraged the presence of families in the cohort to achieve superior haplotype phasing accuracy and we demonstrate improvements that the high coverage panel brings especially for INDEL imputation. We make all the data generated as part of this project publicly available and we envision this updated version of the 1kGP callset to become the new de facto public resource for the worldwide scientific community working on genomics and genetics.


2021 ◽  
Vol 11 (3) ◽  
pp. 231
Author(s):  
Faven Butler ◽  
Ali Alghubayshi ◽  
Youssef Roman

Gout is an inflammatory condition caused by elevated serum urate (SU), a condition known as hyperuricemia (HU). Genetic variations, including single nucleotide polymorphisms (SNPs), can alter the function of urate transporters, leading to differential HU and gout prevalence across different populations. In the United States (U.S.), gout prevalence differentially affects certain racial groups. The objective of this proposed analysis is to compare the frequency of urate-related genetic risk alleles between Europeans (EUR) and the following major racial groups: Africans in Southwest U.S. (ASW), Han-Chinese (CHS), Japanese (JPT), and Mexican (MXL) from the 1000 Genomes Project. The Ensembl genome browser of the 1000 Genomes Project was used to conduct cross-population allele frequency comparisons of 11 SNPs across 11 genes, physiologically involved and significantly associated with SU levels and gout risk. Gene/SNP pairs included: ABCG2 (rs2231142), SLC2A9 (rs734553), SLC17A1 (rs1183201), SLC16A9 (rs1171614), GCKR (rs1260326), SLC22A11 (rs2078267), SLC22A12 (rs505802), INHBC (rs3741414), RREB1 (rs675209), PDZK1 (rs12129861), and NRXN2 (rs478607). Allele frequencies were compared to EUR using Chi-Square or Fisher’s Exact test, when appropriate. Bonferroni correction for multiple comparisons was used, with p < 0.0045 for statistical significance. Risk alleles were defined as the allele that is associated with baseline or higher HU and gout risks. The cumulative HU or gout risk allele index of the 11 SNPs was estimated for each population. The prevalence of HU and gout in U.S. and non-US populations was evaluated using published epidemiological data and literature review. Compared with EUR, the SNP frequencies of 7/11 in ASW, 9/11 in MXL, 9/11 JPT, and 11/11 CHS were significantly different. HU or gout risk allele indices were 5, 6, 9, and 11 in ASW, MXL, CHS, and JPT, respectively. Out of the 11 SNPs, the percentage of risk alleles in CHS and JPT was 100%. Compared to non-US populations, the prevalence of HU and gout appear to be higher in western world countries. Compared with EUR, CHS and JPT populations had the highest HU or gout risk allele frequencies, followed by MXL and ASW. These results suggest that individuals of Asian descent are at higher HU and gout risk, which may partly explain the nearly three-fold higher gout prevalence among Asians versus Caucasians in ambulatory care settings. Furthermore, gout remains a disease of developed countries with a marked global rising.


2017 ◽  
Vol 7 (1) ◽  
Author(s):  
Mathias Gorski ◽  
Peter J. van der Most ◽  
Alexander Teumer ◽  
Audrey Y. Chu ◽  
Man Li ◽  
...  

Abstract HapMap imputed genome-wide association studies (GWAS) have revealed >50 loci at which common variants with minor allele frequency >5% are associated with kidney function. GWAS using more complete reference sets for imputation, such as those from The 1000 Genomes project, promise to identify novel loci that have been missed by previous efforts. To investigate the value of such a more complete variant catalog, we conducted a GWAS meta-analysis of kidney function based on the estimated glomerular filtration rate (eGFR) in 110,517 European ancestry participants using 1000 Genomes imputed data. We identified 10 novel loci with p-value < 5 × 10−8 previously missed by HapMap-based GWAS. Six of these loci (HOXD8, ARL15, PIK3R1, EYA4, ASTN2, and EPB41L3) are tagged by common SNPs unique to the 1000 Genomes reference panel. Using pathway analysis, we identified 39 significant (FDR < 0.05) genes and 127 significantly (FDR < 0.05) enriched gene sets, which were missed by our previous analyses. Among those, the 10 identified novel genes are part of pathways of kidney development, carbohydrate metabolism, cardiac septum development and glucose metabolism. These results highlight the utility of re-imputing from denser reference panels, until whole-genome sequencing becomes feasible in large samples.


2021 ◽  
Vol 12 ◽  
Author(s):  
Gang Shi ◽  
Qingmin Kuang

With the advance of sequencing technology, an increasing number of populations have been sequenced to study the histories of worldwide populations, including their divergence, admixtures, migration, and effective sizes. The variants detected in sequencing studies are largely rare and mostly population specific. Population-specific variants are often recent mutations and are informative for revealing substructures and admixtures in populations; however, computational methods and tools to analyze them are still lacking. In this work, we propose using reference populations and single nucleotide polymorphisms (SNPs) specific to the reference populations. Ancestral information, the best linear unbiased estimator (BLUE) of the ancestral proportion, is proposed, which can be used to infer ancestral proportions in recently admixed target populations and measure the extent to which reference populations serve as good proxies for the admixing sources. Based on the same panel of SNPs, the ancestral information is comparable across samples from different studies and is not affected by genetic outliers, related samples, or the sample sizes of the admixed target populations. In addition, ancestral spectrum is useful for detecting genetic outliers or exploring co-ancestry between study samples and the reference populations. The methods are implemented in a program, Ancestral Spectrum Analyzer (ASA), and are applied in analyzing high-coverage sequencing data from the 1000 Genomes Project and the Human Genome Diversity Project (HGDP). In the analyses of American populations from the 1000 Genomes Project, we demonstrate that recent admixtures can be dissected from ancient admixtures by comparing ancestral spectra with and without indigenous Americans being included in the reference populations.


2016 ◽  
Author(s):  
Suyash S. Shringarpure ◽  
Carlos D. Bustamante ◽  
Kenneth L. Lange ◽  
David H. Alexander

Background: A number of large genomic datasets are being generated for studies of human ancestry and diseases. The ADMIXTURE program is commonly used to infer individual ancestry from genomic data. Results: We describe two improvements to the ADMIXTURE software. The first enables ADMIXTURE to infer ancestry for a new set of individuals using cluster allele frequencies from a reference set of individuals. Using data from the 1000 Genomes Project, we show that this allows ADMIXTURE to infer ancestry for 10,920 individuals in a few hours (a 5x speedup). This mode also allows ADMIXTURE to correctly estimate individual ancestry and allele frequencies from a set of related individuals. The second modification allows ADMIXTURE to correctly handle X-chromosome (and other haploid) data from both males and females. We demonstrate increased power to detect sex-biased admixture in African-American individuals from the 1000 Genomes project using this extension. Conclusions: These modifications make ADMIXTURE more efficient and versatile, allowing users to extract more information from large genomic datasets.


2015 ◽  
Author(s):  
Shane McCarthy ◽  
Sayantan Das ◽  
Warren Kretzschmar ◽  
Olivier Delaneau ◽  
Andrew R. Wood ◽  
...  

We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1%, a large increase in the number of SNPs tested in association studies and can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.


2020 ◽  
Author(s):  
Nathan S. Harris ◽  
Alan R. Rogers

AbstractSignals of selection are not often shared between populations. When a mutual signal is detected, it is often not known if selection occurred before or after populations split. Here we develop a method to detect genomic regions at which selection has favored different haplotypes in two populations. This method is verified through simulations and tested on small regions of the genome. This method was then expanded to scan the phase 3 genomes of the 1000 Genomes Project populations for regions in which the evidence for independent selection is strongest. We identify several genes which likely underwent selection independently in different populations.


2021 ◽  
Vol 36 (Supplement_1) ◽  
Author(s):  
A Groff ◽  
A Korkidakis ◽  
D Sakkas ◽  
D Page

Abstract Study question What role does the X chromosome play in early embryo metabolism? Does X chromosome copy number contribute to sex differences in early embryonic metabolism? Summary answer Chromosome X contains several metabolism-related genes that are expressed prior to X-inactivation, suggesting that their dosage plays a role in sex-biased regulation of embryo metabolism. What is known already Published reports indicate that sex differences in preimplantation embryo metabolism exist across mammalian species, including humans. Two observations supporting this are that male embryos reach blastocyst stage earlier than their female counterparts, and that glucose uptake and processing is thought to be higher in female compared to male embryos. It has been hypothesized that these differences reflect the location of the metabolism gene G6PD, the rate limiting enzyme in the Pentose Phosphate Pathway, on Chromosome X. Study design, size, duration This study is a reanalysis of publicly available RNA-seq data, including 1176 single cells from 59 blastocysts (24 E5, 18 E6, 17 E7) published in one study (Petropoulos et al 2016). Participants/materials, setting, methods Cells were subjected to a digital karyotype inference algorithm and aneuploid samples were removed from the dataset. Sex differential gene expression analyses (DE) were then performed in euploid trophectoderm cells (TE; 233 XY from 16 embryos and 180 XX cells from 12 embryos). Cell numbers from ICM were too sparse to compare. Main results and the role of chance Analysis of XX and XY TE revealed 618 significantly differentially expressed genes (DEGs; 507 upregulated in XX cells, and 111 upregulated in XY cells). These genes are spread across autosomes and sex chromosomes. Interestingly, G6PD is not significantly more highly expressed in XX cells. Gene Ontology (GO) analysis of the XX-biased DEGs revealed a transcriptional sex bias in metabolism-related GO categories, including “mitochondrial ATP synthesis coupled electron transport”, and “respiratory chain complex I”. Gene-level assessment revealed that the drivers of these enrichments are spread across the genome, but 28/64 reside on Chromosome X (hypergeometric p-value = 5.984473e–27), including NDUFA1, NDUFB11, and COX7B (components of the electron transport chain), and SLC25A5 (an ATP/ADP transporter involved in maintaining mitochondrial membrane potential). This indicates a direct role for multiple X-linked genes in sex-biased regulation of embryo metabolism. Metabolic genes that are not sex-biased are distributed across the genome, with no significant enrichment on Chromosome X (76/266, hypergeometric p-value=0.607). Together, these data indicate that GO metabolic term X enrichment is a feature of sex-biased expression and not due to an accumulation of metabolism-related genes on the X. Limitations, reasons for caution This analysis draws on publicly available data, and thus we are unable to perform orthogonal validation of karyotype calls. Additionally, while the initial dataset is large, the quality-filtered dataset (euploid XX and XY TE) is small, and single cell data is infamously variable. Further data collection is required. Wider implications of the findings: Our analysis of sex-biased gene expression in early human embryos suggests a more important role for the X chromosome in modulating sex biases in early embryo metabolism than previously recognized. This study provides insight into the mechanisms underlying the development of metabolic sex differences throughout the lifespan. Trial registration number NA


2015 ◽  
Vol 117 (suppl_1) ◽  
Author(s):  
Jingyuan Li ◽  
Yuichiro Itoh ◽  
Xuqi Chen ◽  
Arthur Arnold ◽  
Mansoureh Eghbali

Introduction: Sex differences in susceptibility to ischemia/reperfusion (I/R) injury have been mostly attributed to sex hormones. Recently we examined the role of sex chromosomes in sex differences in myocardial I/R injury. We discovered that gonadectomized mice with two X chromosomes (XX or XXY) have ~50% larger infarct size after I/R injury, compared to mice with one X chromosome (XY or XO). Only few X genes escape X inactivation and are expressed higher in XX than XY individuals. Here we examined the role of “X escapee” histone demethylase Kdm6a which is important in cardiac development. Methods: Female mice with a heterozygous global knockout of Kdm6a (Kdm6a+/-) and with 2 copies of Kdm6a (Kdm6a+/+, regular WT) were used. Isolated mouse hearts were subjected to 30 min global normothermic ischemia followed by 60 min reperfusion. RNA-Seq analysis was performed by comparing gene expression in hearts of Kdm6a+/+ vs. Kdm6a+/- females at baseline before ischemia. We calculated an unbiased composite score of relevance in which the level of significance of the Kdm6a effect on expression (p value) was integrated with the size of the KDM6A effect on expression (fold change), and with the amount of H3K27me3 mark found on the genes in the heart based on online ChIP-Seq data. Two way ANOVA was used for statistical analysis. P<0.05 was considered statistically significant. Values are expressed as mean± SE. Results: Kdm6a+/+ female mice had significantly lower heart functional recovery compared to their littermate Kdm6a+/- (LVDP: 46.7±9.8% vs. 79.8±3.5%; RPP: 44.1±10.5% vs. 76.2±8.5%, n=6-8 mice/group, p<0.01). The integration of our RNA-Seq data using the hearts of female mice with 2 vs. 1 copy of Kdm6a (n=4 samples per group) with online datasets measuring the H3K27me3 mark, sex differences in expression in humans and mice, and involvement in ischemic heart failure, revealed carbonic anhydrase-3 (Car3) as the most interesting candidate (upregulated ~7 fold in the hearts of Kdm6a+/+ vs. Kdm6a+/- female mice) at baseline. Car3 encodes one isoform of carbonic anhydrase, involved in pH regulation, which is a critical part of I/R injury. Conclusion: Histone demethylase KDM6A contributes to X chromosome dependent I/R injury via epigenetic regulation


Sign in / Sign up

Export Citation Format

Share Document