1000 genomes
Recently Published Documents


TOTAL DOCUMENTS

313
(FIVE YEARS 116)

H-INDEX

36
(FIVE YEARS 7)

2022 ◽  
Author(s):  
Lars Wienbrandt ◽  
David Ellinghaus

Background: Reference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used. Methods: We developed EagleImp, a software with algorithmic and technical improvements and new features for accurate and accelerated phasing and imputation in a single tool. Results: We compared accuracy and runtime of EagleImp with Eagle2, PBWT and prominent imputation servers using whole-genome sequencing data from the 1000 Genomes Project, the Haplotype Reference Consortium and simulated data with more than 1 million reference genomes. EagleImp is 2 to 10 times faster (depending on the single or multiprocessor configuration selected) than Eagle2/PBWT, with the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical GWAS studies, EagleImp provides same or higher imputation accuracy than the Sanger Imputation Service, Michigan Imputation Server and the newly developed TOPMed Imputation Server, despite larger (not publicly available) reference panels. It has many new features, including automated chromosome splitting and memory management at runtime to avoid job aborts, fast reading and writing of large files, and various user-configurable algorithm and output options. Conclusions: Due to the technical optimisations, EagleImp can perform fast and accurate reference-based phasing and imputation for future very large reference panels with more than 1 million genomes. EagleImp is freely available for download from https://github.com/ikmb/eagleimp.


2022 ◽  
Vol 12 ◽  
Author(s):  
Lu Cao ◽  
Ruixue Zhang ◽  
Yirui Wang ◽  
Xia Hu ◽  
Liang Yong ◽  
...  

The important role of MHC in the pathogenesis of vitiligo and SLE has been confirmed in various populations. To map the most significant MHC variants associated with the risk of vitiligo and SLE, we conducted fine mapping analysis using 1117 vitiligo cases, 1046 SLE cases and 1693 healthy control subjects in the Han-MHC reference panel and 1000 Genomes Project phase 3. rs113465897 (P=1.03×10-13, OR=1.64, 95%CI =1.44–1.87) and rs3129898 (P=4.21×10-17, OR=1.93, 95%CI=1.66–2.25) were identified as being most strongly associated with vitiligo and SLE, respectively. Stepwise conditional analysis revealed additional independent signals at rs3130969(p=1.48×10-7, OR=0.69, 95%CI=0.60–0.79), HLA-DPB1*03:01 (p=1.07×10-6, OR=1.94, 95%CI=1.49–2.53) being linked to vitiligo and HLA-DQB1*0301 (P=4.53×10-7, OR=0.62, 95%CI=0.52-0.75) to SLE. Considering that epidemiological studies have confirmed comorbidities of vitiligo and SLE, we used the GCTA tool to analyse the genetic correlation between these two diseases in the HLA region, the correlation coefficient was 0.79 (P=5.99×10-10, SE=0.07), confirming their similar genetic backgrounds. Our findings highlight the value of the MHC region in vitiligo and SLE and provide a new perspective for comorbidities among autoimmune diseases.


Genes ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 44
Author(s):  
Iago Maceda ◽  
Oscar Lao

The 1000 Genomes Project (1000G) is one of the most popular whole genome sequencing datasets used in different genomics fields and has boosting our knowledge in medical and population genomics, among other fields. Recent studies have reported the presence of ghost mutation signals in the 1000G. Furthermore, studies have shown that these mutations can influence the outcomes of follow-up studies based on the genetic variation of 1000G, such as single nucleotide variants (SNV) imputation. While the overall effect of these ghost mutations can be considered negligible for common genetic variants in many populations, the potential bias remains unclear when studying low frequency genetic variants in the population. In this study, we analyze the effect of the sequencing center in predicted loss of function (LoF) alleles, the number of singletons, and the patterns of archaic introgression in the 1000G. Our results support previous studies showing that the sequencing center is associated with LoF and singletons independent of the population that is considered. Furthermore, we observed that patterns of archaic introgression were distorted for some populations depending on the sequencing center. When analyzing the frequency of SNPs showing extreme patterns of genotype differentiation among centers for CEU, YRI, CHB, and JPT, we observed that the magnitude of the sequencing batch effect was stronger at MAF < 0.2 and showed different profiles between CHB and the other populations. All these results suggest that data from 1000G must be interpreted with caution when considering statistics using variants at low frequency.


2021 ◽  
Author(s):  
Zhiyi Xia ◽  
Shi Huang

Human genetic diversity remains to be better understood. We here analyzed data from the 1000 Genomes Project and defined group specific fixed alleles (GSFAs) as those that are likely fixed in one ethnic group but non-fixed in at least one other group. The fraction of derived alleles in GSFAs indicates relative distance to apes because such alleles are absent in apes. Our results show that different groups differed in GSFA numbers consistent with known genetic diversity patterns, but also differed in the fraction of derived alleles in GSFAs throughout the entire genome, with East Asians having the largest fraction, followed by South Asians, Europeans, Native Americans, and Africans. Fast evolving sites such as intergenic regions were enriched with derived alleles and showed greater differences in GSFA numbers between East Asians and Africans. Furthermore, GSFAs in East Asians are mostly not fixed in other groups especially Africans, which was particularly more pronounced for fast evolving noncoding variants, while GSFAs in Africans are mostly also fixed in East Asians. Finally, variants that are likely non-neutral such as those leading to stop codon gain/loss and splice donor/acceptor gain/loss showed patterns similar to those of fast-evolving noncoding variants. These results can be accounted for by the maximum genetic diversity theory but not by the neutral theory or its inference that Eurasians suffered bottlenecks, and have implications for better management of group specific genetic diseases.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12502
Author(s):  
Nikita Moshkov ◽  
Aleksandr Smetanin ◽  
Tatiana V. Tatarinova

Summary We developed PyLAE, a new tool for determining local ancestry along a genome using whole-genome sequencing data or high-density genotyping experiments. PyLAE can process an arbitrarily large number of ancestral populations (with or without an informative prior). Since PyLAE does not involve estimating many parameters, it can process thousands of genomes within a day. PyLAE can run on phased or unphased genomic data. We have shown how PyLAE can be applied to the identification of differentially enriched pathways between populations. The local ancestry approach results in higher enrichment scores compared to whole-genome approaches. We benchmarked PyLAE using the 1000 Genomes dataset, comparing the aggregated predictions with the global admixture results and the current gold standard program RFMix. Computational efficiency, minimal requirements for data pre-processing, straightforward presentation of results, and ease of installation make PyLAE a valuable tool to study admixed populations. Availability and implementation The source code and installation manual are available at https://github.com/smetam/pylae.


Genes ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 1979
Author(s):  
Francesco Musacchia ◽  
Marianthi Karali ◽  
Annalaura Torella ◽  
Steve Laurie ◽  
Valeria Policastro ◽  
...  

Homozygous deletions (HDs) may be the cause of rare diseases and cancer, and their discovery in targeted sequencing is a challenging task. Different tools have been developed to disentangle HD discovery but a sensitive caller is still lacking. We present VarGenius-HZD, a sensitive and scalable algorithm that leverages breadth-of-coverage for the detection of rare homozygous and hemizygous single-exon deletions (HDs). To assess its effectiveness, we detected both real and synthetic rare HDs in fifty exomes from the 1000 Genomes Project obtaining higher sensitivity in comparison with state-of-the-art algorithms that each missed at least one event. We then applied our tool on targeted sequencing data from patients with Inherited Retinal Dystrophies and solved five cases that still lacked a genetic diagnosis. We provide VarGenius-HZD either stand-alone or integrated within our recently developed software, enabling the automated selection of samples using the internal database. Hence, it could be extremely useful for both diagnostic and research purposes.


Author(s):  
Arman A. Bashirova ◽  
Wanjing Zheng ◽  
Marjan Akdag ◽  
Danillo G. Augusto ◽  
Nicolas Vince ◽  
...  

AbstractHuman immunoglobulin G (IgG) molecules, IgG1, IgG2 and IgG3, exhibit substantial inter-individual variation in their constant heavy chain regions, as discovered by serological methods. This polymorphism is encoded by the IGHG1, IGHG2, and IGHG3 genes and may influence antibody function. We sequenced the coding fragments of these genes in 95 European Americans, 94 African Americans, and 94 Black South Africans. Striking differences were observed between the population groups, including extremely low amino acid sequence variation in IGHG1 among South Africans, and higher IGHG2 and IGHG3 diversity in individuals of African descent compared to individuals of European descent. Molecular definition of the loci illustrates a greater level of allelic polymorphism than previously described, including the presence of common IGHG2 and IGHG3 variants that were indistinguishable serologically. Comparison of our data with the 1000 Genome Project sequences indicates overall agreement between the datasets, although some inaccuracies in the 1000 Genomes Project are likely. These data represent the most comprehensive analysis of IGHG polymorphisms across major populations, which can now be applied to deciphering their functional impact.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Fadilla Wahyudi ◽  
Farhang Aghakhanian ◽  
Sadequr Rahman ◽  
Yik-Ying Teo ◽  
Michał Szpak ◽  
...  

Abstract Background In population genomics, polymorphisms that are highly differentiated between geographically separated populations are often suggestive of Darwinian positive selection. Genomic scans have highlighted several such regions in African and non-African populations, but only a handful of these have functional data that clearly associates candidate variations driving the selection process. Fine-Mapping of Adaptive Variation (FineMAV) was developed to address this in a high-throughput manner using population based whole-genome sequences generated by the 1000 Genomes Project. It pinpoints positively selected genetic variants in sequencing data by prioritizing high frequency, population-specific and functional derived alleles. Results We developed a stand-alone software that implements the FineMAV statistic. To graphically visualise the FineMAV scores, it outputs the statistics as bigWig files, which is a common file format supported by many genome browsers. It is available as a command-line and graphical user interface. The software was tested by replicating the FineMAV scores obtained using 1000 Genomes Project African, European, East and South Asian populations and subsequently applied to whole-genome sequencing datasets from Singapore and China to highlight population specific variants that can be subsequently modelled. The software tool is publicly available at https://github.com/fadilla-wahyudi/finemav. Conclusions The software tool described here determines genome-wide FineMAV scores, using low or high-coverage whole-genome sequencing datasets, that can be used to prioritize a list of population specific, highly differentiated candidate variants for in vitro or in vivo functional screens. The tool displays these scores on the human genome browsers for easy visualisation, annotation and comparison between different genomic regions in worldwide human populations.


Heredity ◽  
2021 ◽  
Author(s):  
Qian S. Zhang ◽  
Jérôme Goudet ◽  
Bruce S. Weir

AbstractThe two alleles an individual carries at a locus are identical by descent (ibd) if they have descended from a single ancestral allele in a reference population, and the probability of such identity is the inbreeding coefficient of the individual. Inbreeding coefficients can be predicted from pedigrees with founders constituting the reference population, but estimation from genetic data is not possible without data from the reference population. Most inbreeding estimators that make explicit use of sample allele frequencies as estimates of allele probabilities in the reference population are confounded by average kinships with other individuals. This means that the ranking of those estimates depends on the scope of the study sample and we show the variation in rankings for common estimators applied to different subdivisions of 1000 Genomes data. Allele-sharing estimators of within-population inbreeding relative to average kinship in a study sample, however, do have invariant rankings across all studies including those individuals. They are unbiased with a large number of SNPs. We discuss how allele sharing estimates are the relevant quantities for a range of empirical applications.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e12294
Author(s):  
Neeraj Bharti ◽  
Ruma Banerjee ◽  
Archana Achalere ◽  
Sunitha Manjari Kasibhatla ◽  
Rajendra Joshi

Objectives Reliable identification of population-specific variants is important for building the single nucleotide polymorphism (SNP) profile. In this study, genomic variation using allele frequency differences of pharmacologically important genes for Gujarati Indians in Houston (GIH) and Indian Telugu in the U.K. (ITU) from the 1000 Genomes Project vis-à-vis global population data was studied to understand its role in drug response. Methods Joint genotyping approach was used to derive variants of GIH and ITU independently. SNPs of both these populations with significant allele frequency variation (minor allele frequency ≥ 0.05) with super-populations from the 1000 Genomes Project and gnomAD based on Chi-square distribution with p-value of ≤ 0.05 and Bonferroni’s multiple adjustment tests were identified. Population stratification and fixation index analysis was carried out to understand genetic differentiation. Functional annotation of variants was carried out using SnpEff, VEP and CADD score. Results Population stratification of VIP genes revealed four clusters viz., single cluster of GIH and ITU, one cluster each of East Asian, European, African populations and Admixed American was found to be admixed. A total of 13 SNPs belonging to ten pharmacogenes were identified to have significant allele frequency variation in both GIH and ITU populations as compared to one or more super-populations. These SNPs belong to VKORC1 (rs17708472, rs2359612, rs8050894) involved in Vitamin K cycle, cytochrome P450 isoforms CYP2C9 (rs1057910), CYP2B6 (rs3211371), CYP2A2 (rs4646425) and CYP2A4 (rs4646440); ATP-binding cassette (ABC) transporter ABCB1 (rs12720067), DPYD1 (rs12119882, rs56160474) involved in pyrimidine metabolism, methyltransferase COMT (rs9332377) and transcriptional factor NR1I2 (rs6785049). SNPs rs1544410 (VDR), rs2725264 (ABCG2), rs5215 and rs5219 (KCNJ11) share high fixation index (≥ 0.5) with either EAS/AFR populations. Missense variants rs1057910 (CYP2C9), rs1801028 (DRD2) and rs1138272 (GSTP1), rs116855232 (NUDT15); intronic variants rs1131341 (NQO1) and rs115349832 (DPYD) are identified to be ‘deleterious’. Conclusions Analysis of SNPs pertaining to pharmacogenes in GIH and ITU populations using population structure, fixation index and allele frequency variation provides a premise for understanding the role of genetic diversity in drug response in Asian Indians.


Sign in / Sign up

Export Citation Format

Share Document