WHOLE GENOME IDENTITY-BY-DESCENT DETERMINATION

2013 ◽  
Vol 11 (02) ◽  
pp. 1350002 ◽  
Author(s):  
HADI SABAA ◽  
ZHIPENG CAI ◽  
YINING WANG ◽  
RANDY GOEBEL ◽  
STEPHEN MOORE ◽  
...  

High-throughput single nucleotide polymorphism genotyping assays conveniently produce genotype data for genome-wide genetic linkage and association studies. For pedigree datasets, the unphased genotype data is used to infer the haplotypes for individuals, according to Mendelian inheritance rules. Linkage studies can then locate putative chromosomal regions based on the haplotype allele sharing among the pedigree members and their disease status. Most existing haplotyping programs require rather strict pedigree structures and return a single inferred solution for downstream analysis. In this research, we relax the pedigree structure to contain ungenotyped founders and present a cubic time whole genome haplotyping algorithm to minimize the number of zero-recombination haplotype blocks. With or without explicitly enumerating all the haplotyping solutions, the algorithm determines all distinct haplotype allele identity-by-descent (IBD) sharings among the pedigree members, in linear time in the total number of haplotyping solutions. Our algorithm is implemented as a computer program iBDD. Extensive simulation experiments using 2 sets of 16 pedigree structures from previous studies showed that, in general, there are trillions of haplotyping solutions, but only up to a few thousand distinct haplotype allele IBD sharings. iBDD is able to return all these sharings for downstream genome-wide linkage and association studies.

2018 ◽  
Vol 8 (1) ◽  
Author(s):  
Gabriel Costa Monteiro Moreira ◽  
Clarissa Boschiero ◽  
Aline Silva Mello Cesar ◽  
James M. Reecy ◽  
Thaís Fernanda Godoy ◽  
...  

2020 ◽  
Vol 27 (9) ◽  
pp. 1425-1430
Author(s):  
Inès Krissaane ◽  
Carlos De Niz ◽  
Alba Gutiérrez-Sacristán ◽  
Gabor Korodi ◽  
Nneka Ede ◽  
...  

Abstract Objective Advancements in human genomics have generated a surge of available data, fueling the growth and accessibility of databases for more comprehensive, in-depth genetic studies. Methods We provide a straightforward and innovative methodology to optimize cloud configuration in order to conduct genome-wide association studies. We utilized Spark clusters on both Google Cloud Platform and Amazon Web Services, as well as Hail (http://doi.org/10.5281/zenodo.2646680) for analysis and exploration of genomic variants dataset. Results Comparative evaluation of numerous cloud-based cluster configurations demonstrate a successful and unprecedented compromise between speed and cost for performing genome-wide association studies on 4 distinct whole-genome sequencing datasets. Results are consistent across the 2 cloud providers and could be highly useful for accelerating research in genetics. Conclusions We present a timely piece for one of the most frequently asked questions when moving to the cloud: what is the trade-off between speed and cost?


Genome ◽  
2010 ◽  
Vol 53 (11) ◽  
pp. 967-972 ◽  
Author(s):  
Robbie Waugh ◽  
David Marshall ◽  
Bill Thomas ◽  
Jordi Comadran ◽  
Joanne Russell ◽  
...  

We have previously shown that linkage disequilibrium (LD) in the elite cultivated barley ( Hordeum vulgare ) gene pool extends, on average, for <1–5 cM. Based on this information, we have developed a platform for whole genome association studies that comprises a collection of elite lines that we have characterized at 3060 genome-wide single nucleotide polymorphism (SNP) marker loci. Interrogating this data set shows that significant population substructure is present within the elite gene pool and that diversity and LD vary considerably across each of the seven barley chromosomes. However, we also show that a subpopulation comprised of only the two-rowed spring germplasm is less structured and well suited to whole genome association studies without the need for extensive statistical intervention to account for structure. At the current marker density, the two-rowed spring population is suited for fine mapping simple traits that are located outside of the genetic centromeres with a resolution that is sufficient for candidate gene identification by exploiting conservation of synteny with fully sequenced model genomes and the emerging barley physical map.


2019 ◽  
Author(s):  
Margaret A Taub ◽  
Matthew P Conomos ◽  
Rebecca Keener ◽  
Kruthika R Iyer ◽  
Joshua S Weinstock ◽  
...  

ABSTRACTTelomeres shorten in replicating somatic cells, and telomere length (TL) is associated with age-related diseases 1,2. To date, 17 genome-wide association studies (GWAS) have identified 25 loci for leukocyte TL 3–19, but were limited to European and Asian ancestry individuals and relied on laboratory assays of TL. In this study from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program, we used whole genome sequencing (WGS) of whole blood for variant genotype calling and the bioinformatic estimation of TL in n=109,122 trans-ethnic (European, African, Asian and Hispanic/Latino) individuals. We identified 59 sentinel variants (p-value <5×10−9) from 36 loci (20 novel, 13 replicated in external datasets). There was little evidence of effect heterogeneity across populations, and 10 loci had >1 independent signal. Fine-mapping at OBFC1 indicated the independent signals colocalized with cell-type specific eQTLs for OBFC1 (STN1). We further identified two novel genes, DCLRE1B (SNM1B) and PARN, using a multi-variant gene-based approach.


2020 ◽  
Author(s):  
Yixin An ◽  
Lin Chen ◽  
Yongxiang Li ◽  
Chunhui Li ◽  
Yunsu Shi ◽  
...  

Abstract Background: Kernel row number (KRN) is an important trait for the domestication and improvement of maize. To explore the genetic basis of KRN has great research significance and can provide the valuable information for molecular assisted selection.Results: In this study, one single-locus method (MLM) and six multi-locus methods (mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB and ISIS EM-BLASSO) of genome-wide association studies (GWASs) were used to identify significant quantitative trait nucleotides (QTNs) for KRN in an association panel including 639 maize inbred lines that were genotyped by the MaizeSNP50 BeadChip. In three phenotyping environments and with best linear unbiased prediction (BLUP) values, seven GWAS methods revealed different numbers of KRN-associated QTNs, ranging from 11 to 177. Based on these results, seven important regions for KRN located on chromosomes 1, 2, 3, 5, 9, and 10 were identified by at least three methods and in at least two environments. Moreover, 49 genes from the seven regions were expressed in different maize tissues. Among the 49 genes, ARF29 (Zm00001d026540, encoding auxin response factor 29) and CKO4 (Zm00001d043293, encoding cytokinin oxidase protein) were significantly related to KRN based on expression analysis and candidate gene association mapping. Whole-genome prediction (WGP) for KRN was also performed, and we found that the KRN-associated tagSNPs achieved a high prediction accuracy. The best strategy was to integrate the total KRN-associated tagSNPs identified by all GWAS models.Conclusions: These results aid in our understanding of the genetic architecture of KRN and provide useful information for genomic selection for KRN in maize breeding.


2021 ◽  
Author(s):  
Marsha M. Wheeler ◽  
Adrienne M Stilp ◽  
Shuquan Rao ◽  
Bjarni V Halldorsson ◽  
Doruk V Beyter ◽  
...  

Genome-wide association studies (GWAS) have identified thousands of single nucleotide variants and small indels that contribute to the genetic architecture of hematologic traits. While structural variants (SVs) are known to cause rare blood or hematopoietic disorders, the genome-wide contribution of SVs to quantitative blood cell trait variation is unknown. Here we utilized SVs detected from whole genome sequencing (WGS) in ancestrally diverse participants of the NHLBI TOPMed program (N=50,675). Using single variant tests, we assessed the association of common and rare SVs with red cell-, white cell-, and platelet-related quantitative traits. The results show 33 independent SVs (23 common and 10 rare) reaching genome-wide significance. The majority of significant association signals (N=27) replicated in independent datasets from deCODE genetics and the UK BioBank. Moreover, most trait-associated SVs (N=24) are within 1Mb of previously-reported GWAS loci. SV analyses additionally discovered an association between a complex structural variant on 17p11.2 and white blood cell-related phenotypes. Based on functional annotation, the majority of significant SVs are located in non-coding regions (N=26) and predicted to impact regulatory elements and/or local chromatin domain boundaries in blood cells. We predict that several trait-associated SVs represent the causal variant. This is supported by genome-editing experiments which provide evidence that a deletion associated with lower monocyte counts leads to disruption of an S1PR3 monocyte enhancer and decreased S1PR3 expression.


2017 ◽  
Author(s):  
Clare Bycroft ◽  
Colin Freeman ◽  
Desislava Petkova ◽  
Gavin Band ◽  
Lloyd T. Elliott ◽  
...  

AbstractThe UK Biobank project is a large prospective cohort study of ~500,000 individuals from across the United Kingdom, aged between 40-69 at recruitment. A rich variety of phenotypic and health-related information is available on each participant, making the resource unprecedented in its size and scope. Here we describe the genome-wide genotype data (~805,000 markers) collected on all individuals in the cohort and its quality control procedures. Genotype data on this scale offers novel opportunities for assessing quality issues, although the wide range of ancestries of the individuals in the cohort also creates particular challenges. We also conducted a set of analyses that reveal properties of the genetic data – such as population structure and relatedness – that can be important for downstream analyses. In addition, we phased and imputed genotypes into the dataset, using computationally efficient methods combined with the Haplotype Reference Consortium (HRC) and UK10K haplotype resource. This increases the number of testable variants by over 100-fold to ~96 million variants. We also imputed classical allelic variation at 11 human leukocyte antigen (HLA) genes, and as a quality control check of this imputation, we replicate signals of known associations between HLA alleles and many common diseases. We describe tools that allow efficient genome-wide association studies (GWAS) of multiple traits and fast phenome-wide association studies (PheWAS), which work together with a new compressed file format that has been used to distribute the dataset. As a further check of the genotyped and imputed datasets, we performed a test-case genome-wide association scan on a well-studied human trait, standing height.


2019 ◽  
Author(s):  
Sankar Subramanian ◽  
Umayal Ramasamy ◽  
David Chen

In the past decades a number of software programs have been developed to deduce the phylogenetic relationship between populations. However, these programs are not suited for large-scale whole genome data. Recently, a few standalone or web applications have been developed to handle genome-wide data, but they were either computationally intensive, dependent on third party software or required significant time and resource of a web server. In the post-genomic era, researchers are able to obtain bioinformatically processed high-quality publication-ready whole genome data for many individuals in a population from next generation sequencing companies due to the reduction in the cost of sequencing and analysis. Such genotype data is typically presented in the Variant Call Format (VCF) and there is no simple software available that uses this data to construct the phylogeny of populations in a short time. To address this limitation, we have developed a one-click user-friendly software, VCF2PopTree that uses gnome-wide SNPs to construct and display phylogenetic trees in seconds to minutes. For example, it reads a 1 GB VCF file and draws a tree in less than 5 minutes. VCF2PopTree accepts genotype data from a local machine, constructs a tree using UPGMA and Neighbour-Joining algorithms and displays it on a web-browser. It also produces pairwise-diversity matrix in MEGA and PHYLIP file formats as well as trees in the Newick format which could be directly used by other popular phylogenetic software programs. The software including the source code, a test VCF input file and short documentation are available at: https://github.com/sansubs/vcf2pop.


Sign in / Sign up

Export Citation Format

Share Document