scholarly journals Exploring effective approaches for haplotype block phasing

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Ziad Al Bkhetan ◽  
Justin Zobel ◽  
Adam Kowalczyk ◽  
Karin Verspoor ◽  
Benjamin Goudey

Abstract Background Knowledge of phase, the specific allele sequence on each copy of homologous chromosomes, is increasingly recognized as critical for detecting certain classes of disease-associated mutations. One approach for detecting such mutations is through phased haplotype association analysis. While the accuracy of methods for phasing genotype data has been widely explored, there has been little attention given to phasing accuracy at haplotype block scale. Understanding the combined impact of the accuracy of phasing tool and the method used to determine haplotype blocks on the error rate within the determined blocks is essential to conduct accurate haplotype analyses. Results We present a systematic study exploring the relationship between seven widely used phasing methods and two common methods for determining haplotype blocks. The evaluation focuses on the number of haplotype blocks that are incorrectly phased. Insights from these results are used to develop a haplotype estimator based on a consensus of three tools. The consensus estimator achieved the most accurate phasing in all applied tests. Individually, EAGLE2, BEAGLE and SHAPEIT2 alternate in being the best performing tool in different scenarios. Determining haplotype blocks based on linkage disequilibrium leads to more correctly phased blocks compared to a sliding window approach. We find that there is little difference between phasing sections of a genome (e.g. a gene) compared to phasing entire chromosomes. Finally, we show that the location of phasing error varies when the tools are applied to the same data several times, a finding that could be important for downstream analyses. Conclusions The choice of phasing and block determination algorithms and their interaction impacts the accuracy of phased haplotype blocks. This work provides guidance and evidence for the different design choices needed for analyses using haplotype blocks. The study highlights a number of issues that may have limited the replicability of previous haplotype analysis.

Plants ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 148
Author(s):  
Camilo E. Valenzuela ◽  
Paulina Ballesta ◽  
Sunny Ahmar ◽  
Sajid Fiaz ◽  
Parviz Heidari ◽  
...  

The agricultural and forestry productivity of Mediterranean ecosystems is strongly threatened by the adverse effects of climate change, including an increase in severe droughts and changes in rainfall distribution. In the present study, we performed a genome-wide association study (GWAS) to identify single-nucleotide polymorphisms (SNPs) and haplotype blocks associated with the growth and wood quality of Eucalyptus cladocalyx, a tree species suitable for low-rainfall sites. The study was conducted in a progeny-provenance trial established in an arid site with Mediterranean patterns located in the southern Atacama Desert, Chile. A total of 87 SNPs and 3 haplotype blocks were significantly associated with the 6 traits under study (tree height, diameter at breast height, slenderness coefficient, first bifurcation height, stem straightness, and pilodyn penetration). In addition, 11 loci were identified as pleiotropic through Bayesian multivariate regression and were mainly associated with wood hardness, height, and diameter. In general, the GWAS revealed associations with genes related to primary metabolism and biosynthesis of cell wall components. Additionally, associations coinciding with stress response genes, such as GEM-related 5 and prohibitin-3, were detected. The findings of this study provide valuable information regarding genetic control of morphological traits related to adaptation to arid environments.


2018 ◽  
Vol 50 (12) ◽  
pp. 1051-1058 ◽  
Author(s):  
Samantha A. Brooks ◽  
John Stick ◽  
Ashley Braman ◽  
Katelyn Palermo ◽  
N. Edward Robinson ◽  
...  

Equine recurrent laryngeal neuropathy (RLN) is a bilateral mononeuropathy with an unknown etiology. In Thoroughbreds (TB), we previously demonstrated that the haplotype association for height (LCORL/NCAPG locus on ECA3, which affects body size) and RLN was coincident. In the present study, we performed a genome-wide association scan (GWAS) for RLN in 458 American Belgian Draft Horses, a breed fixed for the LCORL/NCAPG risk alelle. In this breed, RLN risk is associated with sexually dimorphic differences in height, and we identified a novel locus contributing to height in a sex-specific manner: MYPN (ECA1). Yet this specific locus contributes little to RLN risk, suggesting that other growth traits correlated to height may underlie the correlation to this disease. Controlling for height, we identified a locus on ECA15 contributing to RLN risk specifically in males. These results suggest that loci with sex-specific gene expression play an important role in altering growth traits impacting RLN etiology, but not necessarily adult height. These newly identified genes are promising targets for novel preventative and treatment strategies.


Genes ◽  
2020 ◽  
Vol 11 (5) ◽  
pp. 551
Author(s):  
Swati Srivastava ◽  
Krishnamoorthy Srikanth ◽  
Sohyoung Won ◽  
Ju-Hwan Son ◽  
Jong-Eun Park ◽  
...  

Hanwoo, is the most popular native beef cattle in South Korea. Due to its extensive popularity, research is ongoing to enhance its carcass quality and marbling traits. In this study we conducted a haplotype-based genome-wide association study (GWAS) by constructing haplotype blocks by three methods: number of single nucleotide polymorphisms (SNPs) in a haplotype block (nsnp), length of genomic region in kb (Len) and linkage disequilibrium (LD). Significant haplotype blocks and genes associated with them were identified for carcass traits such as BFT (back fat thickness), EMA (eye Muscle area), CWT (carcass weight) and MS (marbling score). Gene-set enrichment analysis and functional annotation of genes in the significantly-associated loci revealed candidate genes, including PLCB1 and PLCB4 present on BTA13, coding for phospholipases, which might be important candidates for increasing fat deposition due to their role in lipid metabolism and adipogenesis. CEL (carboxyl ester lipase), a bile-salt activated lipase, responsible for lipid catabolic process was also identified within the significantly-associated haplotype block on BTA11. The results were validated in a different Hanwoo population. The genes and pathways identified in this study may serve as good candidates for improving carcass traits in Hanwoo cattle.


2014 ◽  
Vol 60 (9) ◽  
pp. 557-568 ◽  
Author(s):  
Heng Xiang ◽  
Ruizhi Zhang ◽  
David De Koeyer ◽  
Guoqing Pan ◽  
Tian Li ◽  
...  

Microsporidia are a group of obligate intracellular eukaryotic parasites that infect a wide variety of species, including humans. Phylogenetic analysis indicates a relationship between the Microsporidia and the Fungi. However, most results are based on the analysis of relatively few genes. DarkHorse analysis involves the transformation of BLAST results into a lineage probability index (LPI) value and allows for the comparison of genes for an entire genome with those of other genomes. Thus, we can see which genes from the microsporidia score most closely based on the LPI with other eukaryotic organisms. In this analysis, we calculated the LPI for each gene from the genomes of 7 Microsporidia, Antonospora locustae, Enterocytozoon bieneusi, Encephalitozoon cuniculi, Encephalitozoon intestinalis, Nosema bombycis, Nosema ceranae, and Nematocida parisii, to analyze the genetic relationships between Microsporidia and other species. It was found that many (91%) genes were most closely correlated with genes from other microsporidial genomes and had the highest mean LPI (0.985), indicating a monophyletic origin of the Microsporidia. In a subsequent analysis, we excluded the other Microsporidia from the analysis to look for relationships before the divergence of Microsporidia, and found that 43% of the microsporidial genes scored highest with fungal genes, and a higher mean LPI was found with Fungi than with other kingdoms, suggesting that Microsporidia is closely related to Fungi at the genomic level. Microsporidial genes were functionally clustered based on the KOG (Eukaryotic COG) database, and the possible lineages for each gene family were discussed in concert with the DarkHorse results.


Genetics ◽  
2020 ◽  
Vol 216 (1) ◽  
pp. 27-41
Author(s):  
Simon Rio ◽  
Laurence Moreau ◽  
Alain Charcosset ◽  
Tristan Mary-Huard

Populations structured into genetic groups may display group-specific linkage disequilibrium, mutations, and/or interactions between quantitative trait loci and the genetic background. These factors lead to heterogeneous marker effects affecting the efficiency of genomic prediction, especially for admixed individuals. Such individuals have a genome that is a mosaic of chromosome blocks from different origins, and may be of interest to combine favorable group-specific characteristics. We developed two genomic prediction models adapted to the prediction of admixed individuals in presence of heterogeneous marker effects: multigroup admixed genomic best linear unbiased prediction random individual (MAGBLUP-RI), modeling the ancestry of alleles; and multigroup admixed genomic best linear unbiased prediction random allele effect (MAGBLUP-RAE), modeling group-specific distributions of allele effects. MAGBLUP-RI can estimate the segregation variance generated by admixture while MAGBLUP-RAE can disentangle the variability that is due to main allele effects from the variability that is due to group-specific deviation allele effects. Both models were evaluated for their genomic prediction accuracy using a maize panel including lines from the Dent and Flint groups, along with admixed individuals. Based on simulated traits, both models proved their efficiency to improve genomic prediction accuracy compared to standard GBLUP models. For real traits, a clear gain was observed at low marker densities whereas it became limited at high marker densities. The interest of including admixed individuals in multigroup training sets was confirmed using simulated traits, but was variable using real traits. Both MAGBLUP models and admixed individuals are of interest whenever group-specific SNP allele effects exist.


Science ◽  
2008 ◽  
Vol 322 (5909) ◽  
pp. 1855-1857 ◽  
Author(s):  
Yiping He ◽  
Bert Vogelstein ◽  
Victor E. Velculescu ◽  
Nickolas Papadopoulos ◽  
Kenneth W. Kinzler

Transcription in mammalian cells can be assessed at a genome-wide level, but it has been difficult to reliably determine whether individual transcripts are derived from the plus or minus strands of chromosomes. This distinction can be critical for understanding the relationship between known transcripts (sense) and the complementary antisense transcripts that may regulate them. Here, we describe a technique that can be used to (i) identify the DNA strand of origin for any particular RNA transcript, and (ii) quantify the number of sense and antisense transcripts from expressed genes at a global level. We examined five different human cell types and in each case found evidence for antisense transcripts in 2900 to 6400 human genes. The distribution of antisense transcripts was distinct from that of sense transcripts, was nonrandom across the genome, and differed among cell types. Antisense transcripts thus appear to be a pervasive feature of human cells, which suggests that they are a fundamental component of gene regulation.


1998 ◽  
Vol 8 (1) ◽  
pp. 23-29 ◽  
Author(s):  
Lucy Yardley

This review examines the relationship between dysequilibrium, falling and anxiety, and their combined impact on the lives of elderly people. More than one in four people aged over 69 fall each year, and a higher proportion of those over 74. Although only one in ten incurs serious injury as a direct result of the fall, fear of falling can often lead not only to psychological distress but also to restriction of activity and an unnecessary and undesirable loss of independence. Naturally, symptoms of imbalance constitute a key risk factor for falling.


2005 ◽  
Vol 03 (05) ◽  
pp. 1021-1038
Author(s):  
AO YUAN ◽  
GUANJIE CHEN ◽  
CHARLES ROTIMI ◽  
GEORGE E. BONNEY

The existence of haplotype blocks transmitted from parents to offspring has been suggested recently. This has created an interest in the inference of the block structure and length. The motivation is that haplotype blocks that are characterized well will make it relatively easier to quickly map all the genes carrying human diseases. To study the inference of haplotype block systematically, we propose a statistical framework. In this framework, the optimal haplotype block partitioning is formulated as the problem of statistical model selection; missing data can be handled in a standard statistical way; population strata can be implemented; block structure inference/hypothesis testing can be performed; prior knowledge, if present, can be incorporated to perform a Bayesian inference. The algorithm is linear in the number of loci, instead of NP-hard for many such algorithms. We illustrate the applications of our method to both simulated and real data sets.


2016 ◽  
Author(s):  
Thijs Janzen ◽  
Arne W. Nolte ◽  
Arne Traulsen

ABSTRACTWhen species originate through hybridization, the genomes of the ancestral species are blended together. Over time genomic blocks that originate from either one of the ancestral species accumulate in the hybrid genome through genetic recombination. Modeling the accumulation of ancestry blocks can elucidate processes and patterns of genomic admixture. However, previous models have ignored ancestry block dynamics for chromosomes that consist of a discrete, finite number of chromosomal elements. Here we present an analytical treatment of the dynamics of the mean number of blocks over time, for continuous and discrete chromosomes, in finite and infinite populations. We describe the mean number of haplotype blocks as a universal function dependent on population size, the number of genomic elements per chromosome, the number of recombination events, and the initial relative frequency of the ancestral species.


Sign in / Sign up

Export Citation Format

Share Document