haplotype reconstruction
Recently Published Documents


TOTAL DOCUMENTS

106
(FIVE YEARS 25)

H-INDEX

19
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Claire Oget-Ebrad ◽  
Naveen Kumar Kadri ◽  
Gabriel Costa Monteiro Moreira ◽  
Latifa Karim ◽  
Wouter Coppieters ◽  
...  

Background: Accurate haplotype reconstruction is required in many applications in quantitative and population genomics. Different phasing methods are available but their accuracy must be evaluated for samples with different properties (population structure, marker density, etc.). We herein took advantage of whole-genome sequence data available for a Holstein cattle pedigree containing 264 individuals, including 98 trios, to evaluate several population-based phasing methods. This data represents a typical example of a livestock population, with low effective population size, high levels of relatedness and long-range linkage disequilibrium. Results: After stringent filtering of our sequence data, we evaluated several population-based phasing programs including one or more versions of AlphaPhase, ShapeIT, Beagle, Eagle and FImpute. To that end we used 98 individuals having both parents sequenced for validation. Their haplotypes reconstructed based on Mendelian segregation rules were considered the gold standard to assess the performance of population-based methods in two scenarios. In the first one, only these 98 individuals were phased, while in the second one, all the 264 sequenced individuals were phased simultaneously, ignoring the pedigree relationships. We assessed phasing accuracy based on switch error counts (SEC) and rates (SER), lengths of correctly phased haplotypes and pairwise SNP phasing accuracies (the probability that a pair of SNPs is correctly phased as a function of their distance). For most evaluated metrics or scenarios, the best software was either ShapeIT4.1 or Beagle5.2, both methods resulting in particularly high phasing accuracies. For instance, ShapeIT4.1 achieved a median SEC of 50 per individual and a mean haplotype block length of 24.1 Mb in the second scenario. These statistics are remarkable since the methods were evaluated with a map of 8,400,000 SNPs, and this corresponds to only one switch error every 40,000 phased informative markers. When more relatives were included in the data, FImpute3.0 reconstructed extremely long segments without errors. Conclusions: We report extremely high phasing accuracies in a typical livestock sample of 100 sequenced individuals. ShapeIT4.1 and Beagle5.2 proved to be the most accurate, particularly for phasing long segments. Nevertheless, most tools achieved high accuracy at short distances and would be suitable for applications requiring only local haplotypes.


2021 ◽  
Author(s):  
Peter Bradbury ◽  
Terry Casstevens ◽  
Sarah E Jensen ◽  
Lynn C Johnson ◽  
Zachary R Miller ◽  
...  

Motivation: Pangenomes provide novel insights for population and quantitative genetics, genomics, and breeding not available from studying a single reference genome. Instead, a species is better represented by a pangenome or collection of genomes. Unfortunately, managing and using pangenomes for genomically diverse species is computationally and practically challenging. We developed a trellis graph representation anchored to the reference genome that represents most pangenomes well and can be used to impute complete genomes from low density sequence or variant data. Results: The Practical Haplotype Graph (PHG) is a pangenome pipeline, database (PostGRES & SQLite), data model (Java, Kotlin, or R), and Breeding API (BrAPI) web service. The PHG has already been able to accurately represent diversity in four major crops including maize, one of the most genomically diverse species, with up to 1000-fold data compression. Using simulated data, we show that, at even 0.1X coverage, with appropriate reads and sequence alignment, imputation results in extremely accurate haplotype reconstruction. The PHG is a platform and environment for the understanding and application of genomic diversity. Availability: All resources listed here are freely available. The PHG Docker used to generate the simulation results is https://hub.docker.com/ as maizegenetics/phg:0.0.27. PHG source code is at https://bitbucket.org/bucklerlab/practicalhaplotypegraph/src/master/. The code used for the analysis of simulated data is at https://bitbucket.org/bucklerlab/phg-manuscript/src/master/. The PHG database of NAM parent haplotypes is in the CyVerse data store (https://de.cyverse.org/de/) and named /iplant/home/shared/panzea/panGenome/PHG_db_maize/phg_v5Assemblies_20200608.db.


Genetics ◽  
2021 ◽  
Author(s):  
Chaozhi Zheng ◽  
Rodrigo R Amadeu ◽  
Patricio R Munoz ◽  
Jeffrey B Endelman

Abstract In diploid species, many multiparental populations have been developed to increase genetic diversity and quantitative trait loci (QTL) mapping resolution. In these populations, haplotype reconstruction has been used as a standard practice to increase the power of QTL detection in comparison with the marker-based association analysis. However, such software tools for polyploid species are few and limited to a single biparental F1 population. In this paper, a statistical framework for haplotype reconstruction has been developed and implemented in the software PolyOrigin for connected tetraploid F1 populations with shared parents, regardless of the number of parents or mating design. Given a genetic or physical map of markers, PolyOrigin first phases parental genotypes, then refines the input marker map, and finally reconstructs offspring haplotypes. PolyOrigin can utilize single nucleotide polymorphism (SNP) data coming from arrays or from sequence-based genotyping; in the latter case, bi-allelic read counts can be used (and are preferred) as input data to minimize the influence of genotype calling errors at low depth. With extensive simulation we show that PolyOrigin is robust to the errors in the input genotypic data and marker map. It works well for various population designs with ≥ offspring per parent and for sequences with read depth as low as 10x. PolyOrigin was further evaluated using an autotetraploid potato dataset with a 3 × 3 half-diallel mating design. In conclusion, PolyOrigin opens up exciting new possibilities for haplotype analysis in tetraploid breeding populations.


Author(s):  
Marta Pelizzola ◽  
Merle Behr ◽  
Housen Li ◽  
Axel Munk ◽  
Andreas Futschik

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Shilpa Garg

AbstractHigh-quality chromosome-scale haplotype sequences of diploid genomes, polyploid genomes, and metagenomes provide important insights into genetic variation associated with disease and biodiversity. However, whole-genome short read sequencing does not yield haplotype information spanning whole chromosomes directly. Computational assembly of shorter haplotype fragments is required for haplotype reconstruction, which can be challenging owing to limited fragment lengths and high haplotype and repeat variability across genomes. Recent advancements in long-read and chromosome-scale sequencing technologies, alongside computational innovations, are improving the reconstruction of haplotypes at the level of whole chromosomes. Here, we review recent and discuss methodological progress and perspectives in these areas.


Author(s):  
Julia Markowski ◽  
Rieke Kempfer ◽  
Alexander Kukalev ◽  
Ibai Irastorza-Azcarate ◽  
Gesa Loof ◽  
...  

Abstract Motivation Genome Architecture Mapping (GAM) was recently introduced as a digestion- and ligation-free method to detect chromatin conformation. Orthogonal to existing approaches based on chromatin conformation capture (3C), GAM’s ability to capture both inter- and intra-chromosomal contacts from low amounts of input data makes it particularly well suited for allele-specific analyses in a clinical setting. Allele-specific analyses are powerful tools to investigate the effects of genetic variants on many cellular phenotypes including chromatin conformation, but require the haplotypes of the individuals under study to be known a-priori. So far however, no algorithm exists for haplotype reconstruction and phasing of genetic variants from GAM data, hindering the allele-specific analysis of chromatin contact points in non-model organisms or individuals with unknown haplotypes. Results We present GAMIBHEAR, a tool for accurate haplotype reconstruction from GAM data. GAMIBHEAR aggregates allelic co-observation frequencies from GAM data and employs a GAM-specific probabilistic model of haplotype capture to optimise phasing accuracy. Using a hybrid mouse embryonic stem cell line with known haplotype structure as a benchmark dataset, we assess correctness and completeness of the reconstructed haplotypes, and demonstrate the power of GAMIBHEAR to infer accurate genome-wide haplotypes from GAM data. Availability GAMIBHEAR is available as an R package under the open source GPL-2 license at https://bitbucket.org/schwarzlab/gamibhear Maintainer [email protected] Supplementary information Supplementary information is available at Bioinformatics online.


Author(s):  
Shilpa Garg

High-quality chromosome-scale haplotype sequences— of diploid genomes, polyploid genomes and metagenomes — provide important insights into genetic variation associated with disease and biodiversity. However, whole-genome short read sequencing does not yield haplotype information that spans whole chromosomes directly. Computational assembly of shorter haplotype fragments is required for haplotype reconstruction, which can be challenging owing to limited fragment lengths and high haplotype and repeat variability across genomes. Recent advancements in long-read and chromosome-scale sequencing technologies, alongside computational innovations, are improving the reconstruction of haplotypes at the level of whole chromosomes. Here, we review recent methodological progress in these areas and discuss perspectives that could enable routine high-quality haplotype reconstruction in clinical and evolutionary studies.


Author(s):  
Matthew L Bendall ◽  
Keylie M Gibson ◽  
Margaret C Steiner ◽  
Uzma Rentia ◽  
Marcos Pérez-Losada ◽  
...  

Abstract Deep sequencing of viral populations using next generation sequencing (NGS) offers opportunities to understand and investigate evolution, transmission dynamics, and population genetics. Currently, the standard practice for processing NGS data to study viral populations is to summarize all the observed sequences from a sample as a single consensus sequence, thus discarding valuable information about the intra-host viral molecular epidemiology. Furthermore, existing analytical pipelines may only analyze genomic regions involved in drug resistance, thus are not suited for full viral genome analysis. Here we present HAPHPIPE, a HAplotype and PHylodynamics PIPEline for genome-wide assembly of viral consensus sequences and haplotypes. The HAPHPIPE protocol includes modules for quality trimming, error correction, de novo assembly, alignment, and haplotype reconstruction. The resulting consensus sequences, haplotypes, and alignments can be further analyzed using a variety of phylogenetic and population genetic software. HAPHPIPE is designed to provide users with a single pipeline to rapidly analyze sequences from viral populations generated from NGS platforms and provide quality output properly formatted for downstream evolutionary analyses.


Author(s):  
Chaozhi Zheng ◽  
Rodrigo R. Amadeu ◽  
Patricio R. Munoz ◽  
Jeffrey B. Endelman

AbstractIn diploid species, many multi-parental populations have been developed to increase genetic diversity and quantitative trait loci (QTL) mapping resolution. In these populations, haplotype reconstruction has been used as a standard practice to increase QTL detection power in comparison with the marker-based association analysis. To realize similar benefits in tetraploid species (and eventually higher ploidy levels), a statistical framework for haplotype reconstruction has been developed and implemented in the software PolyOrigin for connected tetraploid F1 populations with shared parents. Haplotype reconstruction proceeds in two steps: first, parental genotypes are phased based on multi-locus linkage analysis; second, genotype probabilities for the parental alleles are inferred in the progeny. PolyOrigin can utilize genetic marker data from single nucleotide polymorphism (SNP) arrays or from sequence-based genotyping; in the latter case, bi-allelic read counts can be used (and are preferred) as input data to minimize the influence of genotype call errors at low depth. To account for errors in the input map, PolyOrigin includes functionality for filtering markers, inferring inter-marker distances, and refining local marker ordering. Simulation studies were used to investigate the effect of several variables on the accuracy of haplotype reconstruction, including the mating design, the number of parents, population size, and sequencing depth. PolyOrigin was further evaluated using an autotetraploid potato dataset with a 3×3 half-diallel mating design. In conclusion, PolyOrigin opens up exciting new possibilities for haplotype analysis in tetraploid breeding populations.


Sign in / Sign up

Export Citation Format

Share Document