haplotype reconstruction Latest Research Papers

Background: Accurate haplotype reconstruction is required in many applications in quantitative and population genomics. Different phasing methods are available but their accuracy must be evaluated for samples with different properties (population structure, marker density, etc.). We herein took advantage of whole-genome sequence data available for a Holstein cattle pedigree containing 264 individuals, including 98 trios, to evaluate several population-based phasing methods. This data represents a typical example of a livestock population, with low effective population size, high levels of relatedness and long-range linkage disequilibrium. Results: After stringent filtering of our sequence data, we evaluated several population-based phasing programs including one or more versions of AlphaPhase, ShapeIT, Beagle, Eagle and FImpute. To that end we used 98 individuals having both parents sequenced for validation. Their haplotypes reconstructed based on Mendelian segregation rules were considered the gold standard to assess the performance of population-based methods in two scenarios. In the first one, only these 98 individuals were phased, while in the second one, all the 264 sequenced individuals were phased simultaneously, ignoring the pedigree relationships. We assessed phasing accuracy based on switch error counts (SEC) and rates (SER), lengths of correctly phased haplotypes and pairwise SNP phasing accuracies (the probability that a pair of SNPs is correctly phased as a function of their distance). For most evaluated metrics or scenarios, the best software was either ShapeIT4.1 or Beagle5.2, both methods resulting in particularly high phasing accuracies. For instance, ShapeIT4.1 achieved a median SEC of 50 per individual and a mean haplotype block length of 24.1 Mb in the second scenario. These statistics are remarkable since the methods were evaluated with a map of 8,400,000 SNPs, and this corresponds to only one switch error every 40,000 phased informative markers. When more relatives were included in the data, FImpute3.0 reconstructed extremely long segments without errors. Conclusions: We report extremely high phasing accuracies in a typical livestock sample of 100 sequenced individuals. ShapeIT4.1 and Beagle5.2 proved to be the most accurate, particularly for phasing long segments. Nevertheless, most tools achieved high accuracy at short distances and would be suitable for applications requiring only local haplotypes.

Download Full-text

The Practical Haplotype Graph, a platform for storing and using pangenomes for imputation

10.1101/2021.08.27.457652 ◽

2021 ◽

Author(s):

Peter Bradbury ◽

Terry Casstevens ◽

Sarah E Jensen ◽

Lynn C Johnson ◽

Zachary R Miller ◽

...

Keyword(s):

Data Model ◽

Reference Genome ◽

Source Code ◽

Simulated Data ◽

Genomic Diversity ◽

Graph Representation ◽

Haplotype Reconstruction ◽

Diverse Species ◽

Data Store ◽

Simulation Results

Motivation: Pangenomes provide novel insights for population and quantitative genetics, genomics, and breeding not available from studying a single reference genome. Instead, a species is better represented by a pangenome or collection of genomes. Unfortunately, managing and using pangenomes for genomically diverse species is computationally and practically challenging. We developed a trellis graph representation anchored to the reference genome that represents most pangenomes well and can be used to impute complete genomes from low density sequence or variant data. Results: The Practical Haplotype Graph (PHG) is a pangenome pipeline, database (PostGRES & SQLite), data model (Java, Kotlin, or R), and Breeding API (BrAPI) web service. The PHG has already been able to accurately represent diversity in four major crops including maize, one of the most genomically diverse species, with up to 1000-fold data compression. Using simulated data, we show that, at even 0.1X coverage, with appropriate reads and sequence alignment, imputation results in extremely accurate haplotype reconstruction. The PHG is a platform and environment for the understanding and application of genomic diversity. Availability: All resources listed here are freely available. The PHG Docker used to generate the simulation results is https://hub.docker.com/ as maizegenetics/phg:0.0.27. PHG source code is at https://bitbucket.org/bucklerlab/practicalhaplotypegraph/src/master/. The code used for the analysis of simulated data is at https://bitbucket.org/bucklerlab/phg-manuscript/src/master/. The PHG database of NAM parent haplotypes is in the CyVerse data store (https://de.cyverse.org/de/) and named /iplant/home/shared/panzea/panGenome/PHG_db_maize/phg_v5Assemblies_20200608.db.

Download Full-text

Haplotype Reconstruction in Connected Tetraploid F1 Populations

Genetics ◽

10.1093/genetics/iyab106 ◽

2021 ◽

Author(s):

Chaozhi Zheng ◽

Rodrigo R Amadeu ◽

Patricio R Munoz ◽

Jeffrey B Endelman

Keyword(s):

Physical Map ◽

Diploid Species ◽

Read Depth ◽

Haplotype Reconstruction ◽

Single Nucleotide ◽

Mating Design ◽

Statistical Framework ◽

Breeding Populations ◽

Snp Data ◽

Diallel Mating Design

Abstract In diploid species, many multiparental populations have been developed to increase genetic diversity and quantitative trait loci (QTL) mapping resolution. In these populations, haplotype reconstruction has been used as a standard practice to increase the power of QTL detection in comparison with the marker-based association analysis. However, such software tools for polyploid species are few and limited to a single biparental F1 population. In this paper, a statistical framework for haplotype reconstruction has been developed and implemented in the software PolyOrigin for connected tetraploid F1 populations with shared parents, regardless of the number of parents or mating design. Given a genetic or physical map of markers, PolyOrigin first phases parental genotypes, then refines the input marker map, and finally reconstructs offspring haplotypes. PolyOrigin can utilize single nucleotide polymorphism (SNP) data coming from arrays or from sequence-based genotyping; in the latter case, bi-allelic read counts can be used (and are preferred) as input data to minimize the influence of genotype calling errors at low depth. With extensive simulation we show that PolyOrigin is robust to the errors in the input genotypic data and marker map. It works well for various population designs with ≥ offspring per parent and for sequences with read depth as low as 10x. PolyOrigin was further evaluated using an autotetraploid potato dataset with a 3 × 3 half-diallel mating design. In conclusion, PolyOrigin opens up exciting new possibilities for haplotype analysis in tetraploid breeding populations.

Download Full-text

A fuzzy c-means clustering approach for haplotype reconstruction based on minimum error correction

Informatics in Medicine Unlocked ◽

10.1016/j.imu.2021.100646 ◽

2021 ◽

pp. 100646

Author(s):

Mohammad Hossein Olyaee ◽

Alireza Khanteymoori ◽

Ebrahim Fazli

Keyword(s):

Error Correction ◽

Haplotype Reconstruction ◽

Minimum Error ◽

Fuzzy C Means ◽

Fuzzy C Means Clustering ◽

Clustering Approach

Download Full-text

Multiple haplotype reconstruction from allele frequency data

Nature Computational Science ◽

10.1038/s43588-021-00056-5 ◽

2021 ◽

Author(s):

Marta Pelizzola ◽

Merle Behr ◽

Housen Li ◽

Axel Munk ◽

Andreas Futschik

Keyword(s):

Allele Frequency ◽

Haplotype Reconstruction ◽

Frequency Data ◽

Allele Frequency Data

Download Full-text

Computational methods for chromosome-scale haplotype reconstruction

Genome Biology ◽

10.1186/s13059-021-02328-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Shilpa Garg

Keyword(s):

Genetic Variation ◽

Computational Methods ◽

Whole Genome ◽

Haplotype Reconstruction ◽

High Quality ◽

Short Read ◽

Short Read Sequencing ◽

Sequencing Technologies ◽

Long Read ◽

Haplotype Information

AbstractHigh-quality chromosome-scale haplotype sequences of diploid genomes, polyploid genomes, and metagenomes provide important insights into genetic variation associated with disease and biodiversity. However, whole-genome short read sequencing does not yield haplotype information spanning whole chromosomes directly. Computational assembly of shorter haplotype fragments is required for haplotype reconstruction, which can be challenging owing to limited fragment lengths and high haplotype and repeat variability across genomes. Recent advancements in long-read and chromosome-scale sequencing technologies, alongside computational innovations, are improving the reconstruction of haplotypes at the level of whole chromosomes. Here, we review recent and discuss methodological progress and perspectives in these areas.

Download Full-text

GAMIBHEAR: whole-genome haplotype reconstruction from Genome Architecture Mapping data

Bioinformatics ◽

10.1093/bioinformatics/btab238 ◽

2021 ◽

Author(s):

Julia Markowski ◽

Rieke Kempfer ◽

Alexander Kukalev ◽

Ibai Irastorza-Azcarate ◽

Gesa Loof ◽

...

Keyword(s):

Genetic Variants ◽

Embryonic Stem Cell Line ◽

Embryonic Stem ◽

Mouse Embryonic Stem Cell ◽

Supplementary Information ◽

Model Organisms ◽

Genome Architecture ◽

Chromatin Conformation ◽

Haplotype Reconstruction ◽

Allele Specific

Abstract Motivation Genome Architecture Mapping (GAM) was recently introduced as a digestion- and ligation-free method to detect chromatin conformation. Orthogonal to existing approaches based on chromatin conformation capture (3C), GAM’s ability to capture both inter- and intra-chromosomal contacts from low amounts of input data makes it particularly well suited for allele-specific analyses in a clinical setting. Allele-specific analyses are powerful tools to investigate the effects of genetic variants on many cellular phenotypes including chromatin conformation, but require the haplotypes of the individuals under study to be known a-priori. So far however, no algorithm exists for haplotype reconstruction and phasing of genetic variants from GAM data, hindering the allele-specific analysis of chromatin contact points in non-model organisms or individuals with unknown haplotypes. Results We present GAMIBHEAR, a tool for accurate haplotype reconstruction from GAM data. GAMIBHEAR aggregates allelic co-observation frequencies from GAM data and employs a GAM-specific probabilistic model of haplotype capture to optimise phasing accuracy. Using a hybrid mouse embryonic stem cell line with known haplotype structure as a benchmark dataset, we assess correctness and completeness of the reconstructed haplotypes, and demonstrate the power of GAMIBHEAR to infer accurate genome-wide haplotypes from GAM data. Availability GAMIBHEAR is available as an R package under the open source GPL-2 license at https://bitbucket.org/schwarzlab/gamibhear Maintainer [email protected] Supplementary information Supplementary information is available at Bioinformatics online.

Download Full-text

Computational Methods for Chromosome-Scale Haplotype Reconstruction

10.20944/preprints202101.0116.v1 ◽

2021 ◽

Author(s):

Shilpa Garg

Keyword(s):

Genetic Variation ◽

Whole Genome ◽

Haplotype Reconstruction ◽

High Quality ◽

Short Read ◽

Short Read Sequencing ◽

Sequencing Technologies ◽

Evolutionary Studies ◽

Long Read ◽

Haplotype Information

High-quality chromosome-scale haplotype sequences— of diploid genomes, polyploid genomes and metagenomes — provide important insights into genetic variation associated with disease and biodiversity. However, whole-genome short read sequencing does not yield haplotype information that spans whole chromosomes directly. Computational assembly of shorter haplotype fragments is required for haplotype reconstruction, which can be challenging owing to limited fragment lengths and high haplotype and repeat variability across genomes. Recent advancements in long-read and chromosome-scale sequencing technologies, alongside computational innovations, are improving the reconstruction of haplotypes at the level of whole chromosomes. Here, we review recent methodological progress in these areas and discuss perspectives that could enable routine high-quality haplotype reconstruction in clinical and evolutionary studies.

Download Full-text

HAPHPIPE: Haplotype Reconstruction and Phylodynamics for Deep Sequencing of Intra-Host Viral Populations

Molecular Biology and Evolution ◽

10.1093/molbev/msaa315 ◽

2020 ◽

Author(s):

Matthew L Bendall ◽

Keylie M Gibson ◽

Margaret C Steiner ◽

Uzma Rentia ◽

Marcos Pérez-Losada ◽

...

Keyword(s):

Deep Sequencing ◽

De Novo ◽

Consensus Sequence ◽

Haplotype Reconstruction ◽

Consensus Sequences ◽

Genome Wide ◽

Genomic Regions ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

Abstract Deep sequencing of viral populations using next generation sequencing (NGS) offers opportunities to understand and investigate evolution, transmission dynamics, and population genetics. Currently, the standard practice for processing NGS data to study viral populations is to summarize all the observed sequences from a sample as a single consensus sequence, thus discarding valuable information about the intra-host viral molecular epidemiology. Furthermore, existing analytical pipelines may only analyze genomic regions involved in drug resistance, thus are not suited for full viral genome analysis. Here we present HAPHPIPE, a HAplotype and PHylodynamics PIPEline for genome-wide assembly of viral consensus sequences and haplotypes. The HAPHPIPE protocol includes modules for quality trimming, error correction, de novo assembly, alignment, and haplotype reconstruction. The resulting consensus sequences, haplotypes, and alignments can be further analyzed using a variety of phylogenetic and population genetic software. HAPHPIPE is designed to provide users with a single pipeline to rapidly analyze sequences from viral populations generated from NGS platforms and provide quality output properly formatted for downstream evolutionary analyses.

Download Full-text

Haplotype Reconstruction in Connected Tetraploid F1 Populations

10.1101/2020.12.18.423519 ◽

2020 ◽

Cited By ~ 1

Author(s):

Chaozhi Zheng ◽

Rodrigo R. Amadeu ◽

Patricio R. Munoz ◽

Jeffrey B. Endelman

Keyword(s):

Diploid Species ◽

Haplotype Reconstruction ◽

Single Nucleotide ◽

Ploidy Levels ◽

Mating Design ◽

Statistical Framework ◽

Breeding Populations ◽

Diallel Mating Design ◽

Half Diallel ◽

Genotype Probabilities

AbstractIn diploid species, many multi-parental populations have been developed to increase genetic diversity and quantitative trait loci (QTL) mapping resolution. In these populations, haplotype reconstruction has been used as a standard practice to increase QTL detection power in comparison with the marker-based association analysis. To realize similar benefits in tetraploid species (and eventually higher ploidy levels), a statistical framework for haplotype reconstruction has been developed and implemented in the software PolyOrigin for connected tetraploid F1 populations with shared parents. Haplotype reconstruction proceeds in two steps: first, parental genotypes are phased based on multi-locus linkage analysis; second, genotype probabilities for the parental alleles are inferred in the progeny. PolyOrigin can utilize genetic marker data from single nucleotide polymorphism (SNP) arrays or from sequence-based genotyping; in the latter case, bi-allelic read counts can be used (and are preferred) as input data to minimize the influence of genotype call errors at low depth. To account for errors in the input map, PolyOrigin includes functionality for filtering markers, inferring inter-marker distances, and refining local marker ordering. Simulation studies were used to investigate the effect of several variables on the accuracy of haplotype reconstruction, including the mating design, the number of parents, population size, and sequencing depth. PolyOrigin was further evaluated using an autotetraploid potato dataset with a 3×3 half-diallel mating design. In conclusion, PolyOrigin opens up exciting new possibilities for haplotype analysis in tetraploid breeding populations.

Download Full-text

haplotype reconstruction
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Benchmarking phasing software with a whole-genome sequenced cattle pedigree

The Practical Haplotype Graph, a platform for storing and using pangenomes for imputation

Haplotype Reconstruction in Connected Tetraploid F1 Populations

A fuzzy c-means clustering approach for haplotype reconstruction based on minimum error correction

Multiple haplotype reconstruction from allele frequency data

Computational methods for chromosome-scale haplotype reconstruction

GAMIBHEAR: whole-genome haplotype reconstruction from Genome Architecture Mapping data

Computational Methods for Chromosome-Scale Haplotype Reconstruction

HAPHPIPE: Haplotype Reconstruction and Phylodynamics for Deep Sequencing of Intra-Host Viral Populations

Haplotype Reconstruction in Connected Tetraploid F1 Populations

Export Citation Format

haplotype reconstructionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Benchmarking phasing software with a whole-genome sequenced cattle pedigree

The Practical Haplotype Graph, a platform for storing and using pangenomes for imputation

Haplotype Reconstruction in Connected Tetraploid F1 Populations

A fuzzy c-means clustering approach for haplotype reconstruction based on minimum error correction

Multiple haplotype reconstruction from allele frequency data

Computational methods for chromosome-scale haplotype reconstruction

GAMIBHEAR: whole-genome haplotype reconstruction from Genome Architecture Mapping data

Computational Methods for Chromosome-Scale Haplotype Reconstruction

HAPHPIPE: Haplotype Reconstruction and Phylodynamics for Deep Sequencing of Intra-Host Viral Populations

Haplotype Reconstruction in Connected Tetraploid F1 Populations

haplotype reconstruction
Recently Published Documents