scholarly journals WhatsHap: Haplotype Assembly for Future-Generation Sequencing Reads

Author(s):  
Murray Patterson ◽  
Tobias Marschall ◽  
Nadia Pisanti ◽  
Leo van Iersel ◽  
Leen Stougie ◽  
...  
2015 ◽  
Vol 22 (6) ◽  
pp. 498-509 ◽  
Author(s):  
Murray Patterson ◽  
Tobias Marschall ◽  
Nadia Pisanti ◽  
Leo van Iersel ◽  
Leen Stougie ◽  
...  

2016 ◽  
Vol 17 (S11) ◽  
Author(s):  
Andrea Bracciali ◽  
Marco Aldinucci ◽  
Murray Patterson ◽  
Tobias Marschall ◽  
Nadia Pisanti ◽  
...  

2016 ◽  
Author(s):  
Sarah O. Fischer ◽  
Tobias Marschall

AbstractHaplotype assembly or read-based phasing is the problem of reconstructing both haplotypes of a diploid genome from next-generation sequencing data. This problem is formalized as the Minimum Error Correction (MEC) problem and can be solved using algorithms such as WhatsHap. The runtime of WhatsHap is exponential in the maximum coverage, which is hence controlled in a pre-processing step that selects reads to be used for phasing. Here, we report on a heuristic algorithm designed to choose beneficial reads for phasing, in particular to increase the connectivity of the phased blocks and the number of correctly phased variants compared to the random selection previously employed in by WhatsHap. The algorithm we describe has been integrated into the WhatsHap software, which is available under MIT licence from https://bitbucket.org/whatshap/whatshap.


2019 ◽  
Author(s):  
Alberto Magi

AbstractBackgroundHuman genomes are diploid, which means they have two homologous copies of each chromosome and the assignment of heterozygous variants to each chromosome copy, the haplotype assembly problem, is of fundamental importance for medical and population genetics.While short reads from second generation sequencing platforms drastically limit haplotype reconstruction as the great majority of reads do not allow to link many variants together, novel long reads from third generation sequencing can span several variants along the genome allowing to infer much longer haplotype blocks.However, the great majority of haplotype assembly algorithms, originally devised for short sequences, fail when they are applied to noisy long reads data, and although novel algorithm have been properly developed to deal with the properties of this new generation of sequences, these methods are capable to manage only datasets with limited coverages.ResultsTo overcome the limits of currently available algorithms, I propose a novel formulation of the single individual haplotype assembly problem, based on maximum allele co-occurrence (MAC) and I develop an ultra-fast algorithm that is capable to reconstruct the haplotype structure of a diploid genome from low- and high-coverage long read datasets with high accuracy. I test my algorithm (MAtCHap) on synthetic and real PacBio and Nanopore human dataset and I compare its result with other eight state-of-the-art algorithms. All the results obtained by these analyses show that MAtCHap outperforms other methods in terms of accuracy, contiguity, completeness and computational speed.AvailabilityMAtCHap is publicly available at https://sourceforge.net/projects/matchap/.


2013 ◽  
Vol 33 (3) ◽  
pp. 685-704 ◽  
Author(s):  
Benjamin C. Kirkup ◽  
Steven Mahlen ◽  
George Kallstrom

2012 ◽  
Vol 42 (9) ◽  
pp. 8
Author(s):  
PETER HULICK

Sign in / Sign up

Export Citation Format

Share Document