WhatsHap: Haplotype Assembly for Future-Generation Sequencing Reads

Lecture Notes in Computer Science - Research in Computational Molecular Biology ◽

10.1007/978-3-319-05269-4_19 ◽

2014 ◽

pp. 237-249 ◽

Cited By ~ 15

Author(s):

Murray Patterson ◽

Tobias Marschall ◽

Nadia Pisanti ◽

Leo van Iersel ◽

Leen Stougie ◽

...

Keyword(s):

Future Generation ◽

Haplotype Assembly ◽

Generation Sequencing

Download Full-text

WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads

Journal of Computational Biology ◽

10.1089/cmb.2014.0157 ◽

2015 ◽

Vol 22 (6) ◽

pp. 498-509 ◽

Cited By ~ 103

Author(s):

Murray Patterson ◽

Tobias Marschall ◽

Nadia Pisanti ◽

Leo van Iersel ◽

Leen Stougie ◽

...

Keyword(s):

Future Generation ◽

Haplotype Assembly ◽

Generation Sequencing

Download Full-text

PWHATSHAP: efficient haplotyping for future generation sequencing

BMC Bioinformatics ◽

10.1186/s12859-016-1170-y ◽

2016 ◽

Vol 17 (S11) ◽

Cited By ~ 10

Author(s):

Andrea Bracciali ◽

Marco Aldinucci ◽

Murray Patterson ◽

Tobias Marschall ◽

Nadia Pisanti ◽

...

Keyword(s):

Future Generation ◽

Generation Sequencing

Download Full-text

Selecting Reads for Haplotype Assembly

10.1101/046771 ◽

2016 ◽

Cited By ~ 5

Author(s):

Sarah O. Fischer ◽

Tobias Marschall

Keyword(s):

Next Generation Sequencing ◽

Error Correction ◽

Heuristic Algorithm ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Minimum Error ◽

Haplotype Assembly ◽

Maximum Coverage ◽

Processing Step ◽

Generation Sequencing

AbstractHaplotype assembly or read-based phasing is the problem of reconstructing both haplotypes of a diploid genome from next-generation sequencing data. This problem is formalized as the Minimum Error Correction (MEC) problem and can be solved using algorithms such as WhatsHap. The runtime of WhatsHap is exponential in the maximum coverage, which is hence controlled in a pre-processing step that selects reads to be used for phasing. Here, we report on a heuristic algorithm designed to choose beneficial reads for phasing, in particular to increase the connectivity of the phased blocks and the number of correctly phased variants compared to the random selection previously employed in by WhatsHap. The algorithm we describe has been integrated into the WhatsHap software, which is available under MIT licence from https://bitbucket.org/whatshap/whatshap.

Download Full-text

MAtCHap: an ultra fast algorithm for solving the single individual haplotype assembly problem

10.1101/860262 ◽

2019 ◽

Author(s):

Alberto Magi

Keyword(s):

Fast Algorithm ◽

Great Majority ◽

Single Individual ◽

High Coverage ◽

Haplotype Assembly ◽

Long Reads ◽

Second Generation Sequencing ◽

Assembly Problem ◽

Sequencing Platforms ◽

Generation Sequencing

AbstractBackgroundHuman genomes are diploid, which means they have two homologous copies of each chromosome and the assignment of heterozygous variants to each chromosome copy, the haplotype assembly problem, is of fundamental importance for medical and population genetics.While short reads from second generation sequencing platforms drastically limit haplotype reconstruction as the great majority of reads do not allow to link many variants together, novel long reads from third generation sequencing can span several variants along the genome allowing to infer much longer haplotype blocks.However, the great majority of haplotype assembly algorithms, originally devised for short sequences, fail when they are applied to noisy long reads data, and although novel algorithm have been properly developed to deal with the properties of this new generation of sequences, these methods are capable to manage only datasets with limited coverages.ResultsTo overcome the limits of currently available algorithms, I propose a novel formulation of the single individual haplotype assembly problem, based on maximum allele co-occurrence (MAC) and I develop an ultra-fast algorithm that is capable to reconstruct the haplotype structure of a diploid genome from low- and high-coverage long read datasets with high accuracy. I test my algorithm (MAtCHap) on synthetic and real PacBio and Nanopore human dataset and I compare its result with other eight state-of-the-art algorithms. All the results obtained by these analyses show that MAtCHap outperforms other methods in terms of accuracy, contiguity, completeness and computational speed.AvailabilityMAtCHap is publicly available at https://sourceforge.net/projects/matchap/.

Download Full-text