WGA-LP: a pipeline for Whole Genome Assembly of contaminated reads
Summary: Whole Genome Assembly (WGA) of bacterial genomes with short reads is a quite common task as DNA sequencing has become cheaper with the advances of its technology. The process of assembling a genome has no absolute golden standard (Del Angel et al. (2018)) and it requires to perform a sequence of steps each of which can involve combinations of many different tools. However, the quality of the final assembly is always strongly related to the quality of the input data. With this in mind we built WGA-LP, a package that connects state-of-art programs and novel scripts to check and improve the quality of both samples and resulting assemblies. WGA-LP, with its conservative decontamination approach, has shown to be capable of creating high quality assemblies even in the case of contaminated reads. Availability and Implementation: WGA-LP is available on GitHub (https://github.com/redsnic/WGA-LP) and Docker Hub (https://hub.docker.com/r/redsnic/wgalp). The web app for node visualization is hosted by shinyapps.io (https://redsnic.shinyapps.io/ContigCoverageVisualizer/). Contact: Nicolò Rossi, [email protected] Supplementary information: Supplementary data are available at bioRxiv online.