Assessing long-read sequencing with Nanopore R9, R10, and PacBio CCS to obtain high-quality metagenome assembled genomes from complex microbial communities
Short-read DNA sequencing has led to a massive growth of genome databases but mainly with highly fragmented metagenome assembled genomes from environmental systems. The fragmentation is a result of closely related species, strains, and genome repeats that cannot be resolved with short reads. To confidently explore the functional potential of a microbial community, high-quality reference genomes are needed. In this study, we evaluated the use of different combinations of short (Illumina) and long-read technologies (Nanopore R9.4, R10.3, and PacBio CCS) for recovering high-quality metagenome assembled genomes (HQ MAGs) from a complex microbial community (anaerobic digester). Depending on the sequencing approach, 33 to 86 HQ MAGs (encompassing up to 34 % of the assembly and 49 % of the reads) were recovered using long reads, with Nanopore R9 featuring the lowest sequencing costs per HQ MAG recovered. PacBio CCS was also found to be an effective platform for genome-centric metagenomics (74 HQ MAGs) and produced HQ MAGs with the lowest fragmentation (median of 9 contigs) as a stand-alone technology. Using PacBio CCS MAGs as reference, we show that, although a high number of high-quality MAGs can be generated using Nanopore R9, systematic indel errors are still present, which can lead to truncated gene calling. However, polishing the Nanopore MAGs with short-read Illumina data, enabled recovery of MAGs with similar quality as MAGs from PacBio CCS.