scholarly journals Chromosome-Scale Genome Assembly of Fusarium oxysporum Strain Fo47, a Fungal Endophyte and Biocontrol Agent

2020 ◽  
Vol 33 (9) ◽  
pp. 1108-1111 ◽  
Author(s):  
Bo Wang ◽  
Houlin Yu ◽  
Yanyan Jia ◽  
Quanbin Dong ◽  
Christian Steinberg ◽  
...  

Here, we report a chromosome-level genome assembly of Fusarium oxysporum Fo47 (12 pseudomolecules; contig N50: 4.52 Mb), generated using a combination of PacBio long-read, Illumina paired end, and high-throughput chromosome conformation capture sequencing data. Although F. oxysporum causes vascular wilt to over 100 plant species, the strain Fo47 is classified as an endophyte and is widely used as a biocontrol agent for plant disease control. The Fo47 genome carries a single accessory chromosome of 4.23 Mb, compared with the reference genome of F. oxysporum f. sp. lycopersici Fol4287. The high-quality assembly and annotation of the Fo47 genome will be a valuable resource for studying the mechanisms underlying the endophytic interactions between F. oxysporum and plants as well as for deciphering the genome evolution of the F. oxysporum species complex.

2020 ◽  
Author(s):  
Bo Wang ◽  
Houlin Yu ◽  
Yanyan Jia ◽  
Quanbin Dong ◽  
Christian Steinberg ◽  
...  

AbstractHere, we report a chromosome-level genome assembly of Fusarium oxysporum strain Fo47 (12 pseudomolecules; contig N50: 4.52Mb), generated using a combination of PacBio long-read, Illumina pair-ended and Hi-C sequencing data. Although F. oxysporum causes vascular wilt to over 100 plant species, the strain Fo47 is classified as an endophyte and widely used as a biocontrol agent for plant disease control. The Fo47 genome carries a single accessory chromosome of 4.23 Mb, compared to the reference genome of F. oxysporum f.sp. lycopersici strain Fol4287. The high-quality assembly and annotation of the Fo47 genome will be a valuable resource for studying the mechanisms underlying the endophytic interactions between F. oxysporum and plants, as well as deciphering the genome evolution of the F. oxysporum species complex.


2020 ◽  
Author(s):  
Mohamed Awad ◽  
Xiangchao Gan

AbstractHigh-quality genome assembly has wide applications in genetics and medical studies. However, it is still very challenging to achieve gap-free chromosome-scale assemblies using current workflows for long-read platforms. Here we propose GALA (Gap-free long-read assembler), a chromosome-by-chromosome assembly method implemented through a multi-layer computer graph that identifies mis-assemblies within preliminary assemblies or chimeric raw reads and partitions the data into chromosome-scale linkage groups. The subsequent independent assembly of each linkage group generates a gap-free assembly free from the mis-assembly errors which usually hamper existing workflows. This flexible framework also allows us to integrate data from various technologies, such as Hi-C, genetic maps, a reference genome and even motif analyses, to generate gap-free chromosome-scale assemblies. We de novo assembled the C. elegans and A. thaliana genomes using combined Pacbio and Nanopore sequencing data from publicly available datasets. We also demonstrated the new method’s applicability with a gap-free assembly of a human genome with the help a reference genome. In addition, GALA showed promising performance for Pacbio high-fidelity long reads. Thus, our method enables straightforward assembly of genomes with multiple data sources and overcomes barriers that at present restrict the application of de novo genome assembly technology.


2021 ◽  
pp. gr.275325.121
Author(s):  
Rodrigo P. Baptista ◽  
Yiran Li ◽  
Adam Sateriale ◽  
Karen L. Brooks ◽  
Alan Tracey ◽  
...  

Cryptosporidiosis is a leading cause of waterborne diarrheal disease globally and an important contributor to mortality in infants and the immunosuppressed. Despite its importance, the Cryptosporidium community has only had access to a good, but incomplete, Cryptosporidium parvum IOWA reference genome sequence. Incomplete reference sequences hamper annotation, experimental design and interpretation. We have generated a new C. parvum IOWA genome assembly supported by PacBio and Oxford Nanopore long-read technologies and a new comparative and consistent genome annotation for three closely related species C. parvum, Cryptosporidium hominis and Cryptosporidium tyzzeri. We made 1,926 C. parvum annotation updates based on experimental evidence. They include new transporters, ncRNAs, introns and altered gene structures. The new assembly and annotation revealed a complete Dnmt2 methylase ortholog. Comparative annotation between C. parvum, C. hominis and C. tyzzeri revealed that most "missing" orthologs are found suggesting that the biological differences between the species must result from gene copy number variation, differences in gene regulation and single nucleotide variants (SNVs). Using the new assembly and annotation as reference, 190 genes are identified as evolving under positive selection, including many not detected previously. The new C. parvum IOWA reference genome assembly is larger, gap free and lacks ambiguous bases. This chromosomal assembly recovers all 16 chromosome ends, 13 of which are contiguously assembled. The three remaining chromosome ends are provisionally placed. These ends represent duplication of entire chromosome ends including subtelomeric regions revealing a new level of genome plasticity that will both inform and impact future research.


Author(s):  
David Porubsky ◽  
◽  
Peter Ebert ◽  
Peter A. Audano ◽  
Mitchell R. Vollger ◽  
...  

AbstractHuman genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.


2020 ◽  
Vol 10 (8) ◽  
pp. 2801-2809 ◽  
Author(s):  
Tingting Zhao ◽  
Zhongqu Duan ◽  
Georgi Z. Genchev ◽  
Hui Lu

Despite continuous updates of the human reference genome, there are still hundreds of unresolved gaps which account for about 5% of the total sequence length. Given the availability of whole genome de novo assemblies, especially those derived from long-read sequencing data, gap-closing sequences can be determined. By comparing 17 de novo long-read sequencing assemblies with the human reference genome, we identified a total of 1,125 gap-closing sequences for 132 (16.9% of 783) gaps and added up to 2.2 Mb novel sequences to the human reference genome. More than 90% of the non-redundant sequences could be verified by unmapped reads from the Simons Genome Diversity Project dataset. In addition, 15.6% of the non-reference sequences were found in at least one of four non-human primate genomes. We further demonstrated that the non-redundant sequences had high content of simple repeats and satellite sequences. Moreover, 43 (32.6%) of the 132 closed gaps were shown to be polymorphic; such sequences may play an important biological role and can be useful in the investigation of human genetic diversity.


GigaScience ◽  
2020 ◽  
Vol 9 (10) ◽  
Author(s):  
Willem de Koning ◽  
Milad Miladi ◽  
Saskia Hiltemann ◽  
Astrid Heikema ◽  
John P Hays ◽  
...  

Abstract Background Long-read sequencing can be applied to generate very long contigs and even completely assembled genomes at relatively low cost and with minimal sample preparation. As a result, long-read sequencing platforms are becoming more popular. In this respect, the Oxford Nanopore Technologies–based long-read sequencing “nanopore" platform is becoming a widely used tool with a broad range of applications and end-users. However, the need to explore and manipulate the complex data generated by long-read sequencing platforms necessitates accompanying specialized bioinformatics platforms and tools to process the long-read data correctly. Importantly, such tools should additionally help democratize bioinformatics analysis by enabling easy access and ease-of-use solutions for researchers. Results The Galaxy platform provides a user-friendly interface to computational command line–based tools, handles the software dependencies, and provides refined workflows. The users do not have to possess programming experience or extended computer skills. The interface enables researchers to perform powerful bioinformatics analysis, including the assembly and analysis of short- or long-read sequence data. The newly developed “NanoGalaxy" is a Galaxy-based toolkit for analysing long-read sequencing data, which is suitable for diverse applications, including de novo genome assembly from genomic, metagenomic, and plasmid sequence reads. Conclusions A range of best-practice tools and workflows for long-read sequence genome assembly has been integrated into a NanoGalaxy platform to facilitate easy access and use of bioinformatics tools for researchers. NanoGalaxy is freely available at the European Galaxy server https://nanopore.usegalaxy.eu with supporting self-learning training material available at https://training.galaxyproject.org.


2017 ◽  
Author(s):  
Jia-Xing Yue ◽  
Gianni Liti

AbstractLong-read sequencing technologies have become increasingly popular in genome projects due to their strengths in resolving complex genomic regions. As a leading model organism with small genome size and great biotechnological importance, the budding yeast, Saccharomyces cerevisiae, has many isolates currently being sequenced with long reads. However, analyzing long-read sequencing data to produce high-quality genome assembly and annotation remains challenging. Here we present LRSDAY, the first one-stop solution to streamline this process. LRSDAY can produce chromosome-level end-to-end genome assembly and comprehensive annotations for various genomic features (including centromeres, protein-coding genes, tRNAs, transposable elements and telomere-associated elements) that are ready for downstream analysis. Although tailored for S. cerevisiae, we designed LRSDAY to be highly modular and customizable, making it adaptable for virtually any eukaryotic organisms. Applying LRSDAY to a S. cerevisiae strain takes ∼43 hrs to generate a complete and well-annotated genome from ∼100X Pacific Biosciences (PacBio) reads using four threads.


2020 ◽  
Author(s):  
Yuxuan Yuan ◽  
Philipp E. Bayer ◽  
Robyn Anderson ◽  
HueyTyng Lee ◽  
Chon-Kit Kenneth Chan ◽  
...  

AbstractRecent advances in long-read sequencing have the potential to produce more complete genome assemblies using sequence reads which can span repetitive regions. However, overlap based assembly methods routinely used for this data require significant computing time and resources. Here, we have developed RefKA, a reference-based approach for long read genome assembly. This approach relies on breaking up a closely related reference genome into bins, aligning k-mers unique to each bin with PacBio reads, and then assembling each bin in parallel followed by a final bin-stitching step. During benchmarking, we assembled the wheat Chinese Spring (CS) genome using publicly available PacBio reads in parallel in 168 wall hours on a 250 CPU system. The maximum RAM used was 300 Gb and the computing time was 42,000 CPU hours. The approach opens applications for the assembly of other large and complex genomes with much-reduced computing requirements. The RefKA pipeline is available at https://github.com/AppliedBioinformatics/RefKA


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Enhua Xia ◽  
Fangdong Li ◽  
Wei Tong ◽  
Hua Yang ◽  
Songbo Wang ◽  
...  

2021 ◽  
Author(s):  
R. Alan Harris ◽  
Muthuswamy Raveendran ◽  
Dustin T Lyfoung ◽  
Fritz J Sedlazeck ◽  
Medhat Mahmoud ◽  
...  

Background The Syrian hamster (Mesocricetus auratus) has been suggested as a useful mammalian model for a variety of diseases and infections, including infection with respiratory viruses such as SARS-CoV-2. The MesAur1.0 genome assembly was published in 2013 using whole-genome shotgun sequencing with short-read sequence data. Current more advanced sequencing technologies and assembly methods now permit the generation of near-complete genome assemblies with higher quality and higher continuity. Findings Here, we report an improved assembly of the M. auratus genome (BCM_Maur_2.0) using Oxford Nanopore Technologies long-read sequencing to produce a chromosome-scale assembly. The total length of the new assembly is 2.46 Gbp, similar to the 2.50 Gbp length of a previous assembly of this genome, MesAur1.0. BCM_Maur_2.0 exhibits significantly improved continuity with a scaffold N50 that is 6.7 times greater than MesAur1.0. Furthermore, 21,616 protein coding genes and 10,459 noncoding genes were annotated in BCM_Maur_2.0 compared to 20,495 protein coding genes and 4,168 noncoding genes in MesAur1.0. This new assembly also improves the unresolved regions as measured by nucleotide ambiguities, where approximately 17.11% of bases in MesAur1.0 were unresolved compared to BCM_Maur_2.0 in which the number of unresolved bases is reduced to 3.00%. Conclusions Access to a more complete reference genome with improved accuracy and continuity will facilitate more detailed, comprehensive, and meaningful research results for a wide variety of future studies using Syrian hamsters as models.


Sign in / Sign up

Export Citation Format

Share Document