scholarly journals High-quality Arabidopsis thaliana Genome Assembly with Nanopore and HiFi Long Reads

Author(s):  
Bo Wang ◽  
Xiaofei Yang ◽  
Yanyan Jia ◽  
Yu Xu ◽  
Peng Jia ◽  
...  
2021 ◽  
Author(s):  
Bo Wang ◽  
Yanyan Jia ◽  
Peng Jia ◽  
Quanbin Dong ◽  
Xiaofei Yang ◽  
...  

Here, we report a high-quality (HQ) and almost complete genome assembly with a single gap and quality value (QV) larger than 60 of the model plant Arabidopsis thaliana ecotype Columbia (Col-0), generated using combination of Oxford Nanopore Technology (ONT) ultra-long reads, high fidelity (HiFi) reads and Hi-C data. The total genome assembly size is 133,877,291 bp (chr1: 32,659,241 bp, chr2: 22,712,559 bp, chr3: 26,161,332 bp, chr4: 22,250,686 bp and chr5: 30,093,473 bp), and introduces 14.73 Mb (96% belong to centromere) novel sequences compared to TAIR10.1 reference genome. All five chromosomes of our HQ assembly are highly accurate with QV larger than 60, ranging from QV62 to QV68, which is significantly higher than TAIR10.1 referecne (44-51) and a recent published genome (41-43). We have completely resolved chr3 and chr5 from telomere-to-telomere. For chr2 and chr4, we have completely resolved apart from the nucleolar organizing regions, which are composed of highly long-repetitive DNA fragments. It has been reported that the length of centromere 1 is about 9 Mb and it is hard to assembly since tens of thousands of CEN180 satellite repeats. Based on the cutting-edge sequencing data, we assembled about 4Mb continuous sequence of centromere 1. We found different identity patterns across five centromeres, and all centromeres were significantly enriched with CENH3 ChIP-seq signals, confirming the accuracy of the assembly. We obtained four clusters of CEN180 repeats, and found CENH3 presented a strong preference for a cluster 3. Moreover, we observed hypomethylation patterns in CENH3 enriched regions. This high-quality assembly genome will be a valuable reference to assist us in the understanding of global pattern of centromeric polymorphism, genetic and epigenetic in naturally inbred lines of Arabidopsis thaliana.


Plants ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 2681
Author(s):  
Ilya Kirov ◽  
Pavel Merkulov ◽  
Maxim Dudnikov ◽  
Ekaterina Polkhovskaya ◽  
Roman A. Komakhin ◽  
...  

Long-read data is a great tool to discover new active transposable elements (TEs). However, no ready-to-use tools were available to gather this information from low coverage ONT datasets. Here, we developed a novel pipeline, nanotei, that allows detection of TE-contained structural variants, including individual TE transpositions. We exploited this pipeline to identify TE insertion in the Arabidopsis thaliana genome. Using nanotei, we identified tens of TE copies, including ones for the well-characterized ONSEN retrotransposon family that were hidden in genome assembly gaps. The results demonstrate that some TEs are inaccessible for analysis with the current A. thaliana (TAIR10.1) genome assembly. We further explored the mobilome of the ddm1 mutant with elevated TE activity. Nanotei captured all TEs previously known to be active in ddm1 and also identified transposition of non-autonomous TEs. Of them, one non-autonomous TE derived from (AT5TE33540) belongs to TR-GAG retrotransposons with a single open reading frame (ORF) encoding the GAG protein. These results provide the first direct evidence that TR-GAGs and other non-autonomous LTR retrotransposons can transpose in the plant genome, albeit in the absence of most of the encoded proteins. In summary, nanotei is a useful tool to detect active TEs and their insertions in plant genomes using low-coverage data from Nanopore genome sequencing.


2021 ◽  
Vol 13 (2) ◽  
Author(s):  
Linlin Zhao ◽  
Shengyong Xu ◽  
Zhiqiang Han ◽  
Qi Liu ◽  
Wensi Ke ◽  
...  

Abstract Argyrosomus japonicus is an economically and ecologically important fish species in the family Sciaenidae with a wide distribution in the world’s oceans. Here, we report a high-quality, chromosome-level genome assembly of A. japonicus based on PacBio and Hi-C sequencing technology. A 673.7-Mb genome containing 282 contigs with an N50 length of 18.4 Mb was obtained based on PacBio long reads. These contigs were further ordered and clustered into 24 chromosome groups based on Hi-C data. In addition, a total of 217.2 Mb (32.24% of the assembled genome) of sequences were identified as repeat elements, and 23,730 protein-coding genes were predicted based on multiple approaches. More than 97% of BUSCO genes were identified in the A. japonicus genome. The high-quality genome assembled in this work not only provides a valuable genomic resource for future population genetics, conservation biology and selective breeding studies of A. japonicus but also lays a solid foundation for the study of Sciaenidae evolution.


2021 ◽  
Author(s):  
Lauren Coombe ◽  
Janet X Li ◽  
Theodora Lo ◽  
Johnathan Wong ◽  
Vladimir Nikolic ◽  
...  

Background Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. Results LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 2.0-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently runs in under five hours using less than 23GB of RAM. Conclusions Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch.


2021 ◽  
Author(s):  
Brandon D. Pickett ◽  
Jessica R. Glass ◽  
Perry G. Ridge ◽  
John S. K. Kauwe

ABSTRACTCaranx ignobilis, commonly known as the kingfish or giant trevally, is a large, reef-associated apex predator. It is a prized sportfish, targeted heavily throughout its tropical and subtropical range in the Indian and Pacific Oceans, and it has drawn significant interest in aquaculture due to an unusual tolerance for freshwater. In this study, we present a high-quality nuclear genome assembly of a C. ignobilis individual from Hawaiian waters, which have recently been shown to host a genetically distinct population. The assembly has a contig NG50 of 7.3Mbp and scaffold NG50 of 46.3Mbp. Twenty-five of the 203 scaffolds contain 90% of the genome. We also present the raw Pacific Biosciences continuous long-reads from which the assembly was created. A Hi-C dataset (Dovetail Genomics Omni-C) and Illumina-based RNA-seq from eight tissues are also presented; the latter of which can be particularly useful for annotation and studies of freshwater tolerance. Overall, this genome assembly and supporting data is a valuable tool for ecological and comparative genomics studies of kingfish and other carangoid fishes.


2018 ◽  
Vol 9 (1) ◽  
Author(s):  
Todd P. Michael ◽  
Florian Jupe ◽  
Felix Bemm ◽  
S. Timothy Motley ◽  
Justin P. Sandoval ◽  
...  

Author(s):  
Luca Degradi ◽  
Valeria Tava ◽  
Andrea Kunova ◽  
Paolo Cortesi ◽  
Marco Saracchi ◽  
...  

Fusarium musae van Hove causes crown rot of banana and it is also associated to clinical fusariosis. A chromosome-level genome assembly of Fusarium musae F31 obtained combining Nanopore long reads and Illumina paired end reads resulted in 12 chromosomes plus one contig with overall N50 of 4.36 Mb, and is presented together with its mitochondrial genome (58072 bp). F31 genome includes telomeric regions for 11 of the 12 chromosomes representing the most complete genome available in the Fusarium fujikuroi species complex. The high-quality assembly of the F31 genome will be a valuable resource for studying the pathogenic interactions occurring between F. musae and banana. Moreover, it represents an important resource for understanding the genome evolution in the Fusarium fujikuroi species complex.


2021 ◽  
Author(s):  
Alexandre Wagner Silva Hilsdorf ◽  
Marcela Uliano-Silva ◽  
Luiz Lehmann Coutinho ◽  
Horácio Montenegro ◽  
Vera Maria Fonseca Almeida-Val ◽  
...  

ABSTRACTColossoma macropomum known as “tambaqui” is the largest Characiformes fish in the Amazon River Basin and a leading species in Brazilian aquaculture and fisheries. Good quality meat and great adaptability to culture systems are some of its remarkable farming features. To support studies into the genetics and genomics of the tambaqui, we have produced the first high-quality genome for the species. We combined Illumina and PacBio sequencing technologies to generate a reference genome, assembled with 39X coverage of long reads and polished to a QV=36 with 130X coverage of short reads. The genome was assembled into 1,269 scaffolds to a total of 1,221,847,006 bases, with a scaffold N50 size of 40 Mb where 93% of all assembled bases were placed in the largest 54 scaffolds that corresponds to the diploid karyotype of the tambaqui. Furthermore, the NCBI Annotation Pipeline annotated genes, pseudogenes, and non-coding transcripts using the RefSeq database as evidence, guaranteeing a high-quality annotation. A Genome Data Viewer for the tambaqui was produced which benefits any groups interested in exploring unique genomic features of the species. The availability of a highly accurate genome assembly for tambaqui provides the foundation for novel insights about ecological and evolutionary facets and is a helpful resource for aquaculture purposes.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Lauren Coombe ◽  
Janet X. Li ◽  
Theodora Lo ◽  
Johnathan Wong ◽  
Vladimir Nikolic ◽  
...  

Abstract Background Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. Results LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of Caenorhabditis elegans, Oryza sativa, and three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 1.2-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently improves upon human assemblies in under five hours using less than 23 GB of RAM. Conclusions Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch.


2019 ◽  
Author(s):  
Joshua V. Peñalba ◽  
Yuan Deng ◽  
Qi Fang ◽  
Leo Joseph ◽  
Craig Moritz ◽  
...  

AbstractThe superb fairy-wren, Malurus cyaneus, is one of the most iconic Australian passerine species. This species belongs to an endemic Australasian clade, Meliphagides, which diversified early in the evolution of the oscine passerines. Today, the oscine passerines comprise almost half of all avian species diversity. Despite the rapid increase of available bird genome assemblies, this part of the avian tree has not yet been represented by a high-quality reference. To rectify that, we present the first chromosome-scale genome assembly of a Meliphagides representative: the superb fairy-wren. We combined Illumina shotgun and mate-pair sequences, PacBio long-reads, and a genetic linkage map from an intensively sampled pedigree of a wild population to generate this genome assembly. Of the final assembled 1.07Gb genome, 894Mb (84.8%) was anchored onto 25 chromosomes resulting in a final scaffold N50 of 68.11 Mb. This high-quality bird genome assembly is also one of only a handful which is also accompanied by a genetic map and recombination landscape. In comparison to other pedigree-based bird genetic maps, we find that the zebrafinch (Taeniopygia) genetic map more closely resembles the fairy-wren map rather than the map from the more closely-related Ficedula flycatcher. Lastly, we also provide a predictive gene and repeat annotation of the genome assembly. This new high quality, annotated genome assembly will be an invaluable resource not only to the superb fairy-wren species and relatives but also broadly across the avian tree by providing a new reference point for comparative genomic analyses.


Sign in / Sign up

Export Citation Format

Share Document