scholarly journals Genome assembly and annotation of the tambaqui (Colossoma macropomum): an emblematic fish of the Amazon River basin

2021 ◽  
Author(s):  
Alexandre Wagner Silva Hilsdorf ◽  
Marcela Uliano-Silva ◽  
Luiz Lehmann Coutinho ◽  
Horácio Montenegro ◽  
Vera Maria Fonseca Almeida-Val ◽  
...  

ABSTRACTColossoma macropomum known as “tambaqui” is the largest Characiformes fish in the Amazon River Basin and a leading species in Brazilian aquaculture and fisheries. Good quality meat and great adaptability to culture systems are some of its remarkable farming features. To support studies into the genetics and genomics of the tambaqui, we have produced the first high-quality genome for the species. We combined Illumina and PacBio sequencing technologies to generate a reference genome, assembled with 39X coverage of long reads and polished to a QV=36 with 130X coverage of short reads. The genome was assembled into 1,269 scaffolds to a total of 1,221,847,006 bases, with a scaffold N50 size of 40 Mb where 93% of all assembled bases were placed in the largest 54 scaffolds that corresponds to the diploid karyotype of the tambaqui. Furthermore, the NCBI Annotation Pipeline annotated genes, pseudogenes, and non-coding transcripts using the RefSeq database as evidence, guaranteeing a high-quality annotation. A Genome Data Viewer for the tambaqui was produced which benefits any groups interested in exploring unique genomic features of the species. The availability of a highly accurate genome assembly for tambaqui provides the foundation for novel insights about ecological and evolutionary facets and is a helpful resource for aquaculture purposes.

Gigabyte ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Alexandre Wagner Silva Hilsdorf ◽  
Marcela Uliano-Silva ◽  
Luiz Lehmann Coutinho ◽  
Horácio Montenegro ◽  
Vera Maria Fonseca Almeida-Val ◽  
...  

2020 ◽  
Vol 16 (11) ◽  
pp. e1008325
Author(s):  
Hyungtaek Jung ◽  
Tomer Ventura ◽  
J. Sook Chung ◽  
Woo-Jin Kim ◽  
Bo-Hye Nam ◽  
...  

Eukaryotic genome sequencing and de novo assembly, once the exclusive domain of well-funded international consortia, have become increasingly affordable, thus fitting the budgets of individual research groups. Third-generation long-read DNA sequencing technologies are increasingly used, providing extensive genomic toolkits that were once reserved for a few select model organisms. Generating high-quality genome assemblies and annotations for many aquatic species still presents significant challenges due to their large genome sizes, complexity, and high chromosome numbers. Indeed, selecting the most appropriate sequencing and software platforms and annotation pipelines for a new genome project can be daunting because tools often only work in limited contexts. In genomics, generating a high-quality genome assembly/annotation has become an indispensable tool for better understanding the biology of any species. Herein, we state 12 steps to help researchers get started in genome projects by presenting guidelines that are broadly applicable (to any species), sustainable over time, and cover all aspects of genome assembly and annotation projects from start to finish. We review some commonly used approaches, including practical methods to extract high-quality DNA and choices for the best sequencing platforms and library preparations. In addition, we discuss the range of potential bioinformatics pipelines, including structural and functional annotations (e.g., transposable elements and repetitive sequences). This paper also includes information on how to build a wide community for a genome project, the importance of data management, and how to make the data and results Findable, Accessible, Interoperable, and Reusable (FAIR) by submitting them to a public repository and sharing them with the research community.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Jean-Marc Aury ◽  
Benjamin Istace

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.


Author(s):  
Valentina Peona ◽  
Mozes P.K. Blom ◽  
Luohao Xu ◽  
Reto Burri ◽  
Shawn Sullivan ◽  
...  

AbstractGenome assemblies are currently being produced at an impressive rate by consortia and individual laboratories. The low costs and increasing efficiency of sequencing technologies have opened up a whole new world of genomic biodiversity. Although these technologies generate high-quality genome assemblies, there are still genomic regions difficult to assemble, like repetitive elements and GC-rich regions (genomic “dark matter”). In this study, we compare the efficiency of currently used sequencing technologies (short/linked/long reads and proximity ligation maps) and combinations thereof in assembling genomic dark matter starting from the same sample. By adopting different de-novo assembly strategies, we were able to compare each individual draft assembly to a curated multiplatform one and identify the nature of the previously missing dark matter with a particular focus on transposable elements, multi-copy MHC genes, and GC-rich regions. Thanks to this multiplatform approach, we demonstrate the feasibility of producing a high-quality chromosome-level assembly for a non-model organism (paradise crow) for which only suboptimal samples are available. Our approach was able to reconstruct complex chromosomes like the repeat-rich W sex chromosome and several GC-rich microchromosomes. Telomere-to-telomere assemblies are not a reality yet for most organisms, but by leveraging technology choice it is possible to minimize genome assembly gaps for downstream analysis. We provide a roadmap to tailor sequencing projects around the completeness of both the coding and non-coding parts of the genomes.


2021 ◽  
Author(s):  
Xinxin Yi ◽  
Jing Liu ◽  
Shengcai Chen ◽  
Hao Wu ◽  
Min Liu ◽  
...  

Cultivated soybean (Glycine max) is an important source for protein and oil. Many elite cultivars with different traits have been developed for different conditions. Each soybean strain has its own genetic diversity, and the availability of more high-quality soybean genomes can enhance comparative genomic analysis for identifying genetic underpinnings for its unique traits. In this study, we constructed a high-quality de novo assembly of an elite soybean cultivar Jidou 17 (JD17) with chromsome contiguity and high accuracy. We annotated 52,840 gene models and reconstructed 74,054 high-quality full-length transcripts. We performed a genome-wide comparative analysis based on the reference genome of JD17 with three published soybeans (WM82, ZH13 and W05) , which identified five large inversions and two large translocations specific to JD17, 20,984 - 46,912 PAVs spanning 13.1 - 46.9 Mb in size, and 5 - 53 large PAV clusters larger than 500kb. 1,695,741 - 3,664,629 SNPs and 446,689 - 800,489 Indels were identified and annotated between JD17 and them. Symbiotic nitrogen fixation (SNF) genes were identified and the effects from these variants were further evaluated. It was found that the coding sequences of 9 nitrogen fixation-related genes were greatly affected. The high-quality genome assembly of JD17 can serve as a valuable reference for soybean functional genomics research.


2021 ◽  
Author(s):  
Suyog Chaudhari ◽  
Erik Brown ◽  
Raul Quispe-Abad ◽  
Emilio Moran ◽  
Norbert Mueller ◽  
...  

<p>Given the ongoing and planned hydropower development projects in the Amazon River basin, appalling losses in biodiversity, river ecology and river connectivity are inevitable. These hydropower projects are proposed to be built in exceptionally endemic sites, setting records in environmental losses by impeding fish movement, altering flood pulse, causing large-scale deforestation, and increasing greenhouse gas emissions. With the burgeoning energy demand combined with the aforementioned negative impacts of conventional hydropower technology, there is an imminent need to re-think the design of hydropower to avoid the potentially catastrophic consequences of large dams. It is certain that the Amazon will undergo some major hydrological changes in the near future because of the compounded effects of climate change and proposed dams, if built with the conventional hydropower technology. In this study, we present a transformative hydropower outlook that integrates low-head hydropower technology (e.g., in-stream turbines) and multiple environmental aspects, such as river ecology and protected areas. We employ a high resolution (~2km) continental scale hydrological model called LEAF-Hydro-Flood (LHF) to assess the in-stream hydropower potential in the Amazon River basin. We particularly focus on quantifying the potential and feasibility of employing instream turbines in the Amazon instead of building large dams. We show that a significant portion of the total energy planned to be generated from conventional hydropower in the Brazilian Amazon could be harnessed using in-stream turbines that utilize kinetic energy of water without requiring storage. Further, we also find that implementing in-stream turbines as an alternative to large storage-based dams could prove economically feasible, since most of the environmental and social costs associated with dams are eliminated. Our results open multiple pathways to achieve sustainable hydropower development in the Amazon to meet the ever-increasing energy demands while minimizing hydrological, social, and ecological impacts. It also provides important insight for sustainable hydropower development in other global regions. The results presented are based on a manuscript under revision for Nature Sustainability.</p>


GigaScience ◽  
2020 ◽  
Vol 9 (7) ◽  
Author(s):  
Sina Majidian ◽  
Fritz J Sedlazeck

Abstract Background The detection of which mutations are occurring on the same DNA molecule is essential to predict their consequences. This can be achieved by phasing the genomic variations. Nevertheless, state-of-the-art haplotype phasing is currently a black box in which the accuracy and quality of the reconstructed haplotypes are hard to assess. Findings Here we present PhaseME, a versatile method to provide insights into and improvement of sample phasing results based on linkage data. We showcase the performance and the importance of PhaseME by comparing phasing information obtained from Pacific Biosciences including both continuous long reads and high-quality consensus reads, Oxford Nanopore Technologies, 10x Genomics, and Illumina sequencing technologies. We found that 10x Genomics and Oxford Nanopore phasing can be significantly improved while retaining a high N50 and completeness of phase blocks. PhaseME generates reports and summary plots to provide insights into phasing performance and correctness. We observed unique phasing issues for each of the sequencing technologies, highlighting the necessity of quality assessments. PhaseME is able to decrease the Hamming error rate significantly by 22.4% on average across all 5 technologies. Additionally, a significant improvement is obtained in the reduction of long switch errors. Especially for high-quality consensus reads, the improvement is 54.6% in return for only a 5% decrease in phase block N50 length. Conclusions PhaseME is a universal method to assess the phasing quality and accuracy and improves the quality of phasing using linkage information. The package is freely available at https://github.com/smajidian/phaseme.


Sign in / Sign up

Export Citation Format

Share Document