The Beginning of the End: A Chromosomal Assembly of the New World Malaria Mosquito Ends with a Novel Telomere

Chromosome level assemblies are accumulating in various taxonomic groups including mosquitoes. However, even in the few reference-quality mosquito assemblies, a significant portion of the heterochromatic regions including telomeres remain unresolved. Here we produce a de novo assembly of the New World malaria mosquito, Anopheles albimanus by integrating Oxford Nanopore sequencing, Illumina, Hi-C and optical mapping. This 172.6 Mbps female assembly, which we call AalbS3, is obtained by scaffolding polished large contigs (contig N50 = 13.7 Mbps) into three chromosomes. All chromosome arms end with telomeric repeats, which is the first in mosquito assemblies and represents a significant step toward the completion of a genome assembly. These telomeres consist of tandem repeats of a novel 30-32 bp Telomeric Repeat Unit (TRU) and are confirmed by analyzing the termini of long reads and through both chromosomal in situ hybridization and a Bal31 sensitivity assay. The AalbS3 assembly included previously uncharacterized centromeric and rDNA clusters and more than doubled the content of transposable elements and other repetitive sequences. This telomere-to-telomere assembly, although still containing gaps, represents a significant step toward resolving biologically important but previously hidden genomic components. The comparison of different scaffolding methods will also inform future efforts to obtain reference-quality genomes for other mosquito species.

Download Full-text

The beginning of the end: a chromosomal assembly of the New World malaria mosquito ends with a novel telomere

10.1101/2020.04.17.047084 ◽

2020 ◽

Cited By ~ 2

Author(s):

Austin Compton ◽

Jiangtao Liang ◽

Chujia Chen ◽

Varvara Lukyanchikova ◽

Yumin Qi ◽

...

Keyword(s):

New World ◽

Optical Mapping ◽

Mosquito Species ◽

Malaria Mosquito ◽

Anopheles Albimanus ◽

Telomeric Repeats ◽

Long Reads ◽

Oxford Nanopore ◽

Significant Step ◽

Reference Quality

ABSTRACTChromosome level assemblies are accumulating in various taxonomic groups including mosquitoes. However, even in the few reference-quality mosquito assemblies, a significant portion of the heterochromatic regions including telomeres remain unresolved. Here we produce a de novo assembly of the New World malaria mosquito, Anopheles albimanus by integrating Oxford Nanopore sequencing, Illumina, Hi-C and optical mapping. This 172.6 Mbps female assembly, which we call AalbS3, is obtained by scaffolding polished large contigs (contig N50=13.7 Mbps) into three chromosomes. All chromosome arms end with telomeric repeats, which is the first in mosquito assemblies and represents a significant step towards the completion of a genome assembly. These telomeres consist of tandem repeats of a novel 30-32 bp telomeric repeat unit (TRU) and are confirmed by analysing the termini of long reads and through both chromosomal in situ hybridization and a Bal31 sensitivity assay. The AalbS3 assembly included previously uncharacterized centromeric and rDNA clusters and more than doubled the content of transposable elements and other repetitive sequences. This telomere-to-telomere assembly, although still containing gaps, represents a significant step towards resolving biologically important but previously hidden genomic components. The comparison of different scaffolding methods will also inform future efforts to obtain reference-quality genomes for other mosquito species.100-word Article SummaryWe report AalbS3, a telomere-to-telomere assembly of the Anopheles albimanus genome produced by integrating advancing technologies including Oxford Nanopore and Bionano optical mapping. AalbS3 features much of the difficult-to-assemble genomic ‘dark matters’ including previously missed transposons, centromeres and rDNA clusters. We describe novel telomeric repeats that are confirmed by analysis of long reads and by telomere hybridization assays. This reference-quality assembly represents a significant step towards completing the genomic puzzle pieces and informs efforts to improve the assembly of other mosquito species. Future research into the relationship between telomere and mosquito life span may have significant implications to disease control.

Download Full-text

Highly Contiguous Nanopore Genome Assembly of Chlamydomonas reinhardtii CC-1690

Microbiology Resource Announcements ◽

10.1128/mra.00726-20 ◽

2020 ◽

Vol 9 (37) ◽

Author(s):

Samuel O’Donnell ◽

Frederic Chaux ◽

Gilles Fischer

Keyword(s):

Chlamydomonas Reinhardtii ◽

Genome Size ◽

Genome Assembly ◽

Reference Genome ◽

De Novo ◽

Content Type ◽

Oxford Nanopore ◽

A Genome ◽

Reference Quality

ABSTRACT The current Chlamydomonas reinhardtii reference genome remains fragmented due to gaps stemming from large repetitive regions. To overcome the vast majority of these gaps, publicly available Oxford Nanopore Technology data were used to create a new reference-quality de novo genome assembly containing only 21 contigs, 30/34 telomeric ends, and a genome size of 111 Mb.

Download Full-text

Diverse origins of high copy tandem repeats in grass genomes

10.7287/peerj.preprints.2314 ◽

2016 ◽

Author(s):

Paul Bilinski ◽

Yonghua Han ◽

Matthew B Hufford ◽

Anne Lorant ◽

Pingdong Zhang ◽

...

Keyword(s):

Tandem Repeat ◽

Tandem Repeats ◽

High Throughput Sequencing ◽

De Novo ◽

Repetitive Sequences ◽

Read Mapping ◽

Hybridization Data ◽

Centromeric Repeat ◽

Tandem Repeat Sequences

In studying genomic architecture, highly repetitive regions have historically posed a challenge when investigating sequence variation and content. High-throughput sequencing has enabled researchers to use whole-genome shotgun sequencing to estimate the abundance of repetitive sequence, and these methodologies have been recently applied to centromeres. Here, we utilize sequence assembly and read mapping to identify and quantify the genomic abundance of different tandem repeat sequences. Previous research has posited that the highest abundance tandem repeat in eukaryotic genomes is often the centromeric repeat, and we pair our bioinformatic pipeline with fluorescent in-situ hybridization data to test this hypothesis. We find that de novo assembly and bioinformatic filters can successfully identify repeats with homology to known tandem repeats. Fluorescent in-situ hybridization, however, shows that de novo assembly fails to identify novel centromeric repeats, instead identifying other potentially important repetitive sequences. Together, our results test the applicability and limitations of using de novo repeat assembly of tandem repeats to identify novel centromeric repeats. Building on our findings of genomic composition, we also set forth a method for exploring the repetitive regions of non-model genomes whose diversity limits the applicability of established genetic resources.

Download Full-text

De novo identification of satellite DNAs in the sequenced genomes of Drosophila virilis and D. americana using the RepeatExplorer and TAREAN pipelines

10.1101/781146 ◽

2019 ◽

Author(s):

Bráulio S.M.L. Silva ◽

Pedro Heringer ◽

Guilherme B. Dias ◽

Marta Svartman ◽

Gustavo C.S. Kuhn

Keyword(s):

Transposable Elements ◽

Tandem Repeat ◽

Tandem Repeats ◽

De Novo ◽

Chromosome Mapping ◽

Drosophila Virilis ◽

Satellite Dnas ◽

Bioinformatic Tools ◽

A Genome ◽

Genome Assemblies

AbstractSatellite DNAs are among the most abundant repetitive DNAs found in eukaryote genomes, where they participate in a variety of biological roles, from being components of important chromosome structures to gene regulation. Experimental methodologies used before the genomic era were not sufficient despite being too laborious and time-consuming to recover the collection of all satDNAs from a genome. Today, the availability of whole sequenced genomes combined with the development of specific bioinformatic tools are expected to foster the identification of virtually all of the “satellitome” from a particular species. While whole genome assemblies are important to obtain a global view of genome organization, most assemblies are incomplete and lack repetitive regions. Here, we applied short-read sequencing and similarity clustering in order to perform a de novo identification of the most abundant satellite families in two Drosophila species from the virilis group: Drosophila virilis and D. americana. These species were chosen because they have been used as a model to understand satDNA biology since early 70’s. We combined computational tandem repeat detection via similarity-based read clustering (implemented in Tandem Repeat Analyzer pipeline – “TAREAN”) with data from the literature and chromosome mapping to obtain an overview of satDNAs in D. virilis and D. americana. The fact that all of the abundant tandem repeats we detected were previously identified in the literature allowed us to evaluate the efficiency of TAREAN in correctly identifying true satDNAs. Our results indicate that raw sequencing reads can be efficiently used to detect satDNAs, but that abundant tandem repeats present in dispersed arrays or associated with transposable elements are frequent false positives. We demonstrate that TAREAN with its parent method RepeatExplorer, may be used as resources to detect tandem repeats associated with transposable elements and also to reveal families of dispersed tandem repeats.

Download Full-text

Genome, Transcriptome, and Germplasm Sequencing Uncovers Functional Variation in the Warm-Season Grain Legume Horsegram Macrotyloma uniflorum (Lam.) Verdc.

Frontiers in Plant Science ◽

10.3389/fpls.2021.758119 ◽

2021 ◽

Vol 12 ◽

Author(s):

H. B. Mahesh ◽

M. K. Prasannakumar ◽

K. G. Manasa ◽

Sampath Perumal ◽

Yogendra Khedikar ◽

...

Keyword(s):

Genome Wide Association Study ◽

De Novo ◽

Repetitive Sequences ◽

Warm Season ◽

Grain Legume ◽

Dna Repeats ◽

Functional Variation ◽

Food Ingredient ◽

Total Size ◽

A Genome

Horsegram is a grain legume with excellent nutritional and remedial properties and good climate resilience, able to adapt to harsh environmental conditions. Here, we used a combination of short- and long-read sequencing technologies to generate a genome sequence of 279.12Mb, covering 83.53% of the estimated total size of the horsegram genome, and we annotated 24,521 genes. De novo prediction of DNA repeats showed that approximately 25.04% of the horsegram genome was made up of repetitive sequences, the lowest among the legume genomes sequenced so far. The major transcription factors identified in the horsegram genome were bHLH, ERF, C2H2, WRKY, NAC, MYB, and bZIP, suggesting that horsegram is resistant to drought. Interestingly, the genome is abundant in Bowman–Birk protease inhibitors (BBIs), which can be used as a functional food ingredient. The results of maximum likelihood phylogenetic and estimated synonymous substitution analyses suggested that horsegram is closely related to the common bean and diverged approximately 10.17 million years ago. The double-digested restriction associated DNA (ddRAD) sequencing of 40 germplasms allowed us to identify 3,942 high-quality SNPs in the horsegram genome. A genome-wide association study with powdery mildew identified 10 significant associations similar to the MLO and RPW8.2 genes. The reference genome and other genomic information presented in this study will be of great value to horsegram breeding programs. In addition, keeping the increasing demand for food with nutraceutical values in view, these genomic data provide opportunities to explore the possibility of horsegram for use as a source of food and nutraceuticals.

Download Full-text

Genome-wide profiling of heritable and de novo STR variations

10.1101/077727 ◽

2016 ◽

Cited By ~ 7

Author(s):

Thomas Willems ◽

Dina Zielinski ◽

Assaf Gordon ◽

Melissa Gymrek ◽

Yaniv Erlich

Keyword(s):

Tandem Repeats ◽

High Throughput Sequencing ◽

De Novo ◽

Genetic Diseases ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Genome Wide ◽

A Genome ◽

Short Tandem

AbstractShort tandem repeats (STRs) are highly variable elements that play a pivotal role in multiple genetic diseases, population genetics applications, and forensic casework. However, STRs have proven problematic to genotype from high-throughput sequencing data. Here, we describe HipSTR, a novel haplotype-based method for robustly genotyping, haplotyping, and phasing STRs from whole genome sequencing data and report a genome-wide analysis and validation of de novo STR mutations.

Download Full-text

Transposable elements are constantly exchanged by horizontal transfer reshaping mosquito genomes

10.1101/2020.06.23.166744 ◽

2020 ◽

Cited By ~ 2

Author(s):

Elverson Soares de Melo ◽

Gabriel da Luz Wallau

Keyword(s):

Transposable Elements ◽

Genome Size ◽

Horizontal Transfer ◽

De Novo ◽

Mosquito Species ◽

Wuchereria Bancrofti ◽

Model Organisms ◽

A Genome ◽

Horizontal Spread ◽

Horizontal Transfers

AbstractTransposable elements (TEs) are a set of mobile elements within a genome. Due to their complexity, an in-depth TE characterization is only available for a handful of model organisms. In the present study, we performed a de novo and homology-based characterization of TEs in the genomes of 24 mosquito species and investigated their mode of inheritance. More than 40% of the genome of Aedes aegypti, Aedes albopictus, and Culex quinquefasciatus is composed of TEs, varying substantially among Anopheles species (0.13%–19.55%). Class I TEs are the most abundant among mosquitoes and at least 24 TE superfamilies were found. Interestingly, TEs have been continuously exchanged by horizontal transfer (212 TE families of 18 different superfamilies) among mosquitoes since 30 million years ago, representing around 6% of the genome in Aedes genomes and a small fraction in Anopheles genomes. Most of these horizontally transferred TEs are from the three ubiquitous LTR superfamilies: Gypsy, Bel-Pao and Copia. Searching more 32,000 genomes, we also uncover transfers between mosquitoes and two different Phyla—Cnidaria and Nematoda—and two subphyla—Chelicerata and Crustacea, identifying a vector, the worm Wuchereria bancrofti, that enabled the horizontal spread of a Tc1-mariner element of irritans subfamily among various Anopheles species. These data also allowed us to reconstruct the horizontal transfer network of this TE involving more than 40 species. In summary, our results suggest that TEs are constantly exchanged by common phenomena of horizontal transfers among mosquitoes, influencing genome variation and contributing to genome size expansion.Author SummaryMost eukaryotes have DNA fragments inside their genome that can multiply by inserting themselves in other regions of the genome, generating variability. These fragments are called Transposable Elements (TEs). Since they are a constituent part of the eukaryote genomes, these pieces of DNA are usually inherited vertically by the offspring. To avoid damage to the genome caused by the replication and insertion of TEs, organisms usually control them, leading to their inactivation. However, TEs sometimes get out of control and invade other species through a horizontal transfer mechanism. This dynamic is not known in mosquitoes, a group of organisms that acts as vectors of many human diseases. We collected mosquito genomes available in public databases and characterized the whole content of TEs. Using a statistic supported method, we investigate TE relations among mosquitoes and discover that horizontal transfers of transposons are common and occurred in the last 30 million years among these species. Although not as common as transfers among closely related species, transposon transfer to distant species also occur. We also identify a parasite, a filarial worm, that may have facilitated the transfer of TE to many mosquitoes. Together, horizontally transferred TEs contribute to increasing mosquito genome size and variation.

Download Full-text

Efficient iterative Hi-C scaffolder based on N-best neighbors

BMC Bioinformatics ◽

10.1186/s12859-021-04453-5 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Dengfeng Guan ◽

Shane A. McCarthy ◽

Zemin Ning ◽

Guohua Wang ◽

Yadong Wang ◽

...

Keyword(s):

De Novo ◽

A Priori ◽

Sequencing Technology ◽

Current Standard ◽

A Genome ◽

Eukaryotic Species ◽

Long Read ◽

Reference Quality ◽

Comparable Accuracy ◽

Chromosomal Profile

Abstract Background Efficient and effective genome scaffolding tools are still in high demand for generating reference-quality assemblies. While long read data itself is unlikely to create a chromosome-scale assembly for most eukaryotic species, the inexpensive Hi-C sequencing technology, capable of capturing the chromosomal profile of a genome, is now widely used to complete the task. However, the existing Hi-C based scaffolding tools either require a priori chromosome number as input, or lack the ability to build highly continuous scaffolds. Results We design and develop a novel Hi-C based scaffolding tool, pin_hic, which takes advantage of contact information from Hi-C reads to construct a scaffolding graph iteratively based on N-best neighbors of contigs. Subsequent to scaffolding, it identifies potential misjoins and breaks them to keep the scaffolding accuracy. Through our tests on three long read based de novo assemblies from three different species, we demonstrate that pin_hic is more efficient than current standard state-of-art tools, and it can generate much more continuous scaffolds, while achieving a higher or comparable accuracy. Conclusions Pin_hic is an efficient Hi-C based scaffolding tool, which can be useful for building chromosome-scale assemblies. As many sequencing projects have been launched in the recent years, we believe pin_hic has potential to be applied in these projects and makes a meaningful contribution.

Download Full-text

Twelve quick steps for genome assembly and annotation in the classroom

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008325 ◽

2020 ◽

Vol 16 (11) ◽

pp. e1008325

Author(s):

Hyungtaek Jung ◽

Tomer Ventura ◽

J. Sook Chung ◽

Woo-Jin Kim ◽

Bo-Hye Nam ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Repetitive Sequences ◽

Genome Project ◽

Model Organisms ◽

High Quality ◽

Sequencing Technologies ◽

A Genome ◽

Sequencing Platforms ◽

High Quality Genome

Eukaryotic genome sequencing and de novo assembly, once the exclusive domain of well-funded international consortia, have become increasingly affordable, thus fitting the budgets of individual research groups. Third-generation long-read DNA sequencing technologies are increasingly used, providing extensive genomic toolkits that were once reserved for a few select model organisms. Generating high-quality genome assemblies and annotations for many aquatic species still presents significant challenges due to their large genome sizes, complexity, and high chromosome numbers. Indeed, selecting the most appropriate sequencing and software platforms and annotation pipelines for a new genome project can be daunting because tools often only work in limited contexts. In genomics, generating a high-quality genome assembly/annotation has become an indispensable tool for better understanding the biology of any species. Herein, we state 12 steps to help researchers get started in genome projects by presenting guidelines that are broadly applicable (to any species), sustainable over time, and cover all aspects of genome assembly and annotation projects from start to finish. We review some commonly used approaches, including practical methods to extract high-quality DNA and choices for the best sequencing platforms and library preparations. In addition, we discuss the range of potential bioinformatics pipelines, including structural and functional annotations (e.g., transposable elements and repetitive sequences). This paper also includes information on how to build a wide community for a genome project, the importance of data management, and how to make the data and results Findable, Accessible, Interoperable, and Reusable (FAIR) by submitting them to a public repository and sharing them with the research community.

Download Full-text

Development of Microsatellites: A Powerful Genetic Marker

The Agriculturists ◽

10.3329/agric.v13i1.26559 ◽

2016 ◽

Vol 13 (1) ◽

pp. 152-172 ◽

Cited By ~ 3

Author(s):

M Moniruzzaman ◽

R Khatun ◽

Zahira Yaakob ◽

M S Khan ◽

A A Mintoo

Keyword(s):

Tandem Repeats ◽

Genomic Library ◽

De Novo ◽

Genome Mapping ◽

Variable Number ◽

Genomic Libraries ◽

Microsatellite Isolation ◽

Major Disadvantage ◽

A Genome ◽

Variable Number Tandem Repeats

The tandem repeats, conserved short segments of DNA, which are found in all prokaryotic and eukaryotic genomes, are called microsatellites. It is also known as variable number tandem repeats (VNTRs), simple sequence repeats (SSRs) and short tandem repeats (STRs). Microsatellites present in both coding and non-coding regions of a genome. The high polymorphism of microsatellites makes them powerful genetic markers for genome mapping of many organisms. It is also suitable for ancient and forensic DNA studies for population genetics and conservation of biological resources. The major disadvantage of microsatellites is that for the first time they need to be isolated de novo from most species being examined. The task of microsatellite isolation is quite cumbersome involving in terms of effort and time, because it traditionally involves screening of genomic libraries. Cross-species amplification, Mining microsatellites from nucleotide sequenced data and Genomic library- based method are general methods of microsatellite isolation. Cross-species method may not effective for all species, Data mining is not applicable if there is no or limited data of DNA sequence. Genomic library based method is the best choice. Traditional protocol, primer extension protocol, selective hybridization, and Fast Isolation by AFLP of Sequences containing repeats (FIASCO) are the protocols of microsatellite development based on genomic library. FIASCO is the best protocol ever developed.The Agriculturists 2015; 13(1) 152-172

Download Full-text