MobiSeq: De novo SNP discovery in model and non‐model species through sequencing the flanking region of transposable elements

AbstractIn recent years, the availability of reduced representation library (RRL) methods has catalysed an expansion of genome-scale studies to characterize both model and non-model organisms. Most of these methods rely on the use of restriction enzymes to obtain DNA sequences at a genome-wide level. These approaches have been widely used to sequence thousands of markers across individuals for many organisms at a reasonable cost, revolutionizing the field of population genomics. However, there are still some limitations associated with these methods, in particular, the high molecular weight DNA required as starting material, the reduced number of common loci among investigated samples, and the short length of the sequenced site-associated DNA. Here, we present MobiSeq, a RRL protocol exploiting simple laboratory techniques, that generates genomic data based on PCR targeted-enrichment of transposable elements and the sequencing of the associated flanking region. We validate its performance across 103 DNA extracts derived from three mammalian species: grey wolf (Canis lupus), red deer complex (Cervus sp.), and brown rat (Rattus norvegicus). MobiSeq enables the sequencing of hundreds of thousands loci across the genome, and performs SNP discovery with relatively low rates of clonality. Given the ease and flexibility of MobiSeq protocol, the method has the potential to be implemented for marker discovery and population genomics across a wide range of organisms – enabling the exploration of diverse evolutionary and conservation questions.

Download Full-text

RepeatModeler2: automated genomic discovery of transposable element families

10.1101/856591 ◽

2019 ◽

Cited By ~ 12

Author(s):

Jullien M. Flynn ◽

Robert Hubley ◽

Clément Goubert ◽

Jeb Rosen ◽

Andrew G. Clark ◽

...

Keyword(s):

Transposable Elements ◽

De Novo ◽

False Positive Rate ◽

Fruit Fly ◽

Sequence Coverage ◽

Genome Sequences ◽

Model Species ◽

Link Type ◽

Eukaryotic Species ◽

Ltr Retroelements

AbstractThe accelerating pace of genome sequencing throughout the tree of life is driving the need for improved unsupervised annotation of genome components such as transposable elements (TEs). Because the types and sequences of TEs are highly variable across species, automated TE discovery and annotation are challenging and time-consuming tasks. A critical first step is the de novo identification and accurate compilation of sequence models representing all the unique TE families dispersed in the genome. Here we introduce RepeatModeler2, a new pipeline that greatly facilitates this process. This new program brings substantial improvements over the original version of RepeatModeler, one of the most widely used tools for TE discovery. In particular, this version incorporates a module for structural discovery of complete LTR retroelements, which are widespread in eukaryotic genomes but recalcitrant to automated identification because of their size and sequence complexity. We benchmarked RepeatModeler2 on three model species with diverse TE landscapes and high-quality, manually curated TE libraries: Drosophila melanogaster (fruit fly), Danio rerio (zebrafish), and Oryza sativa (rice). In these three species, RepeatModeler2 identified approximately three times more consensus sequences matching with >95% sequence identity and sequence coverage to the manually curated sequences than the original RepeatModeler. As expected, the greatest improvement is for LTR retroelements. The program had an extremely low false positive rate when applied to simulated genomes devoid of TEs. Thus, RepeatModeler2 represents a valuable addition to the genome annotation toolkit that will enhance the identification and study of TEs in eukaryotic genome sequences. RepeatModeler2 is available as source code or a containerized package under an open license (https://github.com/Dfam-consortium/RepeatModeler, https://github.com/Dfam-consortium/TETools).SignificanceGenome sequences are being produced for more and more eukaryotic species. The bulk of these genomes is composed of parasitic, self-mobilizing transposable elements (TEs) that play important roles in organismal evolution. Thus there is a pressing need for developing software that can accurately identify the diverse set of TEs dispersed in genome sequences. Here we introduce RepeatModeler2, an easy-to-use package for the curation of reference TE libraries which can be applied to any eukaryotic species. Through several major improvements over the previous version, RepeatModeler2 is able to produce libraries that recapitulate the known composition of three model species with some of the most complex TE landscapes. Thus RepeatModeler2 will greatly enhance the discovery and annotation of TEs in genome sequences.

Download Full-text

Faculty Opinions recommendation of Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718185441.793487292 ◽

2013 ◽

Author(s):

Hunter Fraser

Keyword(s):

De Novo ◽

Snp Discovery ◽

Inexpensive Method ◽

Model Species

Download Full-text

Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species

PLoS ONE ◽

10.1371/journal.pone.0037135 ◽

2012 ◽

Vol 7 (5) ◽

pp. e37135 ◽

Cited By ~ 1502

Author(s):

Brant K. Peterson ◽

Jesse N. Weber ◽

Emily H. Kay ◽

Heidi S. Fisher ◽

Hopi E. Hoekstra

Keyword(s):

De Novo ◽

Snp Discovery ◽

Inexpensive Method ◽

Model Species

Download Full-text

De novo whole-genome assembly in Chrysanthemum seticuspe, a model species of Chrysanthemums, and its application to genetic and gene discovery analysis

DNA Research ◽

10.1093/dnares/dsy048 ◽

2019 ◽

Vol 26 (3) ◽

pp. 195-203 ◽

Cited By ~ 19

Author(s):

Hideki Hirakawa ◽

Katsuhiko Sumitomo ◽

Tamotsu Hisamatsu ◽

Soichiro Nagano ◽

Kenta Shirasawa ◽

...

Keyword(s):

Genome Assembly ◽

De Novo ◽

Gene Discovery ◽

Whole Genome ◽

Model Species

Download Full-text

Software Evaluation for de novo Detection of Transposons

10.1101/2021.02.08.430290 ◽

2021 ◽

Author(s):

Matias Rodriguez ◽

Wojciech Makałowski

Keyword(s):

Transposable Elements ◽

Genome Evolution ◽

De Novo ◽

Simulated Data ◽

Genomic Sequences ◽

Software Evaluation ◽

Easy Task ◽

Eukaryotic Genomes

AbstractTransposable elements (TEs) are major genomic components in most eukaryotic genomes and play an important role in genome evolution. However, despite their relevance the identification of TEs is not an easy task and a number of tools were developed to tackle this problem. To better understand how they perform, we tested several widely used tools for de novo TE detection and compared their performance on both simulated data and well curated genomic sequences. The results will be helpful for identifying common issues associated with TE-annotation and for evaluating how comparable are the results obtained with different tools.

Download Full-text

An optimized approach for local de novo assembly of overlapping paired-end RAD reads from multiple individuals

Royal Society Open Science ◽

10.1098/rsos.171589 ◽

2018 ◽

Vol 5 (2) ◽

pp. 171589 ◽

Cited By ~ 4

Author(s):

Yu-Long Li ◽

Dong-Xiu Xue ◽

Bai-Dong Zhang ◽

Jin-Xian Liu

Keyword(s):

Data Reduction ◽

De Novo Assembly ◽

Genetic Variance ◽

Restriction Site ◽

De Novo ◽

Optimal Number ◽

Rad Sequencing ◽

Conservation Genomics ◽

Model Species

Restriction site-associated DNA (RAD) sequencing is revolutionizing studies in ecological, evolutionary and conservation genomics. However, the assembly of paired-end RAD reads with random-sheared ends is still challenging, especially for non-model species with high genetic variance. Here, we present an efficient optimized approach with a pipeline software, RADassembler, which makes full use of paired-end RAD reads with random-sheared ends from multiple individuals to assemble RAD contigs. RADassembler integrates the algorithms for choosing the optimal number of mismatches within and across individuals at the clustering stage, and then uses a two-step assembly approach at the assembly stage. RADassembler also uses data reduction and parallelization strategies to promote efficiency. Compared to other tools, both the assembly results based on simulation and real RAD datasets demonstrated that RADassembler could always assemble the appropriate number of contigs with high qualities, and more read pairs were properly mapped to the assembled contigs. This approach provides an optimal tool for dealing with the complexity in the assembly of paired-end RAD reads with random-sheared ends for non-model species in ecological, evolutionary and conservation studies. RADassembler is available at https://github.com/lyl8086/RADscripts.

Download Full-text

A genome resource for green millet Setaria viridis enables discovery of agronomically valuable loci

Nature Biotechnology ◽

10.1038/s41587-020-0681-2 ◽

2020 ◽

Vol 38 (10) ◽

pp. 1203-1210 ◽

Cited By ~ 2

Author(s):

Sujan Mamidi ◽

Adam Healey ◽

Pu Huang ◽

Jane Grimwood ◽

Jerry Jenkins ◽

...

Keyword(s):

Complex Traits ◽

De Novo ◽

Foxtail Millet ◽

Wild Plant ◽

Leaf Angle ◽

Setaria Viridis ◽

Model Species ◽

Seed Shattering ◽

A Genome ◽

Wild Accessions

Abstract Wild and weedy relatives of domesticated crops harbor genetic variants that can advance agricultural biotechnology. Here we provide a genome resource for the wild plant green millet (Setaria viridis), a model species for studies of C4 grasses, and use the resource to probe domestication genes in the close crop relative foxtail millet (Setaria italica). We produced a platinum-quality genome assembly of S. viridis and de novo assemblies for 598 wild accessions and exploited these assemblies to identify loci underlying three traits: response to climate, a ‘loss of shattering’ trait that permits mechanical harvest and leaf angle, a predictor of yield in many grass crops. With CRISPR–Cas9 genome editing, we validated Less Shattering1 (SvLes1) as a gene whose product controls seed shattering. In S. italica, this gene was rendered nonfunctional by a retrotransposon insertion in the domesticated loss-of-shattering allele SiLes1-TE (transposable element). This resource will enhance the utility of S. viridis for dissection of complex traits and biotechnological improvement of panicoid crops.

Download Full-text

Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline

Genome Biology ◽

10.1186/s13059-019-1905-y ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 26

Author(s):

Shujun Ou ◽

Weija Su ◽

Yi Liao ◽

Kapeel Chougule ◽

Jireh R. A. Agda ◽

...

Keyword(s):

Transposable Elements ◽

Animal Species ◽

Performance Metrics ◽

De Novo ◽

Terminal Inverted Repeat ◽

Miniature Inverted Transposable Elements ◽

Sensitivity Specificity ◽

Genomic Regions ◽

Assembly Algorithms ◽

Eukaryotic Genomes

Abstract Background Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations. Results We benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F1. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species. Conclusions The benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.

Download Full-text

Special Issue: Genomic Analyses of Avian Evolution

Diversity ◽

10.3390/d11100178 ◽

2019 ◽

Vol 11 (10) ◽

pp. 178

Author(s):

Peter Houde

Keyword(s):

Genome Organization ◽

De Novo ◽

State Of The Art ◽

Phylogenetic Inference ◽

Special Issue ◽

Model Species ◽

De Novo Genome Assembly ◽

Avian Evolution ◽

New Methods ◽

Genomic Analyses

“Genomic Analyses of Avian Evolution” is a “state of the art” showcase of the varied and rapidly evolving fields of inquiry enabled and driven by powerful new methods of genome sequencing and assembly as they are applied to some of the world’s most familiar and charismatic organisms—birds. The contributions to this Special Issue are as eclectic as avian genomics itself, but loosely interrelated by common underpinnings of phylogenetic inference, de novo genome assembly of non-model species, and genome organization and content.

Download Full-text