scholarly journals Exon capture optimization in large-genome amphibians

2015 ◽  
Author(s):  
Evan McCartney-Melstad ◽  
Genevieve G. Mount ◽  
H. Bradley Shaffer

Background Gathering genomic-scale data efficiently is challenging for non-model species with large, complex genomes. Transcriptome sequencing is accessible for even large-genome organisms, and sequence capture probes can be designed from such mRNA sequences to enrich and sequence exonic regions. Maximizing enrichment efficiency is important to reduce sequencing costs, but, relatively little data exist for exon capture experiments in large-genome non-model organisms. Here, we conducted a replicated factorial experiment to explore the effects of several modifications to standard protocols that might increase sequence capture efficiency for large-genome amphibians. Methods We enriched 53 genomic libraries from salamanders for a custom set of 8,706 exons under differing conditions. Libraries were prepared using pools of DNA from 3 different salamanders with approximately 30 gigabase genomes: California tiger salamander (Ambystoma californiense), barred tiger salamander (Ambystoma mavortium), and an F1 hybrid between the two. We enriched libraries using different amounts of c0t-1 blocker, individual input DNA, and total reaction DNA. Enriched libraries were sequenced with 150 bp paired-end reads on an Illumina HiSeq 2500, and the efficiency of target enrichment was quantified using unique read mapping rates and average depth across targets. The different enrichment treatments were evaluated to determine if c0t-1 and input DNA significantly impact enrichment efficiency in large-genome amphibians. Results Increasing the amounts of c0t-1 and individual input DNA both reduce the rates of PCR duplication. This reduction led to an increase in the percentage of unique reads mapping to target sequences, essentially doubling overall efficiency of the target capture from 10.4% to nearly 19.9%. We also found that post-enrichment DNA concentrations and qPCR enrichment verification were useful for predicting the success of enrichment. Conclusions Increasing the amount of individual sample input DNA and the amount of c0t-1 blocker both increased the efficiency of target capture in large-genome salamanders. By reducing PCR duplication rates, the number of unique reads mapping to targets increased, making target capture experiments more efficient and affordable. Our results indicate that target capture protocols can be modified to efficiently screen large-genome vertebrate taxa including amphibians.

2020 ◽  
Author(s):  
Brendan N. Reid ◽  
Rachel L. Moran ◽  
Christopher J. Kopack ◽  
Sarah W. Fitzpatrick

AbstractResearchers studying non-model organisms have an increasing number of methods available for generating genomic data. However, the applicability of different methods across species, as well as the effect of reference genome choice on population genomic inference, are still difficult to predict in many cases. We evaluated the impact of data type (whole-genome vs. reduced representation) and reference genome choice on data quality and on population genomic and phylogenomic inference across several species of darters (subfamily Etheostomatinae), a highly diverse radiation of freshwater fish. We generated a high-quality reference genome and developed a hybrid RADseq/sequence capture (Rapture) protocol for the Arkansas darter (Etheostoma cragini). Rapture data from 1900 individuals spanning four darter species showed recovery of most loci across darter species at high depth and consistent estimates of heterozygosity regardless of reference genome choice. Loci with baits spanning both sides of the restriction enzyme cut site performed especially well across species. For low-coverage whole-genome data, choice of reference genome affected read depth and inferred heterozygosity. For similar amounts of sequence data, Rapture performed better at identifying fine-scale genetic structure compared to whole-genome sequencing. Rapture loci also recovered an accurate phylogeny for the study species and demonstrated high phylogenetic informativeness across the evolutionary history of the genus Etheostoma. Low cost and high cross-species effectiveness regardless of reference genome suggest that Rapture and similar sequence capture methods may be worthwhile choices for studies of diverse species radiations.


2018 ◽  
Author(s):  
Tobias Andermann ◽  
Angela Cano ◽  
Alexander Zizka ◽  
Christine Bacon ◽  
Alexandre Antonelli

Evolutionary biology has entered an era of unprecedented amounts of DNA sequence data, as new sequencing platforms such as Massive Parallel Sequencing (MPS) can generate billions of nucleotides within less than a day. The current bottleneck is how to efficiently handle, process, and analyze such large amounts of data in an automated and reproducible way. To tackle these challenges we introduce the Sequence Capture Processor (SECAPR) pipeline for processing raw sequencing data into multiple sequence alignments for downstream phylogenetic and phylogeographic analyses. SECAPR is user-friendly and we provide an exhaustive tutorial intended for users with no prior experience with analyzing MPS output. SECAPR is particularly useful for the processing of sequence capture (= hybrid enrichment) datasets for non-model organisms, as we demonstrate using an empirical dataset of the palm genus Geonoma (Arecaceae). Various quality control and plotting functions help the user to decide on the most suitable settings for even challenging datasets. SECAPR is an easy-to-use, free, and versatile pipeline, aimed to enable efficient and reproducible processing of MPS data for many samples in parallel.


2019 ◽  
Vol 15 ◽  
pp. 117693431987479 ◽  
Author(s):  
Hao Yuan ◽  
Calder Atta ◽  
Luke Tornabene ◽  
Chenhong Li

Exon capture across species has been one of the most broadly applied approaches to acquire multi-locus data in phylogenomic studies of non-model organisms. Methods for assembling loci from short-read sequences (eg, Illumina platforms) that rely on mapping reads to a reference genome may not be suitable for studies comprising species across a wide phylogenetic spectrum; thus, de novo assembling methods are more generally applied. Current approaches for assembling targeted exons from short reads are not particularly optimized as they cannot (1) assemble loci with low read depth, (2) handle large files efficiently, and (3) reliably address issues with paralogs. Thus, we present Assexon: a streamlined pipeline that de novo assembles targeted exons and their flanking sequences from raw reads. We tested our method using reads from Lepisosteus osseus (4.37 Gb) and Boleophthalmus pectinirostris (2.43 Gb), which are captured using baits that were designed based on genome sequence of Lepisosteus oculatus and Oreochromis niloticus, respectively. We compared performance of Assexon to PHYLUCE and HybPiper, which are commonly used pipelines to assemble ultra-conserved element (UCE) and Hyb-seq data. A custom exon capture analysis pipeline (CP) developed by Yuan et al was compared as well. Assexon accurately assembled more than 3400 to 3800 (20%-28%) loci than PHYLUCE and more than 1900 to 2300 (8%-14%) loci than HybPiper across different levels of phylogenetic divergence. Assexon ran at least twice as fast as PHYLUCE and HybPiper. Number of loci assembled using CP was comparable with Assexon in both tests, while Assexon ran at least 7 times faster than CP. In addition, some steps of CP require the user’s interaction and are not fully automated, and this user time was not counted in our calculation. Both Assexon and CP retrieved no paralogs in the testing runs, but PHYLUCE and Hybpiper did. In conclusion, Assexon is a tool for accurate and efficient assembling of large read sets from exon capture experiments. Furthermore, Assexon includes scripts to filter poorly aligned coding regions and flanking regions, calculate summary statistics of loci, and select loci with reliable phylogenetic signal. Assexon is available at https://github.com/yhadevol/Assexon .


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Anita Mann ◽  
Naresh Kumar ◽  
Ashwani Kumar ◽  
Charu Lata ◽  
Arvind Kumar ◽  
...  

AbstractSoil salinity is one of the major limiting factors for crop productivity across the world. Halophytes have recently been a source of attraction for exploring the survival and tolerance mechanisms at extreme saline conditions. Urochondra setulosa is one of the obligate grass halophyte that can survive in up to 1000 mM NaCl. The de novo transcriptome of Urochondra leaves at different salt concentrations of 300–500 mM NaCl was generated on Illumina HiSeq. Approximately 352.78 million high quality reads with an average contig length of 1259 bp were assembled de novo. A total of 120,231 unigenes were identified. On an average, 65% unigenes were functionally annotated to known proteins. Approximately 35% unigenes were specific to Urochondra. Differential expression revealed significant enrichment (P < 0.05) of transcription factors, transporters and metabolites suggesting the transcriptional regulation of ion homeostasis and signalling at high salt concentrations in this grass. Also, about 143 unigenes were biologically related to salt stress responsive genes. Randomly selected genes of important pathways were validated for functional characterization. This study provides useful information to understand the gene regulation at extremely saline levels. The study offers the first comprehensive evaluation of Urochondra setulosa leaf transcriptome. Examining non-model organisms that can survive in harsh environment can provide novel insights into the stress coping mechanisms which can be useful to develop improved agricultural crops.


2018 ◽  
Author(s):  
Tobias Andermann ◽  
Angela Cano ◽  
Alexander Zizka ◽  
Christine Bacon ◽  
Alexandre Antonelli

Evolutionary biology has entered an era of unprecedented amounts of DNA sequence data, as new sequencing platforms such as Massive Parallel Sequencing (MPS) can generate billions of nucleotides within less than a day. The current bottleneck is how to efficiently handle, process, and analyze such large amounts of data in an automated and reproducible way. To tackle these challenges we introduce the Sequence Capture Processor (SECAPR) pipeline for processing raw sequencing data into multiple sequence alignments for downstream phylogenetic and phylogeographic analyses. SECAPR is user-friendly and we provide an exhaustive empirical data tutorial intended for users with no prior experience with analyzing MPS output. SECAPR is particularly useful for the processing of sequence capture (synonyms: target or hybrid enrichment) datasets for non-model organisms, as we demonstrate using an empirical sequence capture dataset of the palm genus Geonoma (Arecaceae). Various quality control and plotting functions help the user to decide on the most suitable settings for even challenging datasets. SECAPR is an easy-to-use, free, and versatile pipeline, aimed to enable efficient and reproducible processing of MPS data for many samples in parallel.


2015 ◽  
Author(s):  
Daniel Portik ◽  
Lydia Smith ◽  
Ke Bi

Custom sequence capture experiments are becoming an efficient approach for gathering large sets of orthologous markers with targeted levels of informativeness in non-model organisms. Transcriptome-based exon capture utilizes transcript sequences to design capture probes, often with the aid of a reference genome to identify intron-exon boundaries and exclude shorter exons (< 200 bp). Here, we test an alternative approach that directly uses transcript sequences for probe design, which are often composed of multiple exons of varying lengths. Based on a selection of 1,260 orthologous transcripts, we conducted sequence captures across multiple phylogenetic scales for frogs, including species up to ~100 million years divergent from the focal group. After several conservative filtering steps, we recovered a large phylogenomic data set consisting of sequence alignments for 1,047 of the 1,260 transcriptome-based loci (~630,000 bp) and a large quantity of highly variable regions flanking the exons in transcripts (~70,000 bp). We recovered high numbers of both shorter (< 100 bp) and longer exons (> 200 bp), with no major reduction in coverage towards the ends of exons. We observed significant differences in the performance of blocking oligos for target enrichment and non-target depletion during captures, and observed differences in PCR duplication rates that can be attributed to the number of individuals pooled for capture reactions. We explicitly tested the effects of phylogenetic distance on capture sensitivity, specificity, and missing data, and provide a baseline estimate of expectations for these metrics based on nuclear pairwise differences among samples. We provide recommendations for transcriptome-based exon capture design based on our results, and describe multiple pipelines for data assembly and analysis.


Author(s):  
Tobias Andermann ◽  
Maria Fernanda Torres Jimenez ◽  
Pável Matos-Maraví ◽  
Romina Batista ◽  
José L Blanco-Pastor ◽  
...  

High-throughput DNA sequencing techniques enable time- and cost-effective sequencing of large portions of the genome. Instead of sequencing and annotating whole genomes, many phylogenetic studies focus sequencing efforts on large sets of pre-selected loci, which further reduces costs and bioinformatic challenges while increasing sequencing depth. One common approach that enriches loci before sequencing is often referred to as target sequence capture. This technique has been shown to be applicable to phylogenetic studies of greatly varying evolutionary depth and has proven to produce powerful, large multi-locus DNA sequence datasets of selected loci, suitable for phylogenetic analyses. However, target capture requires careful theoretical and practical considerations, which will greatly affect the success of the experiment. Here we provide an easy-to-follow flowchart for adequately designing phylogenomic target capture experiments, and we discuss necessary considerations and decisions from the first steps in the lab to the final bioinformatic processing of the sequence data. We particularly discuss issues and challenges related to the taxonomic scope, sample quality, and available genomic resources of target capture projects and how these issues affect all steps from bait design to the bioinformatic processing of the data. Altogether this review outlines a roadmap for future target capture experiments and is intended to assist researchers with making informed decisions for designing and carrying out successful phylogenetic target capture studies


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e10864
Author(s):  
Dongfang Zhao ◽  
Chunchun Zheng ◽  
Fengming Shi ◽  
Yabei Xu ◽  
Shixiang Zong ◽  
...  

Pine beetles are well known in North America for their widespread devastation of pine forests. However, Dendroctonus valens LeConte is an important invasive forest pest in China also. Adults and larvae of this bark beetle mainly winter at the trunks and roots of Pinus tabuliformis and Pinus sylvestris; larvae, in particular, result in pine weakness or even death. Since the species was introduced from the United States to Shanxi in 1998, its distribution has spread northward. In 2017, it invaded a large area at the junction of Liaoning, Inner Mongolia and Hebei provinces, showing strong cold tolerance. To identify genes relevant to cold tolerance and the process of overwintering, we sequenced the transcriptomes of wintering and non-wintering adult and larval D. valens using the Illumina HiSeq platform. Differential expression analysis methods for other non-model organisms were used to compare transcript abundances in adults and larvae at two time periods, followed by the identification of functions and metabolic pathways related to genes associated with cold tolerance. We detected 4,387 and 6,091 differentially expressed genes (DEGs) between sampling dates in larvae and adults, respectively, and 1,140 common DEGs, including genes encoding protein phosphatase, very long-chain fatty acids protein, cytochrome P450, and putative leucine-rich repeat-containing proteins. In a Gene Ontology (GO) enrichment analysis, 1,140 genes were assigned to 44 terms, with significant enrichment for cellulase activity, hydrolase activity, and carbohydrate metabolism. Kyoto Encyclopedia of Genes and Genomes (KEGG) classification and enrichment analyses showed that the lysosomal and purine metabolism pathways involved the most DEGs, the highly enriched terms included autophagy—animal, pentose and glucuronate interconversions and lysosomal processes. We identified 140 candidate genes associated with cold tolerance, including genes with established roles in this trait (e.g., genes encoding trehalose transporter, fructose-1,6-bisphosphatase, and trehalase). Our comparative transcriptome analysis of adult and larval D. valens in different conditions provides basic data for the discovery of key genes and molecular mechanisms underlying cold tolerance.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Jian-ye Chen ◽  
Fang-fang Xie ◽  
Yan-ze Cui ◽  
Can-bin Chen ◽  
Wang-jin Lu ◽  
...  

AbstractPitaya (Hylocereus) is the most economically important fleshy-fruited tree of the Cactaceae family that is grown worldwide, and it has attracted significant attention because of its betalain-abundant fruits. Nonetheless, the lack of a pitaya reference genome significantly hinders studies focused on its evolution, as well as the potential for genetic improvement of this crop. Herein, we employed various sequencing approaches, namely, PacBio-SMRT, Illumina HiSeq paired-end, 10× Genomics, and Hi-C (high-throughput chromosome conformation capture) to provide a chromosome-level genomic assembly of ‘GHB’ pitaya (H. undatus, 2n = 2x = 22 chromosomes). The size of the assembled pitaya genome was 1.41 Gb, with a scaffold N50 of ~127.15 Mb. In total, 27,753 protein-coding genes and 896.31 Mb of repetitive sequences in the H. undatus genome were annotated. Pitaya has undergone a WGT (whole-genome triplication), and a recent WGD (whole-genome duplication) occurred after the gamma event, which is common to the other species in Cactaceae. A total of 29,328 intact LTR-RTs (~696.45 Mb) were obtained in H. undatus, of which two significantly expanded lineages, Ty1/copia and Ty3/gypsy, were the main drivers of the expanded genome. A high-density genetic map of F1 hybrid populations of ‘GHB’ × ‘Dahong’ pitayas (H. monacanthus) and their parents were constructed, and a total of 20,872 bin markers were identified (56,380 SNPs) for 11 linkage groups. More importantly, through transcriptomic and WGCNA (weighted gene coexpression network analysis), a global view of the gene regulatory network, including structural genes and the transcription factors involved in pitaya fruit betalain biosynthesis, was presented. Our data present a valuable resource for facilitating molecular breeding programs of pitaya and shed novel light on its genomic evolution, as well as the modulation of betalain biosynthesis in edible fruits.


2021 ◽  
Author(s):  
Jérôme Delroisse ◽  
Marie Bonneel ◽  
Mélanie Demeuldre ◽  
Igor Eeckhaut ◽  
Patrick Flammang

AbstractIn non-model organisms, Next Generation Sequencing (NGS) technology improve our ability to analyze gene expression and identify new genes or transcripts of interest. In this research, paired-end Illumina HiSeq sequencing has been used to describe a composite transcriptome based on two libraries generated from dorsal and ventral integuments of the European sea cucumber Holothuria forskali (Holothuroidea, Echinodermata). A total of 43,044,977 million HQ reads were initially generated. After de novo assembly, a total of 111,194 unigenes were predicted. On all predicted unigenes, 32,569 show significant matches with genes/proteins present in the reference databases. Around 50% of annotated unigenes were significantly similar to sequences from the purple sea urchin Strongylocentrotus purpuratus genome. Annotation analyses were performed on predicted unigenes using public reference databases. These RNA-seq data provide an interesting resource for researchers with a broad interest in sea cucumber biology.


Sign in / Sign up

Export Citation Format

Share Document