scholarly journals Successful exome capture and sequencing in lemurs using human baits

2018 ◽  
Author(s):  
Timothy H. Webster ◽  
Elaine E. Guevara ◽  
Richard R. Lawler ◽  
Brenda J. Bradley

ABSTRACTObjectivesWe assessed the efficacy of exome capture in lemurs using commercially available human baits.Materials and MethodsWe used two human kits (Nimblegen SeqCap EZ Exome Probes v2.0; IDT xGen Exome Research Panel v1.0) to capture and sequence the exomes of wild Verreaux’s sifakas (Propithecus verreauxi, n = 8), a lemur species distantly related to humans. For comparison, we also captured exomes of a primate species more closely related to humans (Macaca mulatta, n= 4). We mapped reads to both the human reference assembly and the most closely related reference for each species before calling variants. We used measures of mapping quality and read coverage to compare capture success.ResultsWe observed high and comparable mapping qualities for both species when mapped to their respective nearest-relative reference genomes. When investigating breadth of coverage, we found greater capture success in macaques than sifakas using both nearest-relative and human assemblies. Exome capture in sifakas was still highly successful with more than 90% of annotated coding sequence in the sifaka reference genome captured, and 80% sequenced to a depth greater than 7x using Nimblegen baits. However, this success depended on probe design: the use of IDT probes resulted in substantially less callable sequence at low-to-moderate depths.DiscussionOverall, we demonstrate successful exome capture in lemurs using human baits, though success differed between kits tested. These results indicate that exome capture is an effective and economical genomic method of broad utility to evolutionary primatologists working across the entire primate order.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Nae-Chyun Chen ◽  
Brad Solomon ◽  
Taher Mun ◽  
Sheila Iyer ◽  
Ben Langmead

AbstractMost sequencing data analyses start by aligning sequencing reads to a linear reference genome, but failure to account for genetic variation leads to reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the reference flow alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow achieves a similar level of accuracy and bias avoidance but with 14% of the memory footprint and 5.5 times the speed.


2018 ◽  
Vol 35 (15) ◽  
pp. 2654-2656 ◽  
Author(s):  
Guoli Ji ◽  
Wenbin Ye ◽  
Yaru Su ◽  
Moliang Chen ◽  
Guangzao Huang ◽  
...  

Abstract Summary Alternative splicing (AS) is a well-established mechanism for increasing transcriptome and proteome diversity, however, detecting AS events and distinguishing among AS types in organisms without available reference genomes remains challenging. We developed a de novo approach called AStrap for AS analysis without using a reference genome. AStrap identifies AS events by extensive pair-wise alignments of transcript sequences and predicts AS types by a machine-learning model integrating more than 500 assembled features. We evaluated AStrap using collected AS events from reference genomes of rice and human as well as single-molecule real-time sequencing data from Amborella trichopoda. Results show that AStrap can identify much more AS events with comparable or higher accuracy than the competing method. AStrap also possesses a unique feature of predicting AS types, which achieves an overall accuracy of ∼0.87 for different species. Extensive evaluation of AStrap using different parameters, sample sizes and machine-learning models on different species also demonstrates the robustness and flexibility of AStrap. AStrap could be a valuable addition to the community for the study of AS in non-model organisms with limited genetic resources. Availability and implementation AStrap is available for download at https://github.com/BMILAB/AStrap. Supplementary information Supplementary data are available at Bioinformatics online.


2015 ◽  
Author(s):  
Farzana Rahman ◽  
Mehedi Hassan ◽  
Alona Kryshchenko ◽  
Inna Dubchak ◽  
Tatiana V Tatarinova ◽  
...  

In the last decade a number of algorithms and associated software were developed to align next generation sequencing (NGS) reads to relevant reference genomes. The results of these programs may vary significantly, especially when the NGS reads are contain mutations not found in the reference genome. Yet there is no standard way to compare these programs and assess their biological relevance. We propose a benchmark to assess accuracy of the short reads mapping based on the pre-computed global alignment of closely related genome sequences. In this paper we outline the method and also present a short report of an experiment performed on five popular alignment tools .


mSystems ◽  
2019 ◽  
Vol 4 (1) ◽  
pp. e00010-19
Author(s):  
Sigal Leviatan ◽  
Eran Segal

ABSTRACT Shotgun sequencing of samples taken from the human microbiome often reveals only partial mapping of the sequenced metagenomic reads to existing reference genomes. Such partial mappability indicates that many genomes are missing in our reference genome set. This is particularly true for non-Western populations and for samples that do not originate from the gut. Pasolli et al. (E. Pasolli, F. Asnicar, S. Manara, M. Zolfo, et al., Cell, 2019, https://doi.org/10.1016/j.cell.2019.01.001) perform a grand effort to expand the reference set, and to better classify its members, revealing a wider pangenome of existing species as well as identifying new species of previously unknown taxonomic branches.


2019 ◽  
Vol 9 (10) ◽  
pp. 3409-3421 ◽  
Author(s):  
Dario I. Ojeda ◽  
Tiina M. Mattila ◽  
Tom Ruttink ◽  
Sonja T. Kujala ◽  
Katri Kärkkäinen ◽  
...  

Compared to angiosperms, gymnosperms lag behind in the availability of assembled and annotated genomes. Most genomic analyses in gymnosperms, especially conifer tree species, rely on the use of de novo assembled transcriptomes. However, the level of allelic redundancy and transcript fragmentation in these assembled transcriptomes, and their effect on downstream applications have not been fully investigated. Here, we assessed three assembly strategies for short-reads data, including the utility of haploid megagametophyte tissue during de novo assembly as single-allele guides, for six individuals and five different tissues in Pinus sylvestris. We then contrasted haploid and diploid tissue genotype calls obtained from the assembled transcriptomes to evaluate the extent of paralog mapping. The use of the haploid tissue during assembly increased its completeness without reducing the number of assembled transcripts. Our results suggest that current strategies that rely on available genomic resources as guidance to minimize allelic redundancy are less effective than the application of strategies that cluster redundant assembled transcripts. The strategy yielding the lowest levels of allelic redundancy among the assembled transcriptomes assessed here was the generation of SuperTranscripts with Lace followed by CD-HIT clustering. However, we still observed some levels of heterozygosity (multiple gene fragments per transcript reflecting allelic redundancy) in this assembled transcriptome on the haploid tissue, indicating that further filtering is required before using these assemblies for downstream applications. We discuss the influence of allelic redundancy when these reference transcriptomes are used to select regions for probe design of exome capture baits and for estimation of population genetic diversity.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Gokhan Yavas ◽  
Huixiao Hong ◽  
Wenming Xiao

Abstract Background Accurate de novo genome assembly has become reality with the advancements in sequencing technology. With the ever-increasing number of de novo genome assembly tools, assessing the quality of assemblies has become of great importance in genome research. Although many quality metrics have been proposed and software tools for calculating those metrics have been developed, the existing tools do not produce a unified measure to reflect the overall quality of an assembly. Results To address this issue, we developed the de novo Assembly Quality Evaluation Tool (dnAQET) that generates a unified metric for benchmarking the quality assessment of assemblies. Our framework first calculates individual quality scores for the scaffolds/contigs of an assembly by aligning them to a reference genome. Next, it computes a quality score for the assembly using its overall reference genome coverage, the quality score distribution of its scaffolds and the redundancy identified in it. Using synthetic assemblies randomly generated from the latest human genome build, various builds of the reference genomes for five organisms and six de novo assemblies for sample NA24385, we tested dnAQET to assess its capability for benchmarking quality evaluation of genome assemblies. For synthetic data, our quality score increased with decreasing number of misassemblies and redundancy and increasing average contig length and coverage, as expected. For genome builds, dnAQET quality score calculated for a more recent reference genome was better than the score for an older version. To compare with some of the most frequently used measures, 13 other quality measures were calculated. The quality score from dnAQET was found to be better than all other measures in terms of consistency with the known quality of the reference genomes, indicating that dnAQET is reliable for benchmarking quality assessment of de novo genome assemblies. Conclusions The dnAQET is a scalable framework designed to evaluate a de novo genome assembly based on the aggregated quality of its scaffolds (or contigs). Our results demonstrated that dnAQET quality score is reliable for benchmarking quality assessment of genome assemblies. The dnQAET can help researchers to identify the most suitable assembly tools and to select high quality assemblies generated.


2016 ◽  
Vol 34 (2_suppl) ◽  
pp. 484-484 ◽  
Author(s):  
Gurudatta Naik ◽  
Dongquan Chen ◽  
Michael Crowley ◽  
David Crossman ◽  
Katherine C. Sexton ◽  
...  

484 Background: Molecular alterations and drivers of PSCC, an orphan malignancy, remain unclear. The Cancer Genome Atlas is not studying PSCC and the Catalogue of Somatic Mutations in Cancer has performed targeted analyses only. We report WES of PSCC tumors from a group of patients (pts). Methods: Freshfrozen macrodissected PSCC tumor tissue and adjacent normal tissue samples were procured from the Cooperative Human Tissue Network. DNA was isolated from tissue sections by phenol chloroform extraction. Exome capture was performed with the Agilent SureSelect clinical research exome kit and whole exome-seq was done on the Illumina HiSeq2500 with paired end 100bp chemistry. Raw sequence data in Fastq format were aligned to human reference genome and quantified, and compared by using a local instance of Galaxy (galaxy.uabgrid.uab.edu). These data were analyzed for mutations (SNPs) analysis, by Partek Genomic Suite/Flow(PGS, Partek, St. Louis, MO) for variance calling against human reference genome (hg19) as referenced to dbSNP; and copy number variants (cnv) by FishingCNV tool together with picard tools/samtools/GATK). We focused on missense mutations and amplifications among ≥ 2 tumor samples but not in normal samples as they may cause upregulation of gene/protein function, which may be therapeutically actionable. Results: PSCC tumors were available from 11 patients and adjacent normal tissue from 3 patients. The 10 most common genes with > 4 missense mutations among ≥ 2 tumor samples overall were the following in decreasing order of frequency: MUC4, HLA-DPA1, MUC16, XIRP2, SSPO, TTN, FCGBP, PABPC3, ALPK2 and MKI67. The top upstream transcriptional regulators were PIH1D3, PRDM5, PTK2, Coup-Tf and NBEAL2. When examining candidate actionable genes, recurrent missense alterations were seen in PIK3C2A and PIK3C2G. Additional analysis will study alterations in functional domains and cnv. Conclusions: WES identified a relatively high mutation burden in PSCC withrecurrent missense mutations in multiple genes, notably including the PI3K gene among potentially actionable genes. Validation of these findings and further study of downstream effects is required.


Author(s):  
Tao Zhou ◽  
Liang Lu ◽  
Chenhong Li

A combination of next-generation sequencing technologies and mate-pair libraries of large insert sizes is used as a standard method to generate genome assemblies with high contiguity. The third-generation sequencing techniques also are used to improve the quality of assembled genomes. However, both mate-pair libraries and the third-generation libraries require high-molecular-weight DNA, making the use of these libraries inappropriate for samples with only degraded DNA. An in silico method that generates mate-pair libraries using a reference genome was devised for the task of assembling target genomes. Although the contiguity and completeness of assembled genomes were significantly improved by this method, a high level of errors manifested in the assembly, further to which the methods for using reference genomes were not optimized. Here, we tested different strategies for using reference genomes to generate in silico mate-pairs. The results showed that using a closely related reference genome from the same genus was more effective than using divergent references. Conservation of in silico mate-pairs by comparing two references and using those to guide genome assembly reduced the number of misassemblies (18.6% – 46.1%) and increased the contiguity of assembled genomes (9.7% – 70.7%), while maintaining gene completeness at a level that was either similar or marginally lower than that obtained via the current method. Finally, we compared the optimized method with another reference-guided assembler, RaGOO. We found that RaGOO produced longer scaffolds (17.8 Mbp vs 3.0 Mbp), but resulted in a much higher misassembly rate (85.68%) than our optimized in silico mate-pair method.


2016 ◽  
Author(s):  
Afif Elghraoui ◽  
Samuel J Modlin ◽  
Faramarz Valafar

AbstractThe genetic basis of virulence in Mycobacterium tuberculosis has been investigated through genome comparisons of its virulent (H37Rv) and attenuated (H37Ra) sister strains. Such analysis, however, relies heavily on the accuracy of the sequences. While the H37Rv reference genome has had several corrections to date, that of H37Ra is unmodified since its original publication. Here, we report the assembly and finishing of the H37Ra genome from single-molecule, real-time (SMRT) sequencing. Our assembly reveals that the number of H37Ra-specific variants is less than half of what the Sanger-based H37Ra reference sequence indicates, undermining and, in some cases, invalidating the conclusions of several studies. PE_PPE family genes, which are intractable to commonly-used sequencing platforms because of their repetitive and GC-rich nature, are overrepresented in the set of genes in which all reported H37Ra-specific variants are contradicted. We discuss how our results change the picture of virulence attenuation and the power of SMRT sequencing for producing high-quality reference genomes.


Author(s):  
Nae-Chyun Chen ◽  
Brad Solomon ◽  
Taher Mun ◽  
Sheila Iyer ◽  
Ben Langmead

AbstractMost sequencing data analyses start by aligning sequencing reads to a linear reference genome. But failure to account for genetic variation causes reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the “reference flow” alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow achieves a similar level of accuracy and bias avoidance, but with 14% of the memory footprint and 5.5 times the speed.


Sign in / Sign up

Export Citation Format

Share Document