What’s all the hype about HybSeq? A brief history and introduction to target enrichment in Compositae

Abstract Background COVID-19 had spread quickly, causing an international public health emergency with an alarming global shortage of COVID-19 diagnostic tests. We developed and clinically validated a next-generation sequencing (NGS)-based target enrichment assay with the COVID-DX Software tailored for the detection, characterization, and surveillance of the SARS-CoV-2 viral genome. Methods The SARS-CoV-2 NGS assay consists of components including library preparation, target enrichment, sequencing, and a COVID-DX Software analysis tool. The NGS library preparation starts with extracted RNA from nasopharyngeal (NP) swabs followed by cDNA synthesis and conversion to Illumina TruSeq-compatible libraries using the Twist Library Preparation Kit via Enzymatic Fragmentation and Unique Dual Indices (UDI). The library is then enriched for SARS-CoV-2 sequences using a panel of dsDNA biotin-labeled probes, specifically designed to target the SARS-CoV-2 genome, then sequenced on an Illumina NextSeq 550 platform. The COVID-DX Software analyzes sequence results and provides a clinically oriented report, including the presence/absence of SARS-CoV-2 for diagnostic use. An additional research use only report describes the assay performance, estimated viral titer, coverage across the viral genome, genetic variants, and phylogenetic analysis. Results The SARS-CoV-2 NGS Assay was validated on 30 positive and 30 negative clinical samples. To measure the sensitivity and specificity of the assay, the positive and negative percent agreement (PPA, NPA) was defined in comparison to an orthogonal EUA RT-PCR assay (PPA [95% CI]: 96.77% [90.56%-100%] and NPA [95% CI]: 100% [100%-100%]). Data reported using our assay defined the limit of detection to be 40 copies/ml using heat-inactivated SARS-CoV-2 viral genome in clinical matrices. In-silico analysis provided >99.9% coverage across the SARS-CoV-2 viral genome and no cross-reactivity with evolutionarily similar respiratory pathogens. Conclusion The SARS-CoV-2 NGS Assay powered by the COVID-DX Software can be used to detect the SARS-CoV-2 virus and provide additional insight into viral titer and genetic variants to track transmission, stratify risk, predict outcome and therapeutic response, and control the spread of infectious disease. Disclosures Dorottya Nagy-Szakal, MD PhD, Biotia (Employee) Mara Couto-Rodriguez, MS, Biotia (Employee) Joseph Barrows, MS, Biotia, Inc. (Employee, Shareholder) Heather L. Wells, MPH, Biotia (Consultant) Marilyne Debieu, PhD, Biotia (Employee) Courteny Hager, BS, Biotia (Employee) Kristin Butcher, MS, Twist Bioscience (Employee) Siyuan Chen, PhD, Twist Bioscience (Employee) Christopher Mason, PhD, Biotia (Board Member, Employee, Shareholder) Niamh B. O’Hara, PhD, Biotia (Board Member, Employee, Shareholder)Twist (Other Financial or Material Support, I am CEO of Biotia and Biotia has business partnership with Twist)

Download Full-text

SAUTE: sequence assembly using target enrichment

BMC Bioinformatics ◽

10.1186/s12859-021-04174-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Alexandre Souvorov ◽

Richa Agarwala

Keyword(s):

De Bruijn Graph ◽

Challenging Problem ◽

Target Enrichment ◽

Rna Seq ◽

Sequencing Technology ◽

Insert Size ◽

Systematic Biases ◽

Target Sequences ◽

Genomic Regions ◽

Higher Sensitivity

Abstract Background Illumina is the dominant sequencing technology at this time. Short length, short insert size, some systematic biases, and low-level carryover contamination in Illumina reads continue to make assembly of repeated regions a challenging problem. Some applications also require finding multiple well supported variants for assembled regions. Results To facilitate assembly of repeat regions and to report multiple well supported variants when a user can provide target sequences to assist the assembly, we propose SAUTE and SAUTE_PROT assemblers. Both assemblers use de Bruijn graph on reads. Targets can be transcripts or proteins for RNA-seq reads and transcripts, proteins, or genomic regions for genomic reads. Target sequences are nucleotide and protein sequences for SAUTE and SAUTE_PROT, respectively. Conclusions For RNA-seq, comparisons with Trinity, rnaSPAdes, SPAligner, and SPAdes assembly of reads aligned to target proteins by DIAMOND show that SAUTE_PROT finds more coding sequences that translate to benchmark proteins. Using AMRFinderPlus calls, we find SAUTE has higher sensitivity and precision than SPAdes, plasmidSPAdes, SPAligner, and SPAdes assembly of reads aligned to target regions by HISAT2. It also has better sensitivity than SKESA but worse precision.

Download Full-text

The best of both worlds: Combining lineage‐specific and universal bait sets in target‐enrichment hybridization reactions

Applications in Plant Sciences ◽

10.1002/aps3.11438 ◽

2021 ◽

Author(s):

Kasper P. Hendriks ◽

Terezie Mandáková ◽

Nikolai M. Hay ◽

Elfy Ly ◽

Alex Hooft van Huysduynen ◽

...

Keyword(s):

Target Enrichment

Download Full-text

A target enrichment probe set for resolving the flagellate land plant tree of life

Applications in Plant Sciences ◽

10.1002/aps3.11406 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Jesse W. Breinholt ◽

Sarah B. Carey ◽

George P. Tiley ◽

E. Christine Davis ◽

Lorena Endara ◽

...

Keyword(s):

Land Plant ◽

Tree Of Life ◽

Target Enrichment ◽

Probe Set

Download Full-text

Universal human papillomavirus typing by whole genome sequencing following target enrichment: evaluation of assay reproducibility and limit of detection

BMC Genomics ◽

10.1186/s12864-019-5598-0 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 3

Author(s):

Tengguo Li ◽

Elizabeth R. Unger ◽

Mangalathu S. Rajeevan

Keyword(s):

Human Papillomavirus ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Limit Of Detection ◽

Whole Genome ◽

Target Enrichment

Download Full-text

Accurate detection of subclonal single nucleotide variants in whole genome amplified and pooled cancer samples using HaloPlex target enrichment

BMC Genomics ◽

10.1186/1471-2164-14-856 ◽

2013 ◽

Vol 14 (1) ◽

pp. 856 ◽

Cited By ~ 15

Author(s):

Eva C Berglund ◽

Carl Lindqvist ◽

Shahina Hayat ◽

Elin Övernäs ◽

Niklas Henriksson ◽

...

Keyword(s):

Whole Genome ◽

Target Enrichment ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Accurate Detection

Download Full-text

Low-Bias RNA Sequencing of the HIV-2 Genome from Blood Plasma

Journal of Virology ◽

10.1128/jvi.00677-18 ◽

2018 ◽

Vol 93 (1) ◽

Cited By ~ 3

Author(s):

Katherine L. James ◽

Thushan I. de Silva ◽

Katherine Brown ◽

Hilton Whittle ◽

Stephen Taylor ◽

...

Keyword(s):

Genetic Diversity ◽

Blood Plasma ◽

De Novo ◽

A Priori ◽

Pcr Amplification ◽

Hiv Vaccine ◽

Whole Genome ◽

Plasma Samples ◽

Target Enrichment ◽

Rna Seq

ABSTRACTAccurate determination of the genetic diversity present in the HIV quasispecies is critical for the development of a preventative vaccine: in particular, little is known about viral genetic diversity for the second type of HIV, HIV-2. A better understanding of HIV-2 biology is relevant to the HIV vaccine field because a substantial proportion of infected people experience long-term viral control, and prior HIV-2 infection has been associated with slower HIV-1 disease progression in coinfected subjects. The majority of traditional and next-generation sequencing methods have relied on target amplification prior to sequencing, introducing biases that may obscure the true signals of diversity in the viral population. Additionally, target enrichment through PCR requiresa priorisequence knowledge, which is lacking for HIV-2. Therefore, a target enrichment free method of library preparation would be valuable for the field. We applied an RNA shotgun sequencing (RNA-Seq) method without PCR amplification to cultured viral stocks and patient plasma samples from HIV-2-infected individuals. Libraries generated from total plasma RNA were analyzed with a two-step pipeline: (i)de novogenome assembly, followed by (ii) read remapping. By this approach, whole-genome sequences were generated with a 28× to 67× mean depth of coverage. Assembled reads showed a low level of GC bias, and comparison of the genome diversities at the intrahost level showed low diversity in the accessory genevpxin all patients. Our study demonstrates that RNA-Seq is a feasible full-genomede novosequencing method for blood plasma samples collected from HIV-2-infected individuals.IMPORTANCEAn accurate picture of viral genetic diversity is critical for the development of a globally effective HIV vaccine. However, sequencing strategies are often complicated by target enrichment prior to sequencing, introducing biases that can distort variant frequencies, which are not easily corrected for in downstream analyses. Additionally, detaileda priorisequence knowledge is needed to inform robust primer design when employing PCR amplification, a factor that is often lacking when working with tropical diseases localized in developing countries. Previous work has demonstrated that direct RNA shotgun sequencing (RNA-Seq) can be used to circumvent these issues for hepatitis C virus (HCV) and norovirus. We applied RNA-Seq to total RNA extracted from HIV-2 blood plasma samples, demonstrating the applicability of this technique to HIV-2 and allowing us to generate a dynamic picture of genetic diversity over the whole genome of HIV-2 in the context of low-bias sequencing.

Download Full-text

A New Paralog Removal Pipeline Resolves Conflict between RAD-seq and Enrichment

10.1101/2020.10.26.355248 ◽

2020 ◽

Author(s):

Wenbin Zhou ◽

John Soghigian ◽

Qiu-yun (Jenny) Xiang

Keyword(s):

High Throughput Sequencing ◽

Sequence Similarity ◽

Phylogenetic Analyses ◽

Disjunct Distribution ◽

Divergence Times ◽

Target Enrichment ◽

Sequencing Technologies ◽

Duplication Events ◽

The Witch ◽

Phylogenomic Analyses

ABSTRACTTarget enrichment and RAD-seq are well-established high throughput sequencing technologies that have been increasingly used for phylogenomic studies, and the choice between methods is a practical issue for plant systematists studying the evolutionary histories of biodiversity of relatively recent origins. However, few studies have compared the congruence and conflict between results from the two methods within the same group of organisms, especially in plants, where extensive genome duplication events may complicate phylogenomic analyses. Unfortunately, currently widely used pipelines for target enrichment data analysis do not have a vigorous procedure for remove paralogs in Hyb-Seq data. In this study, we employed RAD-seq and Hyb-Seq of Angiosperm 353 genes in phylogenomic and biogeographic studies of Hamamelis (the witch-hazels) and Castanea (chestnuts), two classic examples exhibiting the well-known eastern Asian-eastern North American disjunct distribution. We compared these two methods side by side and developed a new pipeline (PPD) with a more vigorous removal of putative paralogs from Hyb-Seq data. The new pipeline considers both sequence similarity and heterozygous sites at each locus in identification of paralogous. We used our pipeline to construct robust datasets for comparison between methods and downstream analyses on the two genera. Our results demonstrated that the PPD identified many more putative paralogs than the popular method HybPiper. Comparisons of tree topologies and divergence times showed significant differences between data from HybPiper and data from our new PPD pipeline, likely due to the error signals from the paralogous genes undetected by HybPiper, but trimmed by PPD. We found that phylogenies and divergence times estimated from our RAD-seq and Hyb-Seq-PPD were largely congruent. We highlight the importance of removal paralogs in enrichment data, and discuss the merits of RAD-seq and Hyb-Seq. Finally, phylogenetic analyses of RAD-seq and Hyb-Seq resulted in well-resolved species relationships, and revealed ancient introgression in both genera. Biogeographic analyses including fossil data revealed a complicated history of each genus involving multiple intercontinental dispersals and local extinctions in areas outside of the taxa’s modern ranges in both the Paleogene and Neogene. Our study demonstrates the value of additional steps for filtering paralogous gene content from Angiosperm 353 data, such as our new PPD pipeline described in this study. [RAD-seq, Hyb-Seq, paralogs, Castanea, Hamamelis, eastern Asia-eastern North America disjunction, biogeography, ancient introgression]

Download Full-text

Comparison of target enrichment strategies for ancient pathogen DNA

10.1101/2020.07.09.195065 ◽

2020 ◽

Author(s):

Anja Furtwängler ◽

Judith Neukamm ◽

Lisa Böhme ◽

Ella Reiter ◽

Melanie Vollstedt ◽

...

Keyword(s):

Ancient Dna ◽

Treponema Pallidum ◽

Mycobacterium Leprae ◽

Target Enrichment ◽

Hybridization Capture ◽

Research Outcomes ◽

Pathogen Dna ◽

Different Characteristics ◽

Rna And Dna ◽

Better Than

AbstractIn ancient DNA research, the degraded nature of the samples generally results in poor yields of highly fragmented DNA, and targeted DNA enrichment is thus required to maximize research outcomes. The three commonly used methods – (1) array-based hybridization capture and in-solution capture using either (2) RNA or (3) DNA baits – have different characteristics that may influence the capture efficiency, specificity, and reproducibility. Here, we compared their performance in enriching pathogen DNA of Mycobacterium leprae and Treponema pallidum of 11 ancient and 19 modern samples. We find that in-solution approaches are the most effective method in ancient and modern samples of both pathogens, and RNA baits usually perform better than DNA baits.Method summaryWe compared three targeted DNA enrichment strategies used in ancient DNA research for the specific enrichment of pathogen DNA regarding their efficiency, specificity, and reproducibility for ancient and modern Mycobacterium leprae and Treponema pallidum samples. Array-based capture and in-solution capture with RNA and DNA baits were all tested in three independent replicates.

Download Full-text

Species delimitation and phylogenetic reconstruction of the sinipercids (Perciformes: Sinipercidae) based on target enrichment of thousands of nuclear coding sequences

Molecular Phylogenetics and Evolution ◽

10.1016/j.ympev.2017.03.014 ◽

2017 ◽

Vol 111 ◽

pp. 44-55 ◽

Cited By ~ 16

Author(s):

Shuli Song ◽

Jinliang Zhao ◽

Chenhong Li

Keyword(s):

Species Delimitation ◽

Phylogenetic Reconstruction ◽

Target Enrichment ◽

Coding Sequences

Download Full-text