Identification of community-consensus clinically relevant variants and development of single molecule molecular inversion probes using the CIViC database

AbstractClinical targeted sequencing panels are important for identifying actionable variants for cancer patients, however, there are currently no strategies to create impartial and rationally-designed panels to accommodate rapidly growing knowledge within the field. Here we use the Clinical Interpretations of Variants in Cancer database (CIViC) in conjunction with single-molecule molecular inversion probe (smMIP) capture to identify and design probes targeting clinically relevant variants in cancer. In total, 2,027 smMIPs were designed to target 111 eligible CIViC variants. The total genomic region covered by the CIViC smMIPs reagent was 61.5 kb. When compared to existing genome or exome sequencing results (n = 27 cancer samples from 5 tumor types), CIViC smMIP sequencing demonstrated a 95% sensitivity for variant detection (n = 61/64 variants). Variant allele frequency for variants identified on both sequencing platforms were highly concordant (Pearson correlation = 0.885; n = 61 variants). Moreover, for individuals with paired tumor/normal samples (n = 12), 182 clinically relevant variants missed by original sequencing were discovered by CIViC smMIPs sequencing. This design paradigm demonstrates the utility of an open-sourced database built on attendant community contributions for each variant with peer-reviewed interpretations. Use of a public repository for variant identification, probe development, and variant annotation could provide a transparent approach to build a dynamic next-generation sequencing–based oncology panel.

Download Full-text

Open-Sourced CIViC Annotation Pipeline to Identify and Annotate Clinically Relevant Variants Using Single-Molecule Molecular Inversion Probes

JCO Clinical Cancer Informatics ◽

10.1200/cci.19.00077 ◽

2019 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Erica K. Barnell ◽

Adam Waalkes ◽

Matt C. Mosior ◽

Kelsi Penewit ◽

Kelsy C. Cotto ◽

...

Keyword(s):

Single Molecule ◽

Annotation Pipeline ◽

Patients With Cancer ◽

Open Access Database ◽

Design Paradigm ◽

Molecular Inversion Probes ◽

Sequencing Platforms ◽

Tumor Types ◽

Molecular Inversion Probe ◽

Actionable Variants

PURPOSE Clinical targeted sequencing panels are important for identifying actionable variants for patients with cancer; however, existing approaches do not provide transparent and rationally designed clinical panels to accommodate the rapidly growing knowledge within oncology. MATERIALS AND METHODS We used the Clinical Interpretations of Variants in Cancer (CIViC) database to develop an Open-Sourced CIViC Annotation Pipeline (OpenCAP). OpenCAP provides methods to identify variants within the CIViC database, build probes for variant capture, use probes on prospective samples, and link somatic variants to CIViC clinical relevance statements. OpenCAP was tested using a single-molecule molecular inversion probe (smMIP) capture design on 27 cancer samples from 5 tumor types. In total, 2,027 smMIPs were designed to target 111 eligible CIViC variants (61.5 kb of genomic space). RESULTS When compared with orthogonal sequencing, CIViC smMIP sequencing demonstrated a 95% sensitivity for variant detection (n = 61 of 64 variants). Variant allele frequencies for variants identified on both sequencing platforms were highly concordant (Pearson’s r = 0.885; n = 61 variants). Moreover, for individuals with paired tumor and normal samples (n = 12), 182 clinically relevant variants missed by orthogonal sequencing were discovered by CIViC smMIP sequencing. CONCLUSION The OpenCAP design paradigm demonstrates the utility of an open-source and open-access database built on attendant community contributions with peer-reviewed interpretations. Use of a public repository for variant identification, probe development, and variant interpretation provides a transparent approach to build dynamic next-generation sequencing–based oncology panels.

Download Full-text

Genome-Wide Identification of 5-Methylcytosine Sites in Bacterial Genomes By High-Throughput Sequencing of MspJI Restriction Fragments

10.1101/2021.02.10.430591 ◽

2021 ◽

Author(s):

Brian P. Anton ◽

Alexey Fomenkov ◽

Victoria Wu ◽

Richard J. Roberts

Keyword(s):

Single Molecule ◽

Dna Sequences ◽

High Throughput Sequencing ◽

Cost Effective ◽

Restriction Enzymes ◽

Specific Sequence ◽

Genome Wide ◽

Cost Effective Alternative ◽

Simple Column ◽

Sequencing Platforms

ABSTRACTSingle-molecule Real-Time (SMRT) sequencing can easily identify sites of N6-methyladenine and N4-methylcytosine within DNA sequences, but similar identification of 5-methylcytosine sites is not as straightforward. In prokaryotic DNA, methylation typically occurs within specific sequence contexts, or motifs, that are a property of the methyltransferases that “write” these epigenetic marks. We present here a straightforward, cost-effective alternative to both SMRT and bisulfite sequencing for the determination of prokaryotic 5-methylcytosine methylation motifs. The method, called MFRE-Seq, relies on excision and isolation of fully methylated fragments of predictable size using MspJI-Family Restriction Enzymes (MFREs), which depend on the presence of 5-methylcytosine for cleavage. We demonstrate that MFRE-Seq is compatible with both Illumina and Ion Torrent sequencing platforms and requires only a digestion step and simple column purification of size-selected digest fragments prior to standard library preparation procedures. We applied MFRE-Seq to numerous bacterial and archaeal genomic DNA preparations and successfully confirmed known motifs and identified novel ones. This method should be a useful complement to existing methodologies for studying prokaryotic methylomes and characterizing the contributing methyltransferases.

Download Full-text

Full-Length Transcriptome Sequences by A Combination of Sequencing Platforms Applied to Isoflavonoid and Triterpenoid Saponin Biosynthesis of Astragalus Mongholicus Bunge

10.21203/rs.3.rs-127368/v1 ◽

2020 ◽

Author(s):

Minzhen Yin ◽

Shanshan Chu ◽

Tingyu Shan ◽

Liangping Zha ◽

Huasheng Peng

Keyword(s):

Single Molecule ◽

Molecular Genetic ◽

Gene Families ◽

Triterpenoid Saponin ◽

Smrt Sequencing ◽

Triterpenoid Saponins ◽

Saponin Biosynthesis ◽

Medicinal Value ◽

Long Time ◽

Sequencing Platforms

Abstract Background: Astragalus mongholicus Bunge is an important medicinal plant and has been used in traditional Chinese medicine for a long history, which is rich in isoflavonoids and triterpenoid saponins. Although these active constituents in A. mongholicus have been discovered for a long time, the molecular genetic basis of the isoflavonoid and triterpenoid saponin biosynthesis pathways is virtually unknown due to the lack of a reference genome. The combination of next-generation sequencing (NGS) and single-molecule real-time (SMRT) sequencing to analyze genes involved in the biosynthetic pathways of secondary metabolites in medicinal plants has been widely recognized.Results: In this study, NGS, SMRT sequencing, and targeted compounds were combined to investigate the association between isoflavonoids and triterpenoid saponins and gene expression in roots, stems and leaves of A. mongholicus. A total of four main isoflavonoids and four astragalosides (belong to triterpenoid saponins) were measured, and 44 differentially expressed genes (DEGs) of nine gene families, 44 DEGs of 16 gene families that encode for enzymes involved in isoflavonoid and triterpenoid saponin biosynthesis were identified, separately. Additionally, transcription factors (TFs) associated with isoflavonoid and triterpenoid saponin biosynthesis were analyzed, including 72 MYBs, 53 bHLHs, 64 AP2-EREBPs and 11 bZIPs. The above transcripts exhibit different expression trends in different organs.Conclusions: Our study provides important genetic information for the essential genes of isoflavonoid and triterpenoid saponin biosynthesis in A. mongholicus, and provides a basis for developing its medicinal value.

Download Full-text

Genome-wide identification of 5-methylcytosine sites in bacterial genomes by high-throughput sequencing of MspJI restriction fragments

PLoS ONE ◽

10.1371/journal.pone.0247541 ◽

2021 ◽

Vol 16 (5) ◽

pp. e0247541

Author(s):

Brian P. Anton ◽

Alexey Fomenkov ◽

Victoria Wu ◽

Richard J. Roberts

Keyword(s):

Single Molecule ◽

Dna Sequences ◽

High Throughput Sequencing ◽

Cost Effective ◽

Restriction Enzymes ◽

Specific Sequence ◽

Genome Wide ◽

Cost Effective Alternative ◽

Simple Column ◽

Sequencing Platforms

Single-molecule Real-Time (SMRT) sequencing can easily identify sites of N6-methyladenine and N4-methylcytosine within DNA sequences, but similar identification of 5-methylcytosine sites is not as straightforward. In prokaryotic DNA, methylation typically occurs within specific sequence contexts, or motifs, that are a property of the methyltransferases that “write” these epigenetic marks. We present here a straightforward, cost-effective alternative to both SMRT and bisulfite sequencing for the determination of prokaryotic 5-methylcytosine methylation motifs. The method, called MFRE-Seq, relies on excision and isolation of fully methylated fragments of predictable size using MspJI-Family Restriction Enzymes (MFREs), which depend on the presence of 5-methylcytosine for cleavage. We demonstrate that MFRE-Seq is compatible with both Illumina and Ion Torrent sequencing platforms and requires only a digestion step and simple column purification of size-selected digest fragments prior to standard library preparation procedures. We applied MFRE-Seq to numerous bacterial and archaeal genomic DNA preparations and successfully confirmed known motifs and identified novel ones. This method should be a useful complement to existing methodologies for studying prokaryotic methylomes and characterizing the contributing methyltransferases.

Download Full-text

Investigating Serum and Tissue Expression Identified a Cytokine/Chemokine Signature as a Highly Effective Melanoma Marker

Cancers ◽

10.3390/cancers12123680 ◽

2020 ◽

Vol 12 (12) ◽

pp. 3680

Author(s):

Marco Cesati ◽

Francesca Scatozza ◽

Daniela D’Arcangelo ◽

Gian Carlo Antonini-Cappellini ◽

Stefania Rossi ◽

...

Keyword(s):

Single Molecule ◽

Learning Algorithm ◽

Profile Analysis ◽

Pearson Correlation ◽

Tissue Expression ◽

Support Vector ◽

Expression Data ◽

Serum Samples ◽

Tissue Samples ◽

Mortality And Morbidity

The identification of reliable and quantitative melanoma biomarkers may help an early diagnosis and may directly affect melanoma mortality and morbidity. The aim of the present study was to identify effective biomarkers by investigating the expression of 27 cytokines/chemokines in melanoma compared to healthy controls, both in serum and in tissue samples. Serum samples were from 232 patients recruited at the IDI-IRCCS hospital. Expression was quantified by xMAP technology, on 27 cytokines/chemokines, compared to the control sera. RNA expression data of the same 27 molecules were obtained from 511 melanoma- and healthy-tissue samples, from the GENT2 database. Statistical analysis involved a 3-step approach: analysis of the single-molecules by Mann–Whitney analysis; analysis of paired-molecules by Pearson correlation; and profile analysis by the machine learning algorithm Support Vector Machine (SVM). Single-molecule analysis of serum expression identified IL-1b, IL-6, IP-10, PDGF-BB, and RANTES differently expressed in melanoma (p < 0.05). Expression of IL-8, GM-CSF, MCP-1, and TNF-α was found to be significantly correlated with Breslow thickness. Eotaxin and MCP-1 were found differentially expressed in male vs. female patients. Tissue expression analysis identified very effective marker/predictor genes, namely, IL-1Ra, IL-7, MIP-1a, and MIP-1b, with individual AUC values of 0.88, 0.86, 0.93, 0.87, respectively. SVM analysis of the tissue expression data identified the combination of these four molecules as the most effective signature to discriminate melanoma patients (AUC = 0.98). Validation, using the GEPIA2 database on an additional 1019 independent samples, fully confirmed these observations. The present study demonstrates, for the first time, that the IL-1Ra, IL-7, MIP-1a, and MIP-1b gene signature discriminates melanoma from control tissues with extremely high efficacy. We therefore propose this 4-molecule combination as an effective melanoma marker.

Download Full-text

BRCA Testing by Single-Molecule Molecular Inversion Probes

Clinical Chemistry ◽

10.1373/clinchem.2016.263897 ◽

2017 ◽

Vol 63 (2) ◽

pp. 503-512 ◽

Cited By ~ 28

Author(s):

Kornelia Neveling ◽

Arjen R Mensenkamp ◽

Ronny Derks ◽

Michael Kwint ◽

Hicham Ouchene ◽

...

Keyword(s):

Genetic Testing ◽

Copy Number Variation ◽

Single Molecule ◽

Copy Number ◽

Single Gene ◽

Work Flow ◽

Analytical Sensitivity ◽

Brca1 And Brca2 ◽

Number Variation ◽

Molecular Inversion Probes

Abstract BACKGROUND Despite advances in next generation DNA sequencing (NGS), NGS-based single gene tests for diagnostic purposes require improvements in terms of completeness, quality, speed, and cost. Single-molecule molecular inversion probes (smMIPs) are a technology with unrealized potential in the area of clinical genetic testing. In this proof-of-concept study, we selected 2 frequently requested gene tests, those for the breast cancer genes BRCA1 and BRCA2, and developed an automated work flow based on smMIPs. METHODS The BRCA1 and BRCA2 smMIPs were validated using 166 human genomic DNA samples with known variant status. A generic automated work flow was built to perform smMIP-based enrichment and sequencing for BRCA1, BRCA2, and the checkpoint kinase 2 (CHEK2) c.1100del variant. RESULTS Pathogenic and benign variants were analyzed in a subset of 152 previously BRCA-genotyped samples, yielding an analytical sensitivity and specificity of 100%. Following automation, blind analysis of 65 in-house samples and 267 Norwegian samples correctly identified all true-positive variants (>3000), with no false positives. Consequent to process optimization, turnaround times were reduced by 60% to currently 10–15 days. Copy number variants were detected with an analytical sensitivity of 100% and an analytical specificity of 88%. CONCLUSIONS smMIP-based genetic testing enables automated and reliable analysis of the coding sequences of BRCA1 and BRCA2. The use of single-molecule tags, double-tiled targeted enrichment, and capturing and sequencing in duplo, in combination with automated library preparation and data analysis, results in a robust process and reduces routine turnaround times. Furthermore, smMIP-based copy number variation analysis could make independent copy number variation tools like multiplex ligation-dependent probes amplification dispensable.

Download Full-text

SMRT Genome Assembly Corrects Reference Errors, Resolving the Genetic Basis of Virulence in Mycobacterium tuberculosis

10.1101/064840 ◽

2016 ◽

Author(s):

Afif Elghraoui ◽

Samuel J Modlin ◽

Faramarz Valafar

Keyword(s):

Mycobacterium Tuberculosis ◽

Single Molecule ◽

Genetic Basis ◽

Reference Genome ◽

Reference Sequence ◽

Smrt Sequencing ◽

Virulence Attenuation ◽

Sequencing Platforms ◽

Genome Comparisons ◽

Reference Genomes

AbstractThe genetic basis of virulence in Mycobacterium tuberculosis has been investigated through genome comparisons of its virulent (H37Rv) and attenuated (H37Ra) sister strains. Such analysis, however, relies heavily on the accuracy of the sequences. While the H37Rv reference genome has had several corrections to date, that of H37Ra is unmodified since its original publication. Here, we report the assembly and finishing of the H37Ra genome from single-molecule, real-time (SMRT) sequencing. Our assembly reveals that the number of H37Ra-specific variants is less than half of what the Sanger-based H37Ra reference sequence indicates, undermining and, in some cases, invalidating the conclusions of several studies. PE_PPE family genes, which are intractable to commonly-used sequencing platforms because of their repetitive and GC-rich nature, are overrepresented in the set of genes in which all reported H37Ra-specific variants are contradicted. We discuss how our results change the picture of virulence attenuation and the power of SMRT sequencing for producing high-quality reference genomes.

Download Full-text

Characterizing glycosyltransferases by a combination of sequencing platforms applied to the leaf tissues of Stevia rebaudiana

10.21203/rs.3.rs-24957/v1 ◽

2020 ◽

Author(s):

shaoshan zhang ◽

Qiong Liu ◽

Chengcheng Lyu ◽

Jinsong chen ◽

Renfeng xiao ◽

...

Keyword(s):

Single Molecule ◽

Phylogenetic Trees ◽

Leaf Tissue ◽

Stevia Rebaudiana ◽

Full Length ◽

Steviol Glycosides ◽

Smrt Sequencing ◽

Leaf Tissues ◽

Sequencing Platforms ◽

Insight Into

Abstract Background: Stevia rebaudiana (Bertoni) is considered one of the most valuable plants because of the steviol glycosides (SGs) that can be extracted from its leaves. Glycosyltransferases (GTs), which can transfer sugar moieties from activated sugar donors onto saccharide and nonsaccharide acceptors, are widely distributed in the genome of S. rebaudiana and play important roles in the synthesis of steviol glycosides. Results: Six stevia genotypes with significantly different concentrations of SGs were obtained by induction through various mutagenic methods, and the contents of seven glycosides (stevioboside, Reb B, ST, Reb A, Reb F, Reb D and Reb M) in their leaves were considerably different. Then, NGS and single-molecule real-time (SMRT) sequencing were combined to analyse leaf tissue from these six different genotypes to generate a more complete and correct full-length transcriptome of S. rebaudiana. Two phylogenetic trees of glycosyltransferases (SrUGTs) were constructed by the neighbour-joining method and successfully predicted the functions of SrUGTs involved in SG biosynthesis. With further insight into glycosyltransferases (SrUGTs) involved in SG biosynthesis, the weighted gene co-expression network analysis (WGCNA) method was used to characterize the relationships between SrUGTs and SGs, and forty-four potential SrUGTs were finally obtained, including SrUGT85C2, SrUGT74G1, SrUGT76G1 and one SrUGT91D2, which have already been reported to be involved in the glucosylation of steviol glycosides, illustrating the reliability of our results.Conclusion: Combined with the results obtained by previous studies and those of this work, we systematically characterized glycosyltransferases in S. rebaudiana and forty-four candidate SrUGTs involved in the glycosylation of steviol glucosides were obtained. Moreover, the complete and correct full-length transcriptome obtained in this study will provide valuable support for further research investigating S. rebaudiana.

Download Full-text

MrHAMER yields highly accurate single molecule viral sequences enabling analysis of intra-host evolution

10.1101/2021.01.27.428469 ◽

2021 ◽

Author(s):

CM Gallardo ◽

S Wang ◽

DJ Montiel-Garcia ◽

SJ Little ◽

DM Smith ◽

...

Keyword(s):

Long Range ◽

Single Molecule ◽

Vaccine Development ◽

Complex Mixture ◽

Therapeutic Benefit ◽

Read Length ◽

Clinical Samples ◽

Viral Genomes ◽

Immune Pressure ◽

Sequencing Platforms

AbstractTechnical challenges remain in the sequencing of RNA viruses due to their high intra-host diversity. This bottleneck is particularly pronounced when interrogating long-range co-evolution given the read-length limitations of next-generation sequencing platforms. This has hampered the direct observation of long-range genetic interactions that code for protein-protein interfaces with relevance in both drug and vaccine development. Here we overcome these technical limitations by developing a nanopore-based long-range viral sequencing pipeline that yields accurate single molecule sequences of circulating virions from clinical samples. We demonstrate its utility in observing the evolution of individual HIV Gag-Pol genomes in response to antiviral pressure. Our pipeline, called Multi-read Hairpin Mediated Error-correction Reaction (MrHAMER), yields >1000s viral genomes per sample at 99.9% accuracy, maintains the original proportion of sequenced virions present in a complex mixture, and allows the detection of rare viral genomes with their associated mutations present at <1% frequency. This method facilitates scalable investigation of genetic correlates of resistance to both antiviral therapy and immune pressure, and enable the identification of novel host-viral and viral-viral interfaces that can be modulated for therapeutic benefit.

Download Full-text

Enabling metagenomic surveillance for bacterial tick-borne pathogens using nanopore sequencing with adaptive sampling

10.1101/2021.08.17.456696 ◽

2021 ◽

Author(s):

Evan J. Kipp ◽

Laramie L. Lindsey ◽

Benedict S. Khoo ◽

Christopher Faulk ◽

Jonathan D. Oliver ◽

...

Keyword(s):

Real Time ◽

Single Molecule ◽

Adaptive Sampling ◽

Sequence Data ◽

Ixodes Scapularis ◽

Fold Increase ◽

Borrelia Miyamotoi ◽

Nucleotide Sequence Data ◽

Oxford Nanopore ◽

Sequencing Platforms

Technological and computational advancements in the fields of genomics and bioinformatics are providing exciting new opportunities for pathogen discovery and surveillance. In particular, single-molecule nucleotide sequence data originating from Oxford Nanopore Technologies (ONT) sequencing platforms can be bioinformatically leveraged, in real-time, for enhanced biosurveillance of a vast array of zoonoses. The recently released nanopore adaptive sampling (NAS) pipeline facilitates immediate mapping of individual nucleotide molecules (i.e., DNA, cDNA, and RNA) to a given reference as each molecule is sequenced. User-defined thresholds then allow for the retention or rejection of specific molecules, informed by the real-time reference mapping results, as they are physically passing through a given sequencing nanopore. Here, we show how NAS can be used to selectively sequence entire genomes of bacterial tick-borne pathogens circulating in wild populations of the blacklegged tick vector, Ixodes scapularis. The NAS method provided a two-fold increase in targeted pathogen sequences, successfully enriching for Borrelia (Borreliella) burgdorferi s.s.; Borrelia (Borrelia) miyamotoi; Anaplasma phagocytophilum; and Ehrlichia muris eauclairensis genomic DNA within our I. scapularis samples. Our results indicate that NAS has strong potential for real-time sequence-based pathogen surveillance.

Download Full-text