Strategy and Performance Evaluation of Low-Frequency Variant Calling for SARS-CoV-2 Using Targeted Deep Illumina Sequencing

The ongoing COVID-19 pandemic, caused by SARS-CoV-2, constitutes a tremendous global health issue. Continuous monitoring of the virus has become a cornerstone to make rational decisions on implementing societal and sanitary measures to curtail the virus spread. Additionally, emerging SARS-CoV-2 variants have increased the need for genomic surveillance to detect particular strains because of their potentially increased transmissibility, pathogenicity and immune escape. Targeted SARS-CoV-2 sequencing of diagnostic and wastewater samples has been explored as an epidemiological surveillance method for the competent authorities. Currently, only the consensus genome sequence of the most abundant strain is taken into consideration for analysis, but multiple variant strains are now circulating in the population. Consequently, in diagnostic samples, potential co-infection(s) by several different variants can occur or quasispecies can develop during an infection in an individual. In wastewater samples, multiple variant strains will often be simultaneously present. Currently, quality criteria are mainly available for constructing the consensus genome sequence, and some guidelines exist for the detection of co-infections and quasispecies in diagnostic samples. The performance of detection and quantification of low-frequency variants using whole genome sequencing (WGS) of SARS-CoV-2 remains largely unknown. Here, we evaluated the detection and quantification of mutations present at low abundances using the mutations defining the SARS-CoV-2 lineage B.1.1.7 (alpha variant) as a case study. Real sequencing data were in silico modified by introducing mutations of interest into raw wild-type sequencing data, or by mixing wild-type and mutant raw sequencing data, to construct mixed samples subjected to WGS using a tiling amplicon-based targeted metagenomics approach and Illumina sequencing. As anticipated, higher variation and lower sensitivity were observed at lower coverages and allelic frequencies. We found that detection of all low-frequency variants at an abundance of 10, 5, 3, and 1%, requires at least a sequencing coverage of 250, 500, 1500, and 10,000×, respectively. Although increasing variability of estimated allelic frequencies at decreasing coverages and lower allelic frequencies was observed, its impact on reliable quantification was limited. This study provides a highly sensitive low-frequency variant detection approach, which is publicly available at https://galaxy.sciensano.be, and specific recommendations for minimum sequencing coverages to detect clade-defining mutations at certain allelic frequencies. This approach will be useful to detect and quantify low-frequency variants in both diagnostic (e.g., co-infections and quasispecies) and wastewater [e.g., multiple variants of concern (VOCs)] samples.

Download Full-text

Strategy and performance evaluation of low-frequency variant calling for SARS-CoV-2 in wastewater using targeted deep Illumina sequencing

10.1101/2021.07.02.21259923 ◽

2021 ◽

Author(s):

Laura A. E. Van Poelvoorde ◽

Thomas Delcourt ◽

Wim Coucke ◽

Philippe Herman ◽

Sigrid C. J. De Keersmaecker ◽

...

Keyword(s):

Illumina Sequencing ◽

Immune Escape ◽

Low Frequency ◽

Epidemiological Surveillance ◽

Lower Sensitivity ◽

Wild Type ◽

Sequencing Data ◽

Allelic Frequencies ◽

Wastewater Samples ◽

Detection And Quantification

The ongoing COVID-19 pandemic, caused by SARS-CoV-2, constitutes a tremendous global health issue. Continuous monitoring of the virus has become a cornerstone to make rational decisions on implementing societal and sanitary measures to curtail the virus spread. Additionally, emerging SARS-CoV-2 variants have increased the need for genomic surveillance to detect particular strains because of their potentially increased transmissibility, pathogenicity and immune escape. Targeted SARS-CoV-2 sequencing of wastewater has been explored as an epidemiological surveillance method for the competent authorities. Few quality criteria are however available when sequencing wastewater samples, and those available typically only pertain to constructing the consensus genome sequence. Multiple variants circulating in the population can however be simultaneously present in wastewater samples. The performance, including detection and quantification of low-abundant variants, of whole genome sequencing (WGS) of SARS-CoV-2 in wastewater samples remains largely unknown. Here, we evaluated the detection and quantification of mutations present at low abundances using the SARS-CoV-2 lineage B.1.1.7 (alpha variant) defining mutations as a case study. Real sequencing data were in silico modified by introducing mutations of interest into raw wild-type sequencing data, or by mixing wild-type and mutant raw sequencing data, to mimic wastewater samples subjected to WGS using a tiling amplicon-based targeted metagenomics approach and Illumina sequencing. As anticipated, higher variation, lower sensitivity and more false negatives, were observed at lower coverages and allelic frequencies. We found that detection of all low-frequency variants at an abundance of 10%, 5%, 3% and 1%, requires at least a sequencing coverage of 250X, 500X, 1500X and 10,000X, respectively. Although increasing variability of estimated allelic frequencies at decreasing coverages and lower allelic frequencies was observed, its impact on reliable quantification was limited. This study provides a highly sensitive low-frequency variant detection approach, which is publicly available at https://galaxy.sciensano.be, and specific recommendations for minimum sequencing coverages to detect clade-defining mutations at specific allelic frequencies.

Download Full-text

Complete Genome Sequence of the Autotrophic Acetogen Clostridium formicaceticum DSM 92T Using Nanopore and Illumina Sequencing Data

Genome Announcements ◽

10.1128/genomea.00423-17 ◽

2017 ◽

Vol 5 (21) ◽

Cited By ~ 4

Author(s):

Michael M. Karl ◽

Anja Poehlein ◽

Frank R. Bengelsdorf ◽

Rolf Daniel ◽

Peter Dürre

Keyword(s):

Carbon Monoxide ◽

Genome Sequence ◽

Illumina Sequencing ◽

Complete Genome Sequence ◽

Complete Genome ◽

Sequencing Data ◽

Circular Chromosome ◽

Content Type ◽

Illumina Sequencing Data

ABSTRACT Here, we report the closed genome sequence of Clostridium formicaceticum, an Rnf- and cytochrome-containing autotrophic acetogen that is able to convert carbon monoxide to acetate using the Wood-Ljungdahl pathway. The genome consists of a circular chromosome (4.59 Mb).

Download Full-text

Contamination as a major factor in poor Illumina assembly of microbial isolate genomes

10.1101/081885 ◽

2016 ◽

Cited By ~ 5

Author(s):

Haeyoung Jeong ◽

Jae-Goo Pan ◽

Seung-Hwan Park

Keyword(s):

Illumina Sequencing ◽

De Novo ◽

Repetitive Sequences ◽

Low Frequency ◽

Read Depth ◽

16S Rrna Genes ◽

Rrna Genes ◽

Sequencing Error ◽

Sequencing Data ◽

Long Reads

ABSTRACTThe nonhybrid hierarchical assembly of PacBio long reads is becoming the most preferred method for obtaining genomes for microbial isolates. On the other hand, among massive numbers of Illumina sequencing reads produced, there is a slim chance of re-evaluating failed microbial genome assembly (high contig number, large total contig size, and/or the presence of low-depth contigs). We generated Illumina-type test datasets with various levels of sequencing error, pretreatment (trimming and error correction), repetitive sequences, contamination, and ploidy from both simulated and real sequencing data and applied k-mer abundance analysis to quickly detect possible diagnostic signatures of poor assemblies. Contamination was the only factor leading to poor assemblies for the test dataset derived from haploid microbial genomes, resulting in an extraordinary peak within low-frequency k-mer range. When thirteen Illumina sequencing reads of microbes belonging to genera Bacillus or Paenibacillus from a single multiplexed run were subjected to a k-mer abundance analysis, all three samples leading to poor assemblies showed peculiar patterns of contamination. Read depth distribution along the contig length indicated that all problematic assemblies suffered from too many contigs with low average read coverage, where 1% to 15% of total reads were mapped to low-coverage contigs. We found that subsampling or filtering out reads having rare k-mers could efficiently remove low-level contaminants and greatly improve the de novo assemblies. An analysis of 16S rRNA genes recruited from reads or contigs and the application of read classification tools originally designed for metagenome analyses can help identify the source of a contamination. The unexpected presence of proteobacterial reads across multiple samples, which had no relevance to our lab environment, implies that such prevalent contamination might have occurred after the DNA preparation step, probably at the place where sequencing service was provided.

Download Full-text

Increased yields of duplex sequencing data by a series of quality control tools

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab002 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Gundula Povysil ◽

Monika Heinzl ◽

Renato Salazar ◽

Nicholas Stoler ◽

Anton Nekrutenko ◽

...

Keyword(s):

Low Frequency ◽

Variant Calling ◽

Data Loss ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Consensus Sequences ◽

Sequencing Errors ◽

Data Output ◽

Reverse Strand ◽

Duplex Sequencing

Abstract Duplex sequencing is currently the most reliable method to identify ultra-low frequency DNA variants by grouping sequence reads derived from the same DNA molecule into families with information on the forward and reverse strand. However, only a small proportion of reads are assembled into duplex consensus sequences (DCS), and reads with potentially valuable information are discarded at different steps of the bioinformatics pipeline, especially reads without a family. We developed a bioinformatics toolset that analyses the tag and family composition with the purpose to understand data loss and implement modifications to maximize the data output for the variant calling. Specifically, our tools show that tags contain polymerase chain reaction and sequencing errors that contribute to data loss and lower DCS yields. Our tools also identified chimeras, which likely reflect barcode collisions. Finally, we also developed a tool that re-examines variant calls from raw reads and provides different summary data that categorizes the confidence level of a variant call by a tier-based system. With this tool, we can include reads without a family and check the reliability of the call, that increases substantially the sequencing depth for variant calling, a particular important advantage for low-input samples or low-coverage regions.

Download Full-text

RUNX1 and REXO2 are associated with the heterogeneity and prognosis of IDH wild type lower grade glioma

Scientific Reports ◽

10.1038/s41598-021-91382-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Haiwei Wang ◽

Xinrui Wang ◽

Liangpu Xu ◽

Ji Zhang ◽

Hua Cao

Keyword(s):

Transcription Factor ◽

Low Frequency ◽

Tumor Grade ◽

The Cancer Genome Atlas ◽

Lower Grade ◽

Wild Type ◽

Lower Grade Glioma ◽

Cancer Genome Atlas ◽

Worse Prognosis ◽

Genome Atlas

AbstractBased on isocitrate dehydrogenase (IDH) alterations, lower grade glioma (LGG) is divided into IDH mutant and wild type subgroups. However, the further classification of IDH wild type LGG was unclear. Here, IDH wild type LGG patients in The Cancer Genome Atlas and Chinese Glioma Genome Atlas were divided into two sub-clusters using non-negative matrix factorization. IDH wild type LGG patients in sub-cluster2 had prolonged overall survival and low frequency of CDKN2A alterations and low immune infiltrations. Differentially expressed genes in sub-cluster1 were positively correlated with RUNX1 transcription factor. Moreover, IDH wild type LGG patients with higher stromal score or immune score were positively correlated with RUNX1 transcription factor. RUNX1 and its target gene REXO2 were up-regulated in sub-cluster1 and associated with the worse prognosis of IDH wild type LGG. RUNX1 and REXO2 were associated with the higher immune infiltrations. Furthermore, RUNX1 and REXO2 were correlated with the worse prognosis of LGG or glioma. IDH wild type LGG in sub-cluster2 was hyper-methylated. REXO2 hyper-methylation was associated with the favorable prognosis of LGG or glioma. At last, we showed that, age, tumor grade and REXO2 expression were independent prognostic factors in IDH wild type LGG.

Download Full-text

IMMU-35. TRANSCRIPTIONALLY DEFINED IMMUNE CONTEXTURE IN HUMAN GLIOMAS AT SINGLE-CELL RESOLUTION

Neuro-Oncology ◽

10.1093/neuonc/noaa215.465 ◽

2020 ◽

Vol 22 (Supplement_2) ◽

pp. ii112-ii112

Author(s):

Pravesh Gupta ◽

Minghao Dang ◽

Krishna Bojja ◽

Tuan Tran M ◽

Huma Shehwana ◽

...

Keyword(s):

Dendritic Cell ◽

Single Cell ◽

Rna Sequencing ◽

Immune Cell ◽

Wild Type ◽

Sequencing Data ◽

Human Gliomas ◽

Cell Clusters ◽

Recurrent Gliomas ◽

Antigen Presenting

Abstract The brain tumor immune microenvironment (TIME) continuously evolves during glioma progression and a comprehensive understanding of the glioma-centric immune cell repertoire beyond a priori cell types and/or states is uncharted. Consequently, we performed single-cell RNA-sequencing on ~123,000 tumor-derived immune cells from 17-pathologically stratified, IDH (isocitrate dehydrogenase)-differential primary, recurrent human gliomas, and non-glioma brains. Our analysis delineated predominant 34-myeloid cell clusters (~75%) over 28-lymphoid cell clusters (~25%) reflecting enormous heterogeneity within and across gliomas. The glioma immune diversity spanned functionally imprinted phagocytic, antigen-presenting, hypoxia, angiogenesis and, tumoricidal myeloid to classical cytotoxic lymphoid subpopulations. Specifically, IDH-mutant gliomas were enriched for brain-resident microglial subpopulations in contrast to enhanced bone barrow-derived infiltrates in IDH-wild type, especially in a recurrent setting. Microglia attrition in IDH-wild type -primary and -recurrent gliomas were concomitant with invading monocyte-derived cells with semblance to dendritic cell and macrophage/microglia like transcriptomic features. Additionally, microglial functional diversification was noted with disease severity and mostly converged to inflammatory states in IDH-wild type recurrent gliomas. Beyond dendritic cells, multiple antigen-presenting cellular states expanded with glioma severity especially in IDH-wild type primary and recurrent- gliomas. Furthermore, we noted differential microglia and dendritic cell inherent antigen presentation axis viz, osteopontin, and classical HLAs in IDH subtypes and, glioma-wide non-PD1 checkpoints associations in T cells like Galectin9 and Tim-3. As a general utility, our immune cell deconvolution approach with single-cell-matched bulk RNA sequencing data faithfully resolved 58-cell states which provides glioma specific immune reference for digital cytometry application to genomics datasets. Resultantly, we identified prognosticator immune cell-signatures from TCGA cohorts as one of many potential immune responsiveness applications of the curated signatures for basic and translational immune-genomics efforts. Thus, we not only provide an unprecedented insight of glioma TIME but also present an immune data resource that can be exploited to guide pragmatic glioma immunotherapy designs.

Download Full-text

Roles of rpoN, fliA,and flgR in Expression of Flagella inCampylobacter jejuni

Journal of Bacteriology ◽

10.1128/jb.183.9.2937-2942.2001 ◽

2001 ◽

Vol 183 (9) ◽

pp. 2937-2942 ◽

Cited By ~ 59

Author(s):

Aparna Jagannathan ◽

Chrystala Constantinidou ◽

Charles W. Penn

Keyword(s):

Campylobacter Jejuni ◽

Genome Sequence ◽

Sigma Factor ◽

Transcriptional Activator ◽

Alternative Sigma Factor ◽

Wild Type ◽

Electron Microscopic ◽

Global Regulation ◽

Mutant Strains ◽

Microscopic Studies

ABSTRACT Three potential regulators of flagellar expression present in the genome sequence of Campylobacter jejuni NCTC 11168, the genes rpoN, flgR, andfliA, which encode the alternative sigma factor ς54, the ς54-associated transcriptional activator FlgR, and the flagellar sigma factor ς28, respectively, were investigated for their role in global regulation of flagellar expression. The three genes were insertionally inactivated inC. jejuni strains NCTC 11168 and NCTC 11828. Electron microscopic studies of the wild-type and mutant strains showed that therpoN and flgR mutants were nonflagellate and that the fliA mutant had truncated flagella. Immunoblotting experiments with the three mutants confirmed the roles of rpoN, flgR, and fliA in the expression of flagellin.

Download Full-text

PATH-40. SPORADIC NF2 WILD-TYPE MULTIPLE MENINGIOMAS HARBOR DISTINCT DRIVER MUTATIONS

Neuro-Oncology ◽

10.1093/neuonc/noab196.492 ◽

2021 ◽

Vol 23 (Supplement_6) ◽

pp. vi124-vi124

Author(s):

Insa Prilop ◽

Thomas Pinzer ◽

Daniel Cahill ◽

Priscilla Brastianos ◽

Gabriele Schackert ◽

...

Keyword(s):

Hot Spot ◽

Low Frequency ◽

Neurofibromatosis Type ◽

Driver Mutations ◽

Wild Type ◽

Driver Genes ◽

Who Grade ◽

Genetic Changes ◽

Histologic Subtype ◽

Multiple Meningiomas

Abstract OBJECTIVE Multiple meningiomas (MM) are rare and present a unique management challenge. While the mutational landscape of single meningiomas has been extensively studied, understanding the molecular pathogenesis of sporadic MM remains incomplete. The objective of this study is to elucidate the genetic features of sporadic MM. METHODS We identified nine patients with MM (n=19) defined as ≥2 spatially separated synchronous or metachronous meningiomas. We profiled genetic changes in these tumors using next-generation sequencing (NGS) assay that covers a large number of targetable and frequently mutated genes in meningiomas including AKT1, KLF4, NF2, PIK3CA/PIK3R1, POLR2A, SMARCB1, SMO, SUFU, TRAF7, and the TERT promoter. RESULTS Most of MM were WHO grade 1 (n= 16, 84.2%). Within individual patients, no driver mutation was shared between separate tumors. All but two cases harbored different hot spot mutations in known meningioma-driver genes like TRAF7 (n= 5), PIK3CA (n= 4), AKT1 (n= 3), POLR2A (n=1) and SMO (n= 1). Moreover, individual tumors differed in histologic subtype in 8/9 patients. The low frequency of NF2 mutations in our series stands in contrast to previous studies that included hereditary cases arising in the setting of neurofibromatosis type 2 (NF2). CONCLUSIONS Our findings provide evidence for genomic inter-tumor heterogeneity and an independent molecular origin of sporadic NF2 wild-type MM. Furthermore, these findings suggest that genetic characterization of each lesion is warranted in sporadic MM.

Download Full-text

Mutation at autosomal loci of Chinese hamster ovary cells: involvement of a high-frequency event silencing two linked alleles

Molecular and Cellular Biology ◽

10.1128/mcb.3.7.1172-1181.1983 ◽

1983 ◽

Vol 3 (7) ◽

pp. 1172-1181

Author(s):

W E Bradley

Keyword(s):

Cell Lines ◽

Chinese Hamster Ovary ◽

High Frequency ◽

Chinese Hamster Ovary Cells ◽

Chinese Hamster ◽

Low Frequency ◽

Class Ii ◽

Class I ◽

Wild Type ◽

Ovary Cells

Two classes of cell lines heterozygous at the galactokinase (glk) locus have been isolated from Chinese hamster ovary cells. Class I, selected by plating nonmutagenized wild-type cells at low density in medium containing 2-deoxygalactose at a partially selective concentration, underwent subsequent mutation to the glk-/- genotype at a low frequency (approximately 10(-6) per cell), which was increased by mutagenesis. Class II heterozygotes, isolated by sib selection from mutagenized wild-type cells, had a higher spontaneous frequency of mutation to the homozygous state (approximately 10(-4) per cell), which was not affected by mutagenesis. About half of the glk-/- mutants derived from a class II heterozygote, but not the heterozygote itself, were functionally hemizygous at the syntenic thymidine kinase (tk) locus. Similarly, a tk+/- heterozygote with characteristics analogous to the class II glk+/- cell lines underwent high-frequency mutation to tk-/-, and most of these mutants, but not the tk+/- heterozygote, were functionally hemizygous at the glk locus. A model is proposed, similar to that for the mutational events at the adenine phosphoribosyl transferase locus (W. E. C. Bradley and D. Letovanec, Somatic Cell Genet. 8:51-66, 1982), of two different events, high and low frequency, being responsible for mutation at either of the linked loci tk and glk. The low-frequency event may be a point mutation, but the high-frequency event, in many instances, involves coordinated inactivation of a portion of a chromosome carrying the two linked alleles. Class II heterozygotes would be generated as a result of a low-frequency event at one allele, and class I heterozygotes would be generated by a high-frequency event. Supporting this model was the demonstration that all class I glk+/- lines examined were functionally hemizygous at tk.

Download Full-text

Distribution of drug-metabolizing enzymes coding genes CYP2D6, CYP3A4, CYP3A5 alleles in a group of healthy Turkish population

Turkish Journal of Biochemistry ◽

10.1515/tjb-2017-0226 ◽

2018 ◽

Vol 44 (2) ◽

pp. 142-146

Author(s):

İsmail Ün ◽

İ. Ömer Barlas ◽

Nisa Uyar ◽

Bahar Taşdelen ◽

Naci Tiftik

Keyword(s):

Drug Therapy ◽

Adverse Drug Reactions ◽

Low Frequency ◽

Turkish Population ◽

Allele Frequencies ◽

Drug Reactions ◽

Metabolizing Enzymes ◽

Frequency Distributions ◽

Allelic Frequencies ◽

Hardy Weinberg Equilibrium

Abstract Objective: Variant alleles in specific ethnic groups are important for personalized drug therapy regimens and adverse drug reactions. Therefore, the aim of this study was to investigate allelic frequencies of the CYP2D6*1, CYP3A4*5, CYP3A4*18, CYP3A5*2 and CYP3A5*4 in a group of Turkish population. Materials and methods: Three hundred and six unrelated healthy subjects who were accepted as blood donors to the Mersin University Blood Bank were included in the study after informed consent. Allelic frequencies of the CYP2D6*1 (rs3892097), CYP3A4*5 (rs55901263), CYP3A4*18 (rs28371759), CYP3A5*2 (rs28365083) and CYP3A5*4 (rs56411402) were determined by using polymerase chain reaction-restriction fragment length polymorphism assays. Results: CYP2D6 allele frequencies in detected group was 100% for CYP2D6*1 (WT/WT). CYP3A4 allele frequencies of subjects were 100% for CYP3A4*5 (C/C) and CYP3A4*18 (T/T). CYP3A5 allele were in Hardy-Weinberg equilibrium for CYP3A5*2 (p=0.142) and frequencies for C and A allele were 91% and 9% respectively. CYP3A5 allele frequencies of subjects was 100% for CYP3A5*4 (WT/WT). Conclusion: Screening of low frequency alleles by pharmacogenetic testing must not be omitted to optimize pharmacotherapy and avoid severe drug toxicities. Frequency distributions of the identified polymorphisms in the present study may contribute to the personalized drug therapy regimens and prediction of possible adverse drug reactions in the Turkish population.

Download Full-text