single nucleotide variants
Recently Published Documents





PLoS ONE ◽  
2022 ◽  
Vol 17 (1) ◽  
pp. e0261014
Carlos Arana ◽  
Chaoying Liang ◽  
Matthew Brock ◽  
Bo Zhang ◽  
Jinchun Zhou ◽  

High viral transmission in the COVID-19 pandemic has enabled SARS‐CoV‐2 to acquire new mutations that may impact genome sequencing methods. The ARTIC.v3 primer pool that amplifies short amplicons in a multiplex-PCR reaction is one of the most widely used methods for sequencing the SARS-CoV-2 genome. We observed that some genomic intervals are poorly captured with ARTIC primers. To improve the genomic coverage and variant detection across these intervals, we designed long amplicon primers and evaluated the performance of a short (ARTIC) plus long amplicon (MRL) sequencing approach. Sequencing assays were optimized on VR-1986D-ATCC RNA followed by sequencing of nasopharyngeal swab specimens from fifteen COVID-19 positive patients. ARTIC data covered 94.47% of the virus genome fraction in the positive control and patient samples. Variant analysis in the ARTIC data detected 217 mutations, including 209 single nucleotide variants (SNVs) and eight insertions & deletions. On the other hand, long-amplicon data detected 156 mutations, of which 80% were concordant with ARTIC data. Combined analysis of ARTIC + MRL data improved the genomic coverage to 97.03% and identified 214 high confidence mutations. The combined final set of 214 mutations included 203 SNVs, 8 deletions and 3 insertions. Analysis showed 26 SARS-CoV-2 lineage defining mutations including 4 known variants of concern K417N, E484K, N501Y, P618H in spike gene. Hybrid analysis identified 7 nonsynonymous and 5 synonymous mutations across the genome that were either ambiguous or not called in ARTIC data. For example, G172V mutation in the ORF3a protein and A2A mutation in Membrane protein were missed by the ARTIC assay. Thus, we show that while the short amplicon (ARTIC) assay provides good genomic coverage with high throughput, complementation of poorly captured intervals with long amplicon data can significantly improve SARS-CoV-2 genomic coverage and variant detection.

F1000Research ◽  
2022 ◽  
Vol 11 ◽  
pp. 33
Alexandr Boytsov ◽  
Sergey Abramov ◽  
Vsevolod J. Makeev ◽  
Ivan V. Kulakovskiy

The commonly accepted model to quantify the specificity of transcription factor binding to DNA is the position weight matrix, also called the position-specific scoring matrix. Position weight matrices are used in thousands of projects and computational tools in regulatory genomics, including prediction of the regulatory potential of single-nucleotide variants. Yet, recently Yan et al. presented new experimental method for analysis of regulatory variants and, based on its results, reported that "the position weight matrices of most transcription factors lack sufficient predictive power". Here, we re-analyze the rich experimental dataset obtained by Yan et al. and show that appropriately selected position weight matrices in fact can successfully quantify transcription factor binding to alternative alleles.

2022 ◽  
Lev I. Levitsky ◽  
Ksenia Kuznetsova ◽  
Anna A. Kliuchnikova ◽  
Irina Y. Ilina ◽  
Anton O. Goncharov ◽  

Mass spectrometry-based proteome analysis usually implies matching mass spectra of proteolytic peptides to amino acid sequences predicted from nucleic acid sequences. At the same time, due to the stochastic nature of the method when it comes to proteome-wide analysis, in which only a fraction of peptides are selected for sequencing, the completeness of protein sequence identification is undermined. Likewise, the reliability of peptide variant identification in proteogenomic studies is suffering. We propose a way to interpret shotgun proteomics results, specifically in data-dependent acquisition mode, as protein sequence coverage by multiple reads, just as it is done in the field of nucleic acid sequencing for the calling of single nucleotide variants. Multiple reads for each position in a sequence could be provided by overlapping distinct peptides, thus, confirming the presence of certain amino acid residues in the overlapping stretch with much lower false discovery rate than conventional 1%. The source of overlapping distinct peptides are, first, miscleaved tryptic peptides in combination with their properly cleaved counterparts, and, second, peptides generated by several proteases with different specificities after the same specimen is subject to parallel digestion and analyzed separately. We illustrate this approach using publicly available multiprotease proteomic datasets and our own data generated for HEK-293 cell line digests obtained using trypsin, LysC and GluC proteases. From 5000 to 8000 protein groups are identified for each digest corresponding to up to 30% of the whole proteome coverage. Most of this coverage was provided by a single read, while up to 7% of the observed protein sequences were covered two-fold and more. The proteogenomic analysis of HEK-293 cell line revealed 36 peptide variants associated with SNP, seven of which were supported by multiple reads. The efficiency of the multiple reads approach depends strongly on the depth of proteome analysis, the digesting features such as the level of miscleavages, and will increase with the number of different proteases used in parallel proteome digestion.

2022 ◽  
Yoo-Jin Ha ◽  
Jisoo Kim ◽  
Seungseok Kang ◽  
Junhan Kim ◽  
Se-Young Jo ◽  

Abstract The rapid advances in sequencing and analysis technologies have enabled the accurate detection of diverse forms of genomic variants, including germline, somatic, and mosaic mutations. However, unlike for the former two mutations, the best practices for mosaic variant calling still remain chaotic due to the technical and conceptual difficulties faced in evaluation. Here, we present our benchmark of nine feasible strategies for mosaic variant detection based on a systematically designed reference standard that mimics mosaic samples, with 390,153 control positive and 35,208,888 negative single-nucleotide variants and insertion–deletion mutations. We identified the condition-dependent strengths and weaknesses of the current strategies, instead of a single winner, regarding variant allele frequencies, variant sharing, and the usage of control samples. Moreover, feature-level investigation directs the way for immediate to prolonged improvements in mosaic variant calling. Our results will guide researchers in selecting suitable calling algorithms and suggest future strategies for developers.

2022 ◽  
Vol 13 (1) ◽  
John K. L. Wong ◽  
Christian Aichmüller ◽  
Markus Schulze ◽  
Mario Hlevnjak ◽  
Shaymaa Elgaafary ◽  

AbstractCancer driving mutations are difficult to identify especially in the non-coding part of the genome. Here, we present sigDriver, an algorithm dedicated to call driver mutations. Using 3813 whole-genome sequenced tumors from International Cancer Genome Consortium, The Cancer Genome Atlas Program, and a childhood pan-cancer cohort, we employ mutational signatures based on single-base substitution in the context of tri- and penta-nucleotide motifs for hotspot discovery. Knowledge-based annotations on mutational hotspots reveal enrichment in coding regions and regulatory elements for 6 mutational signatures, including APOBEC and somatic hypermutation signatures. APOBEC activity is associated with 32 hotspots of which 11 are known and 11 are putative regulatory drivers. Somatic single nucleotide variants clusters detected at hypermutation-associated hotspots are distinct from translocation or gene amplifications. Patients carrying APOBEC induced PIK3CA driver mutations show lower occurrence of signature SBS39. In summary, sigDriver uncovers mutational processes associated with known and putative tumor drivers and hotspots particularly in the non-coding regions of the genome.

Genes ◽  
2022 ◽  
Vol 13 (1) ◽  
pp. 125
Jakub Skorupski

In this paper, a complete mitochondrial genome of the critically endangered European mink Mustela lutreola L., 1761 is reported. The mitogenome was 16,504 bp in length and encoded the typical 13 protein-coding genes, two ribosomal RNA genes and 22 transfer RNA genes, and harboured a putative control region. The A+T content of the entire genome was 60.06% (A > T > C > G), and the AT-skew and GC-skew were 0.093 and −0.308, respectively. The encoding-strand identity of genes and their order were consistent with a collinear gene order characteristic for vertebrate mitogenomes. The start codons of all protein-coding genes were the typical ATN. In eight cases, they were ended by complete stop codons, while five had incomplete termination codons (TA or T). All tRNAs had a typical cloverleaf secondary structure, except tRNASer(AGC) and tRNALys, which lacked the DHU stem and had reduced DHU loop, respectively. Both rRNAs were capable of folding into complex secondary structures, containing unmatched base pairs. Eighty-one single nucleotide variants (substitutions and indels) were identified. Comparative interspecies analyses confirmed the close phylogenetic relationship of the European mink to the so-called ferret group, clustering the European polecat, the steppe polecat and the black-footed ferret. The obtained results are expected to provide useful molecular data, informing and supporting effective conservation measures to save M. lutreola.

Anja Harder

AbstractNon-pathogenic mismatch repair (MMR) gene variants can be associated with decreased MMR capacity in several settings. Due to an increased mutation rate, reduced MMR capacity leads to accumulation of somatic sequence changes in tumour suppressor genes such as in the neurofibromatosis type 1 (NF1) gene. Patients with autosomal dominant NF1 typically develop neurofibromas ranging from single to thousands. Concerning the number of neurofibromas NF1 patients face a situation that is still not predictable. A few studies suggested that germline non-pathogenic MMR gene variants modify the number of neurofibromas in NF1 and by this mechanism may promote the extent of neurofibroma manifestation. This review represents first evidence that specific non-pathogenic single nucleotide variants of MMR genes act as a modifier of neurofibroma manifestation in NF1, highlighting MSH2 re4987188 as the best analysed non-pathogenic variant so far. In summary, besides MSH2 promotor methylation, specific non-pathogenic germline MSH2 variants are associated with the extent of neurofibroma manifestation. Those variants can serve as a biomarker to facilitate better mentoring of NF1 patients at risk.

2022 ◽  
Vol 22 (1) ◽  
Hany E. Marei ◽  
Asmaa Althani ◽  
Nahla Afifi ◽  
Anwarul Hasan ◽  
Thomas Caceci ◽  

Abstract Background Glioblastoma multiforme (GBM) is a heterogeneous CNS neoplasm which causes significant morbidity and mortality. One reason for the poor prognostic outcome of GBM is attributed to the presence of cancer stem cells (CSC) which confer resistance against standard chemo- and radiotherapeutics modalities. Two types of GBM-associated CSC were isolated from the same patient: tumor core- (c-CSC) and peritumor tissue-derived cancer stem cells (p-CSC). Our experiments are focused on glioblastoma–IDH-wild type, and no disease-defining alterations were present in histone, BRAF or other genes. Methods In the present study, potential differences in genetic variants between c-CSC versus p-CSC derived from four GBM patients were investigated with the aims of (1) comparing the exome sequences between all the c-CSC or p-CSC to identify the common variants; (2) identifying the variants affecting the function of genes known to be involved in cancer origin and development. Results By comparative analyses, we identified common gene single nucleotide variants (SNV) in all GBM c-CSC and p-CSC, a potentially deleterious variant was a frameshift deletion at Gln461fs in the MLLT1 gene, that was encountered only in p-CSC samples with different allelic frequency. Conclusions We discovered a potentially harmful frameshift deletion at Gln461fs in the MLLT1 gene. Further investigation is required to confirm the presence of the identified mutations in patient tissue samples, as well as the significance of the frameshift mutation in the MLLT1 gene on GBM biology and response to therapy based on genomic functional experiments.

Jiadai Xu ◽  
Yue Wang ◽  
Zheng Wei ◽  
Jingli Zhuang ◽  
Jing Li ◽  

This study attempted to investigate how clonal structure evolves, along with potential regulatory networks, as a result of multiline therapies in relapsed/refractory multiple myeloma (RRMM). Eight whole exome sequencing (WES) and one single cell RNA sequencing (scRNA-seq) were performed in order to assess dynamic genomic changes in temporal consecutive samples of one RRMM patient from the time of diagnosis to death (about 37 months). The 63-year-old female patient who suffered from MM (P1) had disease progression (PD) nine times from July 2017 [newly diagnosed (ND)] to Aug 2020 (death), and the force to drive branching-pattern evolution of malignant PCs was found to be sustained. The mutant-allele tumor heterogeneity (MATH) and tumor mutation burden (TMB) initially exhibited a downward trend, which was then upward throughout the course of the disease. Various somatic single nucleotide variants (SNVs) that had disappeared after the previous treatment were observed to reappear in later stages. Chromosomal instability (CIN) and homologous recombination deficiency (HRD) scores were observed to be increased during periods of all progression, especially in the period of extramedullary plasmacytoma. Finally, in combination with WES and scRNA-seq of P1-PD9 (the nineth PD), the intro-heterogeneity and gene regulatory networks of MM cells were deciphered. As verified by the overall survival of MM patients in the MMRF CoMMpass and GSE24080 datasets, RUNX3 was identified as a potential driver for RRMM.

2022 ◽  
Nicholas Cardillo ◽  
Eric Devor ◽  
Silvana Pedra Nobre ◽  
Andreea Newtson ◽  
Kimberly Leslie ◽  

Abstract Background: Advanced high grade serous (HGSC) ovarian cancer is treated with either primary surgery followed by chemotherapy or neoadjuvant chemotherapy followed by interval surgery. The decision to proceed with surgery either primarily or after chemotherapy is based on a surgeon’s clinical assessment and prediction of an optimal outcome. Optimal surgery is correlated with improved overall survival. This clinical assessment results in an optimal surgery approximately 70% of the time. We hypothesize that this prediction can be improved by using biological tumor data to predict optimal cytoreduction.Methods: With access to a large biobank of ovarian cancer tumors, we obtained genomic data on 83 patients encompassing gene expression, exon expression, long non-coding RNA, micro RNA, single nucleotide variants, copy number variation, DNA methylation, and fusion transcripts. We then used machine learning to incorporate this data with pre-operative clinical information to create predictive models which successfully predicted whether or not a patient’s cytoreductive surgery would have an optimal outcome. These models were then validated within The Cancer Genome Atlas (TCGA) HGSC database. Results: Of the 124 models created and validated, 21 performed at least equal if not better than our historical clinical rate of optimal debulking in advanced-stage HGSC as a control, 78%. Conclusions: This is the first time tumor genomic data has been used to predict surgical outcome in ovarian cancer. Prospective validation of these models could result in improving our ability to objectively predict which patients will undergo optimal cytoreduction and, therefore, improve our ovarian cancer outcomes.

Sign in / Sign up

Export Citation Format

Share Document