scholarly journals Insights on early mutational events in SARS-CoV-2 virus reveal founder effects across geographical regions

Author(s):  
Carlos Farkas ◽  
Francisco Fuentes-Villalobos ◽  
José Luis Garrido ◽  
Jody J Haigh ◽  
María Inés Barría

AbstractHere we aim to describe early mutational events across samples from publicly available SARS-CoV-2 sequences from the sequence read archive repository. Up until March 27, 2020, we downloaded 53 illumina datasets, mostly from China, USA (Washington DC) and Australia (Victoria). Of 30 high quality datasets, 27 datasets (90%) contain at least a single founder mutation and most of the variants are missense (over 63%). Five-point mutations with clonal (founder) effect were found in USA sequencing samples. Sequencing samples from USA in GenBank present this signature with 50% allele frequencies among samples. Australian mutation signatures were more diverse than USA samples, but still, clonal events were found in those samples. Mutations in the helicase and orf1a coding regions from SARS-CoV-2 were predominant, among others, suggesting that these proteins are prone to evolve by natural selection. Finally, we firmly urge that primer sets for diagnosis be carefully designed, since rapidly occurring variants would affect the performance of the reverse transcribed quantitative PCR (RT-qPCR) based viral testing.

PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9255 ◽  
Author(s):  
Carlos Farkas ◽  
Francisco Fuentes-Villalobos ◽  
Jose Luis Garrido ◽  
Jody Haigh ◽  
María Inés Barría

Here we aim to describe early mutational events across samples from publicly available SARS-CoV-2 sequences from the sequence read archive and GenBank repositories. Up until 27 March 2020, we downloaded 50 illumina datasets, mostly from China, USA (WA State) and Australia (VIC). A total of 30 datasets (60%) contain at least a single founder mutation and most of the variants are missense (over 63%). Five-point mutations with clonal (founder) effect were found in USA next-generation sequencing samples. Sequencing samples from North America in GenBank (22 April 2020) present this signature with up to 39% allele frequencies among samples (n = 1,359). Australian variant signatures were more diverse than USA samples, but still, clonal events were found in these samples. Mutations in the helicase, encoded by the ORF1ab gene in SARS-CoV-2 were predominant, among others, suggesting that these regions are actively evolving. Finally, we firmly urge that primer sets for diagnosis be carefully designed, since rapidly occurring variants would affect the performance of the reverse transcribed quantitative PCR (RT-qPCR) based viral testing.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Jean-Marc Aury ◽  
Benjamin Istace

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.


1994 ◽  
Vol 14 (6) ◽  
pp. 3971-3980
Author(s):  
Y Lu ◽  
C M Alarcon ◽  
T Hall ◽  
L V Reddy ◽  
J E Donelson

We previously described a bloodstream Trypansoma rhodesiense clone, MVAT5-Rx2, whose isolation was based on its cross-reactivity with a monoclonal antibody (MAb) directed against a metacyclic variant surface glycoprotein (VSG). When the duplicated, expressed VSG gene in MVAT5-Rx2 was compared with its donor (basic copy) gene, 11 nucleotide differences were found in the respective 1.5-kb coding regions (Y. Lu, T. Hall, L. S. Gay, and J. E. Donelson, Cell 72:397-406, 1993). Here we describe a characterization of two additional bloodstream trypanosome clones, MVAT5-Rx1 and MVAT5-Rx3, whose VSGs are expressed from duplicated copies of the same donor VSG gene. The three trypanosome clones each react with the MVAT5-specific MAb, but they have different cross-reactivities with a panel of other MAbs, suggesting that their surface epitopes are similar but nonidentical. Each of the three gene duplication events occurs at a different 5' crossover site within a 76-bp repeat and is associated with a different set of point mutations. The 35, 11, and 28 point mutations in the duplicated VSG coding regions of Rx1, Rx2, and Rx3, respectively, exhibit a strand bias. In the sense strand, of the 74 total mutations generated in the three duplications, 54% are A-to-G or G-to-A (A:G) transitions and 7% are C:T transitions, while 26% are C:A transversions and 13% are C:G transversions. No T:G or T:A transversions occurred. Possible models for the generation of these point mutations are discussed.


2001 ◽  
Vol 68 (3) ◽  
pp. 617-626 ◽  
Author(s):  
Magali Periquet ◽  
Christoph B. Lücking ◽  
Jenny R. Vaughan ◽  
Vincenzo Bonifati ◽  
Alexandra Dürr ◽  
...  

2009 ◽  
Vol 46 (5) ◽  
pp. 541-544 ◽  
Author(s):  
Akhtar Ali ◽  
Subodh Kumar Singh ◽  
Rajiva Raman

Objective: Evaluation of the IRF6 gene in Van der Woude syndrome cases from an Indian population. Subjects: Nine affected and four unaffected individuals from seven families with Van der Woude syndrome as well as five normal controls (with no history of Van der Woude or any other congenital malformation and belonging to the same geographical area as the families with Van der Woude syndrome). Method: Direct sequencing of all coding regions and exon-intron boundaries of the IRF6 gene. Results: Five novel variants: IVS1+3900 A>G, 191 T>C, IVS4+775 C>T, IVS8+218 C>T, 1511 T>A (Ser 416 Arg) and two known variants: IVS6+27 C>G, 1083 G>A (V274I) were detected. Except for one, all were in noncoding regions either in 3′UTR or in introns. There was only one mutation in the coding region, detected in a normal control. Conclusion: The present report indicates that point mutations in the coding region of the IRF6 gene may not be a major cause of Van der Woude syndrome in Indian populations.


1990 ◽  
Vol 172 (6) ◽  
pp. 1717-1727 ◽  
Author(s):  
S G Lebecque ◽  
P J Gearhart

To investigate why somatic mutations are spatially restricted to a region around the rearranged V(D)J immunoglobulin gene, we compared the distribution of mutations flanking murine V gene segments that had rearranged next to either proximal or distal J gene segments. 124 nucleotide substitutions, nine deletions, and two insertions were identified in 32,481 bp of DNA flanking the coding regions from 17 heavy and kappa light chain genes. Most of the mutations occurred within a 2-kb region centered around the V(D)J gene, regardless of which J gene segment was used, suggesting that the structural information for mutation is located in sequences around and within the V(D)J gene, and not in sequences downstream of the J gene segments. The majority of mutations were found within 300 bp of DNA flanking the 5' side of the V(D)J gene and 850 bp flanking the 3' side at a frequency of 0.8%, which was similar to the frequency in the coding region. The frequency of flanking mutations decreased as a function of distance from the gene. There was no evidence for hot spots in that every mutation was unique and occurred at a different position. No mutations were found upstream of the promoter region, suggesting that the promoter delimits a 5' boundary, which provides strong evidence that transcription is necessary to generate mutation. The 3' boundary was approximately 1 kb from the V(D)J gene and was not associated with a DNA sequence motif. Occasional mutations were located in the nuclear matrix association and enhancer regions. The pattern of substitutions suggests that there is discrimination between the two DNA strands during mutation, in that the four bases were mutated with different frequencies on each strand. The high frequency of mutations in the 3' flanking region and the uniqueness of each mutation argues against templated gene conversion as a mechanism for generating somatic diversity in murine V(D)J genes. Rather, the data support a model for random point mutations where the mechanism is linked to the transcriptional state of the gene.


2021 ◽  
Vol 10 (8) ◽  
pp. 1682
Author(s):  
Tamás Major ◽  
Réka Gindele ◽  
Gábor Balogh ◽  
Péter Bárdossy ◽  
Zsuzsanna Bereczky

A founder effect can result from the establishment of a new population by individuals from a larger population or bottleneck events. Certain alleles may be found at much higher frequencies because of genetic drift immediately after the founder event. We provide a systematic literature review of the sporadically reported founder effects in hereditary hemorrhagic telangiectasia (HHT). All publications from the ACVRL1, ENG and SMAD4 Mutation Databases and publications searched for terms “hereditary hemorrhagic telangiectasia” and “founder” in PubMed and Scopus, respectively, were extracted. Following duplicate removal, 141 publications were searched for the terms “founder” and “founding” and the etymon “ancest”. Finally, 67 publications between 1992 and 2020 were reviewed. Founder effects were graded upon shared area of ancestry/residence, shared core haplotypes, genealogy and prevalence. Twenty-six ACVRL1 and 12 ENG variants with a potential founder effect were identified. The bigger the cluster of families with a founder mutation, the more remarkable is its influence to the populational ACVRL1/ENG ratio, affecting HHT phenotype. Being aware of founder effects might simplify the diagnosis of HHT by establishing local genetic algorithms. Families sharing a common core haplotype might serve as a basis to study potential second-hits in the etiology of HHT.


Blood ◽  
2009 ◽  
Vol 114 (22) ◽  
pp. 144-144
Author(s):  
Vera Grossmann ◽  
Alexander Kohlmann ◽  
Claudia Haferlach ◽  
Hans-Ulrich Klein ◽  
Martin Dugas ◽  
...  

Abstract Abstract 144 PicoTiterPlate (PTP) pyrosequencing allows the detection of low-abundance oncogene aberrations in complex samples even with low tumor content. Here, we compared deep sequencing data of two Next-Generation Sequencing (NGS) assays to detect molecular mutations using a PCR-based strategy and, in addition, to uncover inversions, translocations, and insertions in a targeted sequence enrichment workflow (454 Life Sciences, Roche Diagnostics Corporation, Branford, CT). First, we studied 95 patients (CMML, n=81; AML, n=6; MDS, n=3; MPS, n=3; ET, n=2) using the amplicon approach and investigated seven candidate genes with relevance in oncogenesis of myeloid malignancies: TET2, RUNX1, JAK2, MPL, KRAS, NRAS, and CBL. 43 primer pairs were designed to cover the complete coding regions of TET2, RUNX1 (beta isoform), and hotspot regions of the latter genes. In total, 4128 individual PCR reactions were performed with DNA isolated from bone marrow mononuclear cells, followed by product purification, fluorometric quantitation, and equimolar pooling of the corresponding 43 amplicon products to generate one single sequence library per patient. For sequencing, a 454 8-lane PTP was used applying standard FLX chemistry and representing one patient per lane. The median number of base pairs sequenced per patient was 9.23 Mb. For each amplicon a median of 840 reads was generated (coverage range: 485–1929 reads). As initial proof-of-concept analysis 27 of the 95 patients with known mutations (n=32) as detected by conventional sequencing or melting curve analyses were investigated (range of cells carrying the respective mutation: 1.1% for JAK2 V617F to 98.14% for TET2 C1464X). In all cases, 454 NGS confirmed results from routine diagnostic methods (GS Amplicon Variant Analyzer software version 2.0.01). We then investigated the remaining 69 CMML patients: In median, 2 variances (range 1–8 variances), i.e. differences in comparison to the reference sequence, per patient were detected. These variances included both point mutations in all candidate genes and large deletions (12-19 bp) in CBL, RUNX1, and TET2. Only 20/81 patients of the CMML-cohort (24.69%) were without any detectable mutation. Secondly, in a cohort of six AML bone marrow specimens a custom NimbleGen array (385K format; Madison, WI) was used to perform a targeted DNA sequence enrichment procedure. In total, capture probes spanning 1.91 Mb were designed to represent all coding regions of 92 target genes (1559 exons) with relevance in hematological malignancies (e.g. KIT, NF1, TP53, BCR, ABL1, NPM1, or FLT3). In addition, the complete genomic regions were targeted for RUNX1, CBFB, and MLL. For sequencing, 454 Titanium chemistry was applied, loading three patients per lane on a 2-lane PTP including three molecular identifiers (MIDs) each. Data analysis was performed using the GS Reference Mapper software version 2.0.01. For the enrichment assay, the median enrichment of the targeted genomic loci was 207-fold, as assessed by ligation-mediated LM-PCR. Overall, 1,098,132 reads were generated in the two lanes, yielding a total sequence length of 386,097,740 bases. In median, 96.52% of the sequenced bases mapped against the human genome, and 66.0% were derived from the customized NimbleGen array capture probes, resulting in a median coverage of 18.7-fold . With this method it was possible to detect and confirm point mutations (KIT, FLT3-TKD, and KRAS) and insertions (FLT3-ITD). Moreover, by capturing chimeric DNA fragments and generating reads mapping to both fusion partners this approach detected balanced aberrations, i.e. inv(16)(p13q22) and the translocations t(8;21)(q22;q22) or t(9;11)(p22;q23). In conclusion, both assays to specifically sequence targeted regions with oncogenic relevance on a NGS platform demonstrated promising results and are feasible. The amplicon approach is more suitable for detection of mutations in a routine setting and is ideally suited for large genes such as TET2, ATM, and NF1, which are labor-intensive to sequence conventionally. The array-based capturing assay is characterized by a complex and time-consuming workflow with low-throughput. However, the ability to detect balanced genomic aberrations which are detectable thus far only by cytogenetics and FISH has the potential to become an important diagnostic assay, especially in tumors in which cytogenetics can not be applied successfully. Disclosures: Grossmann: MLL Munich Leukemia Laboratory: Employment. Kohlmann:MLL Munich Leukemia Laboratory: Employment. Haferlach:MLL Munich Leukemia Laboratory: Equity Ownership. Dicker:MLL Munich Leukemia Laboratory: Employment. Kazak:MLL Munich Leukemia Laboratory: Employment. Schindela:MLL Munich Leukemia Laboratory: Employment. Schnittger:MLL Munich Leukemia Laboratory: Equity Ownership. Kern:MLL Munich Leukemia Laboratory: Equity Ownership. Haferlach:MLL Munich Leukemia Laboratory: Equity Ownership.


Sign in / Sign up

Export Citation Format

Share Document