scholarly journals A short plus long-amplicon based sequencing approach improves genomic coverage and variant detection in the SARS-CoV-2 genome

PLoS ONE ◽  
2022 ◽  
Vol 17 (1) ◽  
pp. e0261014
Author(s):  
Carlos Arana ◽  
Chaoying Liang ◽  
Matthew Brock ◽  
Bo Zhang ◽  
Jinchun Zhou ◽  
...  

High viral transmission in the COVID-19 pandemic has enabled SARS‐CoV‐2 to acquire new mutations that may impact genome sequencing methods. The ARTIC.v3 primer pool that amplifies short amplicons in a multiplex-PCR reaction is one of the most widely used methods for sequencing the SARS-CoV-2 genome. We observed that some genomic intervals are poorly captured with ARTIC primers. To improve the genomic coverage and variant detection across these intervals, we designed long amplicon primers and evaluated the performance of a short (ARTIC) plus long amplicon (MRL) sequencing approach. Sequencing assays were optimized on VR-1986D-ATCC RNA followed by sequencing of nasopharyngeal swab specimens from fifteen COVID-19 positive patients. ARTIC data covered 94.47% of the virus genome fraction in the positive control and patient samples. Variant analysis in the ARTIC data detected 217 mutations, including 209 single nucleotide variants (SNVs) and eight insertions & deletions. On the other hand, long-amplicon data detected 156 mutations, of which 80% were concordant with ARTIC data. Combined analysis of ARTIC + MRL data improved the genomic coverage to 97.03% and identified 214 high confidence mutations. The combined final set of 214 mutations included 203 SNVs, 8 deletions and 3 insertions. Analysis showed 26 SARS-CoV-2 lineage defining mutations including 4 known variants of concern K417N, E484K, N501Y, P618H in spike gene. Hybrid analysis identified 7 nonsynonymous and 5 synonymous mutations across the genome that were either ambiguous or not called in ARTIC data. For example, G172V mutation in the ORF3a protein and A2A mutation in Membrane protein were missed by the ARTIC assay. Thus, we show that while the short amplicon (ARTIC) assay provides good genomic coverage with high throughput, complementation of poorly captured intervals with long amplicon data can significantly improve SARS-CoV-2 genomic coverage and variant detection.

2021 ◽  
Author(s):  
Carlos Arana ◽  
Chaoying Liang ◽  
Matthew Brock ◽  
Bo Zhang ◽  
Jinchun Zhou ◽  
...  

High viral transmission in the COVID-19 pandemic has enabled SARS‐CoV‐2 to acquire new mutations that impact genome sequencing methods. The ARTIC.v3 primer pool that amplifies short amplicons in a multiplex-PCR reaction is one of the most widely used methods for sequencing the SARS-CoV-2 genome. We observed that some genomic intervals are poorly captured with ARTIC primers. To improve the genomic coverage and variant detection across these intervals, we designed long amplicon primers and evaluated the performance of a short (ARTIC) plus long amplicon (MRL) sequencing approach. Sequencing assays were optimized on VR-1986D-ATCC RNA followed by sequencing of nasopharyngeal swab specimens from five COVID-19 positive patients. ARTIC data covered >90% of the virus genome fraction in the positive control and four of the five patient samples. Variant analysis in the ARTIC data detected 67 mutations, including 66 single nucleotide variants (SNVs) and one deletion in ORF10. Of 66 SNVs, five were present in the spike gene, including nt22093 (M177I), nt23042 (S494P), nt23403 (D614G), nt23604 (P681H), and nt23709 (T716I). The D614G mutation is a common variant that has been shown to alter the fitness of SARS-CoV-2. Two spike protein mutations, P681H and T716I, which are represented in the B.1.1.7 lineage of SARS-CoV-2, were also detected in one patient. Long-amplicon data detected 58 variants, of which 70% were concordant with ARTIC data. Combined analysis of ARTIC +MRL data revealed 22 mutations that were either ambiguous (17) or not called at all (5) in ARTIC data due to poor sequencing coverage. For example, a common mutation in the ORF3a gene at nt25907 (G172V) was missed by the ARTIC assay. Hybrid data analysis improved sequencing coverage overall and identified 59 high confidence mutations for phylogenetic analysis. Thus, we show that while the short amplicon (ARTIC) assay provides good genomic coverage with high throughput, complementation of poorly captured intervals with long amplicon data can significantly improve SARS-CoV-2 genomic coverage and variant detection.


Author(s):  
Liang Cheng ◽  
Xudong Han ◽  
Zijun Zhu ◽  
Changlu Qi ◽  
Ping Wang ◽  
...  

Abstract Since the first report of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in December 2019, the COVID-19 pandemic has spread rapidly worldwide. Due to the limited virus strains, few key mutations that would be very important with the evolutionary trends of virus genome were observed in early studies. Here, we downloaded 1809 sequence data of SARS-CoV-2 strains from GISAID before April 2020 to identify mutations and functional alterations caused by these mutations. Totally, we identified 1017 nonsynonymous and 512 synonymous mutations with alignment to reference genome NC_045512, none of which were observed in the receptor-binding domain (RBD) of the spike protein. On average, each of the strains could have about 1.75 new mutations each month. The current mutations may have few impacts on antibodies. Although it shows the purifying selection in whole-genome, ORF3a, ORF8 and ORF10 were under positive selection. Only 36 mutations occurred in 1% and more virus strains were further analyzed to reveal linkage disequilibrium (LD) variants and dominant mutations. As a result, we observed five dominant mutations involving three nonsynonymous mutations C28144T, C14408T and A23403G and two synonymous mutations T8782C, and C3037T. These five mutations occurred in almost all strains in April 2020. Besides, we also observed two potential dominant nonsynonymous mutations C1059T and G25563T, which occurred in most of the strains in April 2020. Further functional analysis shows that these mutations decreased protein stability largely, which could lead to a significant reduction of virus virulence. In addition, the A23403G mutation increases the spike-ACE2 interaction and finally leads to the enhancement of its infectivity. All of these proved that the evolution of SARS-CoV-2 is toward the enhancement of infectivity and reduction of virulence.


2021 ◽  
Author(s):  
Justin Landis ◽  
Razia Moorad ◽  
Linda J. Pluta ◽  
Carolina Caro-Vegas ◽  
Ryan P. McNamara ◽  
...  

Variants of concern (VOC) in SARS-CoV-2 refer to viral genomes that differ significantly from the ancestor virus and that show the potential for higher transmissibility and/or worse clinical progression. VOC have the potential to disrupt ongoing public health measures and vaccine efforts. Yet, little is known regarding how frequently different viral variants emerge and under what circumstances. We report a longitudinal study to determine the degree of SARS-CoV-2 sequence evolution in 94 COVID-19 cases and to estimate the frequency at which highly diverse variants emerge. 2 cases accumulated 9 single-nucleotide variants (SNVs) over a two-week period and 1 case accumulated 23 SNVs over a three-week period, including three non-synonymous mutations in the Spike protein (D138H, E554D, D614G). We estimate that in 2% of COVID cases, viral variants with multiple mutations, including in the Spike glycoprotein, can become the dominant strains in as little as one month of persistent in patient virus replication. This suggests the continued local emergence of VOC independent of travel patterns. Surveillance by sequencing for (i) viremic COVID-19 patients, (ii) patients suspected of re-infection, and (iii) patients with diminished immune function may offer broad public health benefits.


Viruses ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 1806
Author(s):  
Ueric José Borges de Souza ◽  
Raíssa Nunes dos Santos ◽  
Fabrício Souza Campos ◽  
Karine Lima Lourenço ◽  
Flavio Guimarães da Fonseca ◽  
...  

Brazil was considered one of the emerging epicenters of the coronavirus pandemic in 2021, experiencing over 3000 daily deaths caused by the virus at the peak of the second wave. In total, the country had more than 20.8 million confirmed cases of COVID-19, including over 582,764 fatalities. A set of emerging variants arose in the country, some of them posing new challenges for COVID-19 control. The goal of this study was to describe mutational events across samples from Brazilian SARS-CoV-2 sequences publicly obtainable on Global Initiative on Sharing Avian Influenza Data-EpiCoV (GISAID-EpiCoV) platform and to generate indexes of new mutations by each genome. A total of 16,953 SARS-CoV-2 genomes were obtained, which were not proportionally representative of the five Brazilian geographical regions. A comparative sequence analysis was conducted to identify common mutations located at 42 positions of the genome (38 were in coding regions, whereas two were in 5′ and two in 3′ UTR). Moreover, 11 were synonymous variants, 27 were missense variants, and more than 44.4% were located in the spike gene. Across the total of single nucleotide variations (SNVs) identified, 32 were found in genomes obtained from all five Brazilian regions. While a high genomic diversity has been reported in Europe given the large number of sequenced genomes, Africa has demonstrated high potential for new variants. In South America, Brazil, and Chile, rates have been similar to those found in South Africa and India, providing enough “space” for new mutations to arise. Genomic surveillance is the central key to identifying the emerging variants of SARS-CoV-2 in Brazil and has shown that the country is one of the “hotspots” in the generation of new variants.


2021 ◽  
Vol 12 ◽  
Author(s):  
Eun-Ha Hwang ◽  
Hoyin Chung ◽  
Green Kim ◽  
Hanseul Oh ◽  
You Jung An ◽  
...  

Recently, newly emerging variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) have been continuously reported worldwide. However, the precise evaluation of SARS-CoV-2 microevolution in host is very limited because the exact genetic information of infected virus could not be acquired in human researches. In this report, we performed deep sequencing for seed virus and SARS-CoV-2 isolated in eight cynomolgus and rhesus macaques at 3 days postinoculation and evaluated single-nucleotide polymorphisms (SNPs) in SARS-CoV-2 by variant analysis. A total of 69 single-nucleotide variants (SNVs) were present in the 5′-untranslated region (UTR), 3′-UTR, ORF1ab, S, ORF3a, ORF8, and N genes of the seed virus passaged in VERO cells. Between those present on the seed virus and those on each SARS-CoV-2 isolated from the lungs of the macaques, a total of 29 variants was identified in 4 coding proteins (ORF1ab, S, ORF3a, and N) and non-coding regions (5′- and 3′-UTR). Variant number was significantly different according to individuals and ranged from 2 to 11. Moreover, the average major frequency variation was identified in six sites between the cynomolgus monkeys and rhesus macaques. As with diverse SNPs in SARS-CoV-2, the values of viral titers in lungs were significantly different according to individuals and species. Our study first revealed that the genomes of SARS-CoV-2 differ according to individuals and species despite infection of the identical virus in non-human primates (NHPs). These results are important for the interpretation of longitudinal studies evaluating the evolution of the SARS-CoV-2 in human beings and development of new diagnostics, vaccine, and therapeutics targeting SARS-CoV-2.


2021 ◽  
Author(s):  
Shuyi Fang ◽  
Sheng Liu ◽  
Jikui Shen ◽  
Alex Z Lu ◽  
Yucheng Zhang ◽  
...  

AbstractSince its outbreak in December 2019, COVID-19 has caused 100,5844,555 cases and 2,167,313 deaths as of Jan 27, 2021. Comparing our previous study of SARS-CoV-2 single nucleotide variants (SNVs) before June 2020, we found out that the SNV clustering had changed considerably since June 2020. Apart from that the group SNVs represented by two non-synonymous mutations A23403G (S: D614G) and C14408T (ORF1ab: P4715L) became dominant and carried by over 95% genomes, a few emerging groups of SNVs were recognized with sharply increased monthly occurrence ratios up to 70% in November 2020. Further investigation revealed that several SNVs were strongly associated with the mortality, but they presented distinct distribution in specific countries, e.g., Brazil, USA, Saudi Arabia, India, and Italy. SNVs including G25088T, T25A, G29861T and G29864A were adopted in a regularized logistic regression model to predict the mortality status in Brazil with the AUC of 0.84. Protein structure analysis showed that the emerging subgroups of non-synonymous SNVs and those mortality-related ones in Brazil were located on protein surface area. The clashes in protein structure introduced by these mutations might in turn affect virus pathogenesis through conformation changes, leading to the difference in transmission and virulence. Particularly, we found that SNVs tended to occur in intrinsic disordered regions (IDRs) of Spike (S) and ORF1ab, suggesting a critical role of SNVs in protein IDRs to determine protein folding and immune evasion.


2020 ◽  
Author(s):  
Melissa Leija-Salazar ◽  
Alan Pittman ◽  
Katya Mokretar ◽  
Huw Morris ◽  
Anthony HV Schapira ◽  
...  

Background: Somatic mutations occur in neurons but their role in synucleinopathies is unknown. Aim: We aimed to identify disease-relevant low-level somatic single nucleotide variants (SNVs) in brains from sporadic patients with synucleinopathies and a monozygotic twin carrying LRRK2 G2019S, whose penetrance could be explained by somatic variation. Methods and Results: We included different brain regions from 26 Parkinsons disease (PD), 1 Incidental Lewy body, 3 multiple system atrophy cases and 12 controls. The whole SNCA locus and exons of other genes associated with PD and neurodegeneration were deeply sequenced using molecular barcodes to improve accuracy. We selected 21 variants at 0.33-5% allele frequencies for validation using accurate methods for somatic variant detection. Conclusions: We could not detect disease-relevant somatic SNVs, however we cannot exclude their presence at earlier stages of degeneration. Our results support that coding somatic SNVs in neurodegeneration are rare, but other types of somatic variants may hold pathological consequences in synucleinopathies.


2021 ◽  
Author(s):  
Alvaro Santibanez ◽  
Roberto Luraschi ◽  
Carlos Barrera-Avalos ◽  
Eva Vallejos-Vidal ◽  
Javiera Alarcon ◽  
...  

The COVID-19 pandemic has generated a huge challenge and threat to public health throughout the world population. Reverse transcription associated with real-time Polymerase Chain Reaction (RT-qPCR) has been the gold-standard molecular tool for diagnosis and detection of the SARS-CoV-2. Currently, it is used as the main strategy for testing, traceability, and control of positive cases For this reason, the on-top high demand for reagents has produced stock-out on several occasions and the only alternative to keep population diagnosis has been the use of different RT-qPCR kits. Therefore, we evaluate the performance of three of the commercial RT-qPCR kits currently in use for SARS-CoV-2 diagnosis in Chile, consisting in: TaqMan 2019-nCoV Assay Kit v1 (Thermo). Real-Time Fluorescent RT-PCR Kit for Detecting SARS-CoV-2 (BGI), and LightCycler Multiplex RNA Virus Master (Roche). Results of quantification cycle (Cq) and relative fluorescence units (RFU) obtained from their RT-qPCR reactions revealed important discrepancies on the total RNA required for the identification of SARS-CoV-2 genes and diagnosis. Marked differences between kits in samples with 30>Cq value< 34 was observed. Samples with positive diagnoses for Covid-19 using the Thermo Fisher kit had different results when the same samples were evaluated with Roche and BGI kits. The displacement on the Cq value for SARS-CoV-2 identification between the three different RT-qPCR kits was also evident when the presence of single nucleotide variants was evaluated in the context of genomic surveillance. Taken together, this study emphasizes the special care adjusting RT-qPCR reaction conditions of the different kits must be taken by all the laboratories before carrying out the detection of SARS-CoV-2 genes from total RNA nasopharyngeal swab (NPS) samples.


2022 ◽  
Author(s):  
Yoo-Jin Ha ◽  
Jisoo Kim ◽  
Seungseok Kang ◽  
Junhan Kim ◽  
Se-Young Jo ◽  
...  

Abstract The rapid advances in sequencing and analysis technologies have enabled the accurate detection of diverse forms of genomic variants, including germline, somatic, and mosaic mutations. However, unlike for the former two mutations, the best practices for mosaic variant calling still remain chaotic due to the technical and conceptual difficulties faced in evaluation. Here, we present our benchmark of nine feasible strategies for mosaic variant detection based on a systematically designed reference standard that mimics mosaic samples, with 390,153 control positive and 35,208,888 negative single-nucleotide variants and insertion–deletion mutations. We identified the condition-dependent strengths and weaknesses of the current strategies, instead of a single winner, regarding variant allele frequencies, variant sharing, and the usage of control samples. Moreover, feature-level investigation directs the way for immediate to prolonged improvements in mosaic variant calling. Our results will guide researchers in selecting suitable calling algorithms and suggest future strategies for developers.


2018 ◽  
Author(s):  
Riccardo Panero ◽  
Maddalena Arigoni ◽  
Martina Olivero ◽  
Francesca Cordero ◽  
Alessandro Weisz ◽  
...  

AbstractBackgroundRNA-seq represents an attractive methodology for the detection of functional genomic variants because it allows the integration of variant frequency and their expression. However, although specific statistic frameworks have been designed to detect SNVs/INDELS/gene fusions in RNA-seq data, very little has been done to understand the effect of library preparation protocols on transcript variant detection in RNA-seq data.ResultsHere, we compared RNA-seq results obtained on short reads sequencing platform with two protocols: one based on polyA+ RNA selection protocol (POLYA) and the other based on exonic regions capturing protocol (ACCESS). Our data indicate that ACCESS detects 10% more coding SNV/INDELs with respect to POLYA, making this protocol more suitable for this goal. Furthermore, ACCESS requires less reads for coding SNV detection with respect to POLYA. On the other hand, if the analysis aims at identifying SNV/INDELs also in the 5’and 3’ UTRs, POLYA is definitively the preferred method. No particular advantage comes from the usage of ACCESS or POLYA in the detection of fusion transcripts.ConclusionData show that a careful selection of the “wet” protocol adds specific features that cannot be obtained with bioinformatics alone.


Sign in / Sign up

Export Citation Format

Share Document