scholarly journals A Short Plus Long-Amplicon Based Sequencing Approach Improves Genomic Coverage and Variant Detection In the SARS-CoV-2 Genome

Author(s):  
Carlos Arana ◽  
Chaoying Liang ◽  
Matthew Brock ◽  
Bo Zhang ◽  
Jinchun Zhou ◽  
...  

High viral transmission in the COVID-19 pandemic has enabled SARS‐CoV‐2 to acquire new mutations that impact genome sequencing methods. The ARTIC.v3 primer pool that amplifies short amplicons in a multiplex-PCR reaction is one of the most widely used methods for sequencing the SARS-CoV-2 genome. We observed that some genomic intervals are poorly captured with ARTIC primers. To improve the genomic coverage and variant detection across these intervals, we designed long amplicon primers and evaluated the performance of a short (ARTIC) plus long amplicon (MRL) sequencing approach. Sequencing assays were optimized on VR-1986D-ATCC RNA followed by sequencing of nasopharyngeal swab specimens from five COVID-19 positive patients. ARTIC data covered >90% of the virus genome fraction in the positive control and four of the five patient samples. Variant analysis in the ARTIC data detected 67 mutations, including 66 single nucleotide variants (SNVs) and one deletion in ORF10. Of 66 SNVs, five were present in the spike gene, including nt22093 (M177I), nt23042 (S494P), nt23403 (D614G), nt23604 (P681H), and nt23709 (T716I). The D614G mutation is a common variant that has been shown to alter the fitness of SARS-CoV-2. Two spike protein mutations, P681H and T716I, which are represented in the B.1.1.7 lineage of SARS-CoV-2, were also detected in one patient. Long-amplicon data detected 58 variants, of which 70% were concordant with ARTIC data. Combined analysis of ARTIC +MRL data revealed 22 mutations that were either ambiguous (17) or not called at all (5) in ARTIC data due to poor sequencing coverage. For example, a common mutation in the ORF3a gene at nt25907 (G172V) was missed by the ARTIC assay. Hybrid data analysis improved sequencing coverage overall and identified 59 high confidence mutations for phylogenetic analysis. Thus, we show that while the short amplicon (ARTIC) assay provides good genomic coverage with high throughput, complementation of poorly captured intervals with long amplicon data can significantly improve SARS-CoV-2 genomic coverage and variant detection.

PLoS ONE ◽  
2022 ◽  
Vol 17 (1) ◽  
pp. e0261014
Author(s):  
Carlos Arana ◽  
Chaoying Liang ◽  
Matthew Brock ◽  
Bo Zhang ◽  
Jinchun Zhou ◽  
...  

High viral transmission in the COVID-19 pandemic has enabled SARS‐CoV‐2 to acquire new mutations that may impact genome sequencing methods. The ARTIC.v3 primer pool that amplifies short amplicons in a multiplex-PCR reaction is one of the most widely used methods for sequencing the SARS-CoV-2 genome. We observed that some genomic intervals are poorly captured with ARTIC primers. To improve the genomic coverage and variant detection across these intervals, we designed long amplicon primers and evaluated the performance of a short (ARTIC) plus long amplicon (MRL) sequencing approach. Sequencing assays were optimized on VR-1986D-ATCC RNA followed by sequencing of nasopharyngeal swab specimens from fifteen COVID-19 positive patients. ARTIC data covered 94.47% of the virus genome fraction in the positive control and patient samples. Variant analysis in the ARTIC data detected 217 mutations, including 209 single nucleotide variants (SNVs) and eight insertions & deletions. On the other hand, long-amplicon data detected 156 mutations, of which 80% were concordant with ARTIC data. Combined analysis of ARTIC + MRL data improved the genomic coverage to 97.03% and identified 214 high confidence mutations. The combined final set of 214 mutations included 203 SNVs, 8 deletions and 3 insertions. Analysis showed 26 SARS-CoV-2 lineage defining mutations including 4 known variants of concern K417N, E484K, N501Y, P618H in spike gene. Hybrid analysis identified 7 nonsynonymous and 5 synonymous mutations across the genome that were either ambiguous or not called in ARTIC data. For example, G172V mutation in the ORF3a protein and A2A mutation in Membrane protein were missed by the ARTIC assay. Thus, we show that while the short amplicon (ARTIC) assay provides good genomic coverage with high throughput, complementation of poorly captured intervals with long amplicon data can significantly improve SARS-CoV-2 genomic coverage and variant detection.


Author(s):  
Sen Zhao ◽  
Oleg Agafonov ◽  
Abdulrahman Azab ◽  
Tomasz Stokowy ◽  
Eivind Hovig

AbstractAdvances in next-generation sequencing technology has enabled whole genome sequencing (WGS) to be widely used for identification of causal variants in a spectrum of genetic-related disorders, and provided new insight into how genetic polymorphisms affect disease phenotypes. The development of different bioinformatics pipelines has continuously improved the variant analysis of WGS data, however there is a necessity for a systematic performance comparison of these pipelines to provide guidance on the application of WGS-based scientific and clinical genomics. In this study, we evaluated the performance of three variant calling pipelines (GATK, DRAGEN™ and DeepVariant) using Genome in a Bottle Consortium, “synthetic-diploid” and simulated WGS datasets. DRAGEN™ and DeepVariant show a better accuracy in SNPs and indels calling, with no significant differences in their F1-score. DRAGEN™ platform offers accuracy, flexibility and a highly-efficient running speed, and therefore superior advantage in the analysis of WGS data on a large scale. The combination of DRAGEN™ and DeepVariant also provides a good balance of accuracy and efficiency as an alternative solution for germline variant detection in further applications. Our results facilitate the standardization of benchmarking analysis of bioinformatics pipelines for reliable variant detection, which is critical in genetics-based medical research and clinical application.


Author(s):  
E. V. Naidenova ◽  
V. G. Dedkov ◽  
D. A. Agafonov ◽  
A. M. Senichkina ◽  
M. V. Safonova ◽  
...  

The aim of the study was to develop and assess the efficacy of a method for Lujo virus RNA detection in clinical and biological samples using one-step real-time RT-PCR.Materials and methods. In order to select the conservative regions of the genome, we utilized the available in GenBank database Lujo virus sequences (https://www.ncbi. nlm.nih.gov/genbank) aligned in BioEdit 7.2.5 software package ( (IbisBiosciences, USA). To conduct one-round RTPCR, reverse transcriptase and TaqF-polimerase were used. Recombinant Escherichia coli strain, XL1-Blue, containing pGEM-T plasmid with inserted synthetically-generated fragment of the virus genome, was produced to make positive control sample (PCS). Constructed recombinant plasmids were used for creating RNA-containing PCS with protective protein shell of MS2-phage. Determination of specificity of the developed method was performed with the help of control panel of RNA and DNA of 23 viral strains related to 10 families; the sensitivity – the panel of biological samples artificially contaminated with PCS. Further testing was carried out at the premises of laboratory of the Russian-Guinean Center for Epidemiology and Prevention of Infectious Diseases (Kindia, Republic of Guinea) on 265 blood sera from practically healthy persons, 110 blood sera of cattle, 83 suspensions of ticks, and 165 suspensions of organs of small mammals collected in the territory of Guinea.Results and discussion. Two conservative polymerase gene fragments have been chosen as targets for Lujo virus RNA detection using RT-PCR. The combination of primers and probes has been experimentally selected, optimum composition of reaction mixture for PCR and mode of RT-PCR set-up established, as well as control samples C+, internal control, positive control sample developed. Sensitivity of the proposed method is 5·103  GE /ml, specificity – 100 %. 


Author(s):  
Jack Kuipers ◽  
Aashil A Batavia ◽  
Kim Philipp Jablonski ◽  
Fritz Bayer ◽  
Nico Borgsmüller ◽  
...  

AbstractSARS-CoV-2, the virus responsible for the current COVID-19 pandemic, is evolving into different genetic variants by accumulating mutations as it spreads globally. In addition to this diversity of consensus genomes across patients, RNA viruses can also display genetic diversity within individual hosts, and co-existing viral variants may affect disease progression and the success of medical interventions. To systematically examine the intra-patient genetic diversity of SARS-CoV-2, we processed a large cohort of 3939 publicly-available deeply sequenced genomes with specialised bioinformatics software, along with 749 recently sequenced samples from Switzerland. We found that the distribution of diversity across patients and across genomic loci is very unbalanced with a minority of hosts and positions accounting for much of the diversity. For example, the D614G variant in the Spike gene, which is present in the consensus sequences of 67.4% of patients, is also highly diverse within hosts, with 29.7% of the public cohort being affected by this coexistence and exhibiting different variants. We also investigated the impact of several technical and epidemiological parameters on genetic heterogeneity and found that age, which is known to be correlated with poor disease outcomes, is a significant predictor of viral genetic diversity.Author SummarySince it arose in late 2019, the new coronavirus (SARS-CoV-2) behind the COVID-19 pandemic has mutated and evolved during its global spread. Individual patients may host different versions, or variants, of the virus, hallmarked by different mutations. We examine the diversity of genetic variants coexisting within patients across a cohort of 3939 publicly accessible samples and 749 recently sequenced samples from Switzerland. We find that a small number of patients carry most of the diversity, and that patients with more diversity tend to be older. We also find that most of the diversity is concentrated in certain regions and positions of the virus genome. In particular, we find that a variant reported to increase infectivity is among the most diverse positions. Our study provides a large-scale survey of within-patient diversity of the SARS-CoV-2 genome.


2021 ◽  
Author(s):  
Vincenzo Forgetta ◽  
Lai Jiang ◽  
Nicholas Vulpescu ◽  
Meganq Hogan ◽  
Siyuan Chen ◽  
...  

Abstract Drug development and biological discovery require effective strategies to map existing genetic associations to causal genes. To approach this problem, we selected 12 common diseases and quantitative traits for which highly powered genome-wide association studies (GWAS) were available. For each disease or trait, we systematically curated positive control gene sets from Mendelian forms of the disease and from targets of medicines used for disease treatment. We found that these positive control genes were highly enriched in proximity of GWAS-associated single nucleotide variants (SNVs). We then performed quantitative assessment of the contribution of commonly used genomic features, including open chromatin maps, expression quantitative trait loci (eQTL), and chromatin conformation data. Using these features, we trained and validated an Effector Index (Ei), to map target genes for these 12 common diseases and traits. Ei demonstrated high predictive performance, both with cross-validation on the training set, and an independently derived set for type 2 diabetes. Key predictive features included coding or transcript altering SNVs, distance to gene, and open chromatin-based metrics. This work outlines a simple, understandable approach to prioritize genes at GWAS loci for functional follow-up and drug development, and provides a systematic strategy for prioritization of GWAS target genes.


2021 ◽  
Vol 12 ◽  
Author(s):  
Eun-Ha Hwang ◽  
Hoyin Chung ◽  
Green Kim ◽  
Hanseul Oh ◽  
You Jung An ◽  
...  

Recently, newly emerging variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) have been continuously reported worldwide. However, the precise evaluation of SARS-CoV-2 microevolution in host is very limited because the exact genetic information of infected virus could not be acquired in human researches. In this report, we performed deep sequencing for seed virus and SARS-CoV-2 isolated in eight cynomolgus and rhesus macaques at 3 days postinoculation and evaluated single-nucleotide polymorphisms (SNPs) in SARS-CoV-2 by variant analysis. A total of 69 single-nucleotide variants (SNVs) were present in the 5′-untranslated region (UTR), 3′-UTR, ORF1ab, S, ORF3a, ORF8, and N genes of the seed virus passaged in VERO cells. Between those present on the seed virus and those on each SARS-CoV-2 isolated from the lungs of the macaques, a total of 29 variants was identified in 4 coding proteins (ORF1ab, S, ORF3a, and N) and non-coding regions (5′- and 3′-UTR). Variant number was significantly different according to individuals and ranged from 2 to 11. Moreover, the average major frequency variation was identified in six sites between the cynomolgus monkeys and rhesus macaques. As with diverse SNPs in SARS-CoV-2, the values of viral titers in lungs were significantly different according to individuals and species. Our study first revealed that the genomes of SARS-CoV-2 differ according to individuals and species despite infection of the identical virus in non-human primates (NHPs). These results are important for the interpretation of longitudinal studies evaluating the evolution of the SARS-CoV-2 in human beings and development of new diagnostics, vaccine, and therapeutics targeting SARS-CoV-2.


2021 ◽  
Author(s):  
Julia Alcoba-Florez ◽  
Jose M. Lorenzo-Salazar ◽  
Helena Gil-Campesino ◽  
Antonio Íñigo-Campos ◽  
Diego García-Martínez de Artola ◽  
...  

AbstractStarting in December 2020, a sharp increase of COVID-19 cases occurred in Tenerife compared to the rest of the Canary Islands (Spain). Because of the direct touristic connections between Tenerife and the UK, and the rapid transmission and dominance of the SARS-CoV-2 B.1.1.7 variant of concern (VOC-202012/01) by the end of November 2020 in South England, here we measured the proportion of B.1.1.7 cases occurring between the 18th of December 2020 and the 25th of February 2021. Out of the 2,091 COVID-19 positive nasopharyngeal swab samples assessed, 226 showed a spike gene target failure (SGTF). Subsequent viral genome sequencing further confirmed that 93.2% of them corresponded to the B.1.1.7 lineage. Furthermore, a rapid increase in the proportion of SGTF variants was detected in up to 10.7% of positive cases during the Christmas season despite stricter measures for containing the transmission were imposed in Tenerife in the period. These results support the local transmission of SARS-CoV-2 B.1.1.7 lineage in Tenerife since late December 2020 although it is not yet dominant.


Author(s):  
Dr. Debdutta Bhattacharya ◽  
Dr. Debaprasad Parai ◽  
Usha Kiran Rout ◽  
Rashmi Ranjan Nanda ◽  
Dr. Srikanta Kanungo ◽  
...  

Background It is almost nine months, still there is no sign to stop the spreading of the COVID-19 pandemic. Rapid and early detection of the virus is the master key to cease the rapid spread and break the human transmission chain. There are very few studies in search of an alternate and convenient diagnostic tool which can substitute nasopharyngeal swab (NPS) specimen for detection of SARS-CoV-2. We aimed to analyse the comparison and agreement between the feasibility of using the saliva in comparison to NPS for diagnosis of SARS-CoV-2. Methods A total number of 74 patients were enrolled for this study. We analysed and compared the NPS and saliva specimen collected within 48 h after the symptom onset. We used real time quantitative polymerase chain reaction (RT-qPCR), gene sequencing for the detection and determination SARS-CoV-2 specific genes. Phylogenetic tree was constructed to establish the isolation of viral RNA from saliva. We use Bland-Altman model to identify the agreement between two sampling methods. Findings This study shows a lower CT mean value for the detection of SARS-CoV-2 ORF1 gene (27.07; 95% CI, 25.62 to 28.52) in saliva methods than that of NPS (28.24; 95% CI, 26.62 to 29.85) sampling method. Bland-Altman analysis produces relatively smaller bias and high agreement between these specimen tools. Phylogenetic analysis with the RdRp and Spike gene confirmed the presence of SARS-CoV-2 in the saliva samples. Interpretation: In conclusion, our study highlights that saliva represents a promising tool in COVID-19 diagnosis and would reduce the exposure risk of frontline health workers which is one of biggest concern in primary healthcare settings.


2020 ◽  
Author(s):  
Melissa Leija-Salazar ◽  
Alan Pittman ◽  
Katya Mokretar ◽  
Huw Morris ◽  
Anthony HV Schapira ◽  
...  

Background: Somatic mutations occur in neurons but their role in synucleinopathies is unknown. Aim: We aimed to identify disease-relevant low-level somatic single nucleotide variants (SNVs) in brains from sporadic patients with synucleinopathies and a monozygotic twin carrying LRRK2 G2019S, whose penetrance could be explained by somatic variation. Methods and Results: We included different brain regions from 26 Parkinsons disease (PD), 1 Incidental Lewy body, 3 multiple system atrophy cases and 12 controls. The whole SNCA locus and exons of other genes associated with PD and neurodegeneration were deeply sequenced using molecular barcodes to improve accuracy. We selected 21 variants at 0.33-5% allele frequencies for validation using accurate methods for somatic variant detection. Conclusions: We could not detect disease-relevant somatic SNVs, however we cannot exclude their presence at earlier stages of degeneration. Our results support that coding somatic SNVs in neurodegeneration are rare, but other types of somatic variants may hold pathological consequences in synucleinopathies.


2021 ◽  
Author(s):  
Alvaro Santibanez ◽  
Roberto Luraschi ◽  
Carlos Barrera-Avalos ◽  
Eva Vallejos-Vidal ◽  
Javiera Alarcon ◽  
...  

The COVID-19 pandemic has generated a huge challenge and threat to public health throughout the world population. Reverse transcription associated with real-time Polymerase Chain Reaction (RT-qPCR) has been the gold-standard molecular tool for diagnosis and detection of the SARS-CoV-2. Currently, it is used as the main strategy for testing, traceability, and control of positive cases For this reason, the on-top high demand for reagents has produced stock-out on several occasions and the only alternative to keep population diagnosis has been the use of different RT-qPCR kits. Therefore, we evaluate the performance of three of the commercial RT-qPCR kits currently in use for SARS-CoV-2 diagnosis in Chile, consisting in: TaqMan 2019-nCoV Assay Kit v1 (Thermo). Real-Time Fluorescent RT-PCR Kit for Detecting SARS-CoV-2 (BGI), and LightCycler Multiplex RNA Virus Master (Roche). Results of quantification cycle (Cq) and relative fluorescence units (RFU) obtained from their RT-qPCR reactions revealed important discrepancies on the total RNA required for the identification of SARS-CoV-2 genes and diagnosis. Marked differences between kits in samples with 30>Cq value< 34 was observed. Samples with positive diagnoses for Covid-19 using the Thermo Fisher kit had different results when the same samples were evaluated with Roche and BGI kits. The displacement on the Cq value for SARS-CoV-2 identification between the three different RT-qPCR kits was also evident when the presence of single nucleotide variants was evaluated in the context of genomic surveillance. Taken together, this study emphasizes the special care adjusting RT-qPCR reaction conditions of the different kits must be taken by all the laboratories before carrying out the detection of SARS-CoV-2 genes from total RNA nasopharyngeal swab (NPS) samples.


Sign in / Sign up

Export Citation Format

Share Document