sequencing coverage
Recently Published Documents


TOTAL DOCUMENTS

50
(FIVE YEARS 31)

H-INDEX

7
(FIVE YEARS 3)

2022 ◽  
Author(s):  
Hosoon Choi ◽  
Munok Hwang ◽  
Dhammika Navarathna ◽  
Jing Xu ◽  
Janell Lukey ◽  
...  

The whole genomic sequencing (WGS) of SARS-CoV-2 has been performed extensively and is playing a crucial role in fighting against COVID-19 pandemic. Obtaining sufficient WGS data from clinical samples is often challenging especially from the samples with low viral load. We evaluated two SARS-CoV-2 sequencing protocols for their efficiency/accuracy and limitations. Sequence coverage of >95% was obtained by Swift normalase amplicon SARS-CoV-2 panels (SNAP) protocol for all the samples with Ct ≤ 35 and by COVIDSeq protocol for 97% of samples with Ct ≤ 30. Sample RNA quantitation obtained using digital PCR provided more precise cutoff values. The quantitative digital PCR cutoff values for obtaining 95% coverage are 10.5 copies/μL for SNAP protocol and 147 copies/μL for COVIDSeq protocol. Combining FASTQ files obtained from 2 protocols improved the outcome of sequence analysis by compensating for missing amplicon regions. This process resulted in an increase of sequencing coverage and lineage call precision.


2021 ◽  
Author(s):  
Sumit Tarafder ◽  
Mazharul Islam ◽  
Swakkhar Shatabda ◽  
Atif Rahman

Motivation: Advances in sequencing technologies have led to sequencing of genomes of a multitude of organisms. However, draft genomes of many of these organisms contain a large number of gaps due to repeats in genomes, low sequencing coverage and limitations in sequencing technologies. Although there exist several tools for filling gaps, many of these do not utilize all information relevant to gap filling. Results: Here, we present a probabilistic method for filling gaps in draft genome assemblies using second generation reads based on a generative model for sequencing that takes into account information on insert sizes and sequencing errors. Our method is based on the expectation-maximization(EM) algorithm unlike the graph based methods adopted in the literature. Experiments on real biological datasets show that this novel approach can fill up large portions of gaps with small number of errors and misassemblies compared to other state of the art gap filling tools. Availability and Implementation:The method is implemented using C++ in a software named "Filling Gaps by Iterative Read Distribution (Figbird)", which is available at: https://github.com/SumitTarafder/Figbird.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Tao Jiang ◽  
Shiqi Liu ◽  
Shuqi Cao ◽  
Yadong Liu ◽  
Zhe Cui ◽  
...  

Abstract Background With the rapid development of long-read sequencing technologies, it is possible to reveal the full spectrum of genetic structural variation (SV). However, the expensive cost, finite read length and high sequencing error for long-read data greatly limit the widespread adoption of SV calling. Therefore, it is urgent to establish guidance concerning sequencing coverage, read length, and error rate to maintain high SV yields and to achieve the lowest cost simultaneously. Results In this study, we generated a full range of simulated error-prone long-read datasets containing various sequencing settings and comprehensively evaluated the performance of SV calling with state-of-the-art long-read SV detection methods. The benchmark results demonstrate that almost all SV callers perform better when the long-read data reach 20× coverage, 20 kbp average read length, and approximately 10–7.5% or below 1% error rates. Furthermore, high sequencing coverage is the most influential factor in promoting SV calling, while it also directly determines the expensive costs. Conclusions Based on the comprehensive evaluation results, we provide important guidelines for selecting long-read sequencing settings for efficient SV calling. We believe these recommended settings of long-read sequencing will have extraordinary guiding significance in cutting-edge genomic studies and clinical practices.


2021 ◽  
Author(s):  
Hongjie Yu ◽  
Zhiyuan Chen ◽  
Andrew Azman ◽  
Xinhua Chen ◽  
Junyi Zou ◽  
...  

Abstract Genomic surveillance has shaped our understanding of SARS-CoV-2 variants, which have proliferated globally in 2021.We collected country-specific data on SARS-CoV-2 genomic surveillance, sequencing capabilities, public genomic data from multiple public repositories, and aggregated publicly available variant data. Then, different proxies were used to estimate the sequencing coverage and public availability extent of genomic data, in addition to describing the global dissemination of variants. We found that the COVID-19 global epidemic clearly featured increasing circulation of Alpha since the start of 2021, which was rapidly replaced by the Delta variant starting around May 2021. SARS-CoV-2 genomic surveillance and sequencing availability varied markedly across countries, with 63 countries performing routine genomic surveillance and 79 countries with high availability of SARS-CoV-2 sequencing. We also observed a marked heterogeneity of sequenced coverage across regions and countries. Across different variants, 21-46% of countries with explicit reporting on variants shared less than half of their variant sequences in public repositories. Our findings indicated an urgent need to expand sequencing capacity of virus isolates, enhance the sharing of sequences, the standardization of metadata files, and supportive networks for countries with no sequencing capability.


2021 ◽  
Author(s):  
Anne E Watt ◽  
Norelle L Sherry ◽  
Patiyan Andersson ◽  
Courtney R Lane ◽  
Sandra Johnson ◽  
...  

Background COVID-19 has resulted in many infections in healthcare workers (HCWs) globally. We performed state-wide SARS-CoV-2 genomic epidemiological investigations to identify HCW transmission dynamics and provide recommendations to optimise healthcare system preparedness for future outbreaks. Methods Genome sequencing was attempted on all COVID-19 cases in Victoria, Australia. We combined genomic and epidemiologic data to investigate the source of HCW infections across multiple healthcare facilities (HCFs) in the state. Phylogenetic analysis and fine-scale hierarchical clustering were performed for the entire Victorian dataset including community and healthcare cases. Facilities provided standardised epidemiological data and putative transmission links. Findings Between March and October 2020, approximately 1,240 HCW COVID-19 infection cases were identified; 765 are included here. Genomic sequencing was successful for 612 (80%) cases. Thirty-six investigations were undertaken across 12 HCFs. Genomic analysis revealed that multiple introductions of COVID-19 into facilities (31/36) were more common than single introductions (5/36). Major contributors to HCW acquisitions included mobility of staff and patients between wards and facilities, and characteristics and behaviours of individual patients including super-spreading events. Key limitations at the HCF level were identified. Interpretation Genomic epidemiological analyses enhanced understanding of HCW infections, revealing unsuspected clusters and transmission networks. Combined analysis of all HCWs and patients in a HCF should be conducted, supported by high rates of sequencing coverage for all cases in the population. Established systems for integrated genomic epidemiological investigations in healthcare settings will improve HCW safety in future pandemics.


2021 ◽  
Author(s):  
Zhiyuan Chen ◽  
Andrew S. Azman ◽  
Xinhua Chen ◽  
Junyi Zou ◽  
Yuyang Tian ◽  
...  

AbstractBackgroundGenomic surveillance has shaped our understanding of SARS-CoV-2 variants, which have proliferated globally in 2021. Characterizing global genomic surveillance, sequencing coverage, the extent of publicly available genomic data coupled with traditional epidemiologic data can provide evidence to inform SARS-CoV-2 surveillance and control strategies.MethodsWe collected country-specific data on SARS-CoV-2 genomic surveillance, sequencing capabilities, public genomic data, and aggregated publicly available variant data. We divided countries into three levels of genomic surveillance and sequencing availability based on predefined criteria. We downloaded the merged and deduplicated SARS-CoV-2 sequences from multiple public repositories, and used different proxies to estimate the sequencing coverage and public availability extent of genomic data, in addition to describing the global dissemination of variants.FindingsSince the start of 2021, the COVID-19 global epidemic clearly featured increasing circulation of Alpha, which was rapidly replaced by the Delta variant starting around May 2021 and reaching a global prevalence of 96.6% at the end of July 2021. SARS-CoV-2 genomic surveillance and sequencing availability varied markedly across countries, with 63 countries performing routine genomic surveillance and 79 countries with high availability of SARS-CoV-2 sequencing. Less than 3.5% of confirmed SARS-CoV-2 infections were sequenced globally since September 2020, with the lowest sequencing coverage in the WHO regions of Eastern Mediterranean, South East Asia, and Africa. Across different variants, 28-52% of countries with explicit reporting on variants shared less than half of their variant sequences in public repositories. More than 60% of demographic and 95% of clinical data were absent in GISAID metadata accompanying sequences.InterpretationOur findings indicated an urgent need to expand sequencing capacity of virus isolates, enhance the sharing of sequences, the standardization of metadata files, and supportive networks for countries with no sequencing capability.Research in contextEvidence before this studyOn September 3, 2021, we searched PubMed for articles in any language published after January 1, 2020, using the following search terms: (“COVID-19” OR “SARS-CoV-2”) AND (“Global” OR “Region”) AND (“genomic surveillance” OR “sequencing” OR “spread”). Among 43 papers identified, few papers discussed the global diversity in genomic surveillance, sequencing, public availability of genomic data, as well as the global spread of SARS-CoV-2 variants. A paper from Furuse employed the publicly GISAID data to evaluate the SARS-CoV-2 sequencing effort by country from the perspectives of “fraction”, “timeliness”, and “openness”. Another viewpoint paper by Case Western Reserve University’s team discussed the impediments of genomic surveillance in several countries during the COVID-19 pandemic. The paper as reported by Campbell and colleagues used the GISAID data to present the global spread and estimated transmissibility of recently emerged SARS-CoV-2 variants. We also found several studies that reported the country-level genomic surveillance and spread of variants. To our knowledge, no research has quantitatively depicted the global SARS-CoV-2 genomic surveillance, sequencing ability, and public availability extent of genomic data.Added value of this studyThis study collected country-specific data on SARS-CoV-2 genomic surveillance, sequencing capabilities, public genomic data, and aggregated publicly available variant data as of 20 August 2021. We found that genomic surveillance strategies and sequencing availability is globally diverse. Less than 3.5% of confirmed SARS-CoV-2 infections were sequenced globally since September 2020. Our analysis of publicly deposited SARS-CoV-2 sequences and officially reported number of variants implied that the public availability extent of genomic data is low in some countries, and more than 60% of demographic and 95% of clinical data were absent in GISAID metadata accompanying sequences. We also described the pandemic dynamics shaped by VOCs.Implications of all the available evidenceOur study provides a landscape for global sequencing coverage and public availability extent of sequences, as well as the evidence for rapid spread of SRAS-CoV-2 variants. The pervasive spread of Alpha and Delta variants further highlights the threat of SARS-CoV-2 mutations despite the availability of vaccines in many countries. It raised an urgent need to do more work on defining the ideal sampling schemes for different purposes (e.g., identifying new variants) with an additional call to share these data in public repositories to allow for further rapid scientific discovery.


2021 ◽  
Author(s):  
Melivoia Rapti ◽  
Jenny Meylan Merlini ◽  
Emmanuelle Ranza ◽  
Stylianos E. Antonarakis ◽  
Federico A. Santoni

CoverageMaster (CoM) is a Copy Number Variation (CNV) calling algorithm based on depth-of-coverage maps designed to detect CNVs of any size in exome (WES) and genome (WGS) data. The core of the algorithm is the compression of sequencing coverage data in a multiscale Wavelet space and the analysis through an iterative Hidden Markov Model (HMM). CoM processes WES and WGS data at nucleotide scale resolution and accurately detect and visualize full size range CNVs, including single or partial exon deletions and duplications. The results obtained with this approach support the possibility for coverage-based CNV callers to replace probe-based methods such array CGH and MLPA in the near future.


2021 ◽  
Author(s):  
Carlos Arana ◽  
Chaoying Liang ◽  
Matthew Brock ◽  
Bo Zhang ◽  
Jinchun Zhou ◽  
...  

High viral transmission in the COVID-19 pandemic has enabled SARS‐CoV‐2 to acquire new mutations that impact genome sequencing methods. The ARTIC.v3 primer pool that amplifies short amplicons in a multiplex-PCR reaction is one of the most widely used methods for sequencing the SARS-CoV-2 genome. We observed that some genomic intervals are poorly captured with ARTIC primers. To improve the genomic coverage and variant detection across these intervals, we designed long amplicon primers and evaluated the performance of a short (ARTIC) plus long amplicon (MRL) sequencing approach. Sequencing assays were optimized on VR-1986D-ATCC RNA followed by sequencing of nasopharyngeal swab specimens from five COVID-19 positive patients. ARTIC data covered >90% of the virus genome fraction in the positive control and four of the five patient samples. Variant analysis in the ARTIC data detected 67 mutations, including 66 single nucleotide variants (SNVs) and one deletion in ORF10. Of 66 SNVs, five were present in the spike gene, including nt22093 (M177I), nt23042 (S494P), nt23403 (D614G), nt23604 (P681H), and nt23709 (T716I). The D614G mutation is a common variant that has been shown to alter the fitness of SARS-CoV-2. Two spike protein mutations, P681H and T716I, which are represented in the B.1.1.7 lineage of SARS-CoV-2, were also detected in one patient. Long-amplicon data detected 58 variants, of which 70% were concordant with ARTIC data. Combined analysis of ARTIC +MRL data revealed 22 mutations that were either ambiguous (17) or not called at all (5) in ARTIC data due to poor sequencing coverage. For example, a common mutation in the ORF3a gene at nt25907 (G172V) was missed by the ARTIC assay. Hybrid data analysis improved sequencing coverage overall and identified 59 high confidence mutations for phylogenetic analysis. Thus, we show that while the short amplicon (ARTIC) assay provides good genomic coverage with high throughput, complementation of poorly captured intervals with long amplicon data can significantly improve SARS-CoV-2 genomic coverage and variant detection.


2021 ◽  
Vol 22 (11) ◽  
pp. 5802
Author(s):  
Jiayin Wang ◽  
Liubin Chen ◽  
Xuanping Zhang ◽  
Yao Tong ◽  
Tian Zheng

Open chromatin regions (OCRs) are special regions of the human genome that can be accessed by DNA regulatory elements. Several studies have reported that a series of OCRs are associated with mechanisms involved in human diseases, such as cancers. Identifying OCRs using ATAC-seq or DNase-seq is often expensive. It has become popular to detect OCRs from plasma cell-free DNA (cfDNA) sequencing data, because both the fragmentation modes of cfDNA and the sequencing coverage in OCRs are significantly different from those in other regions. However, it is a challenging computational problem to accurately detect OCRs from plasma cfDNA-seq data, as multiple factors—e.g., sequencing and mapping bias, insufficient read depth, etc.—often mislead the computational model. In this paper, we propose a novel bioinformatics pipeline, OCRDetector, for detecting OCRs from whole-genome cfDNA sequencing data. The pipeline calculates the window protection score (WPS) waveform and the cfDNA sequencing coverage. To validate the proposed pipeline, we compared the percentage overlap of our OCRs with those obtained by other methods. The experimental results show that 81% of the TSS regions of housekeeping genes are detected, and our results have obvious tissue specificity. In addition, the overlap percentage between our OCRs and the high-confidence OCRs obtained by ATAC-seq or DNase-seq is greater than 70%.


Diversity ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 151
Author(s):  
Roberto Carlos Frias-Soler ◽  
Lilian Villarín Pildaín ◽  
Michael Wink ◽  
Franz Bairlein

This work presents an updated and more complete version of the transcriptome of a long-distance migrant, the Northern Wheatear (Oenanthe oenanthe). The improved transcriptome was produced from the independent mRNA sequencing of adipose tissue, brain, intestines, liver, skin, and muscle tissues sampled during the autumnal migratory season. This new transcriptome has better sequencing coverage and is more representative of the species’ migratory phenotype. We assembled 20,248 transcripts grouped into 16,430 genes, from which 78% were successfully annotated. All the standard assembly quality parameters were improved in the second transcriptome version.


Sign in / Sign up

Export Citation Format

Share Document