scholarly journals unCOVERApp: an interactive graphical application for clinical assessment of sequence coverage at the base-pair level

2020 ◽  
Author(s):  
Emanuela Iovino ◽  
Marco Seri ◽  
Tommaso Pippucci

AbstractMotivationNext Generation Sequencing (NGS) is increasingly adopted in the clinical practice largely thanks to concurrent advancements in bioinformatic tools for variant detection and annotation. Despite improvements in available approaches, the need to assess sequencing quality down to the base-pair level still poses challenges for diagnostic accuracy. One of the most popular quality parameters of diagnostic NGS is the percentage of targeted bases characterized by low depth of coverage (DoC). These regions potentially hide a clinically-relevant variant, but no annotation is usually returned for them.However, visualizing low-DoC data with their potential functional and clinical consequences may be useful to prioritize inspection of specific regions before re-sequencing all coverage gaps or making assertions about completeness of the diagnostic test.To meet this need we have developed unCOVERApp, an interactive application for graphical inspection and clinical annotation of low-DoC genomic regions containing genes.ResultsunCOVERApp is a suite of graphical and statistical tools to support clinical assessment of low-DoC regions. Its interactive plots allow to display gene sequence coverage down to the base-pair level, and functional and clinical annotations of sites below a user-defined DoC threshold can be downloaded in a user-friendly spreadsheet format. Moreover, unCOVERApp provides a simple statistical framework to evaluate if DoC is sufficient for the detection of somatic variants, where the usual 20x DoC threshold used for germline variants is not adequate. A maximum credible allele frequency calculator is also available allowing users to set allele frequency cut-offs based on assumptions about the genetic architecture of the disease instead of applying a general one (e.g. 5%). In conclusion, unCOVERApp is an original tool designed to identify sites of potential clinical interest that may be hidden in diagnostic sequencing data.AvailabilityunCOVERApp is a freely available application written in R and developed with Shiny packages and available in GitHub.

Author(s):  
Emanuela Iovino ◽  
Marco Seri ◽  
Tommaso Pippucci

Abstract Motivation Next-generation sequencing is increasingly adopted in the clinical practice largely thanks to concurrent advancements in bioinformatic tools for variant detection and annotation. However, the need to assess sequencing quality at the base-pair level still poses challenges for diagnostic accuracy. One of the most popular quality parameters is the percentage of targeted bases characterized by low depth of coverage (DoC). These regions potentially ‘hide’ clinically relevant variants, but no annotation is usually returned with them. However, visualizing low-DoC data with their potential functional and clinical consequences may be useful to prioritize inspection of specific regions before re-sequencing all coverage gaps or making assertions about completeness of the diagnostic test. To meet this need, we have developed unCOVERApp, an interactive application for graphical inspection and clinical annotation of low-DoC genomic regions containing genes. Results unCOVERApp interactive plots allow to display gene sequence coverage down to the base-pair level, and functional and clinical annotations of sites below a user-defined DoC threshold can be downloaded in a user-friendly spreadsheet format. Moreover, unCOVERApp provides a simple statistical framework to evaluate if DoC is sufficient for the detection of somatic variants. A maximum credible allele frequency calculator is also available allowing users to set allele frequency cut-offs based on assumptions about the genetic architecture of the disease. In conclusion, unCOVERApp is an original tool designed to identify sites of potential clinical interest that may be ‘hidden’ in diagnostic sequencing data. Availabilityand implementation unCOVERApp is a free application developed with Shiny packages and available in GitHub (https://github.com/Manuelaio/uncoverappLib). Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Takumi Miura ◽  
Satoshi Yasuda ◽  
Yoji Sato

Abstract Background Next-generation sequencing (NGS) has profoundly changed the approach to genetic/genomic research. Particularly, the clinical utility of NGS in detecting mutations associated with disease risk has contributed to the development of effective therapeutic strategies. Recently, comprehensive analysis of somatic genetic mutations by NGS has also been used as a new approach for controlling the quality of cell substrates for manufacturing biopharmaceuticals. However, the quality evaluation of cell substrates by NGS largely depends on the limit of detection (LOD) for rare somatic mutations. The purpose of this study was to develop a simple method for evaluating the ability of whole-exome sequencing (WES) by NGS to detect mutations with low allele frequency. To estimate the LOD of WES for low-frequency somatic mutations, we repeatedly and independently performed WES of a reference genomic DNA using the same NGS platform and assay design. LOD was defined as the allele frequency with a relative standard deviation (RSD) value of 30% and was estimated by a moving average curve of the relation between RSD and allele frequency. Results Allele frequencies of 20 mutations in the reference material that had been pre-validated by droplet digital PCR (ddPCR) were obtained from 5, 15, 30, or 40 G base pair (Gbp) sequencing data per run. There was a significant association between the allele frequencies measured by WES and those pre-validated by ddPCR, whose p-value decreased as the sequencing data size increased. By this method, the LOD of allele frequency in WES with the sequencing data of 15 Gbp or more was estimated to be between 5 and 10%. Conclusions For properly interpreting the WES data of somatic genetic mutations, it is necessary to have a cutoff threshold of low allele frequencies. The in-house LOD estimated by the simple method shown in this study provides a rationale for setting the cutoff.


2003 ◽  
Vol 68 (2) ◽  
Author(s):  
Boris Mergell ◽  
Mohammad R. Ejtehadi ◽  
Ralf Everaers
Keyword(s):  

2021 ◽  
Author(s):  
Ryan O Schenck ◽  
Gabriel Brosula ◽  
Jeffrey West ◽  
Simon Leedham ◽  
Darryl Shibata ◽  
...  

Gattaca provides the first base-pair resolution artificial genomes for tracking somatic mutations within agent based modeling. Through the incorporation of human reference genomes, mutational context, sequence coverage/error information Gattaca is able to realistically provide comparable sequence data for in-silico comparative evolution studies with human somatic evolution studies. This user-friendly method, incorporated into each in-silico cell, allows us to fully capture somatic mutation spectra and evolution.


2018 ◽  
Author(s):  
Susanne Tilk ◽  
Alan Bergland ◽  
Aaron Goodman ◽  
Paul Schmidt ◽  
Dmitri Petrov ◽  
...  

AbstractEvolve-and-resequence (E+R) experiments leverage next-generation sequencing technology to track the allele frequency dynamics of populations as they evolve. While previous work has shown that adaptive alleles can be detected by comparing frequency trajectories from many replicate populations, this power comes at the expense of high-coverage (>100x) sequencing of many pooled samples, which can be cost-prohibitive. Here, we show that accurate estimates of allele frequencies can be achieved with very shallow sequencing depths (<5x) via inference of known founder haplotypes in small genomic windows. This technique can be used to efficiently estimate frequencies for any number of bi-allelic SNPs in populations of any model organism founded with sequenced homozygous strains. Using both experimentally-pooled and simulated samples of Drosophila melanogaster, we show that haplotype inference can improve allele frequency accuracy by orders of magnitude for up to 50 generations of recombination, and is robust to moderate levels of missing data, as well as different selection regimes. Finally, we show that a simple linear model generated from these simulations can predict the accuracy of haplotype-derived allele frequencies in other model organisms and experimental designs. To make these results broadly accessible for use in E+R experiments, we introduce HAF-pipe, an open-source software tool for calculating haplotype-derived allele frequencies from raw sequencing data. Ultimately, by reducing sequencing costs without sacrificing accuracy, our method facilitates E+R designs with higher replication and resolution, and thereby, increased power to detect adaptive alleles.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Ting Hon ◽  
Kristin Mars ◽  
Greg Young ◽  
Yu-Chih Tsai ◽  
Joseph W. Karalius ◽  
...  

AbstractThe PacBio® HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10–25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.


2011 ◽  
Vol 27 (14) ◽  
pp. 1922-1928 ◽  
Author(s):  
Huanying Ge ◽  
Kejun Liu ◽  
Todd Juan ◽  
Fang Fang ◽  
Matthew Newman ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document