scholarly journals unCOVERApp: an interactive graphical application for clinical assessment of sequence coverage at the base-pair level

Author(s):  
Emanuela Iovino ◽  
Marco Seri ◽  
Tommaso Pippucci

Abstract Motivation Next-generation sequencing is increasingly adopted in the clinical practice largely thanks to concurrent advancements in bioinformatic tools for variant detection and annotation. However, the need to assess sequencing quality at the base-pair level still poses challenges for diagnostic accuracy. One of the most popular quality parameters is the percentage of targeted bases characterized by low depth of coverage (DoC). These regions potentially ‘hide’ clinically relevant variants, but no annotation is usually returned with them. However, visualizing low-DoC data with their potential functional and clinical consequences may be useful to prioritize inspection of specific regions before re-sequencing all coverage gaps or making assertions about completeness of the diagnostic test. To meet this need, we have developed unCOVERApp, an interactive application for graphical inspection and clinical annotation of low-DoC genomic regions containing genes. Results unCOVERApp interactive plots allow to display gene sequence coverage down to the base-pair level, and functional and clinical annotations of sites below a user-defined DoC threshold can be downloaded in a user-friendly spreadsheet format. Moreover, unCOVERApp provides a simple statistical framework to evaluate if DoC is sufficient for the detection of somatic variants. A maximum credible allele frequency calculator is also available allowing users to set allele frequency cut-offs based on assumptions about the genetic architecture of the disease. In conclusion, unCOVERApp is an original tool designed to identify sites of potential clinical interest that may be ‘hidden’ in diagnostic sequencing data. Availabilityand implementation unCOVERApp is a free application developed with Shiny packages and available in GitHub (https://github.com/Manuelaio/uncoverappLib). Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Author(s):  
Emanuela Iovino ◽  
Marco Seri ◽  
Tommaso Pippucci

AbstractMotivationNext Generation Sequencing (NGS) is increasingly adopted in the clinical practice largely thanks to concurrent advancements in bioinformatic tools for variant detection and annotation. Despite improvements in available approaches, the need to assess sequencing quality down to the base-pair level still poses challenges for diagnostic accuracy. One of the most popular quality parameters of diagnostic NGS is the percentage of targeted bases characterized by low depth of coverage (DoC). These regions potentially hide a clinically-relevant variant, but no annotation is usually returned for them.However, visualizing low-DoC data with their potential functional and clinical consequences may be useful to prioritize inspection of specific regions before re-sequencing all coverage gaps or making assertions about completeness of the diagnostic test.To meet this need we have developed unCOVERApp, an interactive application for graphical inspection and clinical annotation of low-DoC genomic regions containing genes.ResultsunCOVERApp is a suite of graphical and statistical tools to support clinical assessment of low-DoC regions. Its interactive plots allow to display gene sequence coverage down to the base-pair level, and functional and clinical annotations of sites below a user-defined DoC threshold can be downloaded in a user-friendly spreadsheet format. Moreover, unCOVERApp provides a simple statistical framework to evaluate if DoC is sufficient for the detection of somatic variants, where the usual 20x DoC threshold used for germline variants is not adequate. A maximum credible allele frequency calculator is also available allowing users to set allele frequency cut-offs based on assumptions about the genetic architecture of the disease instead of applying a general one (e.g. 5%). In conclusion, unCOVERApp is an original tool designed to identify sites of potential clinical interest that may be hidden in diagnostic sequencing data.AvailabilityunCOVERApp is a freely available application written in R and developed with Shiny packages and available in GitHub.


2019 ◽  
Vol 36 (7) ◽  
pp. 2173-2180 ◽  
Author(s):  
Jui Wan Loh ◽  
Caitlin Guccione ◽  
Frances Di Clemente ◽  
Gregory Riedlinger ◽  
Shridar Ganesan ◽  
...  

Abstract Summary Clinical sequencing aims to identify somatic mutations in cancer cells for accurate diagnosis and treatment. However, most widely used clinical assays lack patient-matched control DNA and additional analysis is needed to distinguish somatic and unfiltered germline variants. Such computational analyses require accurate assessment of tumor cell content in individual specimens. Histological estimates often do not corroborate with results from computational methods that are primarily designed for normal–tumor matched data and can be confounded by genomic heterogeneity and presence of sub-clonal mutations. Allele-frequency-based imputation of tumor (All-FIT) is an iterative weighted least square method to estimate specimen tumor purity based on the allele frequencies of variants detected in high-depth, targeted, clinical sequencing data. Using simulated and clinical data, we demonstrate All-FIT’s accuracy and improved performance against leading computational approaches, highlighting the importance of interpreting purity estimates based on expected biology of tumors. Availability and implementation Freely available at http://software.khiabanian-lab.org. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Darawan Rinchai ◽  
Jessica Roelands ◽  
Mohammed Toufiq ◽  
Wouter Hendrickx ◽  
Matthew C Altman ◽  
...  

Abstract Motivation We previously described the construction and characterization of generic and reusable blood transcriptional module repertoires. More recently we released a third iteration (“BloodGen3” module repertoire) that comprises 382 functionally annotated gene sets (modules) and encompasses 14,168 transcripts. Custom bioinformatic tools are needed to support downstream analysis, visualization and interpretation relying on such fixed module repertoires. Results We have developed and describe here a R package, BloodGen3Module. The functions of our package permit group comparison analyses to be performed at the module-level, and to display the results as annotated fingerprint grid plots. A parallel workflow for computing module repertoire changes for individual samples rather than groups of samples is also available; these results are displayed as fingerprint heatmaps. An illustrative case is used to demonstrate the steps involved in generating blood transcriptome repertoire fingerprints of septic patients. Taken together, this resource could facilitate the analysis and interpretation of changes in blood transcript abundance observed across a wide range of pathological and physiological states. Availability The BloodGen3Module package and documentation are freely available from Github: https://github.com/Drinchai/BloodGen3Module Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Takumi Miura ◽  
Satoshi Yasuda ◽  
Yoji Sato

Abstract Background Next-generation sequencing (NGS) has profoundly changed the approach to genetic/genomic research. Particularly, the clinical utility of NGS in detecting mutations associated with disease risk has contributed to the development of effective therapeutic strategies. Recently, comprehensive analysis of somatic genetic mutations by NGS has also been used as a new approach for controlling the quality of cell substrates for manufacturing biopharmaceuticals. However, the quality evaluation of cell substrates by NGS largely depends on the limit of detection (LOD) for rare somatic mutations. The purpose of this study was to develop a simple method for evaluating the ability of whole-exome sequencing (WES) by NGS to detect mutations with low allele frequency. To estimate the LOD of WES for low-frequency somatic mutations, we repeatedly and independently performed WES of a reference genomic DNA using the same NGS platform and assay design. LOD was defined as the allele frequency with a relative standard deviation (RSD) value of 30% and was estimated by a moving average curve of the relation between RSD and allele frequency. Results Allele frequencies of 20 mutations in the reference material that had been pre-validated by droplet digital PCR (ddPCR) were obtained from 5, 15, 30, or 40 G base pair (Gbp) sequencing data per run. There was a significant association between the allele frequencies measured by WES and those pre-validated by ddPCR, whose p-value decreased as the sequencing data size increased. By this method, the LOD of allele frequency in WES with the sequencing data of 15 Gbp or more was estimated to be between 5 and 10%. Conclusions For properly interpreting the WES data of somatic genetic mutations, it is necessary to have a cutoff threshold of low allele frequencies. The in-house LOD estimated by the simple method shown in this study provides a rationale for setting the cutoff.


Author(s):  
Givanna H Putri ◽  
Irena Koprinska ◽  
Thomas M Ashhurst ◽  
Nicholas J C King ◽  
Mark N Read

Abstract Motivation Many ‘automated gating’ algorithms now exist to cluster cytometry and single-cell sequencing data into discrete populations. Comparative algorithm evaluations on benchmark datasets rely either on a single performance metric, or a few metrics considered independently of one another. However, single metrics emphasize different aspects of clustering performance and do not rank clustering solutions in the same order. This underlies the lack of consensus between comparative studies regarding optimal clustering algorithms and undermines the translatability of results onto other non-benchmark datasets. Results We propose the Pareto fronts framework as an integrative evaluation protocol, wherein individual metrics are instead leveraged as complementary perspectives. Judged superior are algorithms that provide the best trade-off between the multiple metrics considered simultaneously. This yields a more comprehensive and complete view of clustering performance. Moreover, by broadly and systematically sampling algorithm parameter values using the Latin Hypercube sampling method, our evaluation protocol minimizes (un)fortunate parameter value selections as confounding factors. Furthermore, it reveals how meticulously each algorithm must be tuned in order to obtain good results, vital knowledge for users with novel data. We exemplify the protocol by conducting a comparative study between three clustering algorithms (ChronoClust, FlowSOM and Phenograph) using four common performance metrics applied across four cytometry benchmark datasets. To our knowledge, this is the first time Pareto fronts have been used to evaluate the performance of clustering algorithms in any application domain. Availability and implementation Implementation of our Pareto front methodology and all scripts and datasets to reproduce this article are available at https://github.com/ghar1821/ParetoBench. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Tomasz Zok

Abstract Motivation Biomolecular structures come in multiple representations and diverse data formats. Their incompatibility with the requirements of data analysis programs significantly hinders the analytics and the creation of new structure-oriented bioinformatic tools. Therefore, the need for robust libraries of data processing functions is still growing. Results BioCommons is an open-source, Java library for structural bioinformatics. It contains many functions working with the 2D and 3D structures of biomolecules, with a particular emphasis on RNA. Availability and implementation The library is available in Maven Central Repository and its source code is hosted on GitHub: https://github.com/tzok/BioCommons Supplementary information Supplementary data are available at Bioinformatics online.


2010 ◽  
Vol 26 (17) ◽  
pp. 2101-2108 ◽  
Author(s):  
Jiří Macas ◽  
Pavel Neumann ◽  
Petr Novák ◽  
Jiming Jiang

Abstract Motivation: Satellite DNA makes up significant portion of many eukaryotic genomes, yet it is relatively poorly characterized even in extensively sequenced species. This is, in part, due to methodological limitations of traditional methods of satellite repeat analysis, which are based on multiple alignments of monomer sequences. Therefore, we employed an alternative, alignment-free, approach utilizing k-mer frequency statistics, which is in principle more suitable for analyzing large sets of satellite repeat data, including sequence reads from next generation sequencing technologies. Results: k-mer frequency spectra were determined for two sets of rice centromeric satellite CentO sequences, including 454 reads from ChIP-sequencing of CENH3-bound DNA (7.6 Mb) and the whole genome Sanger sequencing reads (5.8 Mb). k-mer frequencies were used to identify the most conserved sequence regions and to reconstruct consensus sequences of complete monomers. Reconstructed consensus sequences as well as the assessment of overall divergence of k-mer spectra revealed high similarity of the two datasets, suggesting that CentO sequences associated with functional centromeres (CENH3-bound) do not significantly differ from the total population of CentO, which includes both centromeric and pericentromeric repeat arrays. On the other hand, considerable differences were revealed when these methods were used for comparison of CentO populations between individual chromosomes of the rice genome assembly, demonstrating preferential sequence homogenization of the clusters within the same chromosome. k-mer frequencies were also successfully used to identify and characterize smRNAs derived from CentO repeats. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


2003 ◽  
Vol 68 (2) ◽  
Author(s):  
Boris Mergell ◽  
Mohammad R. Ejtehadi ◽  
Ralf Everaers
Keyword(s):  

2018 ◽  
Vol 35 (13) ◽  
pp. 2326-2328 ◽  
Author(s):  
Tobias Jakobi ◽  
Alexey Uvarovskii ◽  
Christoph Dieterich

Abstract Motivation Circular RNAs (circRNAs) originate through back-splicing events from linear primary transcripts, are resistant to exonucleases, are not polyadenylated and have been shown to be highly specific for cell type and developmental stage. CircRNA detection starts from high-throughput sequencing data and is a multi-stage bioinformatics process yielding sets of potential circRNA candidates that require further analyses. While a number of tools for the prediction process already exist, publicly available analysis tools for further characterization are rare. Our work provides researchers with a harmonized workflow that covers different stages of in silico circRNA analyses, from prediction to first functional insights. Results Here, we present circtools, a modular, Python-based framework for computational circRNA analyses. The software includes modules for circRNA detection, internal sequence reconstruction, quality checking, statistical testing, screening for enrichment of RBP binding sites, differential exon RNase R resistance and circRNA-specific primer design. circtools supports researchers with visualization options and data export into commonly used formats. Availability and implementation circtools is available via https://github.com/dieterich-lab/circtools and http://circ.tools under GPLv3.0. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 35 (15) ◽  
pp. 2654-2656 ◽  
Author(s):  
Guoli Ji ◽  
Wenbin Ye ◽  
Yaru Su ◽  
Moliang Chen ◽  
Guangzao Huang ◽  
...  

Abstract Summary Alternative splicing (AS) is a well-established mechanism for increasing transcriptome and proteome diversity, however, detecting AS events and distinguishing among AS types in organisms without available reference genomes remains challenging. We developed a de novo approach called AStrap for AS analysis without using a reference genome. AStrap identifies AS events by extensive pair-wise alignments of transcript sequences and predicts AS types by a machine-learning model integrating more than 500 assembled features. We evaluated AStrap using collected AS events from reference genomes of rice and human as well as single-molecule real-time sequencing data from Amborella trichopoda. Results show that AStrap can identify much more AS events with comparable or higher accuracy than the competing method. AStrap also possesses a unique feature of predicting AS types, which achieves an overall accuracy of ∼0.87 for different species. Extensive evaluation of AStrap using different parameters, sample sizes and machine-learning models on different species also demonstrates the robustness and flexibility of AStrap. AStrap could be a valuable addition to the community for the study of AS in non-model organisms with limited genetic resources. Availability and implementation AStrap is available for download at https://github.com/BMILAB/AStrap. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document