scholarly journals MALVIRUS: an integrated web application for viral variant calling

2020 ◽  
Author(s):  
Simone Ciccolella ◽  
Luca Denti ◽  
Paola Bonizzoni ◽  
Gianluca Della Vedova ◽  
Yuri Pirola ◽  
...  

AbstractBeing able to efficiently call variants from the increasing amount of sequencing data daily produced from multiple viral strains is of the utmost importance, as demonstrated during the COVID-19 pandemic, in order to track the spread of the viral strains across the globe.We present MALVIRUS, an easy-to-install and easy-to-use web application that assists users in two tasks: computing a variant catalog consisting in a set of population SNP loci from the population sequences andefficiently calling variants of the catalog from a read sample.Tests on Illumina and Nanopore samples prove the efficiency and the effectiveness of MALVIRUS in genotyping SARS-CoV-2 strain samples with respect to GISAID data.

2020 ◽  
Author(s):  
Axel Fürstberger ◽  
Nensi Ikonomi ◽  
Angelika M.R. Kestler ◽  
Ralf Marienfeld ◽  
Thomas Seufferlein ◽  
...  

Abstract Background: Providing suitable treatments strategies that take into account cancer specific alterations is a crucial task for successful cancer treatment. To this end, molecular tumor boards (MTBs), that bring together clinicians as well as scientists with diverse expertise, are increasingly established in the clinical routine for therapeutic interventions. Molecular profiling from sequencing data is an integral part of the decision making process of an MTB. To debate variant calling results from next generation sequencing NGS analyses, detailed information about the detected mutations are mandatory. Further, these results need to be combined with knowledge and up to date evidence from databases. At the moment, few tools are available that aim at managing this amount of required information. As a result, the whole process of analysis and documentation of patients data becomes time consuming and difficult to manage for MTBs. Results: To overcome these limitations, we developed an interactive web application AMBAR (Alteration annotations for Molecular tumor BoARds) to visualize not only annotated mutations, but also evidence for possible therapeutic drug targets. Found mutations can be evaluated, discussed and exported to clinical information systems. The application is based on R shiny and allows customization, interactive filtering and visualization.Conclusion: AMBAR is an interactive application to not only support MTBs in decision making, but to act as interface between results of NGS analyses, result visualization and export into clinical information systems.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Gundula Povysil ◽  
Monika Heinzl ◽  
Renato Salazar ◽  
Nicholas Stoler ◽  
Anton Nekrutenko ◽  
...  

Abstract Duplex sequencing is currently the most reliable method to identify ultra-low frequency DNA variants by grouping sequence reads derived from the same DNA molecule into families with information on the forward and reverse strand. However, only a small proportion of reads are assembled into duplex consensus sequences (DCS), and reads with potentially valuable information are discarded at different steps of the bioinformatics pipeline, especially reads without a family. We developed a bioinformatics toolset that analyses the tag and family composition with the purpose to understand data loss and implement modifications to maximize the data output for the variant calling. Specifically, our tools show that tags contain polymerase chain reaction and sequencing errors that contribute to data loss and lower DCS yields. Our tools also identified chimeras, which likely reflect barcode collisions. Finally, we also developed a tool that re-examines variant calls from raw reads and provides different summary data that categorizes the confidence level of a variant call by a tier-based system. With this tool, we can include reads without a family and check the reliability of the call, that increases substantially the sequencing depth for variant calling, a particular important advantage for low-input samples or low-coverage regions.


Author(s):  
Shilpa Nadimpalli Kobren ◽  
◽  
Dustin Baldridge ◽  
Matt Velinder ◽  
Joel B. Krier ◽  
...  

Abstract Purpose Genomic sequencing has become an increasingly powerful and relevant tool to be leveraged for the discovery of genetic aberrations underlying rare, Mendelian conditions. Although the computational tools incorporated into diagnostic workflows for this task are continually evolving and improving, we nevertheless sought to investigate commonalities across sequencing processing workflows to reveal consensus and standard practice tools and highlight exploratory analyses where technical and theoretical method improvements would be most impactful. Methods We collected details regarding the computational approaches used by a genetic testing laboratory and 11 clinical research sites in the United States participating in the Undiagnosed Diseases Network via meetings with bioinformaticians, online survey forms, and analyses of internal protocols. Results We found that tools for processing genomic sequencing data can be grouped into four distinct categories. Whereas well-established practices exist for initial variant calling and quality control steps, there is substantial divergence across sites in later stages for variant prioritization and multimodal data integration, demonstrating a diversity of approaches for solving the most mysterious undiagnosed cases. Conclusion The largest differences across diagnostic workflows suggest that advances in structural variant detection, noncoding variant interpretation, and integration of additional biomedical data may be especially promising for solving chronically undiagnosed cases.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Kelley Paskov ◽  
Jae-Yoon Jung ◽  
Brianna Chrisman ◽  
Nate T. Stockham ◽  
Peter Washington ◽  
...  

Abstract Background As next-generation sequencing technologies make their way into the clinic, knowledge of their error rates is essential if they are to be used to guide patient care. However, sequencing platforms and variant-calling pipelines are continuously evolving, making it difficult to accurately quantify error rates for the particular combination of assay and software parameters used on each sample. Family data provide a unique opportunity for estimating sequencing error rates since it allows us to observe a fraction of sequencing errors as Mendelian errors in the family, which we can then use to produce genome-wide error estimates for each sample. Results We introduce a method that uses Mendelian errors in sequencing data to make highly granular per-sample estimates of precision and recall for any set of variant calls, regardless of sequencing platform or calling methodology. We validate the accuracy of our estimates using monozygotic twins, and we use a set of monozygotic quadruplets to show that our predictions closely match the consensus method. We demonstrate our method’s versatility by estimating sequencing error rates for whole genome sequencing, whole exome sequencing, and microarray datasets, and we highlight its sensitivity by quantifying performance increases between different versions of the GATK variant-calling pipeline. We then use our method to demonstrate that: 1) Sequencing error rates between samples in the same dataset can vary by over an order of magnitude. 2) Variant calling performance decreases substantially in low-complexity regions of the genome. 3) Variant calling performance in whole exome sequencing data decreases with distance from the nearest target region. 4) Variant calls from lymphoblastoid cell lines can be as accurate as those from whole blood. 5) Whole-genome sequencing can attain microarray-level precision and recall at disease-associated SNV sites. Conclusion Genotype datasets from families are powerful resources that can be used to make fine-grained estimates of sequencing error for any sequencing platform and variant-calling methodology.


2021 ◽  
Author(s):  
H. Serhat Tetikol ◽  
Kubra Narci ◽  
Deniz Turgut ◽  
Gungor Budak ◽  
Ozem Kalay ◽  
...  

ABSTRACTGraph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference for capturing the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. While there have been many efforts to develop computationally efficient graph-based bioinformatics toolkits, how to curate genomic variants and subsequently construct genome graphs remains an understudied problem that inevitably determines the effectiveness of the end-to-end bioinformatics pipeline. In this study, we discuss major obstacles encountered during graph construction and propose methods for sample selection based on population diversity, graph augmentation with structural variants and resolution of graph reference ambiguity caused by information overload. Moreover, we present the case for iteratively augmenting tailored genome graphs for targeted populations and test the proposed approach on the whole-genome samples of African ancestry. Our results show that, as more representative alternatives to linear or generic graph references, population-specific graphs can achieve significantly lower read mapping errors, increased variant calling sensitivity and provide the improvements of joint variant calling without the need of computationally intensive post-processing steps.


2019 ◽  
Author(s):  
Elena Nabieva ◽  
Satyarth Mishra Sharma ◽  
Yermek Kapushev ◽  
Sofya K. Garushyants ◽  
Anna V. Fedotova ◽  
...  

AbstractHigh-throughput sequencing of fetal DNA is a promising and increasingly common method for the discovery of all (or all coding) genetic variants in the fetus, either as part of prenatal screening or diagnosis, or for genetic diagnosis of spontaneous abortions. In many cases, the fetal DNA (from chorionic villi, amniotic fluid, or abortive tissue) can be contaminated with maternal cells, resulting in the mixture of fetal and maternal DNA. This maternal cell contamination (MCC) undermines the assumption, made by traditional variant callers, that each allele in a heterozygous site is covered, on average, by 50% of the reads, and therefore can lead to erroneous genotype calls. We present a panel of methods for reducing the genotyping error in the presence of MCC. All methods start with the output of GATK HaplotypeCaller on the sequencing data for the (contaminated) fetal sample and both of its parents, and additionally rely on information about the MCC fraction (which itself is readily estimated from the high-throughput sequencing data). The first of these methods uses a Bayesian probabilistic model to correct the fetal genotype calls produced by MCC-unaware HaplotypeCaller. The other two methods “learn” the genotype-correction model from examples. We use simulated contaminated fetal data to train and test the models. Using the test sets, we show that all three methods lead to substantially improved accuracy when compared with the original MCC-unaware HaplotypeCaller calls. We then apply the best-performing method to three chorionic villus samples from spontaneously terminated pregnancies.Code and training data availabilityhttps://github.com/bazykinlab/ML-maternal-cell-contamination


2020 ◽  
Author(s):  
Philipp Sievers ◽  
Martin Sill ◽  
Daniel Schrimpf ◽  
Damian Stichel ◽  
David E. Reuss ◽  
...  

AbstractBackgroundMalignant astrocytic gliomas in children show a remarkable biological and clinical diversity. Small in-frame insertions or missense mutations in the EGFR gene have recently been identified in a distinct subset of pediatric bithalamic gliomas with a unique DNA methylation pattern.MethodsHere, we investigated an epigenetically homogeneous cohort of malignant gliomas (n=58) distinct from other subtypes and enriched for pediatric cases and thalamic location, in order to elucidate the overlap with this recently identified subtype of pediatric bithalamic gliomas.ResultsEGFR gene amplification was detected in 16/58 (27%) tumors, and missense mutations or small in-frame insertions in EGFR were found in 20/30 tumors with available sequencing data (67%; five of them co-occurring with EGFR amplification). Additionally, eight of the 30 tumors (27%) harbored an H3.1 or H3.3 K27M mutation (six of them with a concomitant EGFR alteration). All tumors tested showed loss of H3K27me3 staining, with evidence of EZHIP overexpression in the H3 wildtype cases. Although some tumors indeed showed a bithalamic growth pattern, a significant proportion of tumors occurred in the unilateral thalamus or in other (predominantly midline) locations.ConclusionsOur findings present a distinct molecular class of pediatric malignant gliomas largely overlapping with the recently reported bithalamic gliomas characterized by EGFR alteration, but additionally showing a broader spectrum of EGFR alterations and tumor localization. Global H3K27me3 loss in this group appears to be mediated by either H3 K27 mutation or EZHIP overexpression. EGFR inhibition may represent a potential therapeutic strategy in these highly aggressive gliomas.Key pointsThis study confirms a distinct new subset of pediatric diffuse midline glioma with H3K27me3 loss, with or without H3 K27 mutationThe poor outcome of these tumors is in line with the broader family of pediatric diffuse midline gliomas with H3 K27 mutation or EZHIP overexpressionFrequent EGFR alterations in these tumors may represent a therapeutic target in this subsetImportance of the StudyMalignant astrocytic gliomas in children show a remarkable biological and clinical diversity. Here, we highlight a distinct molecular class of pediatric malignant gliomas characterized by EGFR alteration and global H3K27me3 loss that appears to be mediated by either H3 K27 mutation or EZHIP overexpression. EGFR inhibition may represent a potential therapeutic strategy in these highly aggressive gliomas.


2020 ◽  
Author(s):  
Stevenn Volant ◽  
Pierre Lechat ◽  
Perrine Woringer ◽  
Laurence Motreff ◽  
Christophe Malabat ◽  
...  

Abstract BackgroundComparing the composition of microbial communities among groups of interest (e.g., patients vs healthy individuals) is a central aspect in microbiome research. It typically involves sequencing, data processing, statistical analysis and graphical representation of the detected signatures. Such an analysis is normally obtained by using a set of different applications that require specific expertise for installation, data processing and in some case, programming skills. ResultsHere, we present SHAMAN, an interactive web application we developed in order to facilitate the use of (i) a bioinformatic workflow for metataxonomic analysis, (ii) a reliable statistical modelling and (iii) to provide among the largest panels of interactive visualizations as compared to the other options that are currently available. SHAMAN is specifically designed for non-expert users who may benefit from using an integrated version of the different analytic steps underlying a proper metagenomic analysis. The application is freely accessible at http://shaman.pasteur.fr/, and may also work as a standalone application with a Docker container (aghozlane/shaman), conda and R. The source code is written in R and is available at https://github.com/aghozlane/shaman. Using two datasets (a mock community sequencing and published 16S rRNA metagenomic data), we illustrate the strengths of SHAMAN in quickly performing a complete metataxonomic analysis. ConclusionsWe aim with SHAMAN to provide the scientific community with a platform that simplifies reproducible quantitative analysis of metagenomic data.


2018 ◽  
Author(s):  
Daniel P Cooke ◽  
David C Wedge ◽  
Gerton Lunter

Haplotype-based variant callers, which consider physical linkage between variant sites, are currently among the best tools for germline variation discovery and genotyping from short-read sequencing data. However, almost all such tools were designed specifically for detecting common germline variation in diploid populations, and give sub-optimal results in other scenarios. Here we present Octopus, a versatile haplotype-based variant caller that uses a polymorphic Bayesian genotyping model capable of modeling sequencing data from a range of experimental designs within a unified haplotype-aware framework. We show that Octopus accurately calls de novo mutations in parent-offspring trios and germline variants in individuals, including SNVs, indels, and small complex replacements such as microinversions. In addition, using a carefully designed synthetic-tumour data set derived from clean sequencing data from a sample with known germline haplotypes, and observed mutations in large cohort of tumour samples, we show that Octopus accurately characterizes germline and somatic variation in tumours, both with and without a paired normal sample. Sequencing reads and prior information are combined to phase called genotypes of arbitrary ploidy, including those with somatic mutations. Octopus also outputs realigned evidence BAMs to aid validation and interpretation.


2021 ◽  
Vol 37 (1) ◽  
pp. 77-84
Author(s):  
Yanbo Huang ◽  
D. K. Fisher

HighlightsA web application for guiding data calculated from distributed weather data through open-source cloud service.A design scheme of portable weather stations built from inexpensive open-source electronics.Integration of open-source hardware and software for online guiding data to avoid drift caused by temperature inversion.Abstract. It is important for agricultural chemical applicators to follow proper spray procedures to prevent susceptible crops, animals, people, or other living organisms from being injured far downwind. Spraying during stable atmospheric conditions should be avoided to prevent surface-temperature inversion-induced off-target drift of crop protection materials. Previous statistical analysis determined times of high likelihood of stable atmospheric conditions, which are unfavorable for spraying, during the day under clear and cloudy conditions in hot summer months in the Mississippi Delta. Results validated the thresholds of temperature increase in the morning and temperature drop in the afternoon with wind speeds and the transition between stable and unstable atmospheric conditions. With this information, an algorithm was developed to calculate if atmospheric conditions were favorable for spraying based on field temperature and wind speed at any instant. With this algorithm, a web application was built to provide real-time determination of atmospheric stability and hourly online recommendation of whether aerial applications were appropriate for a location and time in the Mississippi Delta. This study further developed another web application specifically for Stoneville, Mississippi, with data measured from weather stations constructed from inexpensive open-source electronics, accessories, and software for more accurate online guidance for site-specific drift management. The web application is adapted for accessing on mobile terminals, such as smartphones and tablets, and provides timely guidance for aerial applicators and producers to avoid spray drift and air quality issues long distances downwind in the area. Keywords: Open-source hardware, Open-source software, Spray drift, Temperature inversion, Web application.


Sign in / Sign up

Export Citation Format

Share Document