Vcfanno: fast, flexible annotation of genetic variants

Mapping Intimacies ◽

10.1101/041863 ◽

2016 ◽

Author(s):

Brent S. Pedersen ◽

Ryan M. Layer ◽

Aaron R. Quinlan

Keyword(s):

Genetic Variants ◽

Source Code ◽

Variant Annotation ◽

Link Type ◽

File Formats ◽

Whole Exome ◽

Wide Range ◽

Reference Databases ◽

Scripting Language ◽

Genome Annotations

ABSTRACTBackgroundThe integration of genome annotations and reference databases is critical to the identification of genetic variants that may be of interest in studies of disease or other traits. However, comprehensive variant annotation with diverse file formats is difficult with existing methods.ResultsWe have developed vcfanno as a flexible toolset that simplifies the annotation of genetic variants in VCF format. Vcfanno can extract and summarize multiple attributes from one or more annotation files and append the resulting annotations to the INFO field of the original VCF file. Vcfanno also integrates the lua scripting language so that users can easily develop custom annotations and metrics. By leveraging a new parallel “chromosome sweeping” algorithm, it enables rapid annotation of both whole-exome and whole-genome datasets. We demonstrate this performance by annotating over 85.3 million variants in less than 17 minutes (>85,000 variants per second) with 50 attributes from 17 commonly used genome annotation resources.ConclusionsVcfanno is a flexible software package that provides researchers with the ability to annotate genetic variation with a wide range of datasets and reference databases in diverse genomic formats.AvailabilityThe vcfanno source code is available at https://github.com/brentp/vcfanno under the MIT license, and platform-specific binaries are available at https://github.com/brentp/vcfanno/releases. Detailed documentation is available at http://brentp.github.io/vcfanno/, and the code underlying the analyses presented can be found at https://github.com/brentp/vcfanno/tree/master/scripts/paper.

gFACs: Filtering, Analysis, and Conversion to Unify Genome Annotations Across Alignment and Gene Prediction Frameworks

10.1101/402396 ◽

2018 ◽

Cited By ~ 1

Author(s):

Madison Caballero ◽

Jill Wegrzyn

Keyword(s):

Gene Prediction ◽

Structural Features ◽

Functional Attributes ◽

File Formats ◽

Wide Range ◽

Long Read ◽

Gene Models ◽

Gene Structures ◽

Genome Annotations ◽

Alignment Analysis

AbstractPublished genome annotations are filled with erroneous gene models that represent issues associated with frame, start side identification, splice sites, and related structural features. The source of these inconsistencies can often be traced to translated text file formats designed to describe long read alignments and predicted gene structures. The majority of gene prediction frameworks do not provide downstream filtering to remove problematic gene annotations, nor do they represent these annotations in a format consistent with current file standards. In addition, these frameworks lack consideration for functional attributes, such as the presence or absence of protein domains which can be used for gene model validation. To provide oversight to the increasing number of published genome annotations, we present gFACs as a software package to filter, analyze, and convert predicted gene models and alignments. gFACs operates across a wide range of alignment, analysis, and gene prediction software inputs with a flexible framework for defining gene models with reliable structural and functional attributes. gFACs supports common downstream applications, including genome browsers and generates extensive details on the filtering process, including distributions that can be visualized to further assess the proposed gene space.

UKBCC: a cohort curation package for UK Biobank

10.1101/2020.07.12.199810 ◽

2020 ◽

Author(s):

Isabell Kiral ◽

Nathalie Willems ◽

Benjamin Goudey

Keyword(s):

Source Code ◽

Heterogeneous Data ◽

Use Case ◽

Uk Biobank ◽

Link Type ◽

Search Terms ◽

Heterogeneous Data Sources ◽

Wide Range ◽

Critical Resource ◽

The Uk

AbstractSummaryThe UK Biobank (UKB) has quickly become a critical resource for researchers conducting a wide-range of biomedical studies (Bycroft et al., 2018). The database is constructed from heterogeneous data sources, employs several different encoding schemes, and is disparately distributed throughout UKB servers. Consequently, querying these data remains complicated, making it difficult to quickly identify participants who meet a given set of criteria. We have developed UK Biobank Cohort Curator (UKBCC), a Python tool that allows researchers to rapidly construct cohorts based on a set of search terms. Here, we describe the UKBCC implementation, critical sub-modules and functions, and outline its usage through an example use case for replicable cohort creation.AvailabilityUKBCC is available through PyPi (https://pypi.org/project/ukbcc) and as open source code on GitHub (https://github.com/tool-bin/ukbcc)[email protected]

IMPLEMENTING FUNCTIONAL MODULARITY FOR PROCESSING OF GENERAL PHOTOGRAMMETRIC DATA WITH THE DAMPED BUNDLE ADJUSTMENT TOOLBOX (DBAT)

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-w17-69-2019 ◽

2019 ◽

Vol XLII-2/W17 ◽

pp. 69-75 ◽

Cited By ~ 1

Author(s):

N. Börlin ◽

A. Murtiyoso ◽

P. Grussenmeyer

Keyword(s):

Open Source ◽

Bundle Adjustment ◽

Fine Tuning ◽

Simple Extension ◽

Fine Grained ◽

File Formats ◽

Wide Range ◽

Extensible Markup ◽

Scripting Language ◽

High Level

Abstract. The Damped Bundle Adjustment Toolbox (DBAT) is a free, open-source, toolbox for bundle adjustment. The purpose of DBAT is to provide an independent, open-source toolkit for statistically rigorous bundle adjustment computations. The capabilities include bundle adjustment, network analysis, point filtering, forward intersection, spatial intersection, plotting functions, and computations of quality indicators such as posterior covariance estimates and parameter correlations. DBAT is written in the high-level Matlab language and includes several processing example files. The input formats have so far been restricted to PhotoModeler export files and Photoscan (Metashape) native files. Fine-tuning of the processing has so far required knowledge of the Matlab language.This paper describes the development of a scripting language based on the XML (eXtensible Markup Language) language that allow the user a fine-grained control over what operations are applied to the input data, while keeping the needed programming skills at a minimum. Furthermore, the scripting language allows a wide range of input formats. Additionally, the XML format allows simple extension of the script file format both in terms of adding new operations, file formats, or adding parameters to existing operations. Overall, the script files will in principle allow DBAT to process any kind of photogrammetric input and should extend the usability of DBAT as a scientific and teaching tool for photogrammetric computations.

Human mitochondrial variant annotation with HmtNote

10.1101/600619 ◽

2019 ◽

Cited By ~ 3

Author(s):

R. Preste ◽

R. Clima ◽

M. Attimonelli

Keyword(s):

Open Source ◽

Online Resources ◽

Annotation Database ◽

Variant Annotation ◽

Internet Connection ◽

Link Type ◽

Wide Range ◽

Using Data ◽

Cross Reference ◽

Python Package

AbstractHmtNote is a Python package to annotate human mitochondrial variants from VCF files.Variants are annotated using a wide range of information, which are grouped into basic, cross-reference, variability and prediction subsets so that users can either select specific annotations of interest or use them altogether.Annotations are performed using data from HmtVar, a recently published database of human mitochondrial variations, which collects information from several online resources as well as offering in-house pathogenicity predictions.HmtNote also allows users to download a local annotation database, that can be used to annotate variants offline, without having to rely on an internet connection.HmtNote is a free and open source package, and can be downloaded and installed from PyPI (https://pypi.org/project/hmtnote) or GitHub (https://github.com/robertopreste/HmtNote).

EXPLANe: An Extensible Framework for Poster Annotation with Mobile Devices

10.1101/121178 ◽

2017 ◽

Author(s):

Nikhil Gopal ◽

Andrew Su ◽

Chunlei Wu ◽

Sean D. Mooney

Keyword(s):

Web Services ◽

Open Source ◽

Mobile Devices ◽

Genetic Variants ◽

Poster Session ◽

Biological Information ◽

Text Recognition ◽

Variant Annotation ◽

Link Type

AbstractSummaryScientific posters tend to be brief, unstructured, and generally unsuitable for communication beyond a poster session. This paper describes EXPLANe, a framework for annotating posters using optical text recognition and web services on mobile devices. EXPLANe is demonstrated through an interface to the MyVariant.info variant annotation web services, and provides users a list of biological information linked with genetic variants (as found via extracted RSIDs from annotated posters). This paper delineates the architecture of the application, and includes results of a five-part evaluation we conducted. Researchers and developers can use the existing codebase as a foundation from which to generate their own annotation tabs when analyzing and annotating posters.AvailabilityAlpha EXPLANe software is available as an open source application at https://github.com/ngopal/EXPLANeContactSean D. Mooney ([email protected])

iWhale: a computational pipeline based on Docker and SCons for detection and annotation of somatic variants in cancer WES data

Briefings in Bioinformatics ◽

10.1093/bib/bbaa065 ◽

2020 ◽

Author(s):

Andrea Binatti ◽

Silvia Bresolin ◽

Stefania Bortoluzzi ◽

Alessandro Coppe

Keyword(s):

Operating Systems ◽

Sequence Variants ◽

Computational Pipeline ◽

Variant Call Format ◽

Variant Call ◽

Whole Exome ◽

Wide Range ◽

Powerful Approach ◽

Variant Call Format File ◽

Reference Databases

Abstract Whole exome sequencing (WES) is a powerful approach for discovering sequence variants in cancer cells but its time effectiveness is limited by the complexity and issues of WES data analysis. Here we present iWhale, a customizable pipeline based on Docker and SCons, reliably detecting somatic variants by three complementary callers (MuTect2, Strelka2 and VarScan2). The results are combined to obtain a single variant call format file for each sample and variants are annotated by integrating a wide range of information extracted from several reference databases, ultimately allowing variant and gene prioritization according to different criteria. iWhale allows users to conduct a complex series of WES analyses with a powerful yet customizable and easy-to-use tool, running on most operating systems (macOs, GNU/Linux and Windows). iWhale code is freely available at https://github.com/alexcoppe/iWhale and the docker image is downloadable from https://hub.docker.com/r/alexcoppe/iwhale.

Analysis workflow to assess de novo genetic variants from human whole-exome sequencing

STAR Protocols ◽

10.1016/j.xpro.2021.100383 ◽

2021 ◽

Vol 2 (1) ◽

pp. 100383

Author(s):

Nicholas S. Diab ◽

Spencer King ◽

Weilai Dong ◽

Garrett Allington ◽

Amar Sheth ◽

...

Keyword(s):

Exome Sequencing ◽

Whole Exome Sequencing ◽

Genetic Variants ◽

De Novo ◽

Whole Exome ◽

Analysis Workflow

Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based Invocation Metrics

Technologies ◽

10.3390/technologies9010003 ◽

2020 ◽

Vol 9 (1) ◽

pp. 3

Author(s):

Gábor Antal ◽

Zoltán Tóth ◽

Péter Hegedűs ◽

Rudolf Ferenc

Keyword(s):

Software Maintenance ◽

Positive Impact ◽

Source Code ◽

Code Analysis ◽

Static Source ◽

Static Code Analysis ◽

Function Calls ◽

Hybrid Code ◽

Code Metrics ◽

Scripting Language

Bug prediction aims at finding source code elements in a software system that are likely to contain defects. Being aware of the most error-prone parts of the program, one can efficiently allocate the limited amount of testing and code review resources. Therefore, bug prediction can support software maintenance and evolution to a great extent. In this paper, we propose a function level JavaScript bug prediction model based on static source code metrics with the addition of a hybrid (static and dynamic) code analysis based metric of the number of incoming and outgoing function calls (HNII and HNOI). Our motivation for this is that JavaScript is a highly dynamic scripting language for which static code analysis might be very imprecise; therefore, using a purely static source code features for bug prediction might not be enough. Based on a study where we extracted 824 buggy and 1943 non-buggy functions from the publicly available BugsJS dataset for the ESLint JavaScript project, we can confirm the positive impact of hybrid code metrics on the prediction performance of the ML models. Depending on the ML algorithm, applied hyper-parameters, and target measures we consider, hybrid invocation metrics bring a 2–10% increase in model performances (i.e., precision, recall, F-measure). Interestingly, replacing static NOI and NII metrics with their hybrid counterparts HNOI and HNII in itself improves model performances; however, using them all together yields the best results.

RiboA: a web application to identify ribosome A-site locations in ribosome profiling data

BMC Bioinformatics ◽

10.1186/s12859-021-04068-w ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Danying Shao ◽

Nabeel Ahmed ◽

Nishant Soni ◽

Edward P. O’Brien

Keyword(s):

Integer Programming ◽

Web Application ◽

Stop Codon ◽

Ribosome Profiling ◽

Programming Method ◽

Analysis Tool ◽

Site Location ◽

Link Type ◽

Wide Range ◽

A Site

Abstract Background Translation is a fundamental process in gene expression. Ribosome profiling is a method that enables the study of transcriptome-wide translation. A fundamental, technical challenge in analyzing Ribo-Seq data is identifying the A-site location on ribosome-protected mRNA fragments. Identification of the A-site is essential as it is at this location on the ribosome where a codon is translated into an amino acid. Incorrect assignment of a read to the A-site can lead to lower signal-to-noise ratio and loss of correlations necessary to understand the molecular factors influencing translation. Therefore, an easy-to-use and accurate analysis tool is needed to accurately identify the A-site locations. Results We present RiboA, a web application that identifies the most accurate A-site location on a ribosome-protected mRNA fragment and generates the A-site read density profiles. It uses an Integer Programming method that reflects the biological fact that the A-site of actively translating ribosomes is generally located between the second codon and stop codon of a transcript, and utilizes a wide range of mRNA fragment sizes in and around the coding sequence (CDS). The web application is containerized with Docker, and it can be easily ported across platforms. Conclusions The Integer Programming method that RiboA utilizes is the most accurate in identifying the A-site on Ribo-Seq mRNA fragments compared to other methods. RiboA makes it easier for the community to use this method via a user-friendly and portable web application. In addition, RiboA supports reproducible analyses by tracking all the input datasets and parameters, and it provides enhanced visualization to facilitate scientific exploration. RiboA is available as a web service at https://a-site.vmhost.psu.edu/. The code is publicly available at https://github.com/obrien-lab/aip_web_docker under the MIT license.

Identification of rare genetic variants in Juvenile Idiopathic Arthritis using whole exome sequencing

Pediatric Rheumatology ◽

10.1186/1546-0096-13-s1-p144 ◽

2015 ◽

Vol 13 (S1) ◽

Author(s):

E Sanchez ◽

S Grandemange ◽

F Tran Mau-Them ◽

P Louis-Plence ◽

A Carbasse ◽

...

Keyword(s):

Juvenile Idiopathic Arthritis ◽

Exome Sequencing ◽

Whole Exome Sequencing ◽

Genetic Variants ◽

Whole Exome ◽

Rare Genetic Variants