EXPLANe: An Extensible Framework for Poster Annotation with Mobile Devices

Vcfanno: fast, flexible annotation of genetic variants

10.1101/041863 ◽

2016 ◽

Author(s):

Brent S. Pedersen ◽

Ryan M. Layer ◽

Aaron R. Quinlan

Keyword(s):

Genetic Variants ◽

Source Code ◽

Variant Annotation ◽

Link Type ◽

File Formats ◽

Whole Exome ◽

Wide Range ◽

Reference Databases ◽

Scripting Language ◽

Genome Annotations

ABSTRACTBackgroundThe integration of genome annotations and reference databases is critical to the identification of genetic variants that may be of interest in studies of disease or other traits. However, comprehensive variant annotation with diverse file formats is difficult with existing methods.ResultsWe have developed vcfanno as a flexible toolset that simplifies the annotation of genetic variants in VCF format. Vcfanno can extract and summarize multiple attributes from one or more annotation files and append the resulting annotations to the INFO field of the original VCF file. Vcfanno also integrates the lua scripting language so that users can easily develop custom annotations and metrics. By leveraging a new parallel “chromosome sweeping” algorithm, it enables rapid annotation of both whole-exome and whole-genome datasets. We demonstrate this performance by annotating over 85.3 million variants in less than 17 minutes (>85,000 variants per second) with 50 attributes from 17 commonly used genome annotation resources.ConclusionsVcfanno is a flexible software package that provides researchers with the ability to annotate genetic variation with a wide range of datasets and reference databases in diverse genomic formats.AvailabilityThe vcfanno source code is available at https://github.com/brentp/vcfanno under the MIT license, and platform-specific binaries are available at https://github.com/brentp/vcfanno/releases. Detailed documentation is available at http://brentp.github.io/vcfanno/, and the code underlying the analyses presented can be found at https://github.com/brentp/vcfanno/tree/master/scripts/paper.

Biobtree: A tool to search and map bioinformatics identifiers and special keywords

F1000Research ◽

10.12688/f1000research.17927.3 ◽

2020 ◽

Vol 8 ◽

pp. 145

Author(s):

Tamer Gur

Keyword(s):

Web Services ◽

Open Source ◽

Resource Usage ◽

Web Interface ◽

Bioinformatics Tool ◽

Link Type ◽

As Species ◽

Storage Resource ◽

Executable File ◽

And Storage

Biobtree is a bioinformatics tool to search and map bioinformatics datasets via identifiers or special keywords such as species name. It processes large bioinformatics datasets using a specialized MapReduce-based solution with optimum computational and storage resource usage. It provides uniform and B+ tree-based database output, a web interface, web services and allows performing chain mapping queries between datasets. It can be used via a single executable file or alternatively it can be used via the R or Python-based wrapper packages which are additionally provided for easier integration into existing pipelines. Biobtree is open source and available at GitHub.

Target Gene Notebook: Connecting genetics and drug discovery

10.1101/757690 ◽

2019 ◽

Cited By ~ 2

Author(s):

Mary Pat Reeve ◽

Andrew Kirby ◽

Jamey Wierzbowski ◽

Mark Daly ◽

Janna Hutz

Keyword(s):

Drug Discovery ◽

Open Source ◽

Target Gene ◽

Knowledge Bases ◽

Biological Information ◽

Genetic Associations ◽

Link Type ◽

Intuitive Interfaces ◽

Credible Set ◽

Genome Annotations

AbstractTarget Gene Notebook was developed to enable more efficient linking of genetic associations to functional biological information. This process is essential to translating genetic insights into therapeutic hypotheses and, eventually, drug discovery. Although many public databases provide access to unfiltered genome annotations and genetic results, there was no existing tool to maintain group curation and integration with proprietary experimental data. We provide Target Gene Notebook freely via the MIT open-source license for the purposes of assisting therapeutic target evaluation and the creation of durable institutional and public knowledge bases. Implemented as a Java backend serving mainly Javascript content derived from gene-specific SQLite databases, Target Gene Notebook enables automated access to the most widely used sources of genetic association, expression and protein QTL data, provides intuitive interfaces to credible set and colocalization information, and enables comprehensive literature review and annotation by multiple users simultaneously to create a consistent target knowledgebase within an organization or across a consortium. TargetGeneNotebook is freely available from GitHub https://github.com/targetgenenotebook/tgn.git under the MIT open-source end-user license agreement and a live version of the interface is provided at http://tgn.broadinstitute.org/.

The variant call format provides efficient and robust storage of GWAS summary statistics

Genome Biology ◽

10.1186/s13059-020-02248-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Matthew S. Lyon ◽

Shea J. Andrews ◽

Ben Elsworth ◽

Tom R. Gaunt ◽

Gibran Hemani ◽

...

Keyword(s):

Open Access ◽

Open Source ◽

Genetic Variants ◽

Data Interpretation ◽

Summary Statistics ◽

Variant Call Format ◽

Variant Call ◽

Query Performance ◽

Link Type ◽

Storage Format

AbstractGWAS summary statistics are fundamental for a variety of research applications yet no common storage format has been widely adopted. Existing tabular formats ambiguously or incompletely store information about genetic variants and associations, lack essential metadata and are typically not indexed yielding poor query performance and increasing the possibility of errors in data interpretation and post-GWAS analyses. To address these issues, we adapted the variant call format to store GWAS summary statistics (GWAS-VCF) and developed open-source tools to use this format in downstream analyses. We provide open access to over 10,000 complete GWAS summary datasets converted to this format (https://gwas.mrcieu.ac.uk).

Human mitochondrial variant annotation with HmtNote

10.1101/600619 ◽

2019 ◽

Cited By ~ 3

Author(s):

R. Preste ◽

R. Clima ◽

M. Attimonelli

Keyword(s):

Open Source ◽

Online Resources ◽

Annotation Database ◽

Variant Annotation ◽

Internet Connection ◽

Link Type ◽

Wide Range ◽

Using Data ◽

Cross Reference ◽

Python Package

AbstractHmtNote is a Python package to annotate human mitochondrial variants from VCF files.Variants are annotated using a wide range of information, which are grouped into basic, cross-reference, variability and prediction subsets so that users can either select specific annotations of interest or use them altogether.Annotations are performed using data from HmtVar, a recently published database of human mitochondrial variations, which collects information from several online resources as well as offering in-house pathogenicity predictions.HmtNote also allows users to download a local annotation database, that can be used to annotate variants offline, without having to rely on an internet connection.HmtNote is a free and open source package, and can be downloaded and installed from PyPI (https://pypi.org/project/hmtnote) or GitHub (https://github.com/robertopreste/HmtNote).

Biobtree: A tool to search and map bioinformatics identifiers and special keywords

F1000Research ◽

10.12688/f1000research.17927.4 ◽

2020 ◽

Vol 8 ◽

pp. 145

Author(s):

Tamer Gur

Keyword(s):

Web Services ◽

Open Source ◽

Resource Usage ◽

Web Interface ◽

Bioinformatics Tool ◽

Link Type ◽

As Species ◽

Storage Resource ◽

Executable File ◽

And Storage

Biobtree is a bioinformatics tool to search and map bioinformatics datasets via identifiers or special keywords such as species name. It processes large bioinformatics datasets using a specialized MapReduce-based solution with optimum computational and storage resource usage. It provides uniform and B+ tree-based database output, a web interface, web services and allows performing chain mapping queries between datasets. It can be used via a single executable file or alternatively it can be used via the R or Python-based wrapper packages which are additionally provided for easier integration into existing pipelines. Biobtree is open source and available at GitHub.

Open source tools for geographic analysis in transport planning

Journal of Geographical Systems ◽

10.1007/s10109-020-00342-2 ◽

2021 ◽

Author(s):

Robin Lovelace

Keyword(s):

User Interface ◽

Open Source ◽

Citizen Participation ◽

Simulation Software ◽

First Century ◽

Transport Planning ◽

Geographic Analysis ◽

Link Type ◽

Twenty First Century ◽

Interactive Map

AbstractGeographic analysis has long supported transport plans that are appropriate to local contexts. Many incumbent ‘tools of the trade’ are proprietary and were developed to support growth in motor traffic, limiting their utility for transport planners who have been tasked with twenty-first century objectives such as enabling citizen participation, reducing pollution, and increasing levels of physical activity by getting more people walking and cycling. Geographic techniques—such as route analysis, network editing, localised impact assessment and interactive map visualisation—have great potential to support modern transport planning priorities. The aim of this paper is to explore emerging open source tools for geographic analysis in transport planning, with reference to the literature and a review of open source tools that are already being used. A key finding is that a growing number of options exist, challenging the current landscape of proprietary tools. These can be classified as command-line interface, graphical user interface or web-based user interface tools and by the framework in which they were implemented, with numerous tools released as R, Python and JavaScript packages, and QGIS plugins. The review found a diverse and rapidly evolving ‘ecosystem’ tools, with 25 tools that were designed for geographic analysis to support transport planning outlined in terms of their popularity and functionality based on online documentation. They ranged in size from single-purpose tools such as the QGIS plugin AwaP to sophisticated stand-alone multi-modal traffic simulation software such as MATSim, SUMO and Veins. Building on their ability to re-use the most effective components from other open source projects, developers of open source transport planning tools can avoid ‘reinventing the wheel’ and focus on innovation, the ‘gamified’ A/B Street https://github.com/dabreegster/abstreet/#abstreet simulation software, based on OpenStreetMap, a case in point. The paper, the source code of which can be found at https://github.com/robinlovelace/open-gat, concludes that, although many of the tools reviewed are still evolving and further research is needed to understand their relative strengths and barriers to uptake, open source tools for geographic analysis in transport planning already hold great potential to help generate the strategic visions of change and evidence that is needed by transport planners in the twenty-first century.

Using open source software and mobile devices for collecting research data in terrain

Management, Information and Educational Engineering ◽

10.1201/b18558-19 ◽

2015 ◽

pp. 99-102

Keyword(s):

Open Source ◽

Mobile Devices ◽

Open Source Software ◽

Research Data

chewBBACA: A complete suite for gene-by-gene schema creation and strain identification

10.1101/173146 ◽

2017 ◽

Cited By ~ 5

Author(s):

Mickael Silva ◽

Miguel Machado ◽

Diogo N. Silva ◽

Mirko Rossi ◽

Jacob Moran-Gilad ◽

...

Keyword(s):

Open Source ◽

Core Genome ◽

Bacterial Species ◽

Outbreak Detection ◽

Strain Identification ◽

List Type ◽

Whole Genome ◽

Link Type ◽

The Creation ◽

Allele Calling

ABSTRACTGene-by-gene approaches are becoming increasingly popular in bacterial genomic epidemiology and outbreak detection. However, there is a lack of open-source scalable software for schema definition and allele calling for these methodologies. The chewBBACA suite was designed to assist users in the creation and evaluation of novel whole-genome or core-genome gene-by-gene typing schemas and subsequent allele calling in bacterial strains of interest. The software can run in a laptop or in high performance clusters making it useful for both small laboratories and large reference centers. ChewBBACA is available athttps://github.com/B-UMMI/chewBBACAor as a docker image athttps://hub.docker.com/r/ummidock/chewbbaca/.DATA SUMMARYAssembled genomes used for the tutorial were downloaded from NCBI in August 2016 by selecting those submitted asStreptococcus agalactiaetaxon or sub-taxa. All the assemblies have been deposited as a zip file in FigShare (https://figshare.com/s/9cbe1d422805db54cd52), where a file with the original ftp link for each NCBI directory is also available.Code for the chewBBACA suite is available athttps://github.com/B-UMMI/chewBBACAwhile the tutorial example is found athttps://github.com/B-UMMI/chewBBACA_tutorial.I/We confirm all supporting data, code and protocols have been provided within the article or through supplementary data files. ⊠IMPACT STATEMENTThe chewBBACA software offers a computational solution for the creation, evaluation and use of whole genome (wg) and core genome (cg) multilocus sequence typing (MLST) schemas. It allows researchers to develop wg/cgMLST schemes for any bacterial species from a set of genomes of interest. The alleles identified by chewBBACA correspond to potential coding sequences, possibly offering insights into the correspondence between the genetic variability identified and phenotypic variability. The software performs allele calling in a matter of seconds to minutes per strain in a laptop but is easily scalable for the analysis of large datasets of hundreds of thousands of strains using multiprocessing options. The chewBBACA software thus provides an efficient and freely available open source solution for gene-by-gene methods. Moreover, the ability to perform these tasks locally is desirable when the submission of raw data to a central repository or web services is hindered by data protection policies or ethical or legal concerns.

The Popgen Pipeline Platform: A Software Platform for Facilitating Population Genomic Analyses

10.1101/785774 ◽

2019 ◽

Author(s):

Andrew Webb ◽

Jared Knoblauch ◽

Nitesh Sabankar ◽

Apeksha Sukesh Kallur ◽

Jody Hey ◽

...

Keyword(s):

Open Source ◽

Development Time ◽

End Users ◽

File Format ◽

Software Platform ◽

Format Conversion ◽

Link Type ◽

Population Genomic ◽

Genomic Analyses ◽

File Format Conversion

AbstractHere we present the Pop-Gen Pipeline Platform (PPP), a software platform with the goal of reducing the computational expertise required for conducting population genomic analyses. The PPP was designed as a collection of scripts that facilitate common population genomic workflows in a consistent and standardized Python environment. Functions were developed to encompass entire workflows, including: input preparation, file format conversion, various population genomic analyses, output generation, and visualization. By facilitating entire workflows, the PPP offers several benefits to prospective end users - it reduces the need of redundant in-house software and scripts that would require development time and may be error-prone, or incorrect. The platform has also been developed with reproducibility and extensibility of analyses in mind. The PPP is an open-source package that is available for download and use at https://ppp.readthedocs.io/en/latest/PPP_pages/install.html