Human mitochondrial variant annotation with HmtNote

AbstractHmtNote is a Python package to annotate human mitochondrial variants from VCF files.Variants are annotated using a wide range of information, which are grouped into basic, cross-reference, variability and prediction subsets so that users can either select specific annotations of interest or use them altogether.Annotations are performed using data from HmtVar, a recently published database of human mitochondrial variations, which collects information from several online resources as well as offering in-house pathogenicity predictions.HmtNote also allows users to download a local annotation database, that can be used to annotate variants offline, without having to rely on an internet connection.HmtNote is a free and open source package, and can be downloaded and installed from PyPI (https://pypi.org/project/hmtnote) or GitHub (https://github.com/robertopreste/HmtNote).

Download Full-text

Dashing: fast and accurate genomic distances with HyperLogLog

Genome Biology ◽

10.1186/s13059-019-1875-0 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 9

Author(s):

Daniel N. Baker ◽

Ben Langmead

Keyword(s):

Open Source ◽

Software Tool ◽

Estimation Methods ◽

Cardinality Estimation ◽

Link Type ◽

Wide Range

AbstractDashing is a fast and accurate software tool for estimating similarities of genomes or sequencing datasets. It uses the HyperLogLog sketch together with cardinality estimation methods that are specialized for set unions and intersections. Dashing summarizes genomes more rapidly than previous MinHash-based methods while providing greater accuracy across a wide range of input sizes and sketch sizes. It can sketch and calculate pairwise distances for over 87K genomes in 6 minutes. Dashing is open source and available at https://github.com/dnbaker/dashing.

Download Full-text

Vcfanno: fast, flexible annotation of genetic variants

10.1101/041863 ◽

2016 ◽

Author(s):

Brent S. Pedersen ◽

Ryan M. Layer ◽

Aaron R. Quinlan

Keyword(s):

Genetic Variants ◽

Source Code ◽

Variant Annotation ◽

Link Type ◽

File Formats ◽

Whole Exome ◽

Wide Range ◽

Reference Databases ◽

Scripting Language ◽

Genome Annotations

ABSTRACTBackgroundThe integration of genome annotations and reference databases is critical to the identification of genetic variants that may be of interest in studies of disease or other traits. However, comprehensive variant annotation with diverse file formats is difficult with existing methods.ResultsWe have developed vcfanno as a flexible toolset that simplifies the annotation of genetic variants in VCF format. Vcfanno can extract and summarize multiple attributes from one or more annotation files and append the resulting annotations to the INFO field of the original VCF file. Vcfanno also integrates the lua scripting language so that users can easily develop custom annotations and metrics. By leveraging a new parallel “chromosome sweeping” algorithm, it enables rapid annotation of both whole-exome and whole-genome datasets. We demonstrate this performance by annotating over 85.3 million variants in less than 17 minutes (>85,000 variants per second) with 50 attributes from 17 commonly used genome annotation resources.ConclusionsVcfanno is a flexible software package that provides researchers with the ability to annotate genetic variation with a wide range of datasets and reference databases in diverse genomic formats.AvailabilityThe vcfanno source code is available at https://github.com/brentp/vcfanno under the MIT license, and platform-specific binaries are available at https://github.com/brentp/vcfanno/releases. Detailed documentation is available at http://brentp.github.io/vcfanno/, and the code underlying the analyses presented can be found at https://github.com/brentp/vcfanno/tree/master/scripts/paper.

Download Full-text

DREAMTools: a Python package for scoring collaborative challenges

F1000Research ◽

10.12688/f1000research.7118.2 ◽

2016 ◽

Vol 4 ◽

pp. 1030 ◽

Cited By ~ 5

Author(s):

Thomas Cokelaer ◽

Mukesh Bansal ◽

Christopher Bare ◽

Erhan Bilal ◽

Brian M. Bot ◽

...

Keyword(s):

Computational Methods ◽

Training Data ◽

Model Parameters ◽

Automated Scoring ◽

Statistical Machine Learning ◽

Science And Engineering ◽

Link Type ◽

Wide Range ◽

Improved Methods ◽

Python Package

DREAM challenges are community competitions designed to advance computational methods and address fundamental questions in system biology and translational medicine. Each challenge asks participants to develop and apply computational methods to either predict unobserved outcomes or to identify unknown model parameters given a set of training data. Computational methods are evaluated using an automated scoring metric, scores are posted to a public leaderboard, and methods are published to facilitate community discussions on how to build improved methods. By engaging participants from a wide range of science and engineering backgrounds, DREAM challenges can comparatively evaluate a wide range of statistical, machine learning, and biophysical methods. Here, we describe DREAMTools, a Python package for evaluating DREAM challenge scoring metrics. DREAMTools provides a command line interface that enables researchers to test new methods on past challenges, as well as a framework for scoring new challenges. As of March 2016, DREAMTools includes more than 80% of completed DREAM challenges. DREAMTools complements the data, metadata, and software tools available at the DREAM website http://dreamchallenges.org and on the Synapse platform at https://www.synapse.org.Availability: DREAMTools is a Python package. Releases and documentation are available at http://pypi.python.org/pypi/dreamtools. The source code is available at http://github.com/dreamtools/dreamtools.

Download Full-text

Python Package abstcal: An Open-Source Tool for Calculating Abstinence from Timeline Followback Data

Nicotine & Tobacco Research ◽

10.1093/ntr/ntab083 ◽

2021 ◽

Author(s):

Yong Cui ◽

Jason D Robinson ◽

Rudel E Rymer ◽

Jennifer A Minnix ◽

Paul M Cinciripini

Keyword(s):

Smoking Cessation ◽

Open Source ◽

Research Field ◽

Missing Data Imputation ◽

Online Tool ◽

Timeline Followback ◽

Continuous Point ◽

Web App ◽

Using Data ◽

Python Package

Abstract In smoking cessation clinical trials, timeline followback (TLFB) interviews are widely used to track daily cigarette consumption. However, there are no standard tools for calculating abstinence based on TLFB data. Individual research groups have to develop their own calculation tools, which is not only time- and resource-consuming but might also lead to variability in the data processing and calculation procedures. To address these issues, we developed a novel open-source Python package named abstcal to calculate abstinence using TLFB data. This package provides data verification, duplicate and outlier detection, missing-data imputation, integration of biochemical verification data, and calculation of a variety of definitions of abstinence, including continuous, point-prevalence, and prolonged abstinence. We verified the accuracy of the calculator using data derived from a clinical smoking cessation study. To improve the package’s accessibility, we have made it available as a free web app. The abstcal package is a reliable abstinence calculator with open-source access, providing a shared validated online tool to the addiction research field. We expect that this open-source abstinence calculation tool will improve the rigor and reproducibility of smoking and addiction research by standardizing TLFB-based abstinence calculation.

Download Full-text

AGEpy: a Python package for computational biology

10.1101/450890 ◽

2018 ◽

Cited By ~ 1

Author(s):

Franziska Metge ◽

Robert Sehlke ◽

Jorge Boucas

Keyword(s):

Computational Biology ◽

Open Source ◽

High Throughput ◽

Biological Data ◽

Command Line ◽

High Throughput Analysis ◽

Throughput Analysis ◽

Link Type ◽

Biological Meaning ◽

Python Package

AbstractSummary:AGEpy is a Python package focused on the transformation of interpretable data into biological meaning. It is designed to support high-throughput analysis of pre-processed biological data using either local Python based processing or Python based API calls to local or remote servers. In this application note we describe its different Python modules as well as its command line accessible toolsaDiff,abed,blasto,david, andobo2tsv.Availability:The open source AGEpy Python package is freely available at:https://github.com/mpg-age-bioinformatics/AGEpy.Contact:[email protected]

Download Full-text

Supplementary material to "AI4Water v1.0: An open source python package for modeling hydrological time series using data-driven methods"

10.5194/gmd-2021-139-supplement ◽

2021 ◽

Author(s):

Ather Abbas ◽

Laurie Boithias ◽

Yakov Pachepsky ◽

Kyunghyun Kim ◽

Jong Ahn Chun ◽

...

Keyword(s):

Time Series ◽

Open Source ◽

Data Driven ◽

Supplementary Material ◽

Using Data ◽

Python Package

Download Full-text

NeuroPycon: An open-source Python toolbox for fast multi-modal and reproducible brain connectivity pipelines

10.1101/789842 ◽

2019 ◽

Author(s):

David Meunier ◽

Annalisa Pascarella ◽

Dmitrii Altukhov ◽

Mainak Jas ◽

Etienne Combrisson ◽

...

Keyword(s):

Open Source ◽

Brain Connectivity ◽

Reproducible Research ◽

Link Type ◽

Large Sets ◽

Current Implementation ◽

Wide Range ◽

Neuroimaging Software ◽

Brain Data ◽

Automatic Removal

AbstractRecent years have witnessed a massive push towards reproducible research in neuroscience. Unfortunately, this endeavor is often challenged by the large diversity of tools used, project-specific custom code and the difficulty to track all user-defined parameters. NeuroPycon is an open-source multi-modal brain data analysis toolkit which provides Python-based template pipelines for advanced multi-processing of MEG, EEG, functional and anatomical MRI data, with a focus on connectivity and graph theoretical analyses. Importantly, it provides shareable parameter files to facilitate replication of all analysis steps. NeuroPycon is based on the NiPype framework which facilitates data analyses by wrapping many commonly-used neuroimaging software tools into a common Python environment. In other words, rather than being a brain imaging software with is own implementation of standard algorithms for brain signal processing, NeuroPycon seamlessly integrates existing packages (coded in python, Matlab or other languages) into a unified python framework. Importantly, thanks to the multi-threaded processing and computational efficiency afforded by NiPype, NeuroPycon provides an easy option for fast parallel processing, which critical when handling large sets of multi-dimensional brain data. Moreover, its flexible design allows users to easily configure analysis pipelines by connecting distinct nodes to each other. Each node can be a Python-wrapped module, a user-defined function or a well-established tool (e.g. MNE-Python for MEG analysis, Radatools for graph theoretical metrics, etc.). Last but not least, the ability to use NeuroPycon parameter files to fully describe any pipeline is an important feature for reproducibility, as they can be shared and used for easy replication by others. The current implementation of NeuroPycon contains two complementary packages: The first, called ephypype, includes pipelines for electrophysiology analysis and a command-line interface for on the fly pipeline creation. Current implementations allow for MEG/EEG data import, pre-processing and cleaning by automatic removal of ocular and cardiac artefacts, in addition to sensor or source-level connectivity analyses. The second package, called graphpype, is designed to investigate functional connectivity via a wide range of graph-theoretical metrics, including modular partitions. The present article describes the philosophy, architecture, and functionalities of the toolkit and provides illustrative examples through interactive notebooks. NeuroPycon is available for download via github (https://github.com/neuropycon) and the two principal packages are documented online (https://neuropycon.github.io/ephypype/index.html. and https://neuropycon.github.io/graphpype/index.html). Future developments include fusion of multi-modal data (eg. MEG and fMRI or intracranial EEG and fMRI). We hope that the release of NeuroPycon will attract many users and new contributors, and facilitate the efforts of our community towards open source tool sharing and development, as well as scientific reproducibility.

Download Full-text

FastTrack: An open-source software for tracking varying numbers of deformable objects

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008697 ◽

2021 ◽

Vol 17 (2) ◽

pp. e1008697

Author(s):

Benjamin Gallois ◽

Raphaël Candelier

Keyword(s):

Open Source ◽

Cell Tracking ◽

Ad Hoc ◽

Ground Truth ◽

General Purpose ◽

Two Dimensions ◽

Deformable Objects ◽

Tracking Accuracy ◽

Link Type ◽

Wide Range

Analyzing the dynamical properties of mobile objects requires to extract trajectories from recordings, which is often done by tracking movies. We compiled a database of two-dimensional movies for very different biological and physical systems spanning a wide range of length scales and developed a general-purpose, optimized, open-source, cross-platform, easy to install and use, self-updating software called FastTrack. It can handle a changing number of deformable objects in a region of interest, and is particularly suitable for animal and cell tracking in two-dimensions. Furthermore, we introduce the probability of incursions as a new measure of a movie’s trackability that doesn’t require the knowledge of ground truth trajectories, since it is resilient to small amounts of errors and can be computed on the basis of an ad hoc tracking. We also leveraged the versatility and speed of FastTrack to implement an iterative algorithm determining a set of nearly-optimized tracking parameters—yet further reducing the amount of human intervention—and demonstrate that FastTrack can be used to explore the space of tracking parameters to optimize the number of swaps for a batch of similar movies. A benchmark shows that FastTrack is orders of magnitude faster than state-of-the-art tracking algorithms, with a comparable tracking accuracy. The source code is available under the GNU GPLv3 at https://github.com/FastTrackOrg/FastTrack and pre-compiled binaries for Windows, Mac and Linux are available at http://www.fasttrack.sh.

Download Full-text

Dashing: Fast and Accurate Genomic Distances with HyperLogLog

10.1101/501726 ◽

2018 ◽

Cited By ~ 8

Author(s):

Daniel N Baker ◽

Ben Langmead

Keyword(s):

Open Source ◽

Software Tool ◽

Estimation Methods ◽

Cardinality Estimation ◽

Link Type ◽

Wide Range

Download Full-text

SemEHR: A General-purpose Semantic Search System to Surface Semantic Data from Clinical Notes for Tailored Care, Trial Recruitment and Clinical Research

10.1101/235622 ◽

2017 ◽

Cited By ~ 1

Author(s):

Honghan Wu ◽

Giulia Toti ◽

Katherine I. Morley ◽

Zina M. Ibrahim ◽

Amos Folarin ◽

...

Keyword(s):

Information Extraction ◽

Open Source ◽

Language Processing ◽

Semantic Search ◽

Trial Recruitment ◽

College Hospital ◽

Link Type ◽

Semantic Data ◽

Wide Range ◽

The Uk

ABSTRACTObjectiveUnlocking the data contained within both structured and unstructured components of Electronic Health Records (EHRs) has the potential to provide a step change in data available forsecondary research use, generation of actionable medical insights, hospital management and trial recruitment. To achieve this, we implemented SemEHR - a semantic search and analytics, open source tool for EHRs.MethodsSemEHR implements a generic information extraction (IE) and retrieval infrastructure by identifying contextualised mentions of a wide range of biomedical concepts within EHRs. Natural Language Processing (NLP) annotations are further assembled at patient level and extended with EHR-specific knowledge to generate a timeline for each patient. The semantic data is serviced via ontology-based search and analytics interfaces.ResultsSemEHR has been deployed to a number of UK hospitals including the Clinical Record Interactive Search (CRIS), an anonymised replica of the EHR of the UK South London and Maudsley (SLaM) NHS Foundation Trust, one of Europes largest providers of mental health services. In two CRIS-based studies, SemEHR achieved 93% (Hepatitis C case) and 99% (HIV case) F-Measure results in identifying true positive patients. At King’s College Hospital in London, as part of the CogStack programme (github.com/cogstack), SemEHR is being used to recruit patients into the UK Dept of Health 100k Genome Project (genomicsengland.co.uk). The validation study suggests that the tool can validate previously recruited cases and is very fast in searching phenotypes - time for recruitment criteria checking reduced from days to minutes. Validated on an open intensive care EHR data - MIMICIII, the vital signs extracted by SemEHR can achieve around 97% accuracy.ConclusionResults from the multiple case studies demonstrate SemEHR’s efficiency - weeks or months of work can be done within hours or minutes in some cases. SemEHR provides a more comprehensive view of a patient, bringing in more and unexpected insight compared to study-oriented bespoke information extraction systems.SemEHR is open source available at https://github.com/CogStack/SemEHR.

Download Full-text