Pydigree: a python library for manipulation and forward-time simulation and of genetic datasets

Mapping Intimacies ◽

10.1101/213413 ◽

2017 ◽

Author(s):

James E. Hicks

Keyword(s):

Population Genetics ◽

Data Structures ◽

Genetic Epidemiology ◽

Genetic Data ◽

Link Type ◽

File Formats ◽

Time Simulation ◽

Cross Platform ◽

User Friendly ◽

Python Package

AbstractThe development of software for working with data from population genetics or genetic epidemiology often requires substantial time spent implementing common procedures. Pydigree is a cross-platform Python 3 library that contains efficient, user friendly implementations for many of these common functions, and support for input from common file formats. Developers can combine the functions and data structures to rapidly implement programs handling genetic data. Pydigree presents a useful environment for development of applications for genetic data or rapid prototyping before reimplementation in a higher-performance language.Pydigree is freely available under an open source license. Stable sources can be found in the Python Package Index at https://pypi.python.org/pypi/pydigree/, and development sources can be downloaded at https://github.com/jameshicks/pydigree/

Download Full-text

LevioSAM: Fast lift-over of alternate reference alignments

10.1101/2021.02.05.429867 ◽

2021 ◽

Author(s):

Taher Mun ◽

Nae-Chyun Chen ◽

Ben Langmead

Keyword(s):

Population Genetics ◽

Coordinate System ◽

Data Structures ◽

Succinct Data Structures ◽

Reference Coordinate System ◽

Link Type ◽

A Chain ◽

Time Required ◽

Effective Use

AbstractMotivationAs more population genetics datasets and population-specific references become available, the task of translating (“lifting”) read alignments from one reference coordinate system to another is becoming more common. Existing tools generally require a chain file, whereas VCF files are the more common way to represent variation. Existing tools also do not make effective use of threads, creating a post-alignment bottleneck.ResultsLevioSAM is a tool for lifting SAM/BAM alignments from one reference to another using a VCF file containing population variants. LevioSAM uses succinct data structures and scales efficiently to many threads. When run downstream of a read aligner, levioSAM completes in less than 13% the time required by an aligner when both are run with 16 threads.Availabilityhttps://github.com/alshai/[email protected], [email protected]

Download Full-text

GToTree: a user-friendly workflow for phylogenomics

10.1101/512491 ◽

2019 ◽

Cited By ~ 8

Author(s):

Michael D. Lee

Keyword(s):

Markov Models ◽

Link Type ◽

File Formats ◽

Evolutionary Inference ◽

Computational Work ◽

Command Line Tool ◽

Genome Level ◽

User Friendly ◽

Reference Genomes

AbstractSummaryGenome-level evolutionary inference (i.e., phylogenomics) is becoming an increasingly essential step in many biologists’ work - such as in the characterization of newly recovered genomes, or in leveraging available reference genomes to guide evolutionary questions. Accordingly, there are several tools available for the major steps in a phylogenomics workflow. But for the biologist whose main focus is not bioinformatics, much of the computational work required - such as accessing genomic data on large scales, integrating genomes from different file formats, performing required filtering, stitching different tools together, etc. - can be prohibitive. Here I introduce GToTree, a command-line tool that can take any combination of fasta files, GenBank files, and/or NCBI assembly accessions as input and outputs an alignment file, estimates of genome completeness and redundancy, and a phylogenomic tree based on the specified singlecopy gene (SCG) set. While GToTree can work with any custom hidden Markov Models (HMMs), also included are 13 newly generated SCG-set HMMs for different lineages and levels of resolution, built based on searches of ~12,000 bacterial and archaeal high-quality genomes. GToTree aims to give more researchers the capability to make phylogenomic trees.AvailabilityGToTree is open-source and freely available for download from: github.com/AstrobioMike/GToTreeDocumentationgithub.com/AstrobioMike/GToTree/wikiImplementationGToTree is implemented primarily in bash, with helper scripts written in [email protected]

Download Full-text

ThermoRawFileParser: modular, scalable and cross-platform RAW file conversion

10.1101/622852 ◽

2019 ◽

Cited By ~ 2

Author(s):

Niels Hulstaert ◽

Timo Sachsenberg ◽

Mathias Walzer ◽

Harald Barsnes ◽

Lennart Martens ◽

...

Keyword(s):

File Format ◽

Mass Spectrometers ◽

Continuous Growth ◽

Workflow Systems ◽

File Formats ◽

Standard File Format ◽

Cross Platform ◽

Quantitative Results ◽

Cloud Infrastructures ◽

User Friendly

AbstractThe field of computational proteomics is approaching the big data age, driven both by a continuous growth in the number of samples analysed per experiment, as well as by the growing amount of data obtained in each analytical run. In order to process these large amounts of data, it is increasingly necessary to use elastic compute resources such as Linux-based cluster environments and cloud infrastructures. Unfortunately, the vast majority of cross-platform proteomics tools are not able to operate directly on the proprietary formats generated by the diverse mass spectrometers. Here, we presented ThermoRawFileParser, an open-source, crossplatform tool that converts Thermo RAW files into open file formats such as MGF and to the HUPO-PSI standard file format mzML. To ensure the broadest possible availability, and to increase integration capabilities with popular workflow systems such as Galaxy or Nextflow, we have also built Conda and BioContainers containers around ThermoRawFileParser. In addition, we implemented a user-friendly interface (ThermoRawFileParserGUI) for those users not familiar with command-line tools. Finally, we performed a benchmark of ThermoRawFileParser and msconvert to verify that the converted mzML files contain reliable quantitative results.

Download Full-text

PubData: search engine for bioinformatics databases worldwide

10.1101/069575 ◽

2016 ◽

Author(s):

Bohdan B. Khomtchouk ◽

Kasra A. Vand ◽

Thor Wahlestedt ◽

Kelly Khomtchouk ◽

Mohammed K. Sayed ◽

...

Keyword(s):

Search Engine ◽

Language Processing ◽

Biomedical Literature ◽

Biomedical Data ◽

File Transfer ◽

Link Type ◽

Bioinformatics Databases ◽

Cross Platform ◽

File Retrieval ◽

User Friendly

AbstractWe propose a search engine and file retrieval system for all bioinformatics databases worldwide. PubData searches biomedical data in a user-friendly fashion similar to how PubMed searches biomedical literature. PubData is built on novel network programming, natural language processing, and artificial intelligence algorithms that can patch into the file transfer protocol servers of any user-specified bioinformatics database, query its contents, retrieve files for download, and adapt to the user’s search preferences.PubData is hosted as a user-friendly, cross-platform graphical user interface program developed using PyQt: http://www.pubdata.bio. The methods are implemented in Python, and are available as part of the PubData project at: https://github.com/Bohdan-Khomtchouk/PubData.

Download Full-text

Expanding the Orthologous Matrix (OMA) programmatic interfaces: REST API and the OmaDB packages for R and Python

F1000Research ◽

10.12688/f1000research.17548.1 ◽

2019 ◽

Vol 8 ◽

pp. 42

Author(s):

Klara Kaleb ◽

Alex Warwick Vesztrocy ◽

Adrian Altenhoff ◽

Christophe Dessimoz

Keyword(s):

Link Type ◽

Rest Api ◽

User Friendly ◽

Python Package

The Orthologous Matrix (OMA) is a well-established resource to identify orthologs among many genomes. Here, we present two recent additions to its programmatic interface, namely a REST API, and user-friendly R and Python packages called OmaDB. These should further facilitate the incorporation of OMA data into computational scripts and pipelines. The REST API can be freely accessed at https://omabrowser.org/api. The R OmaDB package is available as part of Bioconductor at http://bioconductor.org/packages/OmaDB/, and the omadb Python package is available from the Python Package Index (PyPI) at https://pypi.org/project/omadb/.

Download Full-text

SPECTRE: a Suite of PhylogEnetiC Tools for Reticulate Evolution

10.1101/169177 ◽

2017 ◽

Cited By ~ 1

Author(s):

Sarah Bastkowski ◽

Daniel Mapleson ◽

Andreas Spillner ◽

Taoyang Wu ◽

Monika Balvočiūtė ◽

...

Keyword(s):

Open Source ◽

Data Structures ◽

Phylogenetic Trees ◽

High Performance ◽

Reticulate Evolution ◽

Supplementary Information ◽

Link Type ◽

Data Structures And Algorithms ◽

User Friendly ◽

Split Networks

ABSTRACTSummarySplit-networks are a generalization of phylogenetic trees that have proven to be a powerful tool in phylogenetics. Various ways have been developed for computing such networks, including split-decomposition, NeighborNet, QNet and FlatNJ. Some of these approaches are implemented in the user-friendly SplitsTree software package. However, to give the user the option to adjust and extend these approaches and to facilitate their integration into analysis pipelines, there is a need for robust, open-source implementations of associated data structures and algorithms. Here we present SPECTRE, a readily available, open-source library of data structures written in Java, that comes complete with new implementations of several pre-published algorithms and a basic interactive graphical interface for visualizing planar split networks. SPECTRE also supports the use of longer running algorithms by providing command line interfaces, which can be executed on servers or in High Performance Computing (HPC) environments.AvailabilityFull source code is available under the GPLv3 license at: https://github.com/maplesond/SPECTRESPECTRE’s core library is available from Maven Central at: https://mvnrepository.com/artifactuk.ac.uea.cmp.spectre/coreDocumentation is available at: http://spectre-suite-of-phylogenetic-tools-for-reticulate-evolution.readthedocs.io/en/latest/[email protected] Information (SI)Supplementary information is available at Bioinformatics online.

Download Full-text

BuddySuite: Command-line toolkits for manipulating sequences, alignments, and phylogenetic trees

10.1101/040675 ◽

2016 ◽

Author(s):

Stephen R. Bond ◽

Karl E. Keat ◽

Sofia N. Barreira ◽

Andreas D. Baxevanis

Keyword(s):

Sequence Alignment ◽

Phylogenetic Trees ◽

Phylogenetic Reconstruction ◽

General Purpose ◽

Command Line ◽

Link Type ◽

File Formats ◽

Downstream Analysis ◽

Python Package ◽

Common Sequence

AbstractThe ability to manipulate sequence, alignment, and phylogenetic tree files has become an increasingly important skill in the life sciences, whether to generate summary information or to prepare data for further downstream analysis. The command line can be an extremely powerful environment for interacting with these resources, but only if the user has the appropriate general-purpose tools on hand. BuddySuite is a collection of four independent yet interrelated command-line toolkits that facilitate each step in the workflow of sequence discovery, curation, alignment, and phylogenetic reconstruction. Most common sequence, alignment, and tree file formats are automatically detected and parsed, and over 100 tools have been implemented for manipulating these data. The project has been engineered to easily accommodate the addition of new tools, it is written in the popular programming language Python, and is hosted on the Python Package Index and GitHub to maximize accessibility. Documentation for each BuddySuite tool, including usage examples, is available at http://tiny.cc/buddysuite_wiki. All software is open source and freely available through http://research.nhgri.nih.gov/software/BuddySuite.

Download Full-text

Expanding the Orthologous Matrix (OMA) programmatic interfaces: REST API and the OmaDB packages for R and Python

F1000Research ◽

10.12688/f1000research.17548.2 ◽

2019 ◽

Vol 8 ◽

pp. 42 ◽

Cited By ~ 4

Author(s):

Klara Kaleb ◽

Alex Warwick Vesztrocy ◽

Adrian Altenhoff ◽

Christophe Dessimoz

Keyword(s):

Link Type ◽

Rest Api ◽

User Friendly ◽

Python Package

Download Full-text

Pedigree and Pedigree Import Wizard

HortScience ◽

10.21273/hortsci.33.3.552g ◽

1998 ◽

Vol 33 (3) ◽

pp. 552g-553

Author(s):

Shahrokh Khandizadeh

Keyword(s):

Additional Data ◽

File Format ◽

Fruit Crops ◽

Operating Environment ◽

Agronomic Characteristics ◽

Link Type ◽

Plant Characteristics ◽

User Friendly

Pedigree for Windows is a user-friendly program that allows the user to trace agronomic characteristics, draw pedigrees, and view images of several fruit crops, including more than 1400 apple, 800 strawberry, 800 almond, 100 blackberry, 80 blueberry, 790 pear, 200 raspberry examples. Pedigree Import Wizard®© for Windows is an add-on software for users who are interested in importing their research or breeding data records of fruit, flower, and plant characteristics and any related images into Pedigree for Windows. Pedigree for Windows and Pedigree Import Wizard have been designed so that a user familiar with the Windows operating environment should have little need to refer to the documentation provided with the program. Pedigree Import Wizard uses a comma-separated value (csv) file format under the MS Excel environment. This option allows the user to add or import additional data to the existing database that are already stored in other software such as Lotus, Excel, Access, QuattroPro, WordPerfect, and MS Word tables, etc., as long as they work under the Windows environment. A free demo version of Pedigree and Pedigree Import Wizard for Windows is available from http://www.pgris.com.

Download Full-text

Cross-Platform Real-Time Simulation Models for Li-ion Batteries in Opal-RT and Typhoon-HIL

2021 IEEE Texas Power and Energy Conference (TPEC) ◽

10.1109/tpec51183.2021.9384928 ◽

2021 ◽

Author(s):

Xinlan Jia ◽

Prottay M. Adhikari ◽

Luigi Vanfretti

Keyword(s):

Real Time ◽

Simulation Models ◽

Li Ion Batteries ◽

Real Time Simulation ◽

Time Simulation ◽

Cross Platform ◽

Li Ion

Download Full-text