scholarly journals Pydigree: a python library for manipulation and forward-time simulation and of genetic datasets

2017 ◽  
Author(s):  
James E. Hicks

AbstractThe development of software for working with data from population genetics or genetic epidemiology often requires substantial time spent implementing common procedures. Pydigree is a cross-platform Python 3 library that contains efficient, user friendly implementations for many of these common functions, and support for input from common file formats. Developers can combine the functions and data structures to rapidly implement programs handling genetic data. Pydigree presents a useful environment for development of applications for genetic data or rapid prototyping before reimplementation in a higher-performance language.Pydigree is freely available under an open source license. Stable sources can be found in the Python Package Index at https://pypi.python.org/pypi/pydigree/, and development sources can be downloaded at https://github.com/jameshicks/pydigree/

2021 ◽  
Author(s):  
Taher Mun ◽  
Nae-Chyun Chen ◽  
Ben Langmead

AbstractMotivationAs more population genetics datasets and population-specific references become available, the task of translating (“lifting”) read alignments from one reference coordinate system to another is becoming more common. Existing tools generally require a chain file, whereas VCF files are the more common way to represent variation. Existing tools also do not make effective use of threads, creating a post-alignment bottleneck.ResultsLevioSAM is a tool for lifting SAM/BAM alignments from one reference to another using a VCF file containing population variants. LevioSAM uses succinct data structures and scales efficiently to many threads. When run downstream of a read aligner, levioSAM completes in less than 13% the time required by an aligner when both are run with 16 threads.Availabilityhttps://github.com/alshai/[email protected], [email protected]


2019 ◽  
Author(s):  
Michael D. Lee

AbstractSummaryGenome-level evolutionary inference (i.e., phylogenomics) is becoming an increasingly essential step in many biologists’ work - such as in the characterization of newly recovered genomes, or in leveraging available reference genomes to guide evolutionary questions. Accordingly, there are several tools available for the major steps in a phylogenomics workflow. But for the biologist whose main focus is not bioinformatics, much of the computational work required - such as accessing genomic data on large scales, integrating genomes from different file formats, performing required filtering, stitching different tools together, etc. - can be prohibitive. Here I introduce GToTree, a command-line tool that can take any combination of fasta files, GenBank files, and/or NCBI assembly accessions as input and outputs an alignment file, estimates of genome completeness and redundancy, and a phylogenomic tree based on the specified singlecopy gene (SCG) set. While GToTree can work with any custom hidden Markov Models (HMMs), also included are 13 newly generated SCG-set HMMs for different lineages and levels of resolution, built based on searches of ~12,000 bacterial and archaeal high-quality genomes. GToTree aims to give more researchers the capability to make phylogenomic trees.AvailabilityGToTree is open-source and freely available for download from: github.com/AstrobioMike/GToTreeDocumentationgithub.com/AstrobioMike/GToTree/wikiImplementationGToTree is implemented primarily in bash, with helper scripts written in [email protected]


2019 ◽  
Author(s):  
Niels Hulstaert ◽  
Timo Sachsenberg ◽  
Mathias Walzer ◽  
Harald Barsnes ◽  
Lennart Martens ◽  
...  

AbstractThe field of computational proteomics is approaching the big data age, driven both by a continuous growth in the number of samples analysed per experiment, as well as by the growing amount of data obtained in each analytical run. In order to process these large amounts of data, it is increasingly necessary to use elastic compute resources such as Linux-based cluster environments and cloud infrastructures. Unfortunately, the vast majority of cross-platform proteomics tools are not able to operate directly on the proprietary formats generated by the diverse mass spectrometers. Here, we presented ThermoRawFileParser, an open-source, crossplatform tool that converts Thermo RAW files into open file formats such as MGF and to the HUPO-PSI standard file format mzML. To ensure the broadest possible availability, and to increase integration capabilities with popular workflow systems such as Galaxy or Nextflow, we have also built Conda and BioContainers containers around ThermoRawFileParser. In addition, we implemented a user-friendly interface (ThermoRawFileParserGUI) for those users not familiar with command-line tools. Finally, we performed a benchmark of ThermoRawFileParser and msconvert to verify that the converted mzML files contain reliable quantitative results.


2016 ◽  
Author(s):  
Bohdan B. Khomtchouk ◽  
Kasra A. Vand ◽  
Thor Wahlestedt ◽  
Kelly Khomtchouk ◽  
Mohammed K. Sayed ◽  
...  

AbstractWe propose a search engine and file retrieval system for all bioinformatics databases worldwide. PubData searches biomedical data in a user-friendly fashion similar to how PubMed searches biomedical literature. PubData is built on novel network programming, natural language processing, and artificial intelligence algorithms that can patch into the file transfer protocol servers of any user-specified bioinformatics database, query its contents, retrieve files for download, and adapt to the user’s search preferences.PubData is hosted as a user-friendly, cross-platform graphical user interface program developed using PyQt: http://www.pubdata.bio. The methods are implemented in Python, and are available as part of the PubData project at: https://github.com/Bohdan-Khomtchouk/PubData.


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 42
Author(s):  
Klara Kaleb ◽  
Alex Warwick Vesztrocy ◽  
Adrian Altenhoff ◽  
Christophe Dessimoz

The Orthologous Matrix (OMA) is a well-established resource to identify orthologs among many genomes. Here, we present two recent additions to its programmatic interface, namely a REST API, and user-friendly R and Python packages called OmaDB. These should further facilitate the incorporation of OMA data into computational scripts and pipelines. The REST API can be freely accessed at https://omabrowser.org/api. The R OmaDB package is available as part of Bioconductor at http://bioconductor.org/packages/OmaDB/, and the omadb Python package is available from the Python Package Index (PyPI) at https://pypi.org/project/omadb/.


2017 ◽  
Author(s):  
Sarah Bastkowski ◽  
Daniel Mapleson ◽  
Andreas Spillner ◽  
Taoyang Wu ◽  
Monika Balvočiūtė ◽  
...  

ABSTRACTSummarySplit-networks are a generalization of phylogenetic trees that have proven to be a powerful tool in phylogenetics. Various ways have been developed for computing such networks, including split-decomposition, NeighborNet, QNet and FlatNJ. Some of these approaches are implemented in the user-friendly SplitsTree software package. However, to give the user the option to adjust and extend these approaches and to facilitate their integration into analysis pipelines, there is a need for robust, open-source implementations of associated data structures and algorithms. Here we present SPECTRE, a readily available, open-source library of data structures written in Java, that comes complete with new implementations of several pre-published algorithms and a basic interactive graphical interface for visualizing planar split networks. SPECTRE also supports the use of longer running algorithms by providing command line interfaces, which can be executed on servers or in High Performance Computing (HPC) environments.AvailabilityFull source code is available under the GPLv3 license at: https://github.com/maplesond/SPECTRESPECTRE’s core library is available from Maven Central at: https://mvnrepository.com/artifactuk.ac.uea.cmp.spectre/coreDocumentation is available at: http://spectre-suite-of-phylogenetic-tools-for-reticulate-evolution.readthedocs.io/en/latest/[email protected] Information (SI)Supplementary information is available at Bioinformatics online.


2016 ◽  
Author(s):  
Stephen R. Bond ◽  
Karl E. Keat ◽  
Sofia N. Barreira ◽  
Andreas D. Baxevanis

AbstractThe ability to manipulate sequence, alignment, and phylogenetic tree files has become an increasingly important skill in the life sciences, whether to generate summary information or to prepare data for further downstream analysis. The command line can be an extremely powerful environment for interacting with these resources, but only if the user has the appropriate general-purpose tools on hand. BuddySuite is a collection of four independent yet interrelated command-line toolkits that facilitate each step in the workflow of sequence discovery, curation, alignment, and phylogenetic reconstruction. Most common sequence, alignment, and tree file formats are automatically detected and parsed, and over 100 tools have been implemented for manipulating these data. The project has been engineered to easily accommodate the addition of new tools, it is written in the popular programming language Python, and is hosted on the Python Package Index and GitHub to maximize accessibility. Documentation for each BuddySuite tool, including usage examples, is available at http://tiny.cc/buddysuite_wiki. All software is open source and freely available through http://research.nhgri.nih.gov/software/BuddySuite.


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 42 ◽  
Author(s):  
Klara Kaleb ◽  
Alex Warwick Vesztrocy ◽  
Adrian Altenhoff ◽  
Christophe Dessimoz

The Orthologous Matrix (OMA) is a well-established resource to identify orthologs among many genomes. Here, we present two recent additions to its programmatic interface, namely a REST API, and user-friendly R and Python packages called OmaDB. These should further facilitate the incorporation of OMA data into computational scripts and pipelines. The REST API can be freely accessed at https://omabrowser.org/api. The R OmaDB package is available as part of Bioconductor at http://bioconductor.org/packages/OmaDB/, and the omadb Python package is available from the Python Package Index (PyPI) at https://pypi.org/project/omadb/.


HortScience ◽  
1998 ◽  
Vol 33 (3) ◽  
pp. 552g-553
Author(s):  
Shahrokh Khandizadeh

Pedigree for Windows is a user-friendly program that allows the user to trace agronomic characteristics, draw pedigrees, and view images of several fruit crops, including more than 1400 apple, 800 strawberry, 800 almond, 100 blackberry, 80 blueberry, 790 pear, 200 raspberry examples. Pedigree Import Wizard®© for Windows is an add-on software for users who are interested in importing their research or breeding data records of fruit, flower, and plant characteristics and any related images into Pedigree for Windows. Pedigree for Windows and Pedigree Import Wizard have been designed so that a user familiar with the Windows operating environment should have little need to refer to the documentation provided with the program. Pedigree Import Wizard uses a comma-separated value (csv) file format under the MS Excel environment. This option allows the user to add or import additional data to the existing database that are already stored in other software such as Lotus, Excel, Access, QuattroPro, WordPerfect, and MS Word tables, etc., as long as they work under the Windows environment. A free demo version of Pedigree and Pedigree Import Wizard for Windows is available from http://www.pgris.com.


Sign in / Sign up

Export Citation Format

Share Document