scholarly journals Raritas: a program for counting high diversity categorical data with highly unequal abundances

PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5453
Author(s):  
David B. Lazarus ◽  
Johan Renaudie ◽  
Dorina Lenz ◽  
Patrick Diver ◽  
Jens Klump

Acquiring data on the occurrences of many types of difficult to identify objects are often still made by human observation, for example, in biodiversity and paleontologic research. Existing computer counting programs used to record such data have various limitations, including inflexibility and cost. We describe a new open-source program for this purpose—Raritas. Raritas is written in Python and can be run as a standalone app for recent versions of either MacOS or Windows, or from the command line as easily customized source code. The program explicitly supports a rare category count mode which makes it easier to collect quantitative data on rare categories, for example, rare species which are important in biodiversity surveys. Lastly, we describe the file format used by Raritas and propose it as a standard for storing geologic biodiversity data. ‘Stratigraphic occurrence data’ file format combines extensive sample metadata and a flexible structure for recording occurrence data of species or other categories in a series of samples.

2018 ◽  
Author(s):  
David Lazarus ◽  
Johan Renaudie ◽  
Dorina Lenz ◽  
Patrick Diver ◽  
Jens Klump

Acquiring data on the occurrences of many types of difficult to identify objects are often still made by human observation, e.g. in biodiversity and paleontologic research. Existing computer counting programs used to record such data have various limitations, including inflexibility and cost. We describe a pair of new open-source programs for this purpose - Raritas and RaritasVox, which share a similar graphical user interface for mouse based counting, and file output format. Raritas is written in Python and can be run as a standalone app for recent versions of either MacOS or Windows, or from the command line as easily customized source code. RaritasVox in addition supports voice based counting but is written in Java and is more complex to install or modify. Both programs explicitly support a rare category count mode which makes it easier to collect quantitative data on rare categories, e.g. rare species which are important in biodiversity surveys. Lastly, as to our knowledge no standards exist yet, we describe a new stratigraphic occurrence data (SOD) unitary file format which combines extensive metadata and a flexible structure for recording occurrence data of species or other categories in a series of samples.


2018 ◽  
Author(s):  
David Lazarus ◽  
Johan Renaudie ◽  
Dorina Lenz ◽  
Patrick Diver ◽  
Jens Klump

Acquiring data on the occurrences of many types of difficult to identify objects are often still made by human observation, e.g. in biodiversity and paleontologic research. Existing computer counting programs used to record such data have various limitations, including inflexibility and cost. We describe a pair of new open-source programs for this purpose - Raritas and RaritasVox, which share a similar graphical user interface for mouse based counting, and file output format. Raritas is written in Python and can be run as a standalone app for recent versions of either MacOS or Windows, or from the command line as easily customized source code. RaritasVox in addition supports voice based counting but is written in Java and is more complex to install or modify. Both programs explicitly support a rare category count mode which makes it easier to collect quantitative data on rare categories, e.g. rare species which are important in biodiversity surveys. Lastly, as to our knowledge no standards exist yet, we describe a new stratigraphic occurrence data (SOD) unitary file format which combines extensive metadata and a flexible structure for recording occurrence data of species or other categories in a series of samples.


2015 ◽  
Vol 14 ◽  
pp. CIN.S26470 ◽  
Author(s):  
Richard P. Finney ◽  
Qing-Rong Chen ◽  
Cu V. Nguyen ◽  
Chih Hao Hsu ◽  
Chunhua Yan ◽  
...  

The name Alview is a contraction of the term Alignment Viewer. Alview is a compiled to native architecture software tool for visualizing the alignment of sequencing data. Inputs are files of short-read sequences aligned to a reference genome in the SAM/BAM format and files containing reference genome data. Outputs are visualizations of these aligned short reads. Alview is written in portable C with optional graphical user interface (GUI) code written in C, C++, and Objective-C. The application can run in three different ways: as a web server, as a command line tool, or as a native, GUI program. Alview is compatible with Microsoft Windows, Linux, and Apple OS X. It is available as a web demo at https://cgwb.nci.nih.gov/cgi-bin/alview . The source code and Windows/Mac/Linux executables are available via https://github.com/NCIP/alview .


2019 ◽  
Author(s):  
Charlotte A. Darby ◽  
Ravi Gaddipati ◽  
Michael C. Schatz ◽  
Ben Langmead

AbstractRead alignment is central to many aspects of modern genomics. Most aligners use heuristics to accelerate processing, but these heuristics can fail to find the optimal alignments of reads. Alignment accuracy is typically measured through simulated reads; however, the simulated location may not be the (only) location with the optimal alignment score. Vargas implements a heuristic-free algorithm guaranteed to find the highest-scoring alignment for real sequencing reads to a linear or graph genome. With semiglobal and local alignment modes and affine gap and quality-scaled mismatch penalties, it can implement the scoring functions of commonly used aligners to calculate optimal alignments. While this is computationally intensive, Vargas uses multi-core parallelization and vectorized (SIMD) instructions to make it practical to optimally align large numbers of reads, achieving a maximum speed of 456 billion cell updates per second. We demonstrate how these “gold standard” Vargas alignments can be used to improve heuristic alignment accuracy by optimizing command-line parameters in Bowtie 2, BWA-MEM, and vg to align more reads correctly. Source code implemented in C++ and compiled binary releases are available at https://github.com/langmead-lab/vargas under the MIT license.


2020 ◽  
Author(s):  
Xun Zhu ◽  
Ti-Cheng Chang ◽  
Richard Webby ◽  
Gang Wu

AbstractidCOV is a phylogenetic pipeline for quickly identifying the clades of SARS-CoV-2 virus isolates from raw sequencing data based on a selected clade-defining marker list. Using a public dataset, we show that idCOV can make equivalent calls as annotated by Nextstrain.org on all three common clade systems using user uploaded FastQ files directly. Web and equivalent command-line interfaces are available. It can be deployed on any Linux environment, including personal computer, HPC and the cloud. The source code is available at https://github.com/xz-stjude/idcov. A documentation for installation can be found at https://github.com/xz-stjude/idcov/blob/master/README.md.


Author(s):  
Michael Milton ◽  
Natalie Thorne

Abstract Summary aCLImatise is a utility for automatically generating tool definitions compatible with bioinformatics workflow languages, by parsing command-line help output. aCLImatise also has an associated database called the aCLImatise Base Camp, which provides thousands of pre-computed tool definitions. Availability and implementation The latest aCLImatise source code is available within a GitHub organisation, under the GPL-3.0 license: https://github.com/aCLImatise. In particular, documentation for the aCLImatise Python package is available at https://aclimatise.github.io/CliHelpParser/, and the aCLImatise Base Camp is available at https://aclimatise.github.io/BaseCamp/. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Fábio K Mendes ◽  
Dan Vanderpool ◽  
Ben Fulton ◽  
Matthew W Hahn

Abstract Motivation Genome sequencing projects have revealed frequent gains and losses of genes between species. Previous versions of our software, Computational Analysis of gene Family Evolution (CAFE), have allowed researchers to estimate parameters of gene gain and loss across a phylogenetic tree. However, the underlying model assumed that all gene families had the same rate of evolution, despite evidence suggesting a large amount of variation in rates among families. Results Here, we present CAFE 5, a completely re-written software package with numerous performance and user-interface enhancements over previous versions. These include improved support for multithreading, the explicit modeling of rate variation among families using gamma-distributed rate categories, and command-line arguments that preclude the use of accessory scripts. Availability and implementation CAFE 5 source code, documentation, test data and a detailed manual with examples are freely available at https://github.com/hahnlab/CAFE5/releases. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Aleksandra I Jarmolinska ◽  
Anna Gambin ◽  
Joanna I Sulkowska

Abstract Summary The biggest hurdle in studying topology in biopolymers is the steep learning curve for actually seeing the knots in structure visualization. Knot_pull is a command line utility designed to simplify this process—it presents the user with a smoothing trajectory for provided structures (any number and length of protein, RNA or chromatin chains in PDB, CIF or XYZ format), and calculates the knot type (including presence of any links, and slipknots when a subchain is specified). Availability and implementation Knot_pull works under Python >=2.7 and is system independent. Source code and documentation are available at http://github.com/dzarmola/knot_pull under GNU GPL license and include also a wrapper script for PyMOL for easier visualization. Examples of smoothing trajectories can be found at: https://www.youtube.com/watch?v=IzSGDfc1vAY. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document