cual-id: globally unique, correctable, and human-friendly sample identifiers for comparative -omics studies

10.7287/peerj.preprints.1431 ◽

2015 ◽

Author(s):

John H Chase ◽

Evan T Bolyen ◽

Jai Ram Rideout ◽

J Gregory Caporaso

Keyword(s):

High Throughput ◽

Project Teams ◽

Command Line ◽

Major Step ◽

Sample Tracking ◽

Sample Data ◽

Command Line Tool ◽

Scientific Results

The number of samples in high-throughput comparative “omics” studies is increasing rapidly due to the declining experimental costs. To keep sample data and metadata manageable, and ensure the integrity of scientific results as the scale of these projects continue to increase, it is essential that we transition to better designed sample identifiers. Ideally, sample identifiers will be: globally unique across projects, project teams and institutions; be short to facilitate manual transcription; be correctable with respect to common types of transcription errors; be opaque, meaning they do not contain information about the samples; and be compatible with existing standards. We present cual-id, a lightweight command line tool that creates, or mints, sample identifiers that meet these criteria without reliance on centralized infrastructure. cual-id allows users to assign Universally Unique Identifiers, or UUIDs, that are globally unique to their samples. UUIDs are too long to be conveniently written on sampling materials such as swabs or microcentrifuge tubes however, so cual-id additionally generates human-friendly 4-12 character identifiers (CualIDs) that map to their UUIDs and are unique within a project. CualIDs are used by humans when they are manually writing or entering identifiers, while the longer UUIDs are used by computers to unambiguously reference a sample. The adoption of identifiers that are globally unique, correctable, and easily hand-written or manually entered into a computer will be a major step forward for sample tracking in comparative -omics studies within and across projects and project teams.

Download Full-text

cual-id: Globally Unique, Correctable, and Human-Friendly Sample Identifiers for Comparative Omics Studies

mSystems ◽

10.1128/msystems.00010-15 ◽

2015 ◽

Vol 1 (1) ◽

Cited By ~ 2

Author(s):

John H. Chase ◽

Evan Bolyen ◽

Jai Ram Rideout ◽

J. Gregory Caporaso

Keyword(s):

Integrated Analysis ◽

Sample Collection ◽

Major Step ◽

Global Uniqueness ◽

Bar Codes ◽

Sample Tracking ◽

Sample Data ◽

Command Line Tool ◽

Sample Management ◽

Scientific Results

ABSTRACT The adoption of identifiers that are globally unique, correctable, and easily handwritten or manually entered into a computer will be a major step forward for sample tracking in comparative omics studies. As the fields transition to more-centralized sample management, for example, across labs within an institution, across projects funded under a common program, or in systems designed to facilitate meta- and/or integrated analysis, sample identifiers generated with cual-id will not need to change; thus, costly and error-prone updating of data and metadata identifiers will be avoided. Further, using cual-id will ensure that transcription errors in sample identifiers do not require the discarding of otherwise-useful samples that may have been expensive to obtain. Finally, cual-id is simple to install and use and is free for all use. No centralized infrastructure is required to ensure global uniqueness, so it is feasible for any lab to get started using these identifiers within their existing infrastructure. The number of samples in high-throughput comparative “omics” studies is increasing rapidly due to declining experimental costs. To keep sample data and metadata manageable and to ensure the integrity of scientific results as the scale of these projects continues to increase, it is essential that we transition to better-designed sample identifiers. Ideally, sample identifiers should be globally unique across projects, project teams, and institutions; short (to facilitate manual transcription); correctable with respect to common types of transcription errors; opaque, meaning that they do not contain information about the samples; and compatible with existing standards. We present cual-id, a lightweight command line tool that creates, or mints, sample identifiers that meet these criteria without reliance on centralized infrastructure. cual-id allows users to assign universally unique identifiers, or UUIDs, that are globally unique to their samples. UUIDs are too long to be conveniently written on sampling materials, such as swabs or microcentrifuge tubes, however, so cual-id additionally generates human-friendly 4- to 12-character identifiers that map to their UUIDs and are unique within a project. By convention, we use “cual-id” to refer to the software, “CualID” to refer to the short, human-friendly identifiers, and “UUID” to refer to the globally unique identifiers. CualIDs are used by humans when they manually write or enter identifiers, while the longer UUIDs are used by computers to unambiguously reference a sample. Finally, cual-id optionally generates printable label sticker sheets containing Code 128 bar codes and CualIDs for labeling of sample collection and processing materials. IMPORTANCE The adoption of identifiers that are globally unique, correctable, and easily handwritten or manually entered into a computer will be a major step forward for sample tracking in comparative omics studies. As the fields transition to more-centralized sample management, for example, across labs within an institution, across projects funded under a common program, or in systems designed to facilitate meta- and/or integrated analysis, sample identifiers generated with cual-id will not need to change; thus, costly and error-prone updating of data and metadata identifiers will be avoided. Further, using cual-id will ensure that transcription errors in sample identifiers do not require the discarding of otherwise-useful samples that may have been expensive to obtain. Finally, cual-id is simple to install and use and is free for all use. No centralized infrastructure is required to ensure global uniqueness, so it is feasible for any lab to get started using these identifiers within their existing infrastructure.

Download Full-text

FAN-C: A Feature-rich Framework for the Analysis and Visualisation of C data

10.1101/2020.02.03.932517 ◽

2020 ◽

Cited By ~ 6

Author(s):

Kai Kruse ◽

Clemens B. Hug ◽

Juan M. Vaquerizas

Keyword(s):

High Throughput ◽

Matrix Analysis ◽

Set Covering ◽

Command Line ◽

Chromosome Conformation ◽

C Storage ◽

Data Formats ◽

Analysis Tools ◽

Command Line Tool ◽

Broad Feature

Chromosome conformation capture data, particularly from high-throughput approaches such as Hi-C and its derivatives, are typically very complex to analyse. Existing analysis tools are often single-purpose, or limited in compatibility to a small number of data formats, frequently making Hi-C analyses tedious and time-consuming. Here, we present FAN-C, an easy-to-use command-line tool and powerful Python API with a broad feature set covering matrix generation, analysis, and visualisation for C-like data (https://github.com/vaquerizaslab/fanc). Due to its comprehensiveness and compatibility with the most prevalent Hi-C storage formats, FAN-C can be used in combination with a large number of existing analysis tools, thus greatly simplifying Hi-C matrix analysis.

Download Full-text

Alview: Portable Software for Viewing Sequence Reads in BAM Formatted Files

Cancer Informatics ◽

10.4137/cin.s26470 ◽

2015 ◽

Vol 14 ◽

pp. CIN.S26470 ◽

Cited By ~ 2

Author(s):

Richard P. Finney ◽

Qing-Rong Chen ◽

Cu V. Nguyen ◽

Chih Hao Hsu ◽

Chunhua Yan ◽

...

Keyword(s):

Graphical User Interface ◽

Reference Genome ◽

Source Code ◽

Software Tool ◽

Command Line ◽

Sequencing Data ◽

Genome Data ◽

Command Line Tool ◽

Portable Software ◽

Microsoft Windows

The name Alview is a contraction of the term Alignment Viewer. Alview is a compiled to native architecture software tool for visualizing the alignment of sequencing data. Inputs are files of short-read sequences aligned to a reference genome in the SAM/BAM format and files containing reference genome data. Outputs are visualizations of these aligned short reads. Alview is written in portable C with optional graphical user interface (GUI) code written in C, C++, and Objective-C. The application can run in three different ways: as a web server, as a command line tool, or as a native, GUI program. Alview is compatible with Microsoft Windows, Linux, and Apple OS X. It is available as a web demo at https://cgwb.nci.nih.gov/cgi-bin/alview . The source code and Windows/Mac/Linux executables are available via https://github.com/NCIP/alview .

Download Full-text

ScaffoldGraph: an open-source library for the generation and analysis of molecular scaffold networks and scaffold trees

Bioinformatics ◽

10.1093/bioinformatics/btaa219 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3930-3931 ◽

Cited By ~ 1

Author(s):

Oliver B Scott ◽

A W Edith Chan

Keyword(s):

Open Source ◽

High Throughput Screening ◽

Chemical Space ◽

Diversity Analysis ◽

Graph Analysis ◽

Command Line ◽

Molecular Scaffold ◽

Large Sets ◽

Command Line Tool ◽

Scaffold Diversity

Abstract Summary ScaffoldGraph (SG) is an open-source Python library and command-line tool for the generation and analysis of molecular scaffold networks and trees, with the capability of processing large sets of input molecules. With the increase in high-throughput screening data, scaffold graphs have proven useful for the navigation and analysis of chemical space, being used for visualization, clustering, scaffold-diversity analysis and active-series identification. Built on RDKit and NetworkX, SG integrates scaffold graph analysis into the growing scientific/cheminformatics Python stack, increasing the flexibility and extendibility of the tool compared to existing software. Availability and implementation SG is freely available and released under the MIT licence at https://github.com/UCLCheminformatics/ScaffoldGraph.

Download Full-text

Genesis and Gappa: processing, analyzing and visualizing phylogenetic (placement) data

Bioinformatics ◽

10.1093/bioinformatics/btaa070 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3263-3265 ◽

Cited By ~ 14

Author(s):

Lucas Czech ◽

Pierre Barbera ◽

Alexandros Stamatakis

Keyword(s):

Phylogenetic Trees ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Computationally Efficient ◽

Data Types ◽

Low Level ◽

Phylogenetic Placement ◽

Command Line Tool ◽

High Level

Abstract Summary We present genesis, a library for working with phylogenetic data, and gappa, an accompanying command-line tool for conducting typical analyses on such data. The tools target phylogenetic trees and phylogenetic placements, sequences, taxonomies and other relevant data types, offer high-level simplicity as well as low-level customizability, and are computationally efficient, well-tested and field-proven. Availability and implementation Both genesis and gappa are written in modern C++11, and are freely available under GPLv3 at http://github.com/lczech/genesis and http://github.com/lczech/gappa. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Spliceogen: an integrative, scalable tool for the discovery of splice-altering variants

Bioinformatics ◽

10.1093/bioinformatics/btz263 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4405-4407 ◽

Cited By ~ 1

Author(s):

Steven Monger ◽

Michael Troup ◽

Eddie Ip ◽

Sally L Dunwoodie ◽

Eleni Giannoulatou

Keyword(s):

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

In Silico Prediction ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Prediction Tools ◽

Motif Prediction ◽

Command Line Tool ◽

Genome Scale

Abstract Motivation In silico prediction tools are essential for identifying variants which create or disrupt cis-splicing motifs. However, there are limited options for genome-scale discovery of splice-altering variants. Results We have developed Spliceogen, a highly scalable pipeline integrating predictions from some of the individually best performing models for splice motif prediction: MaxEntScan, GeneSplicer, ESRseq and Branchpointer. Availability and implementation Spliceogen is available as a command line tool which accepts VCF/BED inputs and handles both single nucleotide variants (SNVs) and indels (https://github.com/VCCRI/Spliceogen). SNV databases with prediction scores are also available, covering all possible SNVs at all genomic positions within all Gencode-annotated multi-exon transcripts. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

FAN-C: a feature-rich framework for the analysis and visualisation of chromosome conformation capture data

Genome Biology ◽

10.1186/s13059-020-02215-9 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Kai Kruse ◽

Clemens B. Hug ◽

Juan M. Vaquerizas

Keyword(s):

Chromosome Conformation Capture ◽

Matrix Analysis ◽

Set Covering ◽

Command Line ◽

Chromosome Conformation ◽

C Storage ◽

Data Formats ◽

Analysis Tools ◽

Command Line Tool ◽

Broad Feature

AbstractChromosome conformation capture data, particularly from high-throughput approaches such as Hi-C, are typically very complex to analyse. Existing analysis tools are often single-purpose, or limited in compatibility to a small number of data formats, frequently making Hi-C analyses tedious and time-consuming. Here, we present FAN-C, an easy-to-use command-line tool and powerful Python API with a broad feature set covering matrix generation, analysis, and visualisation for C-like data (https://github.com/vaquerizaslab/fanc). Due to its compatibility with the most prevalent Hi-C storage formats, FAN-C can be used in combination with a large number of existing analysis tools, thus greatly simplifying Hi-C matrix analysis.

Download Full-text

Visualization of circular RNAs and their internal splicing events from transcriptomic data

Bioinformatics ◽

10.1093/bioinformatics/btaa033 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2934-2935 ◽

Cited By ~ 1

Author(s):

Yi Zheng ◽

Fangqing Zhao

Keyword(s):

Supplementary Information ◽

Circular Rnas ◽

Visualization Tool ◽

Command Line ◽

Supplementary Data ◽

Transcriptomic Data ◽

Command Line Tool ◽

Transcriptome Comparison ◽

Multiple Samples ◽

Splicing Patterns

Abstract Summary Circular RNAs (circRNAs) are proved to have unique compositions and splicing events distinct from canonical mRNAs. However, there is no visualization tool designed for the exploration of complex splicing patterns in circRNA transcriptomes. Here, we present CIRI-vis, a Java command-line tool for quantifying and visualizing circRNAs by integrating the alignments and junctions of circular transcripts. CIRI-vis can be applied to visualize the internal structure and isoform abundance of circRNAs and perform circRNA transcriptome comparison across multiple samples. Availability and implementation https://sourceforge.net/projects/ciri/files/CIRI-vis. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text