Verification of Arabidopsis stock collections using SNPmatch - an algorithm for genotyping high-plexed samples

AbstractLarge-scale studies such as the Arabidopsis thaliana 1001 Genomes Project aim to understand genetic variation in populations and link it to phenotypic variation. Such studies require routine genotyping of stocks to avoid sample contamination and mix-ups. To genotype samples efficiently and economically, sequencing must be inexpensive and data processing simple. Here we present SNPmatch, a tool which identifies the most likely strain (inbred line, or “accession”) from a SNP database. We tested the tool by performing low-coverage sequencing of over 2000 strains. SNPmatch could readily genotype samples correctly from 1-fold coverage sequencing data, and could also identify the parents of F1 or F2 individuals. SNPmatch can be run either on the command line or through AraGeno (https://arageno.gmi.oeaw.ac.at), a web interface that permits sample genotyping from a user-uploaded VCF or BED file.Availability and implementation: https://github.com/Gregor-Mendel-Institute/SNPmatch.git

Download Full-text

idCOV: a pipeline for quick clade identification of SARS-CoV-2 isolates

10.1101/2020.10.08.330456 ◽

2020 ◽

Author(s):

Xun Zhu ◽

Ti-Cheng Chang ◽

Richard Webby ◽

Gang Wu

Keyword(s):

Personal Computer ◽

Source Code ◽

Command Line ◽

Sequencing Data ◽

Link Type ◽

Public Dataset ◽

Virus Isolates

AbstractidCOV is a phylogenetic pipeline for quickly identifying the clades of SARS-CoV-2 virus isolates from raw sequencing data based on a selected clade-defining marker list. Using a public dataset, we show that idCOV can make equivalent calls as annotated by Nextstrain.org on all three common clade systems using user uploaded FastQ files directly. Web and equivalent command-line interfaces are available. It can be deployed on any Linux environment, including personal computer, HPC and the cloud. The source code is available at https://github.com/xz-stjude/idcov. A documentation for installation can be found at https://github.com/xz-stjude/idcov/blob/master/README.md.

Download Full-text

PinAPL-Py: A comprehensive web-application for the analysis of CRISPR/Cas9 screens

10.1101/147462 ◽

2017 ◽

Author(s):

Philipp N. Spahn ◽

Tyler Bath ◽

Ryan J. Weiss ◽

Jihoon Kim ◽

Jeffrey D. Esko ◽

...

Keyword(s):

Web Application ◽

Large Scale ◽

Sequencing Data ◽

Bioinformatic Tools ◽

Link Type ◽

Screening Experiments ◽

Independent Analysis ◽

Wide Range ◽

Set Up ◽

Sequence Quality

AbstractBackgroundLarge-scale genetic screens using CRISPR/Cas9 technology have emerged as a major tool for functional genomics. With its increased popularity, experimental biologists frequently acquire large sequencing datasets for which they often do not have an easy analysis option. While a few bioinformatic tools have been developed for this purpose, their utility is still hindered either due to limited functionality or the requirement of bioinformatic expertise.ResultsTo make sequencing data analysis of CRISPR/Cas9 screens more accessible to a wide range of scientists, we developed a Platform-independent Analysis of Pooled Screens using Python (PinAPL-Py), which is operated as an intuitive web-service. PinAPL-Py implements state-of-the-art tools and statistical models, assembled in a comprehensive workflow covering sequence quality control, automated sgRNA sequence extraction, alignment, sgRNA enrichment/depletion analysis and gene ranking. The workflow is set up to use a variety of popular sgRNA libraries as well as custom libraries that can be easily uploaded. Various analysis options are offered, suitable to analyze a large variety of CRISPR/Cas9 screening experiments. Analysis output includes ranked lists of sgRNAs and genes, and publication-ready plots.ConclusionsPinAPL-Py helps to advance genome-wide screening efforts by combining comprehensive functionality with user-friendly implementation. PinAPL-Py is freely accessible at http://pinapl-py.ucsd.edu with instructions, documentation and test datasets. The source code is available at https://github.com/LewisLabUCSD/PinAPL-Py

Download Full-text

fluff: exploratory analysis and visualization of high-throughput sequencing data

PeerJ ◽

10.7717/peerj.2209 ◽

2016 ◽

Vol 4 ◽

pp. e2209 ◽

Cited By ~ 28

Author(s):

Georgios Georgiou ◽

Simon J. van Heeringen

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Developmental Stages ◽

Command Line ◽

Clustering Methods ◽

Sequencing Data ◽

Link Type ◽

High Throughput Sequencing Data ◽

Genome Wide ◽

Genome Wide Data

Summary.In this article we describe fluff, a software package that allows for simple exploration, clustering and visualization of high-throughput sequencing data mapped to a reference genome. The package contains three command-line tools to generate publication-quality figures in an uncomplicated manner using sensible defaults. Genome-wide data can be aggregated, clustered and visualized in a heatmap, according to different clustering methods. This includes a predefined setting to identify dynamic clusters between different conditions or developmental stages. Alternatively, clustered data can be visualized in a bandplot. Finally, fluff includes a tool to generate genomic profiles. As command-line tools, the fluff programs can easily be integrated into standard analysis pipelines. The installation is straightforward and documentation is available athttp://fluff.readthedocs.org.Availability.fluff is implemented in Python and runs on Linux. The source code is freely available for download athttps://github.com/simonvh/fluff.

Download Full-text

animalcules: interactive microbiome analytics and visualization in R

Microbiome ◽

10.1186/s40168-021-01013-0 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Yue Zhao ◽

Anthony Federico ◽

Tyler Faits ◽

Solaiappan Manimaran ◽

Daniel Segrè ◽

...

Keyword(s):

16S Rrna ◽

Microbial Communities ◽

R Package ◽

Command Line ◽

Data Generation ◽

Sequencing Data ◽

Shotgun Metagenomics ◽

Microbiome Analysis ◽

Link Type ◽

R Shiny

Abstract Background Microbial communities that live in and on the human body play a vital role in health and disease. Recent advances in sequencing technologies have enabled the study of microbial communities at unprecedented resolution. However, these advances in data generation have presented novel challenges to researchers attempting to analyze and visualize these data. Results To address some of these challenges, we have developed animalcules, an easy-to-use interactive microbiome analysis toolkit for 16S rRNA sequencing data, shotgun DNA metagenomics data, and RNA-based metatranscriptomics profiling data. This toolkit combines novel and existing analytics, visualization methods, and machine learning models. For example, the toolkit features traditional microbiome analyses such as alpha/beta diversity and differential abundance analysis, combined with new methods for biomarker identification are. In addition, animalcules provides interactive and dynamic figures that enable users to understand their data and discover new insights. animalcules can be used as a standalone command-line R package or users can explore their data with the accompanying interactive R Shiny interface. Conclusions We present animalcules, an R package for interactive microbiome analysis through either an interactive interface facilitated by R Shiny or various command-line functions. It is the first microbiome analysis toolkit that supports the analysis of all 16S rRNA, DNA-based shotgun metagenomics, and RNA-sequencing based metatranscriptomics datasets. animalcules can be freely downloaded from GitHub at https://github.com/compbiomed/animalcules or installed through Bioconductor at https://www.bioconductor.org/packages/release/bioc/html/animalcules.html.

Download Full-text

PhenoModifier: a genetic modifier database for elucidating the genetic basis of human phenotypic variation

Nucleic Acids Research ◽

10.1093/nar/gkz930 ◽

2019 ◽

Cited By ~ 1

Author(s):

Hong Sun ◽

Yangfan Guo ◽

Xiaoping Lan ◽

Jia Jia ◽

Xiaoshu Cai ◽

...

Keyword(s):

Phenotypic Variation ◽

Large Scale ◽

Clinical Decision Making ◽

Genetic Interaction ◽

Genetic Modifier ◽

Modifier Genes ◽

Genetic Modifiers ◽

Sequencing Data ◽

Comprehensive Overview ◽

Full Spectrum

Abstract From clinical observations to large-scale sequencing studies, the phenotypic impact of genetic modifiers is evident. To better understand the full spectrum of the genetic contribution to human disease, concerted efforts are needed to construct a useful modifier resource for interpreting the information from sequencing data. Here, we present the PhenoModifier (https://www.biosino.org/PhenoModifier), a manually curated database that provides a comprehensive overview of human genetic modifiers. By manually curating over ten thousand published articles, 3078 records of modifier information were entered into the current version of PhenoModifier, related to 288 different disorders, 2126 genetic modifier variants and 843 distinct modifier genes. To help users probe further into the mechanism of their interested modifier genes, we extended the yeast genetic interaction data and yeast quantitative trait loci to the human and we also integrated GWAS data into the PhenoModifier to assist users in evaluating all possible phenotypes associated with a modifier allele. As the first comprehensive resource of human genetic modifiers, PhenoModifier provides a more complete spectrum of genetic factors contributing to human phenotypic variation. The portal has a broad scientific and clinical scope, spanning activities relevant to variant interpretation for research purposes as well as clinical decision making.

Download Full-text

ROBUSTNESS OF METABOLIC MAP RECONSTRUCTION

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972000400079x ◽

2004 ◽

Vol 02 (03) ◽

pp. 589-593

Author(s):

DAG G. AHREN ◽

CHRISTOS A. OUZOUNIS

Keyword(s):

Large Scale ◽

Predictive Power ◽

Partial Information ◽

Short Note ◽

Genomic Data ◽

Metabolic Reconstruction ◽

Sequencing Data ◽

Biochemical Pathways ◽

Low Coverage ◽

Genome Projects

With the ever increasing amount of genomic data available, the interest for generating biochemical pathways has grown tremendously. So far, mainly complete genomes have been used to reconstruct the biochemical pathways and their associated interactions. However, a large number of low coverage genomes, as well as other sources of partial genomic data, are currently available for many organisms. In order to be able to use incomplete data for metabolic reconstruction, the inherent properties of this procedure need to be investigated. In this short note, we describe the robustness and predictive power of metabolic reconstructions using partial information from Schizosaccharomyces pombe. We also discuss the implications of the results on reference genome projects as well as other large-scale sequencing data.

Download Full-text

Reveel: large-scale population genotyping using low-coverage sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btv530 ◽

2015 ◽

Vol 32 (11) ◽

pp. 1686-1696 ◽

Cited By ~ 4

Author(s):

Lin Huang ◽

Bo Wang ◽

Ruitang Chen ◽

Sivan Bercovici ◽

Serafim Batzoglou

Keyword(s):

Large Scale ◽

Sequencing Data ◽

Scale Population ◽

Low Coverage

Download Full-text

NanoPack: visualizing and processing long read sequencing data

10.1101/237180 ◽

2017 ◽

Cited By ~ 2

Author(s):

Wouter De Coster ◽

Svenn D’Hert ◽

Darrin T. Schultz ◽

Marc Cruts ◽

Christine Van Broeckhoven

Keyword(s):

Web Service ◽

Graphical User Interface ◽

Source Code ◽

Supplementary Information ◽

Command Line ◽

Sequencing Data ◽

Link Type ◽

Oxford Nanopore ◽

Long Read ◽

Oxford Nanopore Technologies

AbstractSummary: Here we describe NanoPack, a set of tools developed for visualization and processing of long read sequencing data from Oxford Nanopore Technologies and Pacific Biosciences.Availability and Implementation: The NanoPack tools are written in Python3 and released under the GNU GPL3.0 Licence. The source code can be found at https://github.com/wdecoster/nanopack, together with links to separate scripts and their documentation. The scripts are compatible with Linux, Mac OS and the MS Windows 10 subsystem for linux and are available as a graphical user interface, a web service at http://nanoplot.bioinf.be and command line tools.Contact:[email protected] information: Supplementary tables and figures are available at Bioinformatics online.

Download Full-text

The Lair: A resource for exploratory analysis of published RNA-Seq data

10.1101/056200 ◽

2016 ◽

Author(s):

Harold Pimentel ◽

Pascal Sturmfels ◽

Nicolas Bray ◽

Páll Melsted ◽

Lior Pachter

Keyword(s):

Large Scale ◽

Exploratory Analysis ◽

Technical Expertise ◽

Rna Seq ◽

Sequencing Data ◽

Short Read ◽

Link Type ◽

Short Read Archive ◽

Published Research

AbstractIncreased emphasis on reproducibility of published research in the last few years has led to the large-scale archiving of sequencing data. While this data can, in theory, be used to reproduce results in papers, it is typically not easily usable in practice. We introduce a series of tools for processing and analyzing RNA-Seq data in the Short Read Archive, that together have allowed us to build an easily extendable resource for analysis of data underlying published papers. Our system makes the exploration of data easily accessible and usable without technical expertise. Our database and associated tools can be accessed at The Lair: http://pachterlab.github.io/lair

Download Full-text

MetaSRA: normalized sample-specific metadata for the Sequence Read Archive

10.1101/090506 ◽

2016 ◽

Cited By ~ 3

Author(s):

Matthew N. Bernstein ◽

AnHai Doan ◽

Colin N. Dewey

Keyword(s):

Large Scale ◽

Cell Types ◽

Sample Type ◽

Computational Pipeline ◽

Sources Of Information ◽

Sequencing Data ◽

Encode Project ◽

Link Type ◽

Sequence Read Archive ◽

Biological Insight

AbstractMotivationThe NCBI’s Sequence Read Archive (SRA) promises great biological insight if one could analyze the data in the aggregate; however, the data remain largely underutilized, in part, due to the poor structure of the metadata associated with each sample. The rules governing submissions to the SRA do not dictate a standardized set of terms that should be used to describe the biological samples from which the sequencing data are derived. As a result, the metadata include many synonyms, spelling variants, and references to outside sources of information. Furthermore, manual annotation of the data remains intractable due to the large number of samples in the archive. For these reasons, it has been difficult to perform large-scale analyses that study the relationships between biomolecular processes and phenotype across diverse diseases, tissues, and cell types present in the SRA.ResultsWe present MetaSRA, a database of normalized SRA sample-specific metadata following a schema inspired by the metadata organization of the ENCODE project. This schema involves mapping samples to terms in biomedical ontologies, labeling each sample with a sample-type category, and extracting real-valued properties. We automated these tasks via a novel computational pipeline.AvailabilityThe MetaSRA database is available at http://deweylab.biostat.wisc.edu/metasra. Software implementing our computational pipeline is available at https://github.com/deweylab/[email protected]

Download Full-text