ANGSD-wrapper: utilities for analyzing next generation sequencing data

10.7287/peerj.preprints.1472 ◽

2016 ◽

Author(s):

Arun Durvasula ◽

Paul J Hoffman ◽

Tyler V Kent ◽

Chaochih Liu ◽

Thomas J Y Kono ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Molecular Ecology ◽

Principal Component ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Genome Data ◽

High Throughput Sequencing Data ◽

Genome Wide ◽

User Friendly

High throughput sequencing has changed many aspects of population genetics, molecular ecology, and related fields, affecting both experimental design and data analysis. The software package ANGSD allows users to perform a number of population genetic analyses on high-throughput sequencing data. ANGSD uses probabilistic approaches to calculate genome-wide descriptive statistics. The package makes use of genotype likelihood estimates rather than SNP calls and is specifically designed to produce more accurate results for samples with low sequencing depth. ANGSD makes use of full genome data while handling a wide array of sampling and experimental designs. Here we present ANGSD-wrapper, a set of wrapper scripts that provide a user-friendly interface for running ANGSD and visualizing results. ANGSD-wrapper supports multiple types of analyses including esti- mates of nucleotide sequence diversity and performing neutrality tests, principal component analysis, estimation of admixture proportions for individuals samples, and calculation of statistics that quantify recent introgression. ANGSD-wrapper also provides interactive graphing of ANGSD results to enhance data exploration. We demonstrate the usefulness of ANGSD-wrapper by analyzing resequencing data from populations of wild and domesticated Zea. ANGSD-wrapper is freely available from https://github.com/mojaveazure/angsd-wrapper.

Download Full-text

ANGSD-wrapper

10.7287/peerj.preprints.1472v1 ◽

2015 ◽

Author(s):

Arun Durvasula ◽

Tyler V Kent ◽

Paul J Hoffman ◽

Chaochih Liu ◽

Thomas J Y Kono ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Molecular Ecology ◽

Data Exploration ◽

Full Genome ◽

Sequencing Data ◽

Genome Data ◽

High Throughput Sequencing Data ◽

User Friendly ◽

Population Genetic Analyses

High throughput sequencing has changed many aspects of population genetics, molecular ecology, and related fields, affecting both experimental design and data analysis. The software package ANGSD allows users to perform a number of population genetic analyses on high-throughput sequencing data. The package is specifically designed to produce more accurate results for samples with low sequencing depth, but it handles a wide array of sampling and experimental designs and makes use of full genome data. Here we present ANGSD-wrapper, a user-friendly interface to ANGSD. ANGSD-wrapper includes a number of ’wrapper’ scripts that facilitate configuration and execution of multi-step analyses. ANGSD- wrapper also provides interactive graphing of ANGSD results, thus enhancing data exploration. We demonstrate the usefulness of ANGSD-wrapper by analyzing resequencing data from populations of wild and domesticated Oryza. ANGSD-wrapper is freely available from https://github.com/ arundurvasula/angsd- wrapper.

Download Full-text

SEED 2: a user-friendly platform for amplicon high-throughput sequencing data analyses

Bioinformatics ◽

10.1093/bioinformatics/bty071 ◽

2018 ◽

Vol 34 (13) ◽

pp. 2292-2294 ◽

Cited By ~ 59

Author(s):

Tomáš Větrovský ◽

Petr Baldrian ◽

Daniel Morais

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Data Analyses ◽

High Throughput Sequencing Data ◽

User Friendly

Download Full-text

Genome-Wide Estimation of Linkage Disequilibrium from Population-Level High-Throughput Sequencing Data

Genetics ◽

10.1534/genetics.114.165514 ◽

2014 ◽

Vol 197 (4) ◽

pp. 1303-1313 ◽

Cited By ~ 16

Author(s):

Takahiro Maruki ◽

Michael Lynch

Keyword(s):

Linkage Disequilibrium ◽

High Throughput ◽

High Throughput Sequencing ◽

Population Level ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Genome Wide

Download Full-text

PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets

Cancer Informatics ◽

10.4137/cin.s13890 ◽

2014 ◽

Vol 13s1 ◽

pp. CIN.S13890 ◽

Cited By ~ 1

Author(s):

Changjin Hong ◽

Solaiappan Manimaran ◽

William Evan Johnson

Keyword(s):

Quality Control ◽

High Throughput ◽

High Performance ◽

High Throughput Sequencing ◽

Next Generation Sequencing Data ◽

Data Sets ◽

Sequencing Data ◽

Computationally Efficient ◽

High Throughput Sequencing Data ◽

Downstream Analysis

Quality control and read preprocessing are critical steps in the analysis of data sets generated from high-throughput genomic screens. In the most extreme cases, improper preprocessing can negatively affect downstream analyses and may lead to incorrect biological conclusions. Here, we present PathoQC, a streamlined toolkit that seamlessly combines the benefits of several popular quality control software approaches for preprocessing next-generation sequencing data. PathoQC provides a variety of quality control options appropriate for most high-throughput sequencing applications. PathoQC is primarily developed as a module in the PathoScope software suite for metagenomic analysis. However, PathoQC is also available as an open-source Python module that can run as a stand-alone application or can be easily integrated into any bioinformatics workflow. PathoQC achieves high performance by supporting parallel computation and is an effective tool that removes technical sequencing artifacts and facilitates robust downstream analysis. The PathoQC software package is available at http://sourceforge.net/projects/PathoScope/ .

Download Full-text

fluff: exploratory analysis and visualization of high-throughput sequencing data

PeerJ ◽

10.7717/peerj.2209 ◽

2016 ◽

Vol 4 ◽

pp. e2209 ◽

Cited By ~ 28

Author(s):

Georgios Georgiou ◽

Simon J. van Heeringen

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Developmental Stages ◽

Command Line ◽

Clustering Methods ◽

Sequencing Data ◽

Link Type ◽

High Throughput Sequencing Data ◽

Genome Wide ◽

Genome Wide Data

Summary.In this article we describe fluff, a software package that allows for simple exploration, clustering and visualization of high-throughput sequencing data mapped to a reference genome. The package contains three command-line tools to generate publication-quality figures in an uncomplicated manner using sensible defaults. Genome-wide data can be aggregated, clustered and visualized in a heatmap, according to different clustering methods. This includes a predefined setting to identify dynamic clusters between different conditions or developmental stages. Alternatively, clustered data can be visualized in a bandplot. Finally, fluff includes a tool to generate genomic profiles. As command-line tools, the fluff programs can easily be integrated into standard analysis pipelines. The installation is straightforward and documentation is available athttp://fluff.readthedocs.org.Availability.fluff is implemented in Python and runs on Linux. The source code is freely available for download athttps://github.com/simonvh/fluff.

Download Full-text

ORFik: a comprehensive R toolkit for the analysis of translation

BMC Bioinformatics ◽

10.1186/s12859-021-04254-w ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Håkon Tjeldnes ◽

Kornel Labun ◽

Yamila Torres Cleuren ◽

Katarzyna Chyżyńska ◽

Michał Świrski ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Sequencing Data ◽

Protein Coding ◽

High Throughput Sequencing Data ◽

Upstream Open Reading Frames ◽

Many Core ◽

User Friendly

Abstract Background With the rapid growth in the use of high-throughput methods for characterizing translation and the continued expansion of multi-omics, there is a need for back-end functions and streamlined tools for processing, analyzing, and characterizing data produced by these assays. Results Here, we introduce ORFik, a user-friendly R/Bioconductor API and toolbox for studying translation and its regulation. It extends GenomicRanges from the genome to the transcriptome and implements a framework that integrates data from several sources. ORFik streamlines the steps to process, analyze, and visualize the different steps of translation with a particular focus on initiation and elongation. It accepts high-throughput sequencing data from ribosome profiling to quantify ribosome elongation or RCP-seq/TCP-seq to also quantify ribosome scanning. In addition, ORFik can use CAGE data to accurately determine 5′UTRs and RNA-seq for determining translation relative to RNA abundance. ORFik supports and calculates over 30 different translation-related features and metrics from the literature and can annotate translated regions such as proteins or upstream open reading frames (uORFs). As a use-case, we demonstrate using ORFik to rapidly annotate the dynamics of 5′ UTRs across different tissues, detect their uORFs, and characterize their scanning and translation in the downstream protein-coding regions. Conclusion In summary, ORFik introduces hundreds of tested, documented and optimized methods. ORFik is designed to be easily customizable, enabling users to create complete workflows from raw data to publication-ready figures for several types of sequencing data. Finally, by improving speed and scope of many core Bioconductor functions, ORFik offers enhancement benefiting the entire Bioconductor environment. Availability http://bioconductor.org/packages/ORFik.

Download Full-text

CloudDOE: A User-Friendly Tool for Deploying Hadoop Clouds and Analyzing High-Throughput Sequencing Data with MapReduce

PLoS ONE ◽

10.1371/journal.pone.0098146 ◽

2014 ◽

Vol 9 (6) ◽

pp. e98146 ◽

Cited By ~ 15

Author(s):

Wei-Chun Chung ◽

Chien-Chih Chen ◽

Jan-Ming Ho ◽

Chung-Yen Lin ◽

Wen-Lian Hsu ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

User Friendly

Download Full-text

ORFik: a comprehensive R toolkit for the analysis of translation

10.1101/2021.01.16.426936 ◽

2021 ◽

Author(s):

Håkon Tjeldnes ◽

Kornel Labun ◽

Yamila Torres Cleuren ◽

Katarzyna Chyżyńska ◽

Michał Świrski ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Ribosome Profiling ◽

Open Reading Frames ◽

Sequencing Data ◽

Protein Coding ◽

Coding Regions ◽

High Throughput Sequencing Data ◽

Upstream Open Reading Frames ◽

User Friendly

ABSTRACT•BackgroundWith the rapid growth in the use of high-throughput methods for characterizing translation and the continued expansion of multi-omics, there is a need for back-end functions and streamlined tools for processing, analyzing, and characterizing data produced by these assays.•ResultsHere, we introduce ORFik, a user-friendly R/Bioconductor toolbox for studying translation and its regulation. It extends GenomicRanges from the genome to the transcriptome and implements a framework that integrates data from several sources. ORFik streamlines the steps to process, analyze, and visualize the different steps of translation with a particular focus on initiation and elongation. It accepts high-throughput sequencing data from ribosome profiling to quantify ribosome elongation or RCP-seq/TCP-seq to also quantify ribosome scanning. In addition, ORFik can use CAGE data to accurately determine 5’UTRs and RNA-seq for determining translation relative to RNA abundance. ORFik supports and calculates over 30 different translation-related features and metrics from the literature and can annotate translated regions such as proteins or upstream open reading frames. As a use-case, we demonstrate using ORFik to rapidly annotate the dynamics of 5’ UTRs across different tissues, detect their uORFs, and characterize their scanning and translation in the downstream protein-coding regions.•Availabilityhttp://bioconductor.org/packages/ORFik

Download Full-text

QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing

F1000Research ◽

10.12688/f1000research.22954.1 ◽

2020 ◽

Vol 9 ◽

pp. 240

Author(s):

Frédéric Jarlier ◽

Nicolas Joly ◽

Nicolas Fedy ◽

Thomas Magalhaes ◽

Leonor Sirotti ◽

...

Keyword(s):

High Throughput ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

High Throughput Sequencing ◽

Genome Structure ◽

Sequencing Data ◽

Genome Data ◽

High Throughput Sequencing Data ◽

Time To Delivery

Life science has entered the so-called ’big data era’ where biologists, clinicians and bioinformaticians are overwhelmed with unprecedented amount of data. High-throughput sequencing has revolutionized genomics and offers new insights to decipher the genome structure. However, using these data for daily clinical practice care and diagnosis purposes is challenging as the data are bigger and bigger. Therefore, we implemented software using Message Passing Interface such that the alignment and sorting of sequencing reads can easily scale on high-performance computing architecture. Our implementation makes it possible to reduce the time to delivery to few minutes, even on large whole-genome data using several hundreds of cores.

Download Full-text