RNA modification mapping with JACUSA2

Mapping Intimacies ◽

10.1101/2021.07.02.450888 ◽

2021 ◽

Author(s):

Michael Piechotta ◽

Qi Wang ◽

Janine Altmueller ◽

Christoph Dieterich

Keyword(s):

High Throughput ◽

Comprehensive Analysis ◽

Hek293 Cells ◽

Rna Modification ◽

Published Data ◽

Data Sets ◽

Analysis Framework ◽

Sequencing Data

A whole series of high-throughput antibody-free methods for RNA modification detection from sequencing data emerged lately. We present JACUSA2 as a versatile software solution and comprehensive analysis framework for RNA modification detection assays that are based on either the Illumina or Nanopore platform. Importantly, JACUSA2 can integrate information from multiple experiments (e.g. replicates and different conditions) and different library types (e.g. first- or secondstrand libraries). We demonstrate its utility by example, showing three analysis workflows for m6A detection on published data sets: 1) MazF m6a-sensitive RNA digestion (FTO+ vs FTO-), 2) DART-seq (YTHwt vs YTHmut) and 3) Nanopore profiling (METTL3 +/+ vs -/-). All assays have been conducted in HEK293 cells and complement one another.

Download Full-text

Great differences in performance and outcome of high-throughput sequencing data analysis platforms for fungal metabarcoding

MycoKeys ◽

10.3897/mycokeys.39.28109 ◽

2018 ◽

Vol 39 ◽

pp. 29-40 ◽

Cited By ~ 21

Author(s):

Sten Anslan ◽

R. Henrik Nilsson ◽

Christian Wurzbacher ◽

Petr Baldrian ◽

Leho Tedersoo ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Computation Time ◽

Potential Effect ◽

Data Sets ◽

Sequencing Data ◽

Operational Taxonomic Units ◽

High Throughput Sequencing Data ◽

Recent Developments

Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appears to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon dataset. We conclude that the output of each platform requires manual validation of the OTUs by examining the taxonomy assignment values.

Download Full-text

Evaluation of Subsampling-Based Normalization Strategies for Tagged High-Throughput Sequencing Data Sets from Gut Microbiomes

Applied and Environmental Microbiology ◽

10.1128/aem.05491-11 ◽

2011 ◽

Vol 77 (24) ◽

pp. 8795-8798 ◽

Cited By ~ 70

Author(s):

Daniel Aguirre de Cárcer ◽

Stuart E. Denman ◽

Chris McSweeney ◽

Mark Morrison

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Data Sets ◽

Sequencing Data ◽

Β Diversity ◽

High Throughput Sequencing Data ◽

Minimum Number ◽

Diversity Metrics

ABSTRACTSeveral subsampling-based normalization strategies were applied to different high-throughput sequencing data sets originating from human and murine gut environments. Their effects on the data sets' characteristics and normalization efficiencies, as measured by several β-diversity metrics, were compared. For both data sets, subsampling to the median rather than the minimum number appeared to improve the analysis.

Download Full-text

COPAR: A ChIP-Seq Optimal Peak Analyzer

BioMed Research International ◽

10.1155/2017/5346793 ◽

2017 ◽

Vol 2017 ◽

pp. 1-4

Author(s):

Binhua Tang ◽

Xihan Wang ◽

Victor X. Jin

Keyword(s):

High Throughput ◽

Genomic Feature ◽

Data Sets ◽

Sequencing Data ◽

Genomic Features ◽

Peak Alignment ◽

Chip Sequencing ◽

Quality Check ◽

User Friendly ◽

High Throughput Experiments

Sequencing data quality and peak alignment efficiency of ChIP-sequencing profiles are directly related to the reliability and reproducibility of NGS experiments. Till now, there is no tool specifically designed for optimal peak alignment estimation and quality-related genomic feature extraction for ChIP-sequencing profiles. We developed open-sourced COPAR, a user-friendly package, to statistically investigate, quantify, and visualize the optimal peak alignment and inherent genomic features using ChIP-seq data from NGS experiments. It provides a versatile perspective for biologists to perform quality-check for high-throughput experiments and optimize their experiment design. The package COPAR can process mapped ChIP-seq read file in BED format and output statistically sound results for multiple high-throughput experiments. Together with three public ChIP-seq data sets verified with the developed package, we have deposited COPAR on GitHub under a GNU GPL license.

Download Full-text

PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets

Cancer Informatics ◽

10.4137/cin.s13890 ◽

2014 ◽

Vol 13s1 ◽

pp. CIN.S13890 ◽

Cited By ~ 1

Author(s):

Changjin Hong ◽

Solaiappan Manimaran ◽

William Evan Johnson

Keyword(s):

Quality Control ◽

High Throughput ◽

High Performance ◽

High Throughput Sequencing ◽

Next Generation Sequencing Data ◽

Data Sets ◽

Sequencing Data ◽

Computationally Efficient ◽

High Throughput Sequencing Data ◽

Downstream Analysis

Quality control and read preprocessing are critical steps in the analysis of data sets generated from high-throughput genomic screens. In the most extreme cases, improper preprocessing can negatively affect downstream analyses and may lead to incorrect biological conclusions. Here, we present PathoQC, a streamlined toolkit that seamlessly combines the benefits of several popular quality control software approaches for preprocessing next-generation sequencing data. PathoQC provides a variety of quality control options appropriate for most high-throughput sequencing applications. PathoQC is primarily developed as a module in the PathoScope software suite for metagenomic analysis. However, PathoQC is also available as an open-source Python module that can run as a stand-alone application or can be easily integrated into any bioinformatics workflow. PathoQC achieves high performance by supporting parallel computation and is an effective tool that removes technical sequencing artifacts and facilitates robust downstream analysis. The PathoQC software package is available at http://sourceforge.net/projects/PathoScope/ .

Download Full-text

Phenomenal: An automatic open source library for 3D shoot architecture reconstruction and analysis for image-based plant phenotyping

10.1101/805739 ◽

2019 ◽

Cited By ~ 2

Author(s):

Simon Artzet ◽

Tsu-Wei Chen ◽

Jérôme Chopard ◽

Nicolas Brichet ◽

Michael Mielewczik ◽

...

Keyword(s):

Open Source ◽

High Throughput ◽

Synthetic Data ◽

3D Models ◽

Plant Phenotyping ◽

Published Data ◽

Data Sets ◽

Quantitative Measurements ◽

Workflow System ◽

Architectural Features

AbstractIn the era of high-throughput visual plant phenotyping, it is crucial to design fully automated and flexible workflows able to derive quantitative traits from plant images. Over the last years, several software supports the extraction of architectural features of shoot systems. Yet currently no end-to-end systems are able to extract both 3D shoot topology and geometry of plants automatically from images on large datasets and a large range of species. In particular, these software essentially deal with dicotyledons, whose architecture is comparatively easier to analyze than monocotyledons. To tackle these challenges, we designed the Phenomenal software featured with: (i) a completely automatic workflow system including data import, reconstruction of 3D plant architecture for a range of species and quantitative measurements on the reconstructed plants; (ii) an open source library for the development and comparison of new algorithms to perform 3D shoot reconstruction and (iii) an integration framework to couple workflow outputs with existing models towards model-assisted phenotyping. Phenomenal analyzes a large variety of data sets and species from images of high-throughput phenotyping platform experiments to published data obtained in different conditions and provided in a different format. Phenomenal has been validated both on manual measurements and synthetic data simulated by 3D models. It has been also tested on other published datasets to reproduce a published semi-automatic reconstruction workflow in an automatic way. Phenomenal is available as an open-source software on a public repository.

Download Full-text

Assessing Study Reproducibility through M2RI: A Novel Approach for Large-scale High-throughput Association Studies

10.1101/2020.08.18.253740 ◽

2020 ◽

Author(s):

Zeyu Jiao ◽

Yinglei Lai ◽

Jujiao Kang ◽

Weikang Gong ◽

Liang Ma ◽

...

Keyword(s):

Sample Size ◽

Rna Sequencing ◽

High Throughput ◽

Large Scale ◽

Association Studies ◽

Structural Mri ◽

Data Sets ◽

Sequencing Data ◽

Novel Approach ◽

Magnetic Resonance Imaging Mri

AbstractHigh-throughput technologies, such as magnetic resonance imaging (MRI) and DNA/RNA sequencing (DNA-seq/RNA-seq), have been increasingly used in large-scale association studies. With these technologies, important biomedical research findings have been generated. The reproducibility of these findings, especially from structural MRI (sMRI) and functional MRI (fMRI) association studies, has recently been questioned. There is an urgent demand for a reliable overall reproducibility assessment for large-scale high-throughput association studies. It is also desirable to understand the relationship between study reproducibility and sample size in an experimental design. In this study, we developed a novel approach: the mixture model reproducibility index (M2RI) for assessing study reproducibility of large-scale association studies. With M2RI, we performed study reproducibility analysis for several recent large sMRI/fMRI data sets. The advantages of our approach were clearly demonstrated, and the sample size requirements for different phenotypes were also clearly demonstrated, especially when compared to the Dice coefficient (DC). We applied M2RI to compare two MRI or RNA sequencing data sets. The reproducibility assessment results were consistent with our expectations. In summary, M2RI is a novel and useful approach for assessing study reproducibility, calculating sample sizes and evaluating the similarity between two closely related studies.

Download Full-text

Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data

Canadian Journal of Microbiology ◽

10.1139/cjm-2015-0821 ◽

2016 ◽

Vol 62 (8) ◽

pp. 692-703 ◽

Cited By ~ 132

Author(s):

Gregory B. Gloor ◽

Gregor Reid

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Compositional Data ◽

Critical Role ◽

Compositional Data Analysis ◽

Data Sets ◽

Clear Understanding ◽

Sequencing Data ◽

Microbiome Data

A workshop held at the 2015 annual meeting of the Canadian Society of Microbiologists highlighted compositional data analysis methods and the importance of exploratory data analysis for the analysis of microbiome data sets generated by high-throughput DNA sequencing. A summary of the content of that workshop, a review of new methods of analysis, and information on the importance of careful analyses are presented herein. The workshop focussed on explaining the rationale behind the use of compositional data analysis, and a demonstration of these methods for the examination of 2 microbiome data sets. A clear understanding of bioinformatics methodologies and the type of data being analyzed is essential, given the growing number of studies uncovering the critical role of the microbiome in health and disease and the need to understand alterations to its composition and function following intervention with fecal transplant, probiotics, diet, and pharmaceutical agents.

Download Full-text

Protein Complex-Based Analysis Framework for High-Throughput Data Sets

Science Signaling ◽

10.1126/scisignal.2003629 ◽

2013 ◽

Vol 6 (264) ◽

pp. rs5-rs5 ◽

Cited By ~ 66

Author(s):

A. Vinayagam ◽

Y. Hu ◽

M. Kulkarni ◽

C. Roesel ◽

R. Sopko ◽

...

Keyword(s):

High Throughput ◽

Protein Complex ◽

Data Sets ◽

Analysis Framework ◽

High Throughput Data

Download Full-text

Identification of Infectious Agents in High-Throughput Sequencing Data Sets Is Easily Achievable Using Free, Cloud-Based Bioinformatics Platforms

Journal of Clinical Microbiology ◽

10.1128/jcm.01386-19 ◽

2019 ◽

Vol 57 (12) ◽

Cited By ~ 2

Author(s):

Joseph G. Chappell ◽

Timothy Byaruhanga ◽

Theocharis Tsoleridis ◽

Jonathan K. Ball ◽

C. Patrick McClure

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Data Sets ◽

Infectious Agents ◽

Sequencing Data ◽

High Throughput Sequencing Data

Download Full-text

Maximizing ecological and evolutionary insight from bisulfite sequencing data sets

10.1101/091488 ◽

2016 ◽

Author(s):

Amanda J. Lea ◽

Tauras P. Vilgalys ◽

Paul A.P. Durst ◽

Jenny Tung

Keyword(s):

Dna Methylation ◽

Population Structure ◽

Effect Size ◽

Evolutionary Biology ◽

Bisulfite Sequencing ◽

Published Data ◽

Data Sets ◽

Cell Type ◽

Sequencing Data ◽

Bisulfite Sequencing Data

AbstractThe role of DNA methylation in development, divergence, and the response to environmental stimuli is of substantial interest in ecology and evolutionary biology. Measuring genome-wide DNA methylation is increasingly feasible using sodium bisulfite sequencing. Here, we analyze simulated and published data sets to demonstrate how effect size, kinship/population structure, taxonomic differences, and cell type heterogeneity influence the power to detect differential methylation in bisulfite sequencing data sets. Our results reveal that the effect sizes typical of evolutionary and ecological studies are modest, and will thus require data sets larger than those currently in common use. Additionally, our findings emphasize that statistical approaches that ignore the properties of bisulfite sequencing data (e.g., its count-based nature) or key sources of variance in natural populations (e.g., population structure or cell type heterogeneity) often produce false negatives or false positives, thus leading to incorrect biological conclusions. Finally, we provide recommendations for handling common issues that arise in bisulfite sequencing analyses and a freely available R Shiny application for simulating and performing power analyses on bisulfite sequencing data. This app, available at www.tung-lab.org/protocols-and-software.html, allows users to explore the effects of sequencing depth, sample size, population structure, and expected effect size, tailored to their own system.

Download Full-text