tRNA-Derived Small RNAs: Biogenesis, Modification, Function and Potential Impact on Human Disease Development

Transfer RNAs (tRNAs) are abundant small non-coding RNAs that are crucially important for decoding genetic information. Besides fulfilling canonical roles as adaptor molecules during protein synthesis, tRNAs are also the source of a heterogeneous class of small RNAs, tRNA-derived small RNAs (tsRNAs). Occurrence and the relatively high abundance of tsRNAs has been noted in many high-throughput sequencing data sets, leading to largely correlative assumptions about their potential as biologically active entities. tRNAs are also the most modified RNAs in any cell type. Mutations in tRNA biogenesis factors including tRNA modification enzymes correlate with a variety of human disease syndromes. However, whether it is the lack of tRNAs or the activity of functionally relevant tsRNAs that are causative for human disease development remains to be elucidated. Here, we review the current knowledge in regard to tsRNAs biogenesis, including the impact of RNA modifications on tRNA stability and discuss the existing experimental evidence in support for the seemingly large functional spectrum being proposed for tsRNAs. We also argue that improved methodology allowing exact quantification and specific manipulation of tsRNAs will be necessary before developing these small RNAs into diagnostic biomarkers and when aiming to harness them for therapeutic purposes.

Download Full-text

Great differences in performance and outcome of high-throughput sequencing data analysis platforms for fungal metabarcoding

MycoKeys ◽

10.3897/mycokeys.39.28109 ◽

2018 ◽

Vol 39 ◽

pp. 29-40 ◽

Cited By ~ 21

Author(s):

Sten Anslan ◽

R. Henrik Nilsson ◽

Christian Wurzbacher ◽

Petr Baldrian ◽

Leho Tedersoo ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Computation Time ◽

Potential Effect ◽

Data Sets ◽

Sequencing Data ◽

Operational Taxonomic Units ◽

High Throughput Sequencing Data ◽

Recent Developments

Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appears to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon dataset. We conclude that the output of each platform requires manual validation of the OTUs by examining the taxonomy assignment values.

Download Full-text

Evaluation of Subsampling-Based Normalization Strategies for Tagged High-Throughput Sequencing Data Sets from Gut Microbiomes

Applied and Environmental Microbiology ◽

10.1128/aem.05491-11 ◽

2011 ◽

Vol 77 (24) ◽

pp. 8795-8798 ◽

Cited By ~ 70

Author(s):

Daniel Aguirre de Cárcer ◽

Stuart E. Denman ◽

Chris McSweeney ◽

Mark Morrison

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Data Sets ◽

Sequencing Data ◽

Β Diversity ◽

High Throughput Sequencing Data ◽

Minimum Number ◽

Diversity Metrics

ABSTRACTSeveral subsampling-based normalization strategies were applied to different high-throughput sequencing data sets originating from human and murine gut environments. Their effects on the data sets' characteristics and normalization efficiencies, as measured by several β-diversity metrics, were compared. For both data sets, subsampling to the median rather than the minimum number appeared to improve the analysis.

Download Full-text

PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets

Cancer Informatics ◽

10.4137/cin.s13890 ◽

2014 ◽

Vol 13s1 ◽

pp. CIN.S13890 ◽

Cited By ~ 1

Author(s):

Changjin Hong ◽

Solaiappan Manimaran ◽

William Evan Johnson

Keyword(s):

Quality Control ◽

High Throughput ◽

High Performance ◽

High Throughput Sequencing ◽

Next Generation Sequencing Data ◽

Data Sets ◽

Sequencing Data ◽

Computationally Efficient ◽

High Throughput Sequencing Data ◽

Downstream Analysis

Quality control and read preprocessing are critical steps in the analysis of data sets generated from high-throughput genomic screens. In the most extreme cases, improper preprocessing can negatively affect downstream analyses and may lead to incorrect biological conclusions. Here, we present PathoQC, a streamlined toolkit that seamlessly combines the benefits of several popular quality control software approaches for preprocessing next-generation sequencing data. PathoQC provides a variety of quality control options appropriate for most high-throughput sequencing applications. PathoQC is primarily developed as a module in the PathoScope software suite for metagenomic analysis. However, PathoQC is also available as an open-source Python module that can run as a stand-alone application or can be easily integrated into any bioinformatics workflow. PathoQC achieves high performance by supporting parallel computation and is an effective tool that removes technical sequencing artifacts and facilitates robust downstream analysis. The PathoQC software package is available at http://sourceforge.net/projects/PathoScope/ .

Download Full-text

SNP genotyping and parameter estimation in polyploids using low-coverage sequencing data

10.1101/120261 ◽

2017 ◽

Author(s):

Paul D. Blischak ◽

Laura S. Kubatko ◽

Andrea D. Wolfe

Keyword(s):

Parameter Estimation ◽

High Throughput Sequencing ◽

Estimation Error ◽

Parameter Estimates ◽

Data Sets ◽

Mixed Mating ◽

Sequencing Data ◽

Ploidy Levels ◽

Sequencing Coverage ◽

High Throughput Sequencing Data

AbstractMotivation:Genotyping and parameter estimation using high throughput sequencing data are everyday tasks for population geneticists, but methods developed for diploids are typically not applicable to polyploid taxa. This is due to their duplicated chromosomes, as well as the complex patterns of allelic exchange that often accompany whole genome duplication (WGD) events. For WGDs within a single lineage (auto polyploids), inbreeding can result from mixed mating and/or double reduction. For WGDs that involve hybridization (allopolyploids), alleles are typically inherited through independently segregating subgenomes.Results:We present two new models for estimating genotypes and population genetic parameters from genotype likelihoods for auto- and allopolyploids. We then use simulations to compare these models to existing approaches at varying depths of sequencing coverage and ploidy levels. These simulations show that our models typically have lower levels of estimation error for genotype and parameter estimates, especially when sequencing coverage is low. Finally, we also apply these models to two empirical data sets from the literature. Overall, we show that the use of genotype likelihoods to model non-standard inheritance patterns is a promising approach for conducting population genomic inferences in polyploids.Availability:A C++ program, EBG, is provided to perform inference using the models we describe. It is available under the GNU GPLv3 on GitHub:https://github.com/pblischak/polyploid-genotyping.Contact: [email protected].

Download Full-text

Challenges ahead for matchmaking

it - Information Technology ◽

10.1515/itit-2016-0012 ◽

2016 ◽

Vol 58 (3) ◽

Author(s):

Peter M. Krawitz

Keyword(s):

High Throughput Sequencing ◽

Disease Gene ◽

Monogenic Disease ◽

Data Sets ◽

Sequencing Data ◽

Joint Effort ◽

High Throughput Sequencing Data ◽

Gene Associations ◽

Novel Variants ◽

Deep Phenotyping

AbstractWith every additional individual whose genome is sequenced thousands of novel variants enter the scene. It is these variants of unknown clinical significance, VUCS, that represent a great challenge to geneticists, who are dealing with high-throughput sequencing data sets. Especially in diagnostics of patients with unknown monogenic disease the joint effort of geneticists is required to find new disease gene associations. For this purpose, online platforms for matchmaking have been developed that allow clinician scientists to collaborate worldwide and to share medically relevant data. However, for a success of these tools, skills in deep phenotyping as well as new statistical approaches will be required.

Download Full-text

Identification of Infectious Agents in High-Throughput Sequencing Data Sets Is Easily Achievable Using Free, Cloud-Based Bioinformatics Platforms

Journal of Clinical Microbiology ◽

10.1128/jcm.01386-19 ◽

2019 ◽

Vol 57 (12) ◽

Cited By ~ 2

Author(s):

Joseph G. Chappell ◽

Timothy Byaruhanga ◽

Theocharis Tsoleridis ◽

Jonathan K. Ball ◽

C. Patrick McClure

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Data Sets ◽

Infectious Agents ◽

Sequencing Data ◽

High Throughput Sequencing Data

Download Full-text

Great differences in performance and outcome of high-throughput sequencing data analysis platforms for fungal metabarcoding

10.7287/peerj.preprints.27019v2 ◽

2018 ◽

Author(s):

Sten Anslan ◽

Henrik Nilsson ◽

Christian Wurzbacher ◽

Petr Baldrian ◽

Leho Tedersoo ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Computation Time ◽

Potential Effect ◽

Data Sets ◽

Sequencing Data ◽

Data Set ◽

Operational Taxonomic Units ◽

High Throughput Sequencing Data ◽

Recent Developments

Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appear to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon data set. We conclude that the output of each platform require manual validation of the OTUs by examining the taxonomy assignment values.

Download Full-text

A comprehensive co-expression network analysis in Vibrio cholerae

10.1101/2020.02.07.939611 ◽

2020 ◽

Author(s):

Cory D. DuPai ◽

Claus O. Wilke ◽

Bryan W. Davies

Keyword(s):

Network Analysis ◽

Vibrio Cholerae ◽

Rna Sequencing ◽

High Throughput Sequencing ◽

Model Organism ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Virulent Strains ◽

Micro Array ◽

The Impact

AbstractResearch into the evolution and pathogenesis of Vibrio cholerae has benefited greatly from the generation of high throughput sequencing data to drive molecular analyses. The steady accumulation of these datasets now provides a unique opportunity for in silico hypothesis generation via co-expression analysis. Here we leverage all published V. cholerae RNA-sequencing data, in combination with select data from other platforms, to generate a gene co-expression network that validates known gene interactions and identifies novel genetic partners across the entire V. cholerae genome. This network provides direct insights into genes influencing pathogenicity, metabolism, and transcriptional regulation, further clarifies results from previous sequencing experiments in V. cholerae (e.g. Tn-seq and ChIP-seq), and expands upon micro-array based findings in related gram-negative bacteria.ImportanceCholera is a devastating illness that kills tens of thousands of people annually. Vibrio cholerae, the causative agent of cholera, is an important model organism to investigate both bacterial pathogenesis and the impact of horizontal gene transfer on the emergence and dissemination of new virulent strains. Despite this importance, roughly one third of V. cholerae genes are functionally un-annotated, leaving large gaps in our understanding of this microbe. Through co-expression network analysis of existing RNA-sequencing data, this work develops an approach to uncover novel gene-gene relationships and contextualize genes with no known function, which will advance our understanding of V. cholerae virulence and evolution.

Download Full-text

High-throughput sequencing data and the impact of plant gene annotation quality

Journal of Experimental Botany ◽

10.1093/jxb/ery434 ◽

2018 ◽

Vol 70 (4) ◽

pp. 1069-1076 ◽

Cited By ~ 4

Author(s):

Aleksia Vaattovaara ◽

Johanna Leppälä ◽

Jarkko Salojärvi ◽

Michael Wrzaczek

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Gene Annotation ◽

Sequencing Data ◽

Plant Gene ◽

High Throughput Sequencing Data ◽

The Impact ◽

Annotation Quality

Download Full-text

Great differences in performance and outcome of high-throughput sequencing data analysis platforms for fungal metabarcoding

10.7287/peerj.preprints.27019v1 ◽

2018 ◽

Author(s):

Sten Anslan ◽

Henrik Nilsson ◽

Christian Wurzbacher ◽

Petr Baldrian ◽

Leho Tedersoo ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Computation Time ◽

Potential Effect ◽

Data Sets ◽

Sequencing Data ◽

Data Set ◽

Operational Taxonomic Units ◽

High Throughput Sequencing Data ◽

Recent Developments

Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appear to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon data set. We conclude that the output of each platform require manual validation of the OTUs by examining the taxonomy assignment values.

Download Full-text