scholarly journals Analysis of Small RNA Sequencing Data in Plants

Author(s):  
Vanika Garg ◽  
Rajeev K. Varshney

AbstractOver the past decades, next-generation sequencing (NGS) has been employed extensively for investigating the regulatory mechanisms of small RNAs. Several bioinformatics tools are available for aiding biologists to extract meaningful information from enormous amounts of data generated by NGS platforms. This chapter describes a detailed methodology for analyzing small RNA sequencing data using different open source tools. We elaborate on various steps involved in analysis, from processing the raw sequencing reads to identifying miRNAs, their targets, and differential expression studies.

2021 ◽  
Author(s):  
Jose Francisco Sanchez-Herrero ◽  
Raquel Pluvinet ◽  
Antonio Luna-de Haro ◽  
Lauro Sumoy

Abstract Background Next generation sequencing has allowed the discovery of miRNA isoforms, termed isomirs. Some isomirs are derived from imprecise processing of pre-miRNA precursors, leading to length variants. Additional variability is introduced by non-templated addition of bases at the ends or editing of internal bases, resulting in base differences relative to the template DNA sequence. We hypothesized that some component of the isomir variation reported so far could be due to systematic technical noise and not real. Results We have developed the XICRA pipeline to analyze small RNA sequencing data at the isomir level. We exploited its ability to use single or merged reads to compare isomir results derived from paired-end (PE) reads with those from single reads (SR) to address whether detectable sequence differences relative to canonical miRNAs found in isomirs are true biological variations or the result of errors in sequencing. We have detected non-negligible systematic differences between SR and PE data which primarily affect putative internally edited isomirs, and at a much smaller frequency terminal length changing isomirs. This is relevant for the identification of true isomirs in small RNA sequencing datasets. Conclusions We conclude that potential artifacts derived from sequencing errors and/or data processing could result in an overestimation of abundance and diversity of miRNA isoforms. Efforts in annotating the isomirnome should take this into account.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jose Francisco Sanchez Herrero ◽  
Raquel Pluvinet ◽  
Antonio Luna de Haro ◽  
Lauro Sumoy

Abstract Background Next generation sequencing has allowed the discovery of miRNA isoforms, termed isomiRs. Some isomiRs are derived from imprecise processing of pre-miRNA precursors, leading to length variants. Additional variability is introduced by non-templated addition of bases at the ends or editing of internal bases, resulting in base differences relative to the template DNA sequence. We hypothesized that some component of the isomiR variation reported so far could be due to systematic technical noise and not real. Results We have developed the XICRA pipeline to analyze small RNA sequencing data at the isomiR level. We exploited its ability to use single or merged reads to compare isomiR results derived from paired-end (PE) reads with those from single reads (SR) to address whether detectable sequence differences relative to canonical miRNAs found in isomiRs are true biological variations or the result of errors in sequencing. We have detected non-negligible systematic differences between SR and PE data which primarily affect putative internally edited isomiRs, and at a much smaller frequency terminal length changing isomiRs. This is relevant for the identification of true isomiRs in small RNA sequencing datasets. Conclusions We conclude that potential artifacts derived from sequencing errors and/or data processing could result in an overestimation of abundance and diversity of miRNA isoforms. Efforts in annotating the isomiRnome should take this into account.


2020 ◽  
Vol 522 (3) ◽  
pp. 776-782
Author(s):  
Wei-Hao Lee ◽  
Kai-Pu Chen ◽  
Kai Wang ◽  
Hsuan-Cheng Huang ◽  
Hsueh-Fen Juan

2016 ◽  
Vol 13 (5) ◽  
Author(s):  
Matthew Kanke ◽  
Jeanette Baran-Gale ◽  
Jonathan Villanueva ◽  
Praveen Sethupathy

SummarySmall non-coding RNAs, in particular microRNAs, are critical for normal physiology and are candidate biomarkers, regulators, and therapeutic targets for a wide variety of diseases. There is an ever-growing interest in the comprehensive and accurate annotation of microRNAs across diverse cell types, conditions, species, and disease states. Highthroughput sequencing technology has emerged as the method of choice for profiling microRNAs. Specialized bioinformatic strategies are required to mine as much meaningful information as possible from the sequencing data to provide a comprehensive view of the microRNA landscape. Here we present miRquant 2.0, an expanded bioinformatics tool for accurate annotation and quantification of microRNAs and their isoforms (termed isomiRs) from small RNA-sequencing data. We anticipate that miRquant 2.0 will be useful for researchers interested not only in quantifying known microRNAs but also mining the rich well of additional information embedded in small RNA-sequencing data.


2009 ◽  
Vol 25 (18) ◽  
pp. 2298-2301 ◽  
Author(s):  
D. Langenberger ◽  
C. Bermudez-Santana ◽  
J. Hertel ◽  
S. Hoffmann ◽  
P. Khaitovich ◽  
...  

Genomics Data ◽  
2016 ◽  
Vol 7 ◽  
pp. 46-53 ◽  
Author(s):  
Suyash Agarwal ◽  
Naresh Sahebrao Nagpure ◽  
Prachi Srivastava ◽  
Basdeo Kushwaha ◽  
Ravindra Kumar ◽  
...  

2011 ◽  
Vol 392 (4) ◽  
Author(s):  
Sven Findeiß ◽  
David Langenberger ◽  
Peter F. Stadler ◽  
Steve Hoffmann

Abstract Many aspects of the RNA maturation leave traces in RNA sequencing data in the form of deviations from the reference genomic DNA. This includes, in particular, genomically non-encoded nucleotides and chemical modifications. The latter leave their signatures in the form of mismatches and conspicuous patterns of sequencing reads. Modified mapping procedures focusing on particular types of deviations can help to unravel post-transcriptional modification, maturation and degradation processes. Here, we focus on small RNA sequencing data that is produced in large quantities aimed at the analysis of microRNA expression. Starting from the recovery of many well known modified sites in tRNAs, we provide evidence that modified nucleotides are a pervasive phenomenon in these data sets. Regarding non-encoded nucleotides we concentrate on CCA tails, which surprisingly can be found in a diverse collection of transcripts including sub-populations of mature microRNAs. Although small RNA sequencing libraries alone are insufficient to obtain a complete picture, they can inform on many aspects of the complex processes of RNA maturation.


2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Santosh Anand ◽  
Eleonora Mangano ◽  
Nadia Barizzone ◽  
Roberta Bordoni ◽  
Melissa Sorosina ◽  
...  

Abstract Sequencing large number of individuals, which is often needed for population genetics studies, is still economically challenging despite falling costs of Next Generation Sequencing (NGS). Pool-seq is an alternative cost- and time-effective option in which DNA from several individuals is pooled for sequencing. However, pooling of DNA creates new problems and challenges for accurate variant call and allele frequency (AF) estimation. In particular, sequencing errors confound with the alleles present at low frequency in the pools possibly giving rise to false positive variants. We sequenced 996 individuals in 83 pools (12 individuals/pool) in a targeted re-sequencing experiment. We show that Pool-seq AFs are robust and reliable by comparing them with public variant databases and in-house SNP-genotyping data of individual subjects of pools. Furthermore, we propose a simple filtering guideline for the removal of spurious variants based on the Kolmogorov-Smirnov statistical test. We experimentally validated our filters by comparing Pool-seq to individual sequencing data showing that the filters remove most of the false variants while retaining majority of true variants. The proposed guideline is fairly generic in nature and could be easily applied in other Pool-seq experiments.


Sign in / Sign up

Export Citation Format

Share Document