scholarly journals Integrated annotations and analyses of small RNA-producing loci from 47 diverse plants

2019 ◽  
Author(s):  
Alice Lunardon ◽  
Nathan R. Johnson ◽  
Emily Hagerott ◽  
Tamia Phifer ◽  
Seth Polydore ◽  
...  

AbstractPlant endogenous small RNAs (sRNAs) are important regulators of gene expression. There are two broad categories of plant sRNAs: microRNAs (miRNAs) and endogenous short interfering RNAs (siRNAs). MicroRNA loci are relatively well-annotated but comprise only a small minority of the total sRNA pool; siRNA locus annotations have lagged far behind. Here, we used a large dataset of published and newly generated sRNA sequencing data (1,333 sRNA-seq libraries containing over 20 billion reads) and a uniform bioinformatic pipeline to produce comprehensive sRNA locus annotations of 47 diverse plants, yielding over 2.7 million sRNA loci. The two most numerous classes of siRNA loci produced mainly 24 nucleotide and 21 nucleotide siRNAs, respectively. 24 nucleotide-dominated siRNA loci usually occurred in intergenic regions, especially at the 5’-flanking regions of protein-coding genes. In contrast, 21 nucleotide-dominated siRNA loci were most often derived from double-stranded RNA precursors copied from spliced mRNAs. Genic 21 nucleotide-dominated loci were especially common from disease resistance genes, including from a large number of monocots. Individual siRNA sequences of all types showed very little conservation across species, while mature miRNAs were more likely to be conserved. We developed a web server where our data and several search and analysis tools are freely accessible at http://plantsmallrnagenes.science.psu.edu.

2016 ◽  
Vol 7 (1) ◽  
Author(s):  
Jane E. Freedman ◽  
Mark Gerstein ◽  
Eric Mick ◽  
Joel Rozowsky ◽  
Daniel Levy ◽  
...  

Abstract There is growing appreciation for the importance of non-protein-coding genes in development and disease. Although much is known about microRNAs, limitations in bioinformatic analyses of RNA sequencing have precluded broad assessment of other forms of small-RNAs in humans. By analysing sequencing data from plasma-derived RNA from 40 individuals, here we identified over a thousand human extracellular RNAs including microRNAs, piwi-interacting RNA (piRNA), and small nucleolar RNAs. Using a targeted quantitative PCR with reverse transcription approach in an additional 2,763 individuals, we characterized almost 500 of the most abundant extracellular transcripts including microRNAs, piRNAs and small nucleolar RNAs. The presence in plasma of many non-microRNA small-RNAs was confirmed in an independent cohort. We present comprehensive data to demonstrate the broad and consistent detection of diverse classes of circulating non-cellular small-RNAs from a large population.


2011 ◽  
Vol 18 (9) ◽  
pp. 1075-1082 ◽  
Author(s):  
Eivind Valen ◽  
Pascal Preker ◽  
Peter Refsing Andersen ◽  
Xiaobei Zhao ◽  
Yun Chen ◽  
...  

2017 ◽  
Vol 2017 ◽  
pp. 1-8 ◽  
Author(s):  
Yuming Zhao ◽  
Fang Wang ◽  
Su Chen ◽  
Jun Wan ◽  
Guohua Wang

MicroRNAs (miRNAs) are short (~22 nucleotides) noncoding RNAs and disseminated throughout the genome, either in the intergenic regions or in the intronic sequences of protein-coding genes. MiRNAs have been proved to play important roles in regulating gene expression. Hence, understanding the transcriptional mechanism of miRNA genes is a very critical step to uncover the whole regulatory network. A number of miRNA promoter prediction models have been proposed in the past decade. This review summarized several most popular miRNA promoter prediction models which used genome sequence features, or other features, for example, histone markers, RNA Pol II binding sites, and nucleosome-free regions, achieved by high-throughput sequencing data. Some databases were described as resources for miRNA promoter information. We then performed comprehensive discussion on prediction and identification of transcription factor mediated microRNA regulatory networks.


2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
Yan Guo ◽  
Shilin Zhao ◽  
Quanhu Sheng ◽  
Mingsheng Guo ◽  
Brian Lehmann ◽  
...  

The most popular RNA library used for RNA sequencing is the poly(A) captured RNA library. This library captures RNA based on the presence of poly(A) tails at the 3′ end. Another type of RNA library for RNA sequencing is the total RNA library which differs from the poly(A) library by capture method and price. The total RNA library costs more and its capture of RNA is not dependent on the presence of poly(A) tails. In practice, only ribosomal RNAs and small RNAs are washed out in the total RNA library preparation. To evaluate the ability of detecting RNA for both RNA libraries we designed a study using RNA sequencing data of the same two breast cancer cell lines from both RNA libraries. We found that the RNA expression values captured by both RNA libraries were highly correlated. However, the number of RNAs captured was significantly higher for the total RNA library. Furthermore, we identify several subsets of protein coding RNAs that were not captured efficiently by the poly(A) library. One of the most noticeable is the histone-encode genes, which lack the poly(A) tail.


2018 ◽  
Vol 49 (6) ◽  
Author(s):  
Elsahookie & et al.

The endosperm in cereals supplies nutrients to the developing kernel and seedling, and it is the primary tissue that gene imprinting occurs. Developing maize (Zea mays L.) endosperms were analysed for allelic gene expression in both reciprocal crosses of inbreds B73 and Mo17. A high-throughput transcriptome sequencing in kernels at 0, 3 up to 15 DAP of both reciprocals were performed, and found a gradual increased paternal transcript expression in 3 and 5 DAP kernels. Meanwhile, in 7 DAP endosperm, most of genes tested gave the ratio 2:1 maternal: paternal, suggesting that paternal genes are almost fully activated at 7 DAP. There were 300 PEGs and 499 MEGs identified across endosperm development stages. A 63 genes out of 116, 234 exhibited parent-specific expression were identified at 7, 10 and 15 DAP. Most of paternally expressed genes was at 7 DAP due to deviation of paternal alleles expression at this stage of development. Imprinted genes in terms of relative expression of maternal and paternal alleles differed at least five folds in both crosses. A total of 179 (1.6%) protein coding genes expressed in the endosperm were imprinted, 68 of them showed maternal preferential expression and 111 paternal expression, besides 38 long noncoding RNA were found imprinted and transcribed in either sense or antisense direction from intronic regions of normal protein coding genes or from intergenic regions. Imprinted genes showed clustering around the genome. A total of 21 imprinted  genes in the maize hybrid endosperm had differentially methylated regions (DMRs). All DMRs were found to be hypomethylated in maternal alleles and hypermethylated in paternal alleles. These results confirm a complex mechanism controlling endosperm in maize in imprinting, auxin activity, and development regulation. Studying F2 kernels on F1 plants may shed a new light on controlling kernel number weight in unit of area.


2017 ◽  
Author(s):  
Matthieu Legendre ◽  
Elisabeth Fabre ◽  
Olivier Poirot ◽  
Sandra Jeudy ◽  
Audrey Lartigue ◽  
...  

AbstractWith DNA genomes up to 2.5 Mb packed in particles of bacterium-like shape and dimension, the first two Acanthamoeba-infectingPandoravirusesremained the most spectacular viruses since their description in 2013. Our isolation of three new strains from distant locations and environments allowed us to perform the first comparative genomics analysis of the emerging worldwide-distributed Pandoraviridae family. Thorough annotation of the genomes combining transcriptomic, proteomic, and bioinformatic analyses, led to the discovery of many non-coding transcripts while significantly reducing the former set of predicted protein-coding genes. We found that the Pandoraviridae exhibit an open pan genome, the enormous size of which is not adequately explained by gene duplications or horizontal transfers. As most of the strain specific genes have no extant homolog and exhibit statistical features comparable to intergenic regions, we suggests thatde novogene creation is a strong component in the evolution of the giant Pandoravirus genomes.


2021 ◽  
Author(s):  
◽  
Mirko Brüggemann

Most cellular processes are regulated by RNA-binding proteins (RBPs). These RBPs usually use defined binding sites to recognize and directly interact with their target RNA molecule. Individual-nucleotide resolution UV crosslinking and immunoprecipitation (iCLIP) experiments are an important tool to de- scribe such interactions in cell cultures in-vivo. This experimental protocol yields millions of individual sequencing reads from which the binding spec- trum of the RBP under study can be deduced. In this PhD thesis I studied how RNA processing is driven from RBP binding by analyzing iCLIP-derived sequencing datasets. First, I described a complete data analysis pipeline to detect RBP binding sites from iCLIP sequencing reads. This workflow covers all essential process- ing steps, from the first quality control to the final annotation of binding sites. I described the accurate integration of biological iCLIP replicates to boost the initial peak calling step while ensuring high specificity through replicate re- producibility analysis. Further I proposed a routine to level binding site width to streamline downstream analysis processes. This was exemplified in the re- analysis of the binding spectrum of the U2 small nuclear RNA auxiliary factor 2 (U2AF2, U2AF65). I recaptured the known dominance of U2AF65 to bind to intronic sequences of protein-coding genes, where it likely recognizes the polypyrimidine tract as part of the core spliceosome machinery. In the second part of my thesis, I analyzed the binding spectrum of the serine and arginine rich splicing factor 6 (SRSF6) in the context of diabetes. In pancreatic beta-cells, the expression of SRSF6 is regulated by the transcription factor GLIS3, which encodes for a diabetes susceptibility gene. It is known that SRSF6 promotes beta-cell death through the splicing dysregulation of genes essential to beta-cell function and survival. However, the exact mechanism of how these RNAs are targeted by SRSF6 remains poorly understood. Here, I applied the defined iCLIP processing pipeline to describe the binding landscape of the splicing factor SRSF6 in the human pancreatic beta-cell line EndoC-H1. The initial binding sites definition revealed a predominant binding to coding sequences (CDS) of protein-coding genes. This was followed up by extensive motif analysis which revealed a so far, in human, unknown purine-rich binding motif. SRSF6 seemed to specifically recognize repetitions of the triplet GAA. I also showed that the number of contiguous triplets correlated with increasing binding site strength. I further integrated RNA-sequencing data from the same cell type, with SRSF6 in KD and in basal conditions, to analyze SRSF6- related splicing changes. I showed that the exact positioning of SRSF6 on alternatively spliced exons regulates the produced transcript isoforms. This mechanism seemed to control exons in several known susceptibility genes for diabetes. In summary, in my PhD thesis, I presented a comprehensive workflow for the processing of iCLIP-derived sequencing data. I applied this pipeline on a dataset from pancreatic beta-cells to unveil the impact of SRSF6-mediated splicing changes. Thus, my analysis provides novel insights into the regulation of diabetes susceptibility genes.


2019 ◽  
Vol 8 (31) ◽  
Author(s):  
Rikky W. Purbojati ◽  
Daniela I. Drautz-Moses ◽  
Akira Uchida ◽  
Anthony Wong ◽  
Megan E. Clare ◽  
...  

Brevundimonas sp. strain SGAir0440 was isolated from indoor air samples collected in Singapore. Its genome was assembled using single-molecule real-time sequencing data, resulting in one circular chromosome with a length of 3.1 Mbp. The genome consists of 3,033 protein-coding genes, 48 tRNAs, and 6 rRNA operons.


2018 ◽  
Author(s):  
Matthieu Legendre ◽  
Jean-Marie Alempic ◽  
Nadège Philippe ◽  
Audrey Lartigue ◽  
Sandra Jeudy ◽  
...  

AbstractWith genomes of up to 2.7 Mb propagated in µm-long oblong particles and initially predicted to encode more than 2000 proteins, members of the Pandoraviridae family display the most extreme features of the known viral world. The mere existence of such giant viruses raises fundamental questions about their origin and the processes governing their evolution. A previous analysis of six newly available isolates, independently confirmed by a study including 3 others, established that the Pandoraviridae pan-genome is open, meaning that each new strain exhibits protein-coding genes not previously identified in other family members. With an average increment of about 60 proteins, the gene repertoire shows no sign of reaching a limit and remains largely coding for proteins without recognizable homologs in other viruses or cells (ORFans). To explain these results, we proposed that most new protein-coding genes were created de novo, from pre-existing non-coding regions of the G+C rich pandoravirus genomes. The comparison of the gene content of a new isolate, P. celtis, closely related (96% identical genome) to the previously described P. quercus is now used to test this hypothesis by studying genomic changes in a microevolution range. Our results confirm that the differences between these two similar gene contents mostly consist of protein-coding genes without known homologs (ORFans), with statistical signatures close to that of intergenic regions. These newborn proteins are under slight negative selection, perhaps to maintain stable folds and prevent protein aggregation pending the eventual emergence of fitness-increasing functions. Our study also unraveled several insertion events mediated by a transposase of the hAT family, 3 copies of which are found in P. celtis and are presumably active. Members of the Pandoraviridae are presently the first viruses known to encode this type of transposase.


mBio ◽  
2018 ◽  
Vol 9 (5) ◽  
Author(s):  
Daniel Dar ◽  
Rotem Sorek

ABSTRACT Prokaryotic genomes encode a plethora of small noncoding RNAs (ncRNAs) that fine-tune the expression of specific genes. The vast majority of known bacterial ncRNAs are encoded from within intergenic regions, where their expression is controlled by promoter and terminator elements, similarly to protein-coding genes. In addition, recent studies have shown that functional ncRNAs can also be derived from gene 3′ untranslated regions (3′UTRs) via an alternative biogenesis pathway, in which the ncRNA segment is separated from the mRNA via RNase cleavage. Here, we report the detection of a large set of decay-generated noncoding RNAs (decRNAs), many of which are completely embedded within protein-coding mRNA regions rather than in the UTRs. We show that these decRNAs are “carved out” of the mRNA through the action of RNase E and that they are predicted to fold into highly stable RNA structures, similar to those of known ncRNAs. A subset of these decRNAs is predicted to interact with Hfq or ProQ or both, which act as ncRNA chaperones, and some decRNAs display evolutionarily conserved sequences and conserved expression patterns between different species. These results suggest that mRNA protein-coding regions may harbor a large set of potentially functional small RNAs. IMPORTANCE Bacteria and archaea utilize regulatory small noncoding RNAs (ncRNAs) to control the expression of specific genetic programs. These ncRNAs are almost exclusively encoded within intergenic regions and are independently transcribed. Here, we report on a large set ncRNAs that are “carved out” from within the protein-coding regions of Escherichia coli mRNAs by cellular RNases. These protected mRNA fragments fold into energetically stable RNA structures, reminiscent of those of intergenic regulatory ncRNAs. In addition, a subset of these ncRNAs coprecipitate with the major ncRNA chaperones Hfq and ProQ and display evolutionarily conserved sequences and conserved expression patterns between different bacterial species. Our data suggest that protein-coding genes can potentially act as a reservoir of regulatory ncRNAs.


Sign in / Sign up

Export Citation Format

Share Document