scholarly journals Characterization and identification of long non-coding RNAs based on feature relationship

2018 ◽  
Author(s):  
Guangyu Wang ◽  
Hongyan Yin ◽  
Boyang Li ◽  
Chunlei Yu ◽  
Fan Wang ◽  
...  

ABSTRACTThe significance of long non-coding RNAs (lncRNAs) in many biological processes and diseases has gained intense interests over the past several years. However, computational identification of lncRNAs in a wide range of species remains challenging; it requires prior knowledge of well-established sequences and annotations or species-specific training data, but the reality is that only a limited number of species have high-quality sequences and annotations. Here we first characterize lncRNAs by contrast to protein-coding RNAs based on feature relationship and find that the feature relationship between ORF (open reading frame) length and GC content presents universally substantial divergence in lncRNAs and protein-coding RNAs, as observed in a broad variety of species. Based on the feature relationship, accordingly, we further present LGC, a novel algorithm for identifying lncRNAs that is able to accurately distinguish lncRNAs from protein-coding RNAs in a cross-species manner without any prior knowledge. As validated on large-scale empirical datasets, comparative results show that LGC outperforms existing algorithms by achieving higher accuracy, well-balanced sensitivity and specificity, and is robustly effective (>90% accuracy) in discriminating lncRNAs from protein-coding RNAs across diverse species that range from plants to mammals. To our knowledge, this study, for the first time, differentially characterizes lncRNAs and protein-coding RNAs based on feature relationship, which is further applied in computational identification of lncRNAs. Taken together, our study represents a significant advance in characterization and identification of lncRNAs and LGC thus bears broad potential utility for computational analysis of lncRNAs in a wide range of species.

2019 ◽  
Vol 35 (17) ◽  
pp. 2949-2956 ◽  
Author(s):  
Guangyu Wang ◽  
Hongyan Yin ◽  
Boyang Li ◽  
Chunlei Yu ◽  
Fan Wang ◽  
...  

Abstract Motivation The significance of long non-coding RNAs (lncRNAs) in many biological processes and diseases has gained intense interests over the past several years. However, computational identification of lncRNAs in a wide range of species remains challenging; it requires prior knowledge of well-established sequences and annotations or species-specific training data, but the reality is that only a limited number of species have high-quality sequences and annotations. Results Here we first characterize lncRNAs in contrast to protein-coding RNAs based on feature relationship and find that the feature relationship between open reading frame length and guanine-cytosine (GC) content presents universally substantial divergence in lncRNAs and protein-coding RNAs, as observed in a broad variety of species. Based on the feature relationship, accordingly, we further present LGC, a novel algorithm for identifying lncRNAs that is able to accurately distinguish lncRNAs from protein-coding RNAs in a cross-species manner without any prior knowledge. As validated on large-scale empirical datasets, comparative results show that LGC outperforms existing algorithms by achieving higher accuracy, well-balanced sensitivity and specificity, and is robustly effective (>90% accuracy) in discriminating lncRNAs from protein-coding RNAs across diverse species that range from plants to mammals. To our knowledge, this study, for the first time, differentially characterizes lncRNAs and protein-coding RNAs based on feature relationship, which is further applied in computational identification of lncRNAs. Taken together, our study represents a significant advance in characterization and identification of lncRNAs and LGC thus bears broad potential utility for computational analysis of lncRNAs in a wide range of species. Availability and implementation LGC web server is publicly available at http://bigd.big.ac.cn/lgc/calculator. The scripts and data can be downloaded at http://bigd.big.ac.cn/biocode/tools/BT000004. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 49 (D1) ◽  
pp. D962-D968 ◽  
Author(s):  
Zhao Li ◽  
Lin Liu ◽  
Shuai Jiang ◽  
Qianpeng Li ◽  
Changrui Feng ◽  
...  

Abstract Expression profiles of long non-coding RNAs (lncRNAs) across diverse biological conditions provide significant insights into their biological functions, interacting targets as well as transcriptional reliability. However, there lacks a comprehensive resource that systematically characterizes the expression landscape of human lncRNAs by integrating their expression profiles across a wide range of biological conditions. Here, we present LncExpDB (https://bigd.big.ac.cn/lncexpdb), an expression database of human lncRNAs that is devoted to providing comprehensive expression profiles of lncRNA genes, exploring their expression features and capacities, identifying featured genes with potentially important functions, and building interactions with protein-coding genes across various biological contexts/conditions. Based on comprehensive integration and stringent curation, LncExpDB currently houses expression profiles of 101 293 high-quality human lncRNA genes derived from 1977 samples of 337 biological conditions across nine biological contexts. Consequently, LncExpDB estimates lncRNA genes’ expression reliability and capacities, identifies 25 191 featured genes, and further obtains 28 443 865 lncRNA-mRNA interactions. Moreover, user-friendly web interfaces enable interactive visualization of expression profiles across various conditions and easy exploration of featured lncRNAs and their interacting partners in specific contexts. Collectively, LncExpDB features comprehensive integration and curation of lncRNA expression profiles and thus will serve as a fundamental resource for functional studies on human lncRNAs.


mBio ◽  
2018 ◽  
Vol 9 (1) ◽  
Author(s):  
Xyrus X. Maurer-Alcalá ◽  
Rob Knight ◽  
Laura A. Katz

ABSTRACTSeparate germline and somatic genomes are found in numerous lineages across the eukaryotic tree of life, often separated into distinct tissues (e.g., in plants, animals, and fungi) or distinct nuclei sharing a common cytoplasm (e.g., in ciliates and some foraminifera). In ciliates, germline-limited (i.e., micronuclear-specific) DNA is eliminated during the development of a new somatic (i.e., macronuclear) genome in a process that is tightly linked to large-scale genome rearrangements, such as deletions and reordering of protein-coding sequences. Most studies of germline genome architecture in ciliates have focused on the model ciliatesOxytricha trifallax,Paramecium tetraurelia, andTetrahymena thermophila, for which the complete germline genome sequences are known. Outside of these model taxa, only a few dozen germline loci have been characterized from a limited number of cultivable species, which is likely due to difficulties in obtaining sufficient quantities of “purified” germline DNA in these taxa. Combining single-cell transcriptomics and genomics, we have overcome these limitations and provide the first insights into the structure of the germline genome of the ciliateChilodonella uncinata, a member of the understudied classPhyllopharyngea. Our analyses reveal the following: (i) large gene families contain a disproportionate number of genes from scrambled germline loci; (ii) germline-soma boundaries in the germline genome are demarcated by substantial shifts in GC content; (iii) single-cell omics techniques provide large-scale quality germline genome data with limited effort, at least for ciliates with extensively fragmented somatic genomes. Our approach provides an efficient means to understand better the evolution of genome rearrangements between germline and soma in ciliates.IMPORTANCEOur understanding of the distinctions between germline and somatic genomes in ciliates has largely relied on studies of a few model genera (e.g.,Oxytricha,Paramecium,Tetrahymena). We have used single-cell omics to explore germline-soma distinctions in the ciliateChilodonella uncinata, which likely diverged from the better-studied ciliates ~700 million years ago. The analyses presented here indicate that developmentally regulated genome rearrangements between germline and soma are demarcated by rapid transitions in local GC composition and lead to diversification of protein families. The approaches used here provide the basis for future work aimed at discerning the evolutionary impacts of germline-soma distinctions among diverse ciliates.


Planta ◽  
2020 ◽  
Vol 252 (5) ◽  
Author(s):  
Li Chen ◽  
Qian-Hao Zhu ◽  
Kerstin Kaufmann

Abstract Main conclusion Long non-coding RNAs modulate gene activity in plant development and stress responses by various molecular mechanisms. Abstract Long non-coding RNAs (lncRNAs) are transcripts larger than 200 nucleotides without protein coding potential. Computational approaches have identified numerous lncRNAs in different plant species. Research in the past decade has unveiled that plant lncRNAs participate in a wide range of biological processes, including regulation of flowering time and morphogenesis of reproductive organs, as well as abiotic and biotic stress responses. LncRNAs execute their functions by interacting with DNA, RNA and protein molecules, and by modulating the expression level of their targets through epigenetic, transcriptional, post-transcriptional or translational regulation. In this review, we summarize characteristics of plant lncRNAs, discuss recent progress on understanding of lncRNA functions, and propose an experimental framework for functional characterization.


Cancers ◽  
2020 ◽  
Vol 12 (12) ◽  
pp. 3695
Author(s):  
Kathleen M. Lucere ◽  
Megan M. R. O’Malley ◽  
Sarah D. Diermeier

Recent technological advancements such as CRISPR/Cas-based systems enable multiplexed, high-throughput screening for new therapeutic targets in cancer. While numerous functional screens have been performed on protein-coding genes to date, long non-coding RNAs (lncRNAs) represent an emerging class of potential oncogenes and tumor suppressors, with only a handful of large-scale screens performed thus far. Here, we review in detail currently available screening approaches to identify new lncRNA drivers of tumorigenesis and tumor progression. We discuss the various approaches of genomic and transcriptional targeting using CRISPR/Cas9, as well as methods to post-transcriptionally target lncRNAs via RNA interference (RNAi), antisense oligonucleotides (ASOs) and CRISPR/Cas13. We discuss potential advantages, caveats and future applications of each method to provide an overview and guide on investigating lncRNAs as new therapeutic targets in cancer.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Fatemeh Salabi ◽  
Hedieh Jafari ◽  
Shahrokh Navidpour ◽  
Ayeh Sadat Sadr

AbstractThe potential function of long non-coding RNAs in regulating neighbor protein-coding genes has attracted scientists’ attention. Despite the important role of lncRNAs in biological processes, a limited number of studies focus on non-model animal lncRNAs. In this study, we used a stringent step-by-step filtering pipeline and machine learning-based tools to identify the specific Androctonus crassicauda lncRNAs and analyze the features of predicted scorpion lncRNAs. 13,401 lncRNAs were detected using pipeline in A. crassicauda transcriptome. The blast results indicated that the majority of these lncRNAs sequences (12,642) have no identifiable orthologs even in closely related species and those considered as novel lncRNAs. Compared to lncRNA prediction tools indicated that our pipeline is a helpful approach to distinguish protein-coding and non-coding transcripts from RNA sequencing data of species without reference genomes. Moreover, analyzing lncRNA characteristics in A. crassicauda uncovered that lower protein-coding potential, lower GC content, shorter transcript length, and less number of isoform per gene are outstanding features of A. crassicauda lncRNAs transcripts.


2018 ◽  
Author(s):  
Guiling Sun ◽  
Yuxing Xu ◽  
Hui Liu ◽  
Ting Sun ◽  
Jingxiong Zhang ◽  
...  

Dodders (Cuscuta spp., Convolvulaceae) are globally distributed root- and leafless parasitic plants that parasitize a wide range of hosts. The physiology, ecology, and evolution of these obligate parasites are still poorly understood. A high-quality reference genome (size 266.74 Mb and contig N50 of 3.63 Mb) of Cuscuta australis was assembled. Our analyses reveal that Cuscuta experienced accelerated evolution, and Cuscuta and the convolvulaceous morning glory (Ipomoea) shared a common whole-genome triplication event before their divergence. Importantly, C. australis genome harbors only 19805 protein-coding genes, and 11.7% of the conserved orthologs in autotrophic plants are lost in C. australis. Many of these gene loss events likely result from the plant’s parasitic lifestyle and large changes in its body plan. Moreover, comparison of the gene expression patterns in Cuscuta prehaustoria/haustoria and various tissues of closely related autotrophic plants suggests that Cuscuta haustorium genes largely evolved from roots. The C. australis genome provides important resources for studying the evolution of parasitism, regressive evolution, and evo-devo in plant parasites.


2021 ◽  
Vol 7 (1) ◽  
pp. 16
Author(s):  
Didem Karakas ◽  
Bulent Ozpolat

Long non-coding RNAs (lncRNAs), a group of non-protein coding RNAs with lengths of more than 200 nucleotides, exert their effects by binding to DNA, mRNA, microRNA, and proteins and regulate gene expression at the transcriptional, post-transcriptional, translational, and post-translational levels. Depending on cellular location, lncRNAs are involved in a wide range of cellular functions, including chromatin modification, transcriptional activation, transcriptional interference, scaffolding and regulation of translational machinery. This review highlights recent studies on lncRNAs in the regulation of protein translation by modulating the translational factors (i.e, eIF4E, eIF4G, eIF4A, 4E-BP1, eEF5A) and signaling pathways involved in this process as wells as their potential roles as tumor suppressors or tumor promoters.


Genome ◽  
2016 ◽  
Vol 59 (4) ◽  
pp. 263-275 ◽  
Author(s):  
Mohammad Reza Bakhtiarizadeh ◽  
Batool Hosseinpour ◽  
Babak Arefnezhad ◽  
Narges Shamabadi ◽  
Seyed Alireza Salami

Long non-coding RNAs (lncRNAs) are transcribed RNA molecules >200 nucleotides in length that do not encode proteins and serve as key regulators of diverse biological processes. Recently, thousands of long intergenic non-coding RNAs (lincRNAs), a type of lncRNAs, have been identified in mammalians using massive parallel large sequencing technologies. The availability of the genome sequence of sheep (Ovis aries) has allowed us genomic prediction of non-coding RNAs. This is the first study to identify lincRNAs using RNA-seq data of eight different tissues of sheep, including brain, heart, kidney, liver, lung, ovary, skin, and white adipose. A computational pipeline was employed to characterize 325 putative lincRNAs with high confidence from eight important tissues of sheep using different criteria such as GC content, exon number, gene length, co-expression analysis, stability, and tissue-specific scores. Sixty-four putative lincRNAs displayed tissues-specific expression. The highest number of tissues-specific lincRNAs was found in skin and brain. All novel lincRNAs that aligned to the human and mouse lincRNAs had conserved synteny. These closest protein-coding genes were enriched in 11 significant GO terms such as limb development, appendage development, striated muscle tissue development, and multicellular organismal development. The findings reported here have important implications for the study of sheep genome.


2016 ◽  
Vol 2 (1) ◽  
pp. 5
Author(s):  
Yu Cuiyun ◽  
Qian Ning ◽  
Zhi-Ping Li ◽  
Wen Huang ◽  
Jia Yu ◽  
...  

<p align="left">Non-coding RNAs (ncRNA) are RNA molecules without protein coding functions owing to the lack of an open reading frame (ORF). Based on the length, ncRNAs can be divided into long and short ncRNAs; short ncRNAs include miRNAs and piRNAs. Hepatocellular carcinoma (HCC) is among the most frequent forms of cancer worldwide and its incidence is increasing rapidly. Studies have found that ncRNAs are likely to play a crucial role in a variety of biological processes including the pathogenesis and progression of HCC. In this review, we summarized the regulation mechanism and biological functions of ncRNAs in HCC with respect to its application in HCC diagnosis, therapy and prognosis.</p>


Sign in / Sign up

Export Citation Format

Share Document