scholarly journals De novo computational prediction of non-coding RNA genes in prokaryotic genomes

2009 ◽  
Vol 25 (22) ◽  
pp. 2897-2905 ◽  
Author(s):  
Thao T. Tran ◽  
Fengfeng Zhou ◽  
Sarah Marshburn ◽  
Mark Stead ◽  
Sidney R. Kushner ◽  
...  
Genes ◽  
2020 ◽  
Vol 11 (3) ◽  
pp. 274 ◽  
Author(s):  
Christian Siadjeu ◽  
Boas Pucker ◽  
Prisca Viehöver ◽  
Dirk C. Albach ◽  
Bernd Weisshaar

Trifoliate yam (Dioscorea dumetorum) is one example of an orphan crop, not traded internationally. Post-harvest hardening of the tubers of this species starts within 24 h after harvesting and renders the tubers inedible. Genomic resources are required for D. dumetorum to improve breeding for non-hardening varieties as well as for other traits. We sequenced the D. dumetorum genome and generated the corresponding annotation. The two haplophases of this highly heterozygous genome were separated to a large extent. The assembly represents 485 Mbp of the genome with an N50 of over 3.2 Mbp. A total of 35,269 protein-encoding gene models as well as 9941 non-coding RNA genes were predicted, and functional annotations were assigned.


2020 ◽  
Author(s):  
Christian Siadjeu ◽  
Boas Pucker ◽  
Prisca Viehöver ◽  
Dirk C. Albach ◽  
Bernd Weisshaar

AbstractTrifoliate yam (Dioscorea dumetorum) is one example of an orphan crop, not traded internationally. Post-harvest hardening of the tubers of this species starts within 24 hours after harvesting and renders the tubers inedible. Genomic resources are required for D. dumetorum to improve breeding for non-hardening varieties as well as for other traits. We sequenced the D. dumetorum genome and generated the corresponding annotation. The two haplophases of this highly heterozygous genome were separated to a large extent. The assembly represents 485 Mbp of the genome with an N50 of over 3.2 Mbp. A total of 35,269 protein-encoding gene models as well as 9,941 non-coding RNA genes were predicted and functional annotations were assigned.


2008 ◽  
Vol 18 (6) ◽  
pp. 888-899 ◽  
Author(s):  
P. Larsson ◽  
A. Hinas ◽  
D. H. Ardell ◽  
L. A. Kirsebom ◽  
A. Virtanen ◽  
...  

2020 ◽  
Vol 27 (5) ◽  
pp. 385-391
Author(s):  
Lin Zhong ◽  
Zhong Ming ◽  
Guobo Xie ◽  
Chunlong Fan ◽  
Xue Piao

: In recent years, more and more evidence indicates that long non-coding RNA (lncRNA) plays a significant role in the development of complex biological processes, especially in RNA progressing, chromatin modification, and cell differentiation, as well as many other processes. Surprisingly, lncRNA has an inseparable relationship with human diseases such as cancer. Therefore, only by knowing more about the function of lncRNA can we better solve the problems of human diseases. However, lncRNAs need to bind to proteins to perform their biomedical functions. So we can reveal the lncRNA function by studying the relationship between lncRNA and protein. But due to the limitations of traditional experiments, researchers often use computational prediction models to predict lncRNA protein interactions. In this review, we summarize several computational models of the lncRNA protein interactions prediction base on semi-supervised learning during the past two years, and introduce their advantages and shortcomings briefly. Finally, the future research directions of lncRNA protein interaction prediction are pointed out.


2021 ◽  
Vol 11 (8) ◽  
pp. 1306-1312
Author(s):  
Li Song ◽  
Ningchao Du ◽  
Haitao Luo ◽  
Furong Li

This study aimed to identify the association of protein coding and long non coding RNA genes with immunotherapy response in melanoma. Based on RNA sequencing data of melanoma specimens, the expression levels of protein coding and long non coding RNA genes were calculated using the Kallisto RNA-seq quantification method, and differently expressed genes were detected using the DESeq2 method. Cox proportional hazards regression was used to evaluate the effects of gene expression on survival. According to the clinical data of 14 patients with drug response and 11 patients without drug response, 18 protein coding genes and 14 long non coding RNAs showed differential expressions (multiple of difference > 2 and P < 0.01 after correction), among which the coding genes of differential expression were significantly enriched through the process of cell adhesion (P < 0.01). The results of survival analysis showed that 18 coding genes and 14 long non coding RNA genes had significant effects on patient survival (P < 0.01). In this study, magnetic nanoparticles can be used to extract genomic DNA and total RNA due to their paramagnetism and biocompatibility, then transcriptome high-throughput sequencing was performed. The method has the advantages of removing dangerous reagents such as phenol and chloroform, replacing inorganic coating such as silica with organic oil, and shortening reaction time. Protein coding and long non coding RNA genes as well as magnetic nanoparticles may serve as potential cancer immune biomarker targets for developing future oncological treatments.


2016 ◽  
Vol 18 (suppl_6) ◽  
pp. vi84-vi85
Author(s):  
Siyuan Liu ◽  
Max Horlbeck ◽  
Seung Woo Cho ◽  
Harjus Birk ◽  
Martina Malatesta ◽  
...  

2006 ◽  
Vol 22 (21) ◽  
pp. 2590-2596 ◽  
Author(s):  
C. Wang ◽  
C. Ding ◽  
R. F. Meraz ◽  
S. R. Holbrook

2021 ◽  
Author(s):  
Abhibhav Sharma ◽  
Pinki Dey

AbstractAlzheimer’s disease (AD) is a progressive neurodegenerative disorder whose aetiology is currently unknown. Although numerous studies have attempted to identify the genetic risk factor(s) of AD, the interpretability and/or the prediction accuracies achieved by these studies remained unsatisfactory, reducing their clinical significance. Here, we employ the ensemble of random-forest and regularized regression model (LASSO) to the AD-associated microarray datasets from four brain regions - Prefrontal cortex, Middle temporal gyrus, Hippocampus, and Entorhinal cortex- to discover novel genetic biomarkers through a machine learning-based feature-selection classification scheme. The proposed scheme unrevealed the most optimum and biologically significant classifiers within each brain region, which achieved by far the highest prediction accuracy of AD in 5-fold cross-validation (99% average). Interestingly, along with the novel and prominent biomarkers including CORO1C, SLC25A46, RAE1, ANKIB1, CRLF3, PDYN, numerous non-coding RNA genes were also observed as discriminator, of which AK057435 and BC037880 are uncharacterized long non-coding RNA genes.


2020 ◽  
Author(s):  
Lei Li ◽  
Yanjie Chao

ABSTRACTSmall proteins shorter than 50 amino acids have been long overlooked. A number of small proteins have been identified in several model bacteria using experimental approaches and assigned important functions in diverse cellular processes. The recent development of ribosome profiling technologies has allowed a genome-wide identification of small proteins and small ORFs (smORFs), but our incomplete understanding of small proteins hinders de novo computational prediction of smORFs in non-model bacterial species. Here, we have identified several sequence features for smORFs by a systematic analysis of all the known small proteins in E. coli, among which the translation initiation rate is the strongest determinant. By integrating these features into a support vector machine learning model, we have developed a novel sPepFinder algorithm that can predict conserved smORFs in bacterial genomes with a high accuracy of 92.8%. De novo prediction in E. coli has revealed several novel smORFs with evidence of translation supported by ribosome profiling. Further application of sPepFinder in 549 bacterial species has led to the identification of > 100,000 novel smORFs, many of which are conserved at the amino acid and nucleotide levels under purifying selection. Overall, we have established sPepFinder as a valuable tool to identify novel smORFs in both model and non-model bacterial organisms, and provided a large resource of small proteins for functional characterizations.


Sign in / Sign up

Export Citation Format

Share Document