De novo computational prediction of non-coding RNA genes in prokaryotic genomes

Trifoliate yam (Dioscorea dumetorum) is one example of an orphan crop, not traded internationally. Post-harvest hardening of the tubers of this species starts within 24 h after harvesting and renders the tubers inedible. Genomic resources are required for D. dumetorum to improve breeding for non-hardening varieties as well as for other traits. We sequenced the D. dumetorum genome and generated the corresponding annotation. The two haplophases of this highly heterozygous genome were separated to a large extent. The assembly represents 485 Mbp of the genome with an N50 of over 3.2 Mbp. A total of 35,269 protein-encoding gene models as well as 9941 non-coding RNA genes were predicted, and functional annotations were assigned.

Download Full-text

High contiguity de novo genome sequence assembly of Trifoliate yam (Dioscorea dumetorum) using long read sequencing

10.1101/2020.01.31.928630 ◽

2020 ◽

Author(s):

Christian Siadjeu ◽

Boas Pucker ◽

Prisca Viehöver ◽

Dirk C. Albach ◽

Bernd Weisshaar

Keyword(s):

De Novo ◽

Functional Annotations ◽

Genome Sequence Assembly ◽

Non Coding Rna ◽

Protein Encoding ◽

Long Read ◽

Encoding Gene ◽

Rna Genes ◽

Orphan Crop ◽

Trifoliate Yam

AbstractTrifoliate yam (Dioscorea dumetorum) is one example of an orphan crop, not traded internationally. Post-harvest hardening of the tubers of this species starts within 24 hours after harvesting and renders the tubers inedible. Genomic resources are required for D. dumetorum to improve breeding for non-hardening varieties as well as for other traits. We sequenced the D. dumetorum genome and generated the corresponding annotation. The two haplophases of this highly heterozygous genome were separated to a large extent. The assembly represents 485 Mbp of the genome with an N50 of over 3.2 Mbp. A total of 35,269 protein-encoding gene models as well as 9,941 non-coding RNA genes were predicted and functional annotations were assigned.

Download Full-text

De novo search for non-coding RNA genes in the AT-rich genome of Dictyostelium discoideum: Performance of Markov-dependent genome feature scoring

Genome Research ◽

10.1101/gr.069104.107 ◽

2008 ◽

Vol 18 (6) ◽

pp. 888-899 ◽

Cited By ~ 13

Author(s):

P. Larsson ◽

A. Hinas ◽

D. H. Ardell ◽

L. A. Kirsebom ◽

A. Virtanen ◽

...

Keyword(s):

Dictyostelium Discoideum ◽

De Novo ◽

Genome Feature ◽

Non Coding Rna ◽

Rna Genes

Download Full-text

Recent Advances on the Semi-Supervised Learning for Long Non-Coding RNA-Protein Interactions Prediction: A Review

Protein and Peptide Letters ◽

10.2174/0929866526666191025104043 ◽

2020 ◽

Vol 27 (5) ◽

pp. 385-391

Author(s):

Lin Zhong ◽

Zhong Ming ◽

Guobo Xie ◽

Chunlong Fan ◽

Xue Piao

Keyword(s):

Supervised Learning ◽

Protein Interactions ◽

Computational Models ◽

Prediction Models ◽

Chromatin Modification ◽

Computational Prediction ◽

Human Diseases ◽

Future Research ◽

Non Coding Rna ◽

Long Non Coding Rna

: In recent years, more and more evidence indicates that long non-coding RNA (lncRNA) plays a significant role in the development of complex biological processes, especially in RNA progressing, chromatin modification, and cell differentiation, as well as many other processes. Surprisingly, lncRNA has an inseparable relationship with human diseases such as cancer. Therefore, only by knowing more about the function of lncRNA can we better solve the problems of human diseases. However, lncRNAs need to bind to proteins to perform their biomedical functions. So we can reveal the lncRNA function by studying the relationship between lncRNA and protein. But due to the limitations of traditional experiments, researchers often use computational prediction models to predict lncRNA protein interactions. In this review, we summarize several computational models of the lncRNA protein interactions prediction base on semi-supervised learning during the past two years, and introduce their advantages and shortcomings briefly. Finally, the future research directions of lncRNA protein interaction prediction are pointed out.

Download Full-text

Screening and survival analysis of melanoma immunodrug response-related genes and the function of magnetic nanoparticles in gene extraction

Materials Express ◽

10.1166/mex.2021.2037 ◽

2021 ◽

Vol 11 (8) ◽

pp. 1306-1312

Author(s):

Li Song ◽

Ningchao Du ◽

Haitao Luo ◽

Furong Li

Keyword(s):

Survival Analysis ◽

Magnetic Nanoparticles ◽

Drug Response ◽

High Throughput Sequencing ◽

Cox Proportional Hazards ◽

Sequencing Data ◽

Protein Coding ◽

Non Coding Rna ◽

Long Non Coding Rna ◽

Rna Genes

This study aimed to identify the association of protein coding and long non coding RNA genes with immunotherapy response in melanoma. Based on RNA sequencing data of melanoma specimens, the expression levels of protein coding and long non coding RNA genes were calculated using the Kallisto RNA-seq quantification method, and differently expressed genes were detected using the DESeq2 method. Cox proportional hazards regression was used to evaluate the effects of gene expression on survival. According to the clinical data of 14 patients with drug response and 11 patients without drug response, 18 protein coding genes and 14 long non coding RNAs showed differential expressions (multiple of difference > 2 and P < 0.01 after correction), among which the coding genes of differential expression were significantly enriched through the process of cell adhesion (P < 0.01). The results of survival analysis showed that 18 coding genes and 14 long non coding RNA genes had significant effects on patient survival (P < 0.01). In this study, magnetic nanoparticles can be used to extract genomic DNA and total RNA due to their paramagnetism and biocompatibility, then transcriptome high-throughput sequencing was performed. The method has the advantages of removing dangerous reagents such as phenol and chloroform, replacing inorganic coating such as silica with organic oil, and shortening reaction time. Protein coding and long non coding RNA genes as well as magnetic nanoparticles may serve as potential cancer immune biomarker targets for developing future oncological treatments.

Download Full-text

GENT-49. SYSTEMATIC IDENTIFICATION OF ESSENTIAL LONG NON-CODING RNA GENES IN GLIOBLASTOMA

Neuro-Oncology ◽

10.1093/neuonc/now212.354 ◽

2016 ◽

Vol 18 (suppl_6) ◽

pp. vi84-vi85

Author(s):

Siyuan Liu ◽

Max Horlbeck ◽

Seung Woo Cho ◽

Harjus Birk ◽

Martina Malatesta ◽

...

Keyword(s):

Non Coding Rna ◽

Systematic Identification ◽

Long Non Coding Rna ◽

Rna Genes

Download Full-text

PSoL: a positive sample only learning algorithm for finding non-coding RNA genes

Bioinformatics ◽

10.1093/bioinformatics/btl441 ◽

2006 ◽

Vol 22 (21) ◽

pp. 2590-2596 ◽

Cited By ~ 56

Author(s):

C. Wang ◽

C. Ding ◽

R. F. Meraz ◽

S. R. Holbrook

Keyword(s):

Learning Algorithm ◽

Positive Sample ◽

Non Coding Rna ◽

Rna Genes

Download Full-text

A Machine Learning Approach to Unmask Novel Gene Signatures and Prediction of Alzheimer’s Disease Within Different Brain Regions

10.1101/2021.03.03.433689 ◽

2021 ◽

Author(s):

Abhibhav Sharma ◽

Pinki Dey

Keyword(s):

Machine Learning ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Neurodegenerative Disorder ◽

Brain Regions ◽

Middle Temporal Gyrus ◽

Non Coding Rna ◽

Machine Learning Approach ◽

Microarray Datasets ◽

Rna Genes

AbstractAlzheimer’s disease (AD) is a progressive neurodegenerative disorder whose aetiology is currently unknown. Although numerous studies have attempted to identify the genetic risk factor(s) of AD, the interpretability and/or the prediction accuracies achieved by these studies remained unsatisfactory, reducing their clinical significance. Here, we employ the ensemble of random-forest and regularized regression model (LASSO) to the AD-associated microarray datasets from four brain regions - Prefrontal cortex, Middle temporal gyrus, Hippocampus, and Entorhinal cortex- to discover novel genetic biomarkers through a machine learning-based feature-selection classification scheme. The proposed scheme unrevealed the most optimum and biologically significant classifiers within each brain region, which achieved by far the highest prediction accuracy of AD in 5-fold cross-validation (99% average). Interestingly, along with the novel and prominent biomarkers including CORO1C, SLC25A46, RAE1, ANKIB1, CRLF3, PDYN, numerous non-coding RNA genes were also observed as discriminator, of which AK057435 and BC037880 are uncharacterized long non-coding RNA genes.

Download Full-text

sPepFinder expedites genome-wide identification of small proteins in bacteria

10.1101/2020.05.05.079178 ◽

2020 ◽

Author(s):

Lei Li ◽

Yanjie Chao

Keyword(s):

De Novo ◽

Bacterial Species ◽

Computational Prediction ◽

Ribosome Profiling ◽

Support Vector ◽

Initiation Rate ◽

E Coli ◽

Small Proteins ◽

Genome Wide ◽

A Genome

ABSTRACTSmall proteins shorter than 50 amino acids have been long overlooked. A number of small proteins have been identified in several model bacteria using experimental approaches and assigned important functions in diverse cellular processes. The recent development of ribosome profiling technologies has allowed a genome-wide identification of small proteins and small ORFs (smORFs), but our incomplete understanding of small proteins hinders de novo computational prediction of smORFs in non-model bacterial species. Here, we have identified several sequence features for smORFs by a systematic analysis of all the known small proteins in E. coli, among which the translation initiation rate is the strongest determinant. By integrating these features into a support vector machine learning model, we have developed a novel sPepFinder algorithm that can predict conserved smORFs in bacterial genomes with a high accuracy of 92.8%. De novo prediction in E. coli has revealed several novel smORFs with evidence of translation supported by ribosome profiling. Further application of sPepFinder in 549 bacterial species has led to the identification of > 100,000 novel smORFs, many of which are conserved at the amino acid and nucleotide levels under purifying selection. Overall, we have established sPepFinder as a valuable tool to identify novel smORFs in both model and non-model bacterial organisms, and provided a large resource of small proteins for functional characterizations.

Download Full-text

Formation of human long intergenic non-coding RNA genes, pseudogenes, and protein genes: Ancestral sequences are key players

PLoS ONE ◽

10.1371/journal.pone.0230236 ◽

2020 ◽

Vol 15 (3) ◽

pp. e0230236 ◽

Cited By ~ 2

Author(s):

Nicholas Delihas

Keyword(s):

Ancestral Sequences ◽

Non Coding Rna ◽

Key Players ◽

Rna Genes

Download Full-text