Landscape of human miRNA variation and conservation using Annotative Database of miRNA Elements, ADmiRE

AbstractMicroRNAs (miRNAs) are the most abundant class of non-coding RNAs that regulate expression of >60% genes and are frequently deregulated in many human diseases. Sequence variants in miRNAs are expected to have a high impact on miRNA function. However, the lack of miRNA variant annotation and prioritization guidelines has hampered this analysis from whole genome/exome sequencing (WGS/WES) studies. Through the development of an Annotative Database of miRNA Elements, ADmiRE workflow, we re-annotated the publicly available population dataset of gnomAD 15,596 WGS and 123,136 WES and describe 26,094 precursor-miRNA variants. AdmiRE annotates twice the miRNA variants predicted by existing tools which prioritize variation relative to protein coding regions. We provide the allele frequency distribution of miRNA variation which is comparable to variation in exonic regions. This distribution is similar for miRNAs located in the intragenic and intergenic genomic context. Moreover, ‘high confidence’ miRNAs (designated by miRBase) harbor less variation (the majority contributed by rare variants) compared with the remaining miRNAs. We identify 279 miRNAs highly constrained with little or no variation in gnomAD. We further describe the evolutionary conservation of miRNAs across 100 vertebrates and identify 434 highly conserved miRNAs. We demonstrate that these constraint and conservation metrics (now incorporated into the ADmiRE workflow) characterize miRNAs previously implicated in human diseases. In conclusion, through the development of ADmiRE, we comprehensively analyze the landscape of miRNA sequence variation in large human population datasets and provide miRNA vertebrate conservation scores to aid future studies of miRNA variation in human diseases.

Download Full-text

Non-coding RNAs and disease: the classical ncRNAs make a comeback

Biochemical Society Transactions ◽

10.1042/bst20160089 ◽

2016 ◽

Vol 44 (4) ◽

pp. 1073-1078 ◽

Cited By ~ 36

Author(s):

Rogerio Alves de Almeida ◽

Marcin G. Fraczek ◽

Steven Parker ◽

Daniela Delneri ◽

Raymond T. O'Keefe

Keyword(s):

Human Genome ◽

Human Disease ◽

Human Diseases ◽

Protein Coding ◽

Coding Regions ◽

Disease Biology ◽

The Future ◽

Future Potential ◽

Non Coding Rnas ◽

Disproportionate Number

Many human diseases have been attributed to mutation in the protein coding regions of the human genome. The protein coding portion of the human genome, however, is very small compared with the non-coding portion of the genome. As such, there are a disproportionate number of diseases attributed to the coding compared with the non-coding portion of the genome. It is now clear that the non-coding portion of the genome produces many functional non-coding RNAs and these RNAs are slowly being linked to human diseases. Here we discuss examples where mutation in classical non-coding RNAs have been attributed to human disease and identify the future potential for the non-coding portion of the genome in disease biology.

Download Full-text

Promising Advances in LINC01116 Related to Cancer

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.736927 ◽

2021 ◽

Vol 9 ◽

Author(s):

Yating Xu ◽

Xiao Yu ◽

Menggang Zhang ◽

Qingyuan Zheng ◽

Zongzong Sun ◽

...

Keyword(s):

Colorectal Cancer ◽

Gastric Cancer ◽

Lung Cancer ◽

Cancer Colorectal ◽

Malignant Tumors ◽

Biological Processes ◽

Aberrant Expression ◽

Protein Coding ◽

Future Studies ◽

Non Coding Rnas

Long non-coding RNAs (lncRNAs) are RNAs with a length of no less than 200 nucleotides that are not translated into proteins. Accumulating evidence indicates that lncRNAs are pivotal regulators of biological processes in several diseases, particularly in several malignant tumors. Long intergenic non-protein coding RNA 1116 (LINC01116) is a lncRNA, whose aberrant expression is correlated with a variety of cancers, including lung cancer, gastric cancer, colorectal cancer, glioma, and osteosarcoma. LINC01116 plays a crucial role in facilitating cell proliferation, invasion, migration, and apoptosis. In addition, numerous studies have recently suggested that LINC01116 has emerged as a novel biomarker for prognosis and therapy in malignant tumors. Consequently, we summarize the clinical significance of LINC01116 associated with biological processes in various tumors and provide a hopeful orientation to guide clinical treatment of various cancers in future studies.

Download Full-text

Watch Out for a Second SNP: Focus on Multi-Nucleotide Variants in Coding Regions and Rescued Stop-Gained

Frontiers in Genetics ◽

10.3389/fgene.2021.659287 ◽

2021 ◽

Vol 12 ◽

Author(s):

Fabien Degalez ◽

Frédéric Jehl ◽

Kévin Muret ◽

Maria Bernard ◽

Frédéric Lecerf ◽

...

Keyword(s):

Companion Paper ◽

Potential Effect ◽

Nucleotide Polymorphisms ◽

Rna Seq ◽

Coding Region ◽

Single Nucleotide ◽

Protein Coding ◽

Variant Annotation ◽

Coding Regions ◽

Reliable Genotypes

Most single-nucleotide polymorphisms (SNPs) are located in non-coding regions, but the fraction usually studied is harbored in protein-coding regions because potential impacts on proteins are relatively easy to predict by popular tools such as the Variant Effect Predictor. These tools annotate variants independently without considering the potential effect of grouped or haplotypic variations, often called “multi-nucleotide variants” (MNVs). Here, we used a large RNA-seq dataset to survey MNVs, comprising 382 chicken samples originating from 11 populations analyzed in the companion paper in which 9.5M SNPs— including 3.3M SNPs with reliable genotypes—were detected. We focused our study on in-codon MNVs and evaluate their potential mis-annotation. Using GATK HaplotypeCaller read-based phasing results, we identified 2,965 MNVs observed in at least five individuals located in 1,792 genes. We found 41.1% of them showing a novel impact when compared to the effect of their constituent SNPs analyzed separately. The biggest impact variation flux concerns the originally annotated stop-gained consequences, for which around 95% were rescued; this flux is followed by the missense consequences for which 37% were reannotated with a different amino acid. We then present in more depth the rescued stop-gained MNVs and give an illustration in the SLC27A4 gene. As previously shown in human datasets, our results in chicken demonstrate the value of haplotype-aware variant annotation, and the interest to consider MNVs in the coding region, particularly when searching for severe functional consequence such as stop-gained variants.

Download Full-text

Transposable element-derived sequences in vertebrate development

Mobile DNA ◽

10.1186/s13100-020-00229-5 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Ema Etchegaray ◽

Magali Naville ◽

Jean-Nicolas Volff ◽

Zofia Haftek-Terreau

Keyword(s):

Transposable Element ◽

Genomic Instability ◽

Adaptive Immunity ◽

Insertion Site ◽

Regulatory Sequences ◽

Vertebrate Development ◽

Genomic Context ◽

Protein Coding ◽

Evolutionary Success ◽

Non Coding Rnas

AbstractTransposable elements (TEs) are major components of all vertebrate genomes that can cause deleterious insertions and genomic instability. However, depending on the specific genomic context of their insertion site, TE sequences can sometimes get positively selected, leading to what are called “exaptation” events. TE sequence exaptation constitutes an important source of novelties for gene, genome and organism evolution, giving rise to new regulatory sequences, protein-coding exons/genes and non-coding RNAs, which can play various roles beneficial to the host. In this review, we focus on the development of vertebrates, which present many derived traits such as bones, adaptive immunity and a complex brain. We illustrate how TE-derived sequences have given rise to developmental innovations in vertebrates and how they thereby contributed to the evolutionary success of this lineage.

Download Full-text

RNAsamba: coding potential assessment using ORF and whole transcript sequence information

10.1101/620880 ◽

2019 ◽

Author(s):

Antonio P. Camargo ◽

Vsevolod Sourkov ◽

Marcelo F. Carazzolle

Keyword(s):

High Throughput Sequencing ◽

Model Organisms ◽

Sequence Information ◽

Protein Coding ◽

Rna Molecules ◽

Coding Regions ◽

Sequencing Technologies ◽

Partial Length ◽

Non Coding Rnas ◽

Coding Potential

AbstractMotivationThe advent of high-throughput sequencing technologies made it possible to obtain large volumes of genetic information, quickly and inexpensively. Thus, many efforts are devoted to unveil the biological roles of genomic elements, being one of the main tasks the identification of protein-coding and long non-coding RNAs.ResultsWe describe RNAsamba, a tool to predict the coding potential of RNA molecules from sequence information using a deep-learning model that processes both the whole sequence and the ORF to look for patterns that distinguish coding and non-coding RNAs. We evaluated the model in the classification of coding and non-coding transcripts of humans and five other model organisms and show that RNAsamba mostly outperforms other state-of-the-art methods. We also show that RNAsamba can identify coding signals in partial-length ORFs and UTR sequences, evidencing that its model is not dependent on the presence of complete coding regions. RNAsamba is a fast and easy tool that can provide valuable contributions to genome annotation pipelines.Availability and implementationThe source code of RNAsamba is freely available at:https://github.com/apcamargo/RNAsamba.

Download Full-text

Prediction of Deleterious Nonsynonymous Single-Nucleotide Polymorphism for Human Diseases

The Scientific World JOURNAL ◽

10.1155/2013/675851 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10 ◽

Cited By ~ 28

Author(s):

Jiaxin Wu ◽

Rui Jiang

Keyword(s):

Rare Variants ◽

Fundamental Problem ◽

Nucleotide Polymorphisms ◽

Nucleotide Polymorphism ◽

Single Nucleotide ◽

Protein Coding ◽

Inherited Diseases ◽

Coding Regions ◽

Typical Type ◽

And Function

The identification of genetic variants that are responsible for human inherited diseases is a fundamental problem in human and medical genetics. As a typical type of genetic variation, nonsynonymous single-nucleotide polymorphisms (nsSNPs) occurring in protein coding regions may alter the encoded amino acid, potentially affect protein structure and function, and further result in human inherited diseases. Therefore, it is of great importance to develop computational approaches to facilitate the discrimination of deleterious nsSNPs from neutral ones. In this paper, we review databases that collect nsSNPs and summarize computational methods for the identification of deleterious nsSNPs. We classify the existing methods for characterizing nsSNPs into three categories (sequence based, structure based, and annotation based), and we introduce machine learning models for the prediction of deleterious nsSNPs. We further discuss methods for identifying deleterious nsSNPs in noncoding variants and those for dealing with rare variants.

Download Full-text

Non-Coding RNAs and Human Diseases

10.3389/978-2-88963-832-1 ◽

2020 ◽

Keyword(s):

Human Diseases ◽

Non Coding Rnas

Download Full-text

Evolutionary Analysis of DNA-Protein-Coding Regions Based on a Genetic Code Cube Metric

Current Topics in Medicinal Chemistry ◽

10.2174/1568026613666131204110022 ◽

2014 ◽

Vol 14 (3) ◽

pp. 407-417

Author(s):

Robersy Sanchez

Keyword(s):

Genetic Code ◽

Evolutionary Analysis ◽

Protein Coding ◽

Coding Regions

Download Full-text

The open targets post-GWAS analysis pipeline

Bioinformatics ◽

10.1093/bioinformatics/btaa020 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2936-2937 ◽

Cited By ~ 4

Author(s):

Gareth Peat ◽

William Jones ◽

Michael Nuhn ◽

José Carlos Marugán ◽

William Newell ◽

...

Keyword(s):

Drug Targets ◽

Gene Expression Regulation ◽

Association Studies ◽

Genome Wide Association Studies ◽

Protein Coding ◽

Data Resource ◽

Coding Regions ◽

Genome Wide ◽

Causal Genes ◽

Interactive Data

Abstract Motivation Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. Results We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. Availability and implementation The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.

Download Full-text

Epigenetic roles of PIWI proteins and piRNAs in colorectal cancer

Cancer Cell International ◽

10.1186/s12935-021-02034-3 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Fatemeh Sadoughi ◽

Seyyed Mehdi Mirhashemi ◽

Zatollah Asemi

Keyword(s):

Colorectal Cancer ◽

Human Diseases ◽

Regulatory Function ◽

Epigenetic Alterations ◽

Gastrointestinal Malignancy ◽

Abnormal Expression ◽

Gene Regulatory ◽

Non Coding Rnas ◽

Entire World

AbstractSmall non‐coding RNAs (sncRNAs) are a subgroup of non‐coding RNAs, with less than 200 nucleotides length and no potential for coding proteins. PiRNAs, a member of sncRNAs, were first discovered more than a decade ago and have attracted researcher’s attention because of their gene regulatory function both in the nucleus and in the cytoplasm. Recent investigations have found that the abnormal expression of these sncRNAs is involved in many human diseases, including cancers. Colorectal cancer (CRC), as a common gastrointestinal malignancy, is one of the important causes of cancer‐related deaths through the entire world and appears to be a consequence of mutation in the genome and epigenetic alterations. The aim of this review is to realize whether there is a relationship between CRC and piRNAs or not.

Download Full-text