Landscape of human miRNA variation and conservation using Annotative Database of miRNA Elements, ADmiRE
AbstractMicroRNAs (miRNAs) are the most abundant class of non-coding RNAs that regulate expression of >60% genes and are frequently deregulated in many human diseases. Sequence variants in miRNAs are expected to have a high impact on miRNA function. However, the lack of miRNA variant annotation and prioritization guidelines has hampered this analysis from whole genome/exome sequencing (WGS/WES) studies. Through the development of an Annotative Database of miRNA Elements, ADmiRE workflow, we re-annotated the publicly available population dataset of gnomAD 15,596 WGS and 123,136 WES and describe 26,094 precursor-miRNA variants. AdmiRE annotates twice the miRNA variants predicted by existing tools which prioritize variation relative to protein coding regions. We provide the allele frequency distribution of miRNA variation which is comparable to variation in exonic regions. This distribution is similar for miRNAs located in the intragenic and intergenic genomic context. Moreover, ‘high confidence’ miRNAs (designated by miRBase) harbor less variation (the majority contributed by rare variants) compared with the remaining miRNAs. We identify 279 miRNAs highly constrained with little or no variation in gnomAD. We further describe the evolutionary conservation of miRNAs across 100 vertebrates and identify 434 highly conserved miRNAs. We demonstrate that these constraint and conservation metrics (now incorporated into the ADmiRE workflow) characterize miRNAs previously implicated in human diseases. In conclusion, through the development of ADmiRE, we comprehensively analyze the landscape of miRNA sequence variation in large human population datasets and provide miRNA vertebrate conservation scores to aid future studies of miRNA variation in human diseases.