Overlapping protein-coding genes in human genome and their coincidental expression in tissues

Chao-Hsin Chen; Chao-Yu Pan; Wen-chang Lin

doi:10.1038/s41598-019-49802-w

Overlapping protein-coding genes in human genome and their coincidental expression in tissues

Scientific Reports ◽

10.1038/s41598-019-49802-w ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 2

Author(s):

Chao-Hsin Chen ◽

Chao-Yu Pan ◽

Wen-chang Lin

Keyword(s):

Human Genome ◽

Expression Profiles ◽

Tissue Expression ◽

Human Protein ◽

Clear Understanding ◽

Overlapping Genes ◽

Genome Sequences ◽

Protein Coding ◽

Protein Coding Genes ◽

Overlapping Gene

Abstract The completion of human genome sequences and the advancement of next-generation sequencing technologies have engendered a clear understanding of all human genes. Overlapping genes are usually observed in compact genomes, such as those of bacteria and viruses. Notably, overlapping protein-coding genes do exist in human genome sequences. Accordingly, we used the current Ensembl gene annotations to identify overlapping human protein-coding genes. We analysed 19,200 well-annotated protein-coding genes and determined that 4,951 protein-coding genes overlapped with their adjacent genes. Approximately a quarter of all human protein-coding genes were overlapping genes. We observed different clusters of overlapping protein-coding genes, ranging from two genes (paired overlapping genes) to 22 genes. We also divided the paired overlapping protein-coding gene groups into four subtypes. We found that the divergent overlapping gene subtype had a stronger expression association than did the subtypes of 5ʹ-tandem overlapping and 3ʹ-tandem overlapping genes. The majority of paired overlapping genes exhibited comparable coincidental tissue expression profiles; however, a few overlapping gene pairs displayed distinctive tissue expression association patterns. In summary, we have carefully examined the genomic features and distributions about human overlapping protein-coding genes and found coincidental expression in tissues for most overlapping protein-coding genes.

Get full-text (via PubEx)

The shrinking human protein coding complement: are there fewer than 20,000 genes?

10.1101/001909 ◽

2014 ◽

Cited By ~ 2

Author(s):

Iakes Ezkurdia ◽

David Juan ◽

Jose Manuel Rodriguez ◽

Adam Frankish ◽

Mark Deikhans ◽

...

Keyword(s):

Protein Expression ◽

Human Genome ◽

Genome Annotation ◽

Large Scale ◽

Cellular Protein ◽

Human Protein ◽

Protein Coding ◽

Detection Rates ◽

Protein Coding Genes ◽

Peptide Mass

Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein coding potential is the detection of cellular protein expression through peptide mass spectrometry experiments. Here we map the peptides detected in 7 large-scale proteomics studies to almost 60% of the protein coding genes in the GENCODE annotation the human genome. We find that conservation across vertebrate species and the age of the gene family are key indicators of whether a peptide will be detected in proteomics experiments. We find peptides for most highly conserved genes and for practically all genes that evolved before bilateria. At the same time there is almost no evidence of protein expression for genes that have appeared since primates, or for genes that do not have any protein-like features or cross-species conservation. We identify 19 non-protein-like features such as weak conservation, no protein features or ambiguous annotations in major databases that are indicators of low peptide detection rates. We use these features to describe a set of 2,001 genes that are potentially non-coding, and show that many of these genes behave more like non-coding genes than protein-coding genes. We detect peptides for just 3% of these genes. We suggest that many of these 2,001 genes do not code for proteins under normal circumstances and that they should not be included in the human protein coding gene catalogue. These potential non-coding genes will be revised as part of the ongoing human genome annotation effort.

Get full-text (via PubEx)

Faculty Opinions recommendation of Distribution of SR protein exonic splicing enhancer motifs in human protein-coding genes.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1027960.335073 ◽

2005 ◽

Author(s):

Lawrence Chasin

Keyword(s):

Human Protein ◽

Exonic Splicing Enhancer ◽

Sr Protein ◽

Protein Coding ◽

Protein Coding Genes ◽

Splicing Enhancer

Get full-text (via PubEx)

Faculty Opinions recommendation of A systematic survey of loss-of-function variants in human protein-coding genes.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13923985.15512056 ◽

2012 ◽

Author(s):

Isaac Kohane

Keyword(s):

Human Protein ◽

Loss Of Function ◽

Protein Coding ◽

Systematic Survey ◽

Protein Coding Genes

Get full-text (via PubEx)

Analysis of correlation of local GC level in human protein coding genes

Hereditas (Beijing) ◽

10.3724/sp.j.1005.2008.01169 ◽

2009 ◽

Vol 30 (9) ◽

pp. 1169-1174

Author(s):

Xiang-Gui CHEN

Keyword(s):

Human Protein ◽

Protein Coding ◽

Protein Coding Genes

Get full-text (via PubEx)

Systematic analyses of the cancer genome: lessons learned from sequencing most of the annotated human protein-coding genes

Current Opinion in Oncology ◽

10.1097/cco.0b013e3282f31108 ◽

2008 ◽

Vol 20 (1) ◽

pp. 66-71 ◽

Cited By ~ 15

Author(s):

Tobias Sjöblom

Keyword(s):

Cancer Genome ◽

Lessons Learned ◽

Human Protein ◽

Protein Coding ◽

Protein Coding Genes

Get full-text (via PubEx)

Biogenic mechanisms and utilization of small RNAs derived from human protein-coding genes

Nature Structural & Molecular Biology ◽

10.1038/nsmb.2091 ◽

2011 ◽

Vol 18 (9) ◽

pp. 1075-1082 ◽

Cited By ~ 69

Author(s):

Eivind Valen ◽

Pascal Preker ◽

Peter Refsing Andersen ◽

Xiaobei Zhao ◽

Yun Chen ◽

...

Keyword(s):

Small Rnas ◽

Human Protein ◽

Protein Coding ◽

Protein Coding Genes

Get full-text (via PubEx)

A unified allosteric/torpedo mechanism for transcriptional termination on human protein-coding genes

Genes & Development ◽

10.1101/gad.332833.119 ◽

2019 ◽

Vol 34 (1-2) ◽

pp. 132-145 ◽

Cited By ~ 15

Author(s):

Joshua D. Eaton ◽

Laura Francis ◽

Lee Davidson ◽

Steven West

Keyword(s):

Human Protein ◽

Protein Coding ◽

Protein Coding Genes ◽

Transcriptional Termination

Get full-text (via PubEx)

LncExpDB: an expression database of human long non-coding RNAs

Nucleic Acids Research ◽

10.1093/nar/gkaa850 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D962-D968 ◽

Cited By ~ 2

Author(s):

Zhao Li ◽

Lin Liu ◽

Shuai Jiang ◽

Qianpeng Li ◽

Changrui Feng ◽

...

Keyword(s):

Expression Profiles ◽

Biological Functions ◽

Protein Coding ◽

Web Interfaces ◽

Functional Studies ◽

Protein Coding Genes ◽

Genes Expression ◽

Wide Range ◽

Non Coding Rnas ◽

User Friendly

Abstract Expression profiles of long non-coding RNAs (lncRNAs) across diverse biological conditions provide significant insights into their biological functions, interacting targets as well as transcriptional reliability. However, there lacks a comprehensive resource that systematically characterizes the expression landscape of human lncRNAs by integrating their expression profiles across a wide range of biological conditions. Here, we present LncExpDB (https://bigd.big.ac.cn/lncexpdb), an expression database of human lncRNAs that is devoted to providing comprehensive expression profiles of lncRNA genes, exploring their expression features and capacities, identifying featured genes with potentially important functions, and building interactions with protein-coding genes across various biological contexts/conditions. Based on comprehensive integration and stringent curation, LncExpDB currently houses expression profiles of 101 293 high-quality human lncRNA genes derived from 1977 samples of 337 biological conditions across nine biological contexts. Consequently, LncExpDB estimates lncRNA genes’ expression reliability and capacities, identifies 25 191 featured genes, and further obtains 28 443 865 lncRNA-mRNA interactions. Moreover, user-friendly web interfaces enable interactive visualization of expression profiles across various conditions and easy exploration of featured lncRNAs and their interacting partners in specific contexts. Collectively, LncExpDB features comprehensive integration and curation of lncRNA expression profiles and thus will serve as a fundamental resource for functional studies on human lncRNAs.

Get full-text (via PubEx)

Predicting Coding Potential from Genome Sequence: Application to Betaherpesviruses Infecting Rats and Mice

Journal of Virology ◽

10.1128/jvi.79.12.7570-7596.2005 ◽

2005 ◽

Vol 79 (12) ◽

pp. 7570-7596 ◽

Cited By ~ 46

Author(s):

Luciano Brocchieri ◽

Thomas N. Kledal ◽

Samuel Karlin ◽

Edward S. Mocarski

Keyword(s):

Genome Annotation ◽

Mrna Splicing ◽

Overlapping Genes ◽

Genome Sequences ◽

Protein Coding ◽

Coding Regions ◽

Translation Signals ◽

Rats And Mice ◽

Coding Potential ◽

Exon Gene

ABSTRACT Prediction of protein-coding regions and other features of primary DNA sequence have greatly contributed to experimental biology. Significant challenges remain in genome annotation methods, including the identification of small or overlapping genes and the assessment of mRNA splicing or unconventional translation signals in expression. We have employed a combined analysis of compositional biases and conservation together with frame-specific G+C representation to reevaluate and annotate the genome sequences of mouse and rat cytomegaloviruses. Our analysis predicts that there are at least 34 protein-coding regions in these genomes that were not apparent in earlier annotation efforts. These include 17 single-exon genes, three new exons of previously identified genes, a newly identified four-exon gene for a lectin-like protein (in rat cytomegalovirus), and 10 probable frameshift extensions of previously annotated genes. This expanded set of candidate genes provides an additional basis for investigation in cytomegalovirus biology and pathogenesis.

Get full-text (via PubEx)