GENCODE Annotation for the Human and Mouse Genome: A User Perspective

Author(s):  
Saleh Musleh ◽  
Meshari Alazmi ◽  
Tanvir Alam
2014 ◽  
Vol 42 (2) ◽  
pp. 500-503 ◽  
Author(s):  
Alaa Shafie ◽  
Mingzhan Xue ◽  
Paul J. Thornalley ◽  
Naila Rabbani

The glyoxalase I gene GLO1 is a hotspot for copy number variation in the human and mouse genomes. The additional copies are often functional, giving rise to 2–4-fold increased glyoxalase I expression and activity. The prevalence of GLO1 copy number increase in the human population appears to be approximately 2% and may be linked to a risk of obesity, diabetes and aging. Increased GLO1 copy number has been found in human tumour cell lines and primary human tumours. The minimum common copy number increase region was approximately 1 Mb and it contained GLO1 and seven other genes. The increased copy number was generally functional, being associated with increased glyoxalase I protein and multidrug resistance in cancer chemotherapy. Glo1 duplication in the mouse genome is found within approximately 0.5 Mb of duplicated DNA. It was claimed to be linked to anxiety phenotypes, but other related discordant findings have doubted the association with glyoxalase I and further investigation is required.


Genome ◽  
2011 ◽  
Vol 54 (2) ◽  
pp. 144-150 ◽  
Author(s):  
Xia Shen ◽  
Haimei Mao ◽  
Shan Miao

cis-Elements CArG bound by serum response factor (SRF) are presently being intensively studied, but little is known about the substitution pattern of functional CArG elements. Here, we have performed the first evolutionary analysis of CArGome in the human and mouse genome through bioinformatic methods and statistical tests. We calculated the substitution rate at each site of the functional CArG elements. The results showed that the core sites of the functional CArG elements evolved faster than did the background DNA, indicating that these sites were likely to evolve under positive selection. Moreover, a strong TATA “motif” was evident in the core region within the functional CArG elements in both human and mouse promoters. This motif could probably be a major contribution to the formation of the spatial structure, which was important for CArG-SRF recognition. Thus, the study further revealed the sequence character and substitution pattern of CArG elements and provided useful information for the study of the SRF-binding efficiencies of CArG promoters in functional assays.


2020 ◽  
Author(s):  
Sehyun Oh ◽  
Jasmine Abdelnabi ◽  
Ragheed Al-Dulaimi ◽  
Ayush Aggarwal ◽  
Marcel Ramos ◽  
...  

AbstractGene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome Informatics project (MGI) for mouse genes provide authoritative sources of valid, aliased, and outdated symbols, but lack a programmatic interface and correction of symbols converted by spreadsheets. We present HGNChelper, an R package that identifies known aliases and outdated gene symbols based on the HGNC human and MGI mouse gene symbol databases, in addition to common mislabeling introduced by spreadsheets, and provides corrections where possible. HGNChelper identified invalid gene symbols in the most recent Molecular Signatures Database (mSigDB 7.0) and in platform annotation files of the Gene Expression Omnibus, with prevalence ranging from ∼3% in recent platforms to 30-40% in the earliest platforms from 2002-03. HGNChelper is installable from CRAN, with open development and issue tracking on GitHub and an associated pkgdown site https://waldronlab.io/HGNChelper/.


2003 ◽  
Vol 13 (8) ◽  
pp. 1966-1972 ◽  
Author(s):  
Denis M. Larkin ◽  
Annelie Everts-van der Wind ◽  
Mark Rebeiz ◽  
Peter A. Schweitzer ◽  
Sharon Bachman ◽  
...  

As a step toward the goal of adding the cattle genome to those available for multispecies comparative genome analysis, 40,224 cattle BAC clones were end-sequenced, yielding 60,547 sequences (BAC end sequences, BESs) after trimming with an average read length of 515 bp. Cattle BACs were anchored to the human and mouse genome sequences by BLASTN search, revealing 29.4% and 10.1% significant hits (E < e-5), respectively. More than 60% of all cattle BES hits in both the human and mouse genomes are located within known genes. In order to confirm in silico predictions of orthology and their relative position on cattle chromosomes, 84 cattle BESs with similarity to sequences on HSA11 were mapped using a cattle–hamster radiation hybrid (RH) panel. Resulting RH maps of BTA15 and BTA29 cover ∼85% of HSA11 sequence, revealing a complex patchwork shuffling of segments not explained by a simple translocation followed by internal rearrangements. Overlay of the mouse conserved syntenies onto HSA11 revealed that segmental boundaries appear to be conserved in all three species. The BAC clone-based comparative map provides a foundation for the evolutionary analysis of mammalian karyotypes and for sequencing of the cattle genome.


2003 ◽  
Vol 90 (08) ◽  
pp. 185-193 ◽  
Author(s):  
Roman Szabo ◽  
Qingyu Wu ◽  
Robert Dickson ◽  
Sarah Netzel-Arnett ◽  
Toni Antalis ◽  
...  

SummaryThe recent availability of human and mouse genome sequences and expressed sequence tag databases facilitated the identification of a large new family of membrane anchored serine proteases, the type II transmembrane serine proteases or TTSPs. Analyses of human inherited disorders and gene targeting studies in mice have revealed that several members of this new protease family have critical functions in development and health. Preliminary studies also suggest that aberrant expression of type II transmembrane serine proteases may be linked to disease progression. The knowledge gathered thus far of the genetics, physiology, and pathology of this interesting new serine protease family will be reviewed here in brief.


2006 ◽  
Vol 174 (2) ◽  
pp. 169-174 ◽  
Author(s):  
Jürgen Schweizer ◽  
Paul E. Bowden ◽  
Pierre A. Coulombe ◽  
Lutz Langbein ◽  
E. Birgitte Lane ◽  
...  

Keratins are intermediate filament–forming proteins that provide mechanical support and fulfill a variety of additional functions in epithelial cells. In 1982, a nomenclature was devised to name the keratin proteins that were known at that point. The systematic sequencing of the human genome in recent years uncovered the existence of several novel keratin genes and their encoded proteins. Their naming could not be adequately handled in the context of the original system. We propose a new consensus nomenclature for keratin genes and proteins that relies upon and extends the 1982 system and adheres to the guidelines issued by the Human and Mouse Genome Nomenclature Committees. This revised nomenclature accommodates functional genes and pseudogenes, and although designed specifically for the full complement of human keratins, it offers the flexibility needed to incorporate additional keratins from other mammalian species.


2013 ◽  
Vol 368 (1620) ◽  
pp. 20120366 ◽  
Author(s):  
Chih-Hao Hsu ◽  
Ivan Ovcharenko

Lineage-specific regulatory elements underlie adaptation of species and play a role in disease susceptibility. We compared functionally conserved and lineage-specific enhancers by cross-mapping 5042 human and 6564 mouse heart enhancers. Of these, 79 per cent are lineage-specific, lacking a functional orthologue. Heart enhancers tend to cluster and, commonly, there are multiple heart enhancers in a heart locus providing a regulatory stability to the locus. We observed little cross-clustering, however, between lineage-specific and functionally conserved heart enhancers suggesting regulatory function acquisition and development in loci previously lacking heart activity. We also identified 862 human-specific heart enhancers: 417 featuring sequence conservation with mouse (class II) and 445 with neither sequence nor function conservation (class III). Ninety-eight per cent of class III enhancers were deleted from the mouse genome, and we estimated a similar-sized enhancer gain in the human lineage. Human-specific enhancers display no detectable decrease in the negative selection pressure and are strongly associated with genes partaking in the heart regulatory programmes. The loss of a heart enhancer could be compensated by activity of a redundant heart enhancer; however, we observed redundancy in only 15 per cent of class II and III enhancer loci indicating a large-scale reprogramming of the heart regulatory programme in mammals.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 1493
Author(s):  
Sehyun Oh ◽  
Jasmine Abdelnabi ◽  
Ragheed Al-Dulaimi ◽  
Ayush Aggarwal ◽  
Marcel Ramos ◽  
...  

Gene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome Informatics project (MGI) for mouse genes provide authoritative sources of valid, aliased, and outdated symbols, but lack a programmatic interface and correction of symbols converted by spreadsheets. We present HGNChelper, an R package that identifies known aliases and outdated gene symbols based on the HGNC human and MGI mouse gene symbol databases, in addition to common mislabeling introduced by spreadsheets, and provides corrections where possible. HGNChelper identified invalid gene symbols in the most recent Molecular Signatures Database (mSigDB 7.0) and in platform annotation files of the Gene Expression Omnibus, with prevalence ranging from ~3% in recent platforms to 30-40% in the earliest platforms from 2002-03. HGNChelper is installable from CRAN.


Sign in / Sign up

Export Citation Format

Share Document