Human CHR18: “Stakhanovite” Genes, Missing and uPE1 Proteins in Liver Tissue and HepG2 Cells

Missing (MP) and functionally uncharacterized proteins (uPE1) comprise less than 5% of the total number of proteins encoded by human Chr18 genes. Within half a year, since the January 2020 version of NextProt, the number of entries in the MP+uPE1 datasets changed, mainly due to the achievements of antibody-based proteomics. Assuming that the proteome is closely related to the transcriptome scaffold, quantitative PCR, Illumina HiSeq, and Oxford Nanopore Technology were applied to characterize the liver samples of three male donors in comparison with the HepG2 cell line. The data mining of the Expression Atlas (EMBL-EBI) and the profiling of biopsy samples by using orthogonal methods of transcriptome analysis have shown that in HepG2 cells and the liver, the genes encoding functionally uncharacterized proteins (uPE1) are expressed as low as for the missing proteins (less than 1 copy per cell), except the selected cases of HSBP1L1, TMEM241, C18orf21, and KLHL14. The initial expectation that uPE1 genes might be expressed at higher levels than MP genes, was compromised by severe discrepancies in our semi-quantitative gene expression data and in public databanks. Such discrepancy forced us to revisit the transcriptome of Chr18, the target of the Russian C-HPP Consortium. Tanglegram of highly expressed genes and further correlation analysis have shown the severe dependencies on the mRNA extraction method and the analytical platform. Targeted gene expression analysis by quantitative PCR (qPCR) and high-throughput transcriptome profiling (Illumina HiSeq and ONT MinION) for the same set of samples from normal liver tissue and HepG2 cells revealed the detectable expression of 250+ (92%) protein-coding genes of Chr18 (at least one method). The expression of slightly more than 50% protein-coding genes was detected simultaneously by all three methods. Correlation analysis of the gene expression profiles showed that the grouping of the datasets depended almost equally on both the type of biological material and the experimental method, particularly cDNA/mRNA isolation and library preparation.

Download Full-text

Human Chr18: “Stakhanovite” Genes, Missing and uPE1 Proteins in Liver Tissue and HepG2 Cells

10.1101/2020.11.04.358739 ◽

2020 ◽

Author(s):

George S. Krasnov ◽

Sergey P. Radko ◽

Konstantin G. Ptitsyn ◽

Valeriya V. Shapovalova ◽

Olga S. Timoshenko ◽

...

Keyword(s):

Gene Expression ◽

Correlation Analysis ◽

Quantitative Pcr ◽

Liver Tissue ◽

Hepg2 Cells ◽

Expression Profiles ◽

Uncharacterized Protein ◽

Illumina Hiseq ◽

Protein Coding ◽

Protein Coding Genes

AbstractMissing (MP) and functionally uncharacterized proteins (uPE1) comprise less than 5% of the total number of human Chr18 genes. Within half a year, since the January 2020 version of NextProt, the number of entries in the MP+uPE1 datasets has changed, mainly due to the achievements of antibody-based proteomics. Assuming that the proteome is closely related to the transcriptome scaffold, quantitative PCR, Illumina HiSeq, and Oxford Nanopore Technology were applied to characterize the liver samples of three male donors compared with the HepG2 cell line. The data mining of Expression Atlas (EMBL-EBI) and the profiling of our biospecimens using orthogonal methods of transcriptome analysis have shown that in HepG2 cells and the liver, the genes encoding functionally uncharacterized proteins (uPE1) are expressed as low as for the missing proteins (less than 1 copy per cell), except for selected cases of HSBP1L1, TMEM241, C18orf21, and KLHL14. The initial expectation that uPE1 genes might be expressed at higher levels than MP genes, was compromised by severe discrepancies in our semi-quantitative gene expression data and in public databanks. Such discrepancy forced us to revisit the transcriptome of Chr18, the target of Russian C-HPP Consortia. Tanglegram of highly expressed genes and further correlation analysis have shown the severe dependencies on the mRNA extraction method and analytical platform.Targeted gene expression analysis by quantitative PCR (qPCR) and high-throughput transcriptome profiling (Illumina HiSeq and ONT MinION) for the same set of samples from normal liver tissue and HepG2 cells revealed the detectable expression of 250+ (92%) protein-coding genes of Chr18 (at least one method). The expression of slightly more than 50% protein-coding genes was detected simultaneously by all three methods. Correlation analysis of the gene expression profiles showed that the grouping of the datasets depended almost equally on both the type of biological material and the experimental method, particularly cDNA/mRNA isolation and library preparation. The dependence on the choice of bioinformatics analysis pipeline was also noticeable but significantly less. Furthermore, the combination of Illumina HiSeq and ONT MinION sequencing to validate proteotypic peptides of missing and uPE1 proteins was performed for the heat-shock factor binding protein HSBP1L1 (missing protein, recently transferred to PE1 category) and uncharacterized protein C18orf21 (uPE1). We observed that a nonsynonymous SNP led to the loss of the site of trypsinolysis in HSBP1L1. The modified version of HSBP1L1 was included in the sequence database and searched against the MS/MS dataset from Kulak, Geyer & Mann (2017), but delivered no significant identification. Thus, HSBP1L1 is still missing for the MS-pillar of C-HPP, although its existence at the protein level has been confirmed.

Download Full-text

GC-AG introns features in long non-coding and protein-coding genes suggest their role in gene expression regulation

10.26226/morressier.5ebd45acffea6f735881b039 ◽

2020 ◽

Author(s):

Monah Abou Alezz

Keyword(s):

Gene Expression ◽

Gene Expression Regulation ◽

Expression Regulation ◽

Protein Coding ◽

Protein Coding Genes

Download Full-text

Tri-methylation of Histone H3 Lysine 4 Facilitates Gene Expression in Ageing Cells

10.1101/238048 ◽

2017 ◽

Cited By ~ 1

Author(s):

Cristina Cruz ◽

Monica Della Rosa ◽

Christel Krueger ◽

Qian Gao ◽

Lucy Field ◽

...

Keyword(s):

Gene Expression ◽

Budding Yeast ◽

Histone H3 ◽

Specific Factor ◽

Protein Coding ◽

Replicative Lifespan ◽

Protein Coding Genes ◽

Genome Wide ◽

Normal Expression ◽

Transcriptional Induction

AbstractTranscription of protein coding genes is accompanied by recruitment of COMPASS to promoter-proximal chromatin, which deposits di- and tri-methylation on histone H3 lysine 4 (H3K4) to form H3K4me2 and H3K4me3. Here we determine the importance of COMPASS in maintaining gene expression across lifespan in budding yeast. We find that COMPASS mutations dramatically reduce replicative lifespan and cause widespread gene expression defects. Known repressive functions of H3K4me2 are progressively lost with age, while hundreds of genes become dependent on H3K4me3 for full expression. Induction of these H3K4me3 dependent genes is also impacted in young cells lacking COMPASS components including the H3K4me3-specific factor Spp1. Remarkably, the genome-wide occurrence of H3K4me3 is progressively reduced with age despite widespread transcriptional induction, minimising the normal positive correlation between promoter H3K4me3 and gene expression. Our results provide clear evidence that H3K4me3 is required to attain normal expression levels of many genes across organismal lifespan.

Download Full-text

Two waves of transcriptomic changes in periovulatory human granulosa cells

Human Reproduction ◽

10.1093/humrep/deaa043 ◽

2020 ◽

Vol 35 (5) ◽

pp. 1230-1245 ◽

Cited By ~ 2

Author(s):

L C Poulsen ◽

J A Bøtkjær ◽

O Østrup ◽

K B Petersen ◽

C Yding Andersen ◽

...

Keyword(s):

Gene Expression ◽

Growth Factor ◽

Tumour Necrosis Factor ◽

Tumour Necrosis ◽

Fertility Treatment ◽

Protein Coding ◽

Time Points ◽

Protein Coding Genes ◽

Upstream Regulators ◽

Necrosis Factor

Abstract STUDY QUESTION How does the human granulosa cell (GC) transcriptome change during ovulation? SUMMARY ANSWER Two transcriptional peaks were observed at 12 h and at 36 h after induction of ovulation, both dominated by genes and pathways known from the inflammatory system. WHAT IS KNOWN ALREADY The crosstalk between GCs and the oocyte, which is essential for ovulation and oocyte maturation, can be assessed through transcriptomic profiling of GCs. Detailed transcriptional changes during ovulation have not previously been assessed in humans. STUDY DESIGN, SIZE, DURATION This prospective cohort study comprised 50 women undergoing fertility treatment in a standard antagonist protocol at a university hospital-affiliated fertility clinic in 2016–2018. PARTICIPANTS/MATERIALS, SETTING, METHODS From each woman, one sample of GCs was collected by transvaginal ultrasound-guided follicle aspiration either before or 12 h, 17 h or 32 h after ovulation induction (OI). A second sample was collected at oocyte retrieval, 36 h after OI. Total RNA was isolated from GCs and analyzed by microarray. Gene expression differences between the five time points were assessed by ANOVA with a random factor accounting for the pairing of samples, and seven clusters of protein-coding genes representing distinct expression profiles were identified. These were used as input for subsequent bioinformatic analyses to identify enriched pathways and suggest upstream regulators. Subsets of genes were assessed to explore specific ovulatory functions. MAIN RESULTS AND THE ROLE OF CHANCE We identified 13 345 differentially expressed transcripts across the five time points (false discovery rate, <0.01) of which 58% were protein-coding genes. Two clusters of mainly downregulated genes represented cell cycle pathways and DNA repair. Upregulated genes showed one peak at 12 h that resembled the initiation of an inflammatory response, and one peak at 36 h that resembled the effector functions of inflammation such as vasodilation, angiogenesis, coagulation, chemotaxis and tissue remodelling. Genes involved in cell–matrix interactions as a part of cytoskeletal rearrangement and cell motility were also upregulated at 36 h. Predicted activated upstream regulators of ovulation included FSH, LH, transforming growth factor B1, tumour necrosis factor, nuclear factor kappa-light-chain-enhancer of activated B cells, coagulation factor 2, fibroblast growth factor 2, interleukin 1 and cortisol, among others. The results confirmed early regulation of several previously described factors in a cascade inducing meiotic resumption and suggested new factors involved in cumulus expansion and follicle rupture through co-regulation with previously described factors. LARGE SCALE DATA The microarray data were deposited to the Gene Expression Omnibus (www.ncbi.nlm.nih.gov/gds/, accession number: GSE133868). LIMITATIONS, REASONS FOR CAUTION The study included women undergoing ovarian stimulation and the findings may therefore differ from a natural cycle. However, the results confirm significant regulation of many well-established ovulatory genes from a series of previous studies such as amphiregulin, epiregulin, tumour necrosis factor alfa induced protein 6, tissue inhibitor of metallopeptidases 1 and plasminogen activator inhibitor 1, which support the relevance of the results. WIDER IMPLICATIONS OF THE FINDINGS The study increases our understanding of human ovarian function during ovulation, and the publicly available dataset is a valuable resource for future investigations. Suggested upstream regulators and highly differentially expressed genes may be potential pharmaceutical targets in fertility treatment and gynaecology. STUDY FUNDING/COMPETING INTEREST(S) The study was funded by EU Interreg ÔKS V through ReproUnion (www.reprounion.eu) and by a grant from the Region Zealand Research Foundation. None of the authors have any conflicts of interest to declare.

Download Full-text

Overlapping protein-coding genes in human genome and their coincidental expression in tissues

Scientific Reports ◽

10.1038/s41598-019-49802-w ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 2

Author(s):

Chao-Hsin Chen ◽

Chao-Yu Pan ◽

Wen-chang Lin

Keyword(s):

Human Genome ◽

Expression Profiles ◽

Tissue Expression ◽

Human Protein ◽

Clear Understanding ◽

Overlapping Genes ◽

Genome Sequences ◽

Protein Coding ◽

Protein Coding Genes ◽

Overlapping Gene

Abstract The completion of human genome sequences and the advancement of next-generation sequencing technologies have engendered a clear understanding of all human genes. Overlapping genes are usually observed in compact genomes, such as those of bacteria and viruses. Notably, overlapping protein-coding genes do exist in human genome sequences. Accordingly, we used the current Ensembl gene annotations to identify overlapping human protein-coding genes. We analysed 19,200 well-annotated protein-coding genes and determined that 4,951 protein-coding genes overlapped with their adjacent genes. Approximately a quarter of all human protein-coding genes were overlapping genes. We observed different clusters of overlapping protein-coding genes, ranging from two genes (paired overlapping genes) to 22 genes. We also divided the paired overlapping protein-coding gene groups into four subtypes. We found that the divergent overlapping gene subtype had a stronger expression association than did the subtypes of 5ʹ-tandem overlapping and 3ʹ-tandem overlapping genes. The majority of paired overlapping genes exhibited comparable coincidental tissue expression profiles; however, a few overlapping gene pairs displayed distinctive tissue expression association patterns. In summary, we have carefully examined the genomic features and distributions about human overlapping protein-coding genes and found coincidental expression in tissues for most overlapping protein-coding genes.

Download Full-text

Identifying inaccuracies in gene expression estimates from unstranded RNA-seq data

Scientific Reports ◽

10.1038/s41598-019-52584-w ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 1

Author(s):

Mikhail Pomaznoy ◽

Ashu Sethi ◽

Jason Greenbaum ◽

Bjoern Peters

Keyword(s):

Gene Expression ◽

Differential Expression Analysis ◽

Cell Types ◽

Library Preparation ◽

Rna Seq ◽

Protein Coding ◽

Protein Coding Genes ◽

Machine Learning Model ◽

Specific Manner ◽

Library Preparation Protocol

Abstract RNA-seq methods are widely utilized for transcriptomic profiling of biological samples. However, there are known caveats of this technology which can skew the gene expression estimates. Specifically, if the library preparation protocol does not retain RNA strand information then some genes can be erroneously quantitated. Although strand-specific protocols have been established, a significant portion of RNA-seq data is generated in non-strand-specific manner. We used a comprehensive stranded RNA-seq dataset of 15 blood cell types to identify genes for which expression would be erroneously estimated if strand information was not available. We found that about 10% of all genes and 2.5% of protein coding genes have a two-fold or higher difference in estimated expression when strand information of the reads was ignored. We used parameters of read alignments of these genes to construct a machine learning model that can identify which genes in an unstranded dataset might have incorrect expression estimates and which ones do not. We also show that differential expression analysis of genes with biased expression estimates in unstranded read data can be recovered by limiting the reads considered to those which span exonic boundaries. The resulting approach is implemented as a package available at https://github.com/mikpom/uslcount.

Download Full-text

RNAseq Studies Reveal Distinct Transcriptional Response to Vitamin a (VA) Deficiency in Small Intestine (SI) versus Colon, Discovering Novel VA-regulated Genes (P15-002-19)

Current Developments in Nutrition ◽

10.1093/cdn/nzz037.p15-002-19 ◽

2019 ◽

Vol 3 (Supplement_1) ◽

Author(s):

Zhi Chai ◽

Yafei Lyu ◽

Qiuyan Chen ◽

Cheng-Hsin Wei ◽

Lindsay Snyder ◽

...

Keyword(s):

Gene Expression ◽

Small Intestine ◽

Vitamin A ◽

Target Genes ◽

Expression Profiles ◽

Expression Patterns ◽

Differentially Expressed Gene ◽

Gene Expression Profiles ◽

Illumina Hiseq ◽

The Impact

Abstract Objectives To characterize and compare the impact of vitamin A (VA) deficiency on gene expression patterns in the small intestine (SI) and the colon, and to discover novel target genes in VA-related biological pathways. Methods vitamin A deficient (VAD) mice were generated by feeding VAD diet to pregnant C57/BL6 dams and their post-weaning offspring. Total mRNA extracted from SI and colon were sequenced using Illumina HiSeq 2500 platform. Differentially Expressed Gene (DEG), Gene Ontology (GO) enrichment, and Weighted Gene Co-expression Network Analysis (WGCNA) were performed to characterize expression patterns and co-expression patterns. Results The comparison between vitamin A sufficient (VAS) and VAD groups detected 49 and 94 DEGs in SI and colon, respectively. According to GO information, DEGs in the SI demonstrated significant enrichment in categories relevant to retinoid metabolic process, molecule binding, and immune function. Immunity related pathways, such as “humoral immune response” and “complement activation,” were positively associated with VA in SI. On the contrary, in colon, “cell division” was the only enriched category and was negatively associated with VA. WGCNA identified modules significantly correlated with VA status in SI and in colon. One of those modules contained five known retinoic acid targets. Therefore we have prioritized the other module members (e.g., Mbl2, Mmp9, Mmp13, Cxcl14 and Pkd1l2) to be investigated as candidate genes regulated by VA. Comparison of co-expression modules between SI and colon indicated distinct VA effects on these two organs. Conclusions The results show that VA deficiency alters the gene expression profiles in SI and colon quite differently. Some immune-related genes (Mbl2, Mmp9, Mmp13, Cxcl14 and Pkd1l2) may be novel targets under the control of VA in SI. Funding Sources NIH training grant and NIH research grant. Supporting Tables, Images and/or Graphs

Download Full-text

LncExpDB: an expression database of human long non-coding RNAs

Nucleic Acids Research ◽

10.1093/nar/gkaa850 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D962-D968 ◽

Cited By ~ 2

Author(s):

Zhao Li ◽

Lin Liu ◽

Shuai Jiang ◽

Qianpeng Li ◽

Changrui Feng ◽

...

Keyword(s):

Expression Profiles ◽

Biological Functions ◽

Protein Coding ◽

Web Interfaces ◽

Functional Studies ◽

Protein Coding Genes ◽

Genes Expression ◽

Wide Range ◽

Non Coding Rnas ◽

User Friendly

Abstract Expression profiles of long non-coding RNAs (lncRNAs) across diverse biological conditions provide significant insights into their biological functions, interacting targets as well as transcriptional reliability. However, there lacks a comprehensive resource that systematically characterizes the expression landscape of human lncRNAs by integrating their expression profiles across a wide range of biological conditions. Here, we present LncExpDB (https://bigd.big.ac.cn/lncexpdb), an expression database of human lncRNAs that is devoted to providing comprehensive expression profiles of lncRNA genes, exploring their expression features and capacities, identifying featured genes with potentially important functions, and building interactions with protein-coding genes across various biological contexts/conditions. Based on comprehensive integration and stringent curation, LncExpDB currently houses expression profiles of 101 293 high-quality human lncRNA genes derived from 1977 samples of 337 biological conditions across nine biological contexts. Consequently, LncExpDB estimates lncRNA genes’ expression reliability and capacities, identifies 25 191 featured genes, and further obtains 28 443 865 lncRNA-mRNA interactions. Moreover, user-friendly web interfaces enable interactive visualization of expression profiles across various conditions and easy exploration of featured lncRNAs and their interacting partners in specific contexts. Collectively, LncExpDB features comprehensive integration and curation of lncRNA expression profiles and thus will serve as a fundamental resource for functional studies on human lncRNAs.

Download Full-text

BCL6 positively regulates AID and germinal center gene expression via repression of miR-155

Journal of Experimental Medicine ◽

10.1084/jem.20121387 ◽

2012 ◽

Vol 209 (13) ◽

pp. 2455-2465 ◽

Cited By ~ 77

Author(s):

Katia Basso ◽

Christof Schneider ◽

Qiong Shen ◽

Antony B. Holmes ◽

Manu Setty ◽

...

Keyword(s):

Gene Expression ◽

B Cells ◽

Germinal Center ◽

Potential Role ◽

Transcriptional Repressor ◽

Substantial Evidence ◽

Protein Coding ◽

Protein Coding Genes

The BCL6 proto-oncogene encodes a transcriptional repressor that is required for germinal center (GC) formation and whose de-regulation is involved in lymphomagenesis. Although substantial evidence indicates that BCL6 exerts its function by repressing the transcription of hundreds of protein-coding genes, its potential role in regulating gene expression via microRNAs (miRNAs) is not known. We have identified a core of 15 miRNAs that show binding of BCL6 in their genomic loci and are down-regulated in GC B cells. Among BCL6 validated targets, miR-155 and miR-361 directly modulate AID expression, indicating that via repression of these miRNAs, BCL6 up-regulates AID. Similarly, the expression of additional genes relevant for the GC phenotype, including SPI1, IRF8, and MYB, appears to be sustained via BCL6-mediated repression of miR-155. These findings identify a novel mechanism by which BCL6, in addition to repressing protein coding genes, promotes the expression of important GC functions by repressing specific miRNAs.

Download Full-text