4. Proteins

Mapping Intimacies ◽

10.1093/actrade/9780198723882.003.0004 ◽

2016 ◽

Author(s):

Aysha Divan ◽

Janice A. Royds

Keyword(s):

Alternative Splicing ◽

Human Genome ◽

The Body ◽

Biological Functions ◽

Protein Coding ◽

Post Translational Modifications ◽

Protein Coding Genes ◽

Composition And Structure ◽

A Cell ◽

Structure Of Proteins

Biological functions require protein and the protein makeup of a cell determines its behaviour and identity. Proteins, therefore, are the most abundant molecules in the body except for water. The approximately 20,000 protein coding genes in the human genome can, by alternative splicing, multiple translation starts, and post-translational modifications, produce over 1,000,000 different proteins, collectively called ‘the proteome’. It is the size of the proteome and not the genome that defines the complexity of an organism. ‘Proteins’ describes the composition and structure of proteins and how they are studied. What information is required in order to understand how proteins work and what happens when this function is impaired in disease?

Download Full-text

Overlapping protein-coding genes in human genome and their coincidental expression in tissues

Scientific Reports ◽

10.1038/s41598-019-49802-w ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 2

Author(s):

Chao-Hsin Chen ◽

Chao-Yu Pan ◽

Wen-chang Lin

Keyword(s):

Human Genome ◽

Expression Profiles ◽

Tissue Expression ◽

Human Protein ◽

Clear Understanding ◽

Overlapping Genes ◽

Genome Sequences ◽

Protein Coding ◽

Protein Coding Genes ◽

Overlapping Gene

Abstract The completion of human genome sequences and the advancement of next-generation sequencing technologies have engendered a clear understanding of all human genes. Overlapping genes are usually observed in compact genomes, such as those of bacteria and viruses. Notably, overlapping protein-coding genes do exist in human genome sequences. Accordingly, we used the current Ensembl gene annotations to identify overlapping human protein-coding genes. We analysed 19,200 well-annotated protein-coding genes and determined that 4,951 protein-coding genes overlapped with their adjacent genes. Approximately a quarter of all human protein-coding genes were overlapping genes. We observed different clusters of overlapping protein-coding genes, ranging from two genes (paired overlapping genes) to 22 genes. We also divided the paired overlapping protein-coding gene groups into four subtypes. We found that the divergent overlapping gene subtype had a stronger expression association than did the subtypes of 5ʹ-tandem overlapping and 3ʹ-tandem overlapping genes. The majority of paired overlapping genes exhibited comparable coincidental tissue expression profiles; however, a few overlapping gene pairs displayed distinctive tissue expression association patterns. In summary, we have carefully examined the genomic features and distributions about human overlapping protein-coding genes and found coincidental expression in tissues for most overlapping protein-coding genes.

Download Full-text

LncExpDB: an expression database of human long non-coding RNAs

Nucleic Acids Research ◽

10.1093/nar/gkaa850 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D962-D968 ◽

Cited By ~ 2

Author(s):

Zhao Li ◽

Lin Liu ◽

Shuai Jiang ◽

Qianpeng Li ◽

Changrui Feng ◽

...

Keyword(s):

Expression Profiles ◽

Biological Functions ◽

Protein Coding ◽

Web Interfaces ◽

Functional Studies ◽

Protein Coding Genes ◽

Genes Expression ◽

Wide Range ◽

Non Coding Rnas ◽

User Friendly

Abstract Expression profiles of long non-coding RNAs (lncRNAs) across diverse biological conditions provide significant insights into their biological functions, interacting targets as well as transcriptional reliability. However, there lacks a comprehensive resource that systematically characterizes the expression landscape of human lncRNAs by integrating their expression profiles across a wide range of biological conditions. Here, we present LncExpDB (https://bigd.big.ac.cn/lncexpdb), an expression database of human lncRNAs that is devoted to providing comprehensive expression profiles of lncRNA genes, exploring their expression features and capacities, identifying featured genes with potentially important functions, and building interactions with protein-coding genes across various biological contexts/conditions. Based on comprehensive integration and stringent curation, LncExpDB currently houses expression profiles of 101 293 high-quality human lncRNA genes derived from 1977 samples of 337 biological conditions across nine biological contexts. Consequently, LncExpDB estimates lncRNA genes’ expression reliability and capacities, identifies 25 191 featured genes, and further obtains 28 443 865 lncRNA-mRNA interactions. Moreover, user-friendly web interfaces enable interactive visualization of expression profiles across various conditions and easy exploration of featured lncRNAs and their interacting partners in specific contexts. Collectively, LncExpDB features comprehensive integration and curation of lncRNA expression profiles and thus will serve as a fundamental resource for functional studies on human lncRNAs.

Download Full-text

The distribution pattern of genetic variation in the transcript isoforms of the alternatively spliced protein-coding genes in the human genome

Molecular BioSystems ◽

10.1039/c5mb00132c ◽

2015 ◽

Vol 11 (5) ◽

pp. 1378-1388 ◽

Cited By ~ 8

Author(s):

Ting Liu ◽

Kui Lin

Keyword(s):

Genetic Variation ◽

Distribution Pattern ◽

Human Genome ◽

Protein Coding ◽

Transcript Isoforms ◽

Protein Coding Genes ◽

Alternatively Spliced

The relationships among the types of transcripts, the classes of coding SNPs and the population frequencies in the human genome.

Download Full-text

Pre-mRNA Splicing Mechanisms, Misregulation in Disease, and Therapeutic Strategies

Blood ◽

10.1182/blood.v120.21.sci-13.sci-13 ◽

2012 ◽

Vol 120 (21) ◽

pp. SCI-13-SCI-13

Author(s):

Adrian Krainer

Keyword(s):

Alternative Splicing ◽

Research Funding ◽

Therapeutic Strategies ◽

Mrna Splicing ◽

Protein Isoforms ◽

Current Status ◽

Targeted Therapeutics ◽

Protein Coding ◽

Protein Coding Genes ◽

Normal Gene

Abstract Abstract SCI-13 Most eukaryotic protein-coding genes have one or more introns, and their transcripts can undergo alternative splicing, giving rise to multiple isoforms. Accurate splicing is essential for normal gene expression, and alternative splicing is a key mechanism for expanding the proteome and regulating the expression of diverse protein isoforms. This session will review the general mechanisms of pre-mRNA splicing and the regulation of alternative splicing. In addition, the process of how abnormal splicing arises as a result of intronic or exonic mutations in particular genes, or more globally as a result of splicing-factor misregulation, as well as the contribution of splicing misregulation to cancer, will be described. Lastly the current status of targeted therapeutics development, focusing on antisense approaches to correct abnormal splicing of specific genes or to modulate alternative splicing, will be discussed. Disclosures: Krainer: ISIS Pharmaceuticals: Consultancy, Patents & Royalties, Research Funding.

Download Full-text

Mass Spectrometric (MS) Analysis of Proteins and Peptides

Current Protein and Peptide Science ◽

10.2174/1389203721666200726223336 ◽

2020 ◽

Vol 21 ◽

Author(s):

Madhuri Jayathirtha ◽

Emmalyn J. Dupree ◽

Zaen Manzoor ◽

Brianna Larose ◽

Zach Sechrist ◽

...

Keyword(s):

Mass Spectrometry ◽

Alternative Splicing ◽

Human Genome ◽

Mass Spectrometric ◽

Protein Isoforms ◽

Gene Products ◽

Post Translational Modifications ◽

The Past ◽

Transient Interactions ◽

Proteins And Peptides

: The human genome is sequenced and is comprised of~30,000 genes, making humans just a little bit more complicated than worms or flies. However, complexity of humans is given by proteins that these genes code for, because one gene can produce many proteins mostly through alternative splicing and tissue-dependent expression of particular proteins. In addition, post-translational modifications (PTMs) in proteins greatly increase the number of gene products or protein isoforms. Furthermore, stable and transient interactions between proteins, protein isoforms/proteoforms and PTM-ed proteins (proteinprotein interactions, PPI) adds yet another level of complexity in humans and other organisms. In the past, all of these proteins were analyzed one at the time. Currently, they are analyzed by a less tedious method: mass spectrometry (MS) for two reasons: 1) because of the complexity of proteins, protein PTMs and PPIs and 2) because MS is the only method that can keep up with such a complex array of features. Here, we discuss the applications of mass spectrometry in protein analysis.

Download Full-text

Integrative Analysis Reveals the Prognostic Value and Functions of Splicing Factors Implicated in Hepatocellular Carcinoma

10.21203/rs.3.rs-408292/v1 ◽

2021 ◽

Author(s):

Yue Wang ◽

Fan Yang ◽

Jiaqi Shang ◽

Haitao He ◽

Qing Yang

Keyword(s):

Hepatocellular Carcinoma ◽

Prognostic Model ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Splicing Factors ◽

Biological Functions ◽

Clinical Value ◽

Multiple Cancer ◽

Protein Coding ◽

Protein Coding Genes

Abstract Splicing factors (SFs) play critical roles in the pathogenesis of various cancers through regulating tumor-associated alternative splicing (AS) events. However, the clinical value and biological functions of SFs in hepatocellular carcinoma (HCC) remain obscure. In this study, we identified 40 dysregulated SFs in HCC and established a prognostic model composed of four SFs (DNAJC6, ZC3H13, IGF2BP3, DDX19B). The predictive efficiency and independence of the prognostic model were confirmed to be satisfactory. Gene Set Enrichment Analysis (GSEA) illustrated the risk score calculated by our prognostic model was significantly associated with multiple cancer-related pathways and metabolic processes. Furthermore, we constructed the SFs-AS events regulatory network and extracted 108 protein-coding genes from the network for following functional explorations. Protein-protein interaction (PPI) network delineated the potential interactions among these 108 protein-coding genes. GO and KEGG pathway analyses investigated ontology gene sets and canonical pathways enriched by these 108 protein-coding genes. Overlapping the results of GSEA and KEGG, seven pathways were identified to be potential pathways regulated by our prognostic model through triggering aberrant AS events in HCC. In conclusion, the present study established an effective prognostic model based on SFs for HCC patients. Functional explorations of SFs and SFs-associated AS events provided directions to explore biological functions and mechanisms of SFs in HCC tumorigenesis.

Download Full-text

The spliced leader sequence of Trypanosoma brucei has a potential role as a cap donor structure

Molecular and Cellular Biology ◽

10.1128/mcb.5.9.2487-2490.1985 ◽

1985 ◽

Vol 5 (9) ◽

pp. 2487-2490

Author(s):

M J Lenardo ◽

D M Dorfman ◽

J E Donelson

Keyword(s):

Trypanosoma Brucei ◽

Potential Role ◽

Leader Sequence ◽

The Body ◽

Trypanosoma Brucei Brucei ◽

Protein Coding ◽

Spliced Leader ◽

Protein Coding Genes ◽

Separate Locus

Trypanosoma brucei brucei and other trypanosomatid species are unique among eucaryotes because transcription of their protein-coding genes is discontinuous. The 5' ends of their mRNAs consist of an identical 35-nucleotide spliced leader which is encoded at a separate locus from that for the body of the protein-coding transcript. We show here that the spliced leader transcript contains a 5' cap structure and suggest that at least one function of the spliced leader sequence is to provide a cap structure to trypanosome mRNAs.

Download Full-text

Gene Expression Profile in Responsive and Non-Responsive Chronic Myeloid Leukemia Patients Treated with Dasatinib.

Blood ◽

10.1182/blood.v114.22.3260.3260 ◽

2009 ◽

Vol 114 (22) ◽

pp. 3260-3260

Author(s):

Rosana A Silveira ◽

Angela A Fachel ◽

Yuri B Moreira ◽

Marcia T Delamain ◽

Carmino Antonio De Souza ◽

...

Keyword(s):

Gene Expression ◽

Human Genome ◽

Mononuclear Cells ◽

Cytogenetic Response ◽

Post Treatment ◽

Differentially Expressed ◽

Regulation Of Transcription ◽

Protein Coding ◽

Altered Expression ◽

Protein Coding Genes

Abstract Abstract 3260 Poster Board III-1 Background: CML treatment with tyrosine kinase inhibitors induces high and durable rates of complete cytogenetic response. Despite treatment efficacy, a significant proportion of patients develop resistance to these drugs. We measured gene expression profiles in an attempt to identify gene pathways that may be associated with dasatinib resistance. Patients and Methods: Mononuclear cells were separated from peripheral blood samples from seven CML patients resistant to imatinib, collected prior and after dasatinib treatment. Three patients who achieved partial cytogenetic response (Ph-positive cells: 1% - 35%) within twelve months were considered responders (R), whereas four patients who failed to achieve PCyR within 12 months of treatment were classified as non-responders. RNA samples prepared from peripheral mononuclear cells were hybridized to Agilent Technologies 4×44K Whole Human Genome Microarrays (WHGM) and 4×44K intronic-exonic custom oligoarrays. The latter was developed by Verjovski-Almeida's group (Nakaya et al, Genome Biology 2007, 8:R43) and contains sense and antisense probes that map to intronic regions in the human genome representing totally (TIN) and partially (PIN) intronic non-coding RNAs (ncRNAs), in addition to probes for the corresponding protein-coding genes of the same loci. Raw microarray data were normalized by the Affy package in statistical R language implemented in the Bioconductor platform. Each sample was labeled in replicate with Cy3 or Cy5 and the two were considered technical replicates. Two independent statistical approaches SAM (Significance Analysis of Microarrays) and Golub's discrimination score (SNR, Signal to Noise Ratio, with permutations) were performed to identify differentially expressed transcripts between responder and non-responder patients. For the intronic-exonic platform, the analysis parameters were FDR 10%, SNR>1.5 and p<0.01, and for WHGM platform parameters were FDR 5%, SNR>1.5 and p<0.001. For this latter platform, we also performed a patient leave-one-out analysis. Functions of transcripts differentially expressed were annotated and compared using GO Biological Process categories (www.genetools.microarray.ntu.no/egon). Results: We identified 34 ncRNAs with altered expression (26 over and 8 underexpressed in responders) in pre-treatment samples and 33 ncRNAs (20 over and 13 underexpressed in responders) in post-treatment samples. Functions associated with protein-coding genes from the same genomic loci as those of the intronic differentially expressed ncRNAs were: regulation of transcription (PRMT5, SOD2, SSBP3, BCL7A, MLL), signal transduction (PRKCB1, RASGRP2, NF1, PXN) and apoptosis (BCL2, PCSK6, TNFAIP8, EIF4G2). WHGM platform data analysis showed 63 and 250 protein-coding genes differentially expressed in pre and post-treatment samples, respectively. We observed a higher number of protein-coding genes with altered expression after treatment in the following functions: cell communication, immune response and metabolic process (p<0.02). Conclusions: Overall, these findings indicate that protein-coding genes and intronic ncRNAs may be related to dasatinib resistance and response to treatment. In particular, altered expression of ncRNAs transcribed from the introns of ‘regulation of transcription' genes could be part of an important alternative mechanism of gene expression control during emergence of resistance.Support: FAPESP (2005/60266-8) Disclosures: No relevant conflicts of interest to declare.

Download Full-text

Alternative splicing at NAGNAG acceptors in Arabidopsis thaliana SR and SR-related protein-coding genes

BMC Genomics ◽

10.1186/1471-2164-9-159 ◽

2008 ◽

Vol 9 (1) ◽

pp. 159 ◽

Cited By ~ 25

Author(s):

Stefanie Schindler ◽

Karol Szafranski ◽

Michael Hiller ◽

Gul Ali ◽

Saiprasad G Palusa ◽

...

Keyword(s):

Arabidopsis Thaliana ◽

Alternative Splicing ◽

Related Protein ◽

Protein Coding ◽

Protein Coding Genes

Download Full-text

Extreme purifying selection against point mutations in the human genome

10.1101/2021.08.23.457339 ◽

2021 ◽

Author(s):

Noah Dukler ◽

Mehreen R Mughal ◽

Ritika Ramani ◽

Yi-Fei Huang ◽

Adam Siepel

Keyword(s):

Human Genome ◽

De Novo ◽

Point Mutations ◽

Purifying Selection ◽

Selection Coefficient ◽

Sequencing Data ◽

Protein Coding ◽

Coding Regions ◽

Protein Coding Genes ◽

Selective Effects

Genome sequencing of tens of thousands of human individuals has recently enabled the measurement of large selective effects for mutations to protein-coding genes. Here we describe a new method, called ExtRaINSIGHT, for measuring similar selective effects at individual sites in noncoding as well as in coding regions of the human genome. ExtRaINSIGHT estimates the prevalance of strong purifying selection, or "ultraselection" (λs), as the fractional depletion of rare single-nucleotide variants (minor allele frequency <0.1%) in a target set of genomic sites relative to matched sites that are putatively neutrally evolving, in a manner that controls for local variation and neighbor-dependence in mutation rate. We show using simulations that, above an appropriate threshold, λs is closely related to the average site-specific selection coefficient against heterozygous point mutations, as predicted at mutation-selection balance. Applying ExtRaINSIGHT to 71,702 whole genome sequences from gnomAD v3, we find particularly strong evidence of ultraselection in evolutionarily ancient miRNAs and neuronal protein-coding genes, as well as at splice sites. Moreover, our estimated selection coefficient against heterozygous amino-acid replacements across the genome (at 1.4%) is substantially larger than previous estimates based on smaller sample sizes. By contrast, we find weak evidence of ultraselection in other noncoding RNAs and transcription factor binding sites, and only modest evidence in ultraconserved elements and human accelerated regions. We estimate that ~0.3-0.5% of the human genome is ultraselected, with one third to one half of ultraselected sites falling in coding regions. These estimates suggest ~0.3-0.4 lethal or nearly lethal de novo mutations per potential human zygote, together with ~2 de novo mutations that are more weakly deleterious. Overall, our study sheds new light on the genome-wide distribution of fitness effects for new point mutations by combining deep new sequencing data sets and classical theory from population genetics.

Download Full-text