scholarly journals Hidden in plain sight: What remains to be discovered in the eukaryotic proteome?

2018 ◽  
Author(s):  
Valerie Wood ◽  
Antonia Lock ◽  
Midori A. Harris ◽  
Kim Rutherford ◽  
Jürg Bähler ◽  
...  

AbstractThe first decade of genome sequencing stimulated an explosion in the characterization of unknown proteins. More recently, the pace of functional discovery has slowed, leaving around 20% of the proteins even in well-studied model organisms without informative descriptions of their biological roles. Remarkably, many uncharacterized proteins are conserved from yeasts to human, suggesting that they contribute to fundamental biological processes. To fully understand biological systems in health and disease, we need to account for every part of the system. Unstudied proteins thus represent a collective blind spot that limits the progress of both basic and applied biosciences.We use a simple yet powerful metric based on Gene Ontology (GO) biological process terms to define characterized and uncharacterized proteins for human, budding yeast, and fission yeast. We then identify a set of conserved but unstudied proteins in S. pombe, and classify them based on a combination of orthogonal attributes determined by large-scale experimental and comparative methods. Finally, we explore possible reasons why these proteins remain neglected, and propose courses of action to raise their profile and thereby reap the benefits of completing the catalog of proteins’ biological roles.

Open Biology ◽  
2019 ◽  
Vol 9 (2) ◽  
pp. 180241 ◽  
Author(s):  
Valerie Wood ◽  
Antonia Lock ◽  
Midori A. Harris ◽  
Kim Rutherford ◽  
Jürg Bähler ◽  
...  

The first decade of genome sequencing stimulated an explosion in the characterization of unknown proteins. More recently, the pace of functional discovery has slowed, leaving around 20% of the proteins even in well-studied model organisms without informative descriptions of their biological roles. Remarkably, many uncharacterized proteins are conserved from yeasts to human, suggesting that they contribute to fundamental biological processes (BP). To fully understand biological systems in health and disease, we need to account for every part of the system. Unstudied proteins thus represent a collective blind spot that limits the progress of both basic and applied biosciences. We use a simple yet powerful metric based on Gene Ontology BP terms to define characterized and uncharacterized proteins for human, budding yeast and fission yeast. We then identify a set of conserved but unstudied proteins in S. pombe , and classify them based on a combination of orthogonal attributes determined by large-scale experimental and comparative methods. Finally, we explore possible reasons why these proteins remain neglected, and propose courses of action to raise their profile and thereby reap the benefits of completing the catalogue of proteins’ biological roles.


2019 ◽  
Vol 48 (D1) ◽  
pp. D650-D658 ◽  
Author(s):  
◽  
Julie Agapite ◽  
Laurent-Philippe Albou ◽  
Suzi Aleksander ◽  
Joanna Argasinska ◽  
...  

Abstract The Alliance of Genome Resources (Alliance) is a consortium of the major model organism databases and the Gene Ontology that is guided by the vision of facilitating exploration of related genes in human and well-studied model organisms by providing a highly integrated and comprehensive platform that enables researchers to leverage the extensive body of genetic and genomic studies in these organisms. Initiated in 2016, the Alliance is building a central portal (www.alliancegenome.org) for access to data for the primary model organisms along with gene ontology data and human data. All data types represented in the Alliance portal (e.g. genomic data and phenotype descriptions) have common data models and workflows for curation. All data are open and freely available via a variety of mechanisms. Long-term plans for the Alliance project include a focus on coverage of additional model organisms including those without dedicated curation communities, and the inclusion of new data types with a particular focus on providing data and tools for the non-model-organism researcher that support enhanced discovery about human health and disease. Here we review current progress and present immediate plans for this new bioinformatics resource.


2011 ◽  
Vol 279 (1726) ◽  
pp. 3-14 ◽  
Author(s):  
Megan L. Porter ◽  
Joseph R. Blasic ◽  
Michael J. Bok ◽  
Evan G. Cameron ◽  
Thomas Pringle ◽  
...  

Opsin proteins are essential molecules in mediating the ability of animals to detect and use light for diverse biological functions. Therefore, understanding the evolutionary history of opsins is key to understanding the evolution of light detection and photoreception in animals. As genomic data have appeared and rapidly expanded in quantity, it has become possible to analyse opsins that functionally and histologically are less well characterized, and thus to examine opsin evolution strictly from a genetic perspective. We have incorporated these new data into a large-scale, genome-based analysis of opsin evolution. We use an extensive phylogeny of currently known opsin sequence diversity as a foundation for examining the evolutionary distributions of key functional features within the opsin clade. This new analysis illustrates the lability of opsin protein-expression patterns, site-specific functionality (i.e. counterion position) and G-protein binding interactions. Further, it demonstrates the limitations of current model organisms, and highlights the need for further characterization of many of the opsin sequence groups with unknown function.


2015 ◽  
Vol 467 (2) ◽  
pp. 345-352 ◽  
Author(s):  
Yosua Adi Kristariyanto ◽  
Soo-Youn Choi ◽  
Syed Arif Abdul Rehman ◽  
Maria Stella Ritorto ◽  
David G Campbell ◽  
...  

Ubiquitylation regulates a multitude of biological processes and this versatility stems from the ability of ubiquitin (Ub) to form topologically different polymers of eight different linkage types. Whereas some linkages have been studied in detail, other linkage types including Lys33-linked polyUb are poorly understood. In the present study, we identify an enzymatic system for the large-scale assembly of Lys33 chains by combining the HECT (homologous to the E6–AP C-terminus) E3 ligase AREL1 (apoptosis-resistant E3 Ub protein ligase 1) with linkage selective deubiquitinases (DUBs). Moreover, this first characterization of the chain selectivity of AREL1 indicates its preference for assembling Lys33- and Lys11-linked Ub chains. Intriguingly, the crystal structure of Lys33-linked diUb reveals that it adopts a compact conformation very similar to that observed for Lys11-linked diUb. In contrast, crystallographic analysis of Lys33-linked triUb reveals a more extended conformation. These two distinct conformational states of Lys33-linked polyUb may be selectively recognized by Ub-binding domains (UBD) and enzymes of the Ub system. Importantly, our work provides a method to assemble Lys33-linked polyUb that will allow further characterization of this atypical chain type.


2016 ◽  
Author(s):  
Harold E. Smith ◽  
Amy S. Fabritius ◽  
Aimee Jaramillo-Lambert ◽  
Andy Golden

ABSTRACTWhole-genome sequencing provides a rapid and powerful method for identifying mutations on a global scale, and has spurred a renewed enthusiasm for classical genetic screens in model organisms. The most commonly characterized category of mutation consists of monogenic, recessive traits, due to their genetic tractability. Therefore, most of the mapping methods for mutation identification by whole-genome sequencing are directed toward alleles that fulfill those criteria (i.e., single-gene, homozygous variants). However, such approaches are not entirely suitable for the characterization of a variety of more challenging mutations, such as dominant and semi-dominant alleles or multigenic traits. Therefore, we have developed strategies for the identification of those classes of mutations, using polymorphism mapping in Caenorhabditis elegans as our model for validation. We also report an alternative approach for mutation identification from traditional recombinant crosses, and a solution to the technical challenge of sequencing sterile or terminally arrested strains where population size is limiting. The methods described herein extend the applicability of whole-genome sequencing to a broader spectrum of mutations, including classes that are difficult to map by traditional means.


2021 ◽  
Author(s):  
Varun S. Sharma ◽  
Andrea Fossati ◽  
Rodolfo Ciuffa ◽  
Marija Buljan ◽  
Evan G. Williams ◽  
...  

SummaryIt is a general assumption of molecular biology that the ensemble of expressed molecules, their activities and interactions determine biological processes, cellular states and phenotypes. Quantitative abundance of transcripts, proteins and metabolites are now routinely measured with considerable depth via an array of “OMICS” technologies, and recently a number of methods have also been introduced for the parallel analysis of the abundance, subunit composition and cell state specific changes of protein complexes. In comparison to the measurement of the molecular entities in a cell, the determination of their function remains experimentally challenging and labor-intensive. This holds particularly true for determining the function of protein complexes, which constitute the core functional assemblies of the cell. Therefore, the tremendous progress in multi-layer molecular profiling has been slow to translate into increased functional understanding of biological processes, cellular states and phenotypes. In this study we describe PCfun, a computational framework for the systematic annotation of protein complex function using Gene Ontology (GO) terms. This work is built upon the use of word embedding— natural language text embedded into continuous vector space that preserves semantic relationships— generated from the machine reading of 1 million open access PubMed Central articles. PCfun leverages the embedding for rapid annotation of protein complex function by integrating two approaches: (1) an unsupervised approach that obtains the nearest neighbor (NN) GO term word vectors for a protein complex query vector, and (2) a supervised approach using Random Forest (RF) models trained specifically for recovering the GO terms of protein complex queries described in the CORUM protein complex database. PCfun consolidates both approaches by performing the statistical test for the enrichment of the top NN GO terms within the child terms of the predicted GO terms by RF models. Thus, PCfun amalgamates information learned from the gold-standard protein-complex database, CORUM, with the unbiased predictions obtained directly from the word embedding, thereby enabling PCfun to identify the potential functions of putative protein complexes. The documentation and examples of the PCfun package are available at https://github.com/sharmavaruns/PCfun. We anticipate that PCfun will serve as a useful tool and novel paradigm for the large-scale characterization of protein complex function.


2020 ◽  
Author(s):  
Yudao Shen ◽  
Fengling Li ◽  
Magdalena M. Szewczyk ◽  
Levon Halebelian ◽  
Irene Chau ◽  
...  

AbstractPRMT6 catalyzes monomethylation and asymmetric dimethylation of arginine residues in various proteins, plays important roles in biological processes and is associated with multiple cancers. While there are several reported PRMT6 inhibitors, a highly selective PRMT6 inhibitor has not been reported to date. Furthermore, allosteric inhibitors of protein methyltransferases are rare. Here we report the discovery and characterization of a first-in-class, highly selective allosteric inhibitor of PRMT6, SGC6870. SGC6870 is a potent PRMT6 inhibitor (IC50 = 77 ± 6 nM) with outstanding selectivity for PRMT6 over a broad panel of other methyltransferases and non-epigenetic targets. Notably, the crystal structure of the PRMT6–SGC6870 complex and kinetic studies revealed SGC6870 binds a unique, induced allosteric pocket. Additionally, SGC6870 engages PRMT6 and potently inhibits its methyltransferase activity in cells. Moreover, SGC6870’s enantiomer, SGC6870N, is inactive against PRMT6 and can be utilized as a negative control. Collectively, SGC6870 is a well-characterized PRMT6 chemical probe and valuable tool for further investigating PRMT6 functions in health and disease.


Genome ◽  
2003 ◽  
Vol 46 (6) ◽  
pp. 947-952 ◽  
Author(s):  
Glyn Jenkins

This is an account of the development and use of genetic maps, from humble beginnings at the hands of Thomas Hunt Morgan, to the sophistication of genome sequencing. The review charters the emergence of molecular marker maps exploiting DNA polymorphism, the renaissance of cytogenetics through the use of fluorescence in situ hybridisation, and the discovery and isolation of genes by map-based cloning. The historical significance of sequencing of DNA prefaces a section describing the sequencing of genomes, the ascendancy of particular model organisms, and the utility and limitations of comparative genomic and functional genomic approaches to further our understanding of the control of biological processes. Emphasis is given throughout the treatise as to how the structure and biological behaviour of the DNA molecule underpin the technological development and biological applications of maps.Key words: maps, comparative mapping, genome sequencing, functional genomics.


2020 ◽  
Author(s):  
Teresa R. O’Meara ◽  
Matthew J. O’Meara

AbstractFunctional characterization of open reading frames in non-model organisms, such as the common opportunistic fungal pathogen Candida albicans, can be labor intensive. To meet this challenge, we built a comprehensive and unbiased co-expression network for C. albicans, which we call CalCEN, from data collected from 853 RNA sequencing runs from 18 large scale studies deposited in the NCBI Sequence Read Archive. Retrospectively, CalCEN is highly predictive of known gene function annotations and can be synergistically combined with sequence similarity and interaction networks in Saccharomyces cerevisiae through orthology for additional accuracy in gene function prediction. To prospectively demonstrate the utility of the co-expression network in C. albicans, we predicted the function of under-annotated open reading frames (ORF)s and identified CCJ1 as a novel cell cycle regulator in C. albicans. This study provides a tool for future systems biology analyses of gene function in C. albicans. We provide a computational pipeline for building and analyzing the co-expression network and CalCEN itself at (http://github.com/momeara/CalCEN).ImportanceCandida albicans is a common and deadly fungal pathogen of humans, yet the genome of this organism contains many genes of unknown function. By determining gene function, we can help identify essential genes, new virulence factors, or new regulators of drug resistance, and thereby give new targets for antifungal development. Here, we use information from large scale RNAseq studies and generate a C. albicans co-expression network (CalCEN) that is robust and able to predict gene function. We demonstrate the utility of this network in both retrospective and prospective testing, and use CalCEN to predict a role for C4_06590W/CCJ1 in cell cycle. This tool will allow for a better characterization of under-annotated genes in pathogenic yeasts.


Author(s):  
Simon Thomas

Trends in the technology development of very large scale integrated circuits (VLSI) have been in the direction of higher density of components with smaller dimensions. The scaling down of device dimensions has been not only laterally but also in depth. Such efforts in miniaturization bring with them new developments in materials and processing. Successful implementation of these efforts is, to a large extent, dependent on the proper understanding of the material properties, process technologies and reliability issues, through adequate analytical studies. The analytical instrumentation technology has, fortunately, kept pace with the basic requirements of devices with lateral dimensions in the micron/ submicron range and depths of the order of nonometers. Often, newer analytical techniques have emerged or the more conventional techniques have been adapted to meet the more stringent requirements. As such, a variety of analytical techniques are available today to aid an analyst in the efforts of VLSI process evaluation. Generally such analytical efforts are divided into the characterization of materials, evaluation of processing steps and the analysis of failures.


Sign in / Sign up

Export Citation Format

Share Document