A Computational Strategy for Protein Function Assignment which Addresses the Multidomain Problem

A method for assigning functions to unknown sequences based on finding correlations between short signals and functional annotations in a protein database is presented. This approach is based on keyword (KW) and feature (FT) information stored in the SWISS-PROT database. The former refers to particular protein characteristics and the latter locates these characteristics at a specific sequence position. In this way, a certain keyword is only assigned to a sequence if sequence similarity is found in the position described by the FT field. Exhaustive tests performed over sequences with homologues (cluster set) and without homologues (singleton set) in the database show that assigning functions is much ’cleaner’ when information about domains (FT field) is used, than when only the keywords are used.

Download Full-text

De NovoAssembly and Characterization ofOryza officinalisLeaf Transcriptome by Using RNA-Seq

BioMed Research International ◽

10.1155/2015/982065 ◽

2015 ◽

Vol 2015 ◽

pp. 1-7 ◽

Cited By ~ 3

Author(s):

Ying Bao ◽

Si Xu ◽

Xiang Jing ◽

Lu Meng ◽

Zongyan Qin

Keyword(s):

Wild Rice ◽

De Novo ◽

Sequence Similarity ◽

Cultivated Rice ◽

Protein Database ◽

Next Generation Sequencing Technology ◽

Functional Annotations ◽

Wild Rice Species ◽

Leaf Transcriptome ◽

Clusters Of Orthologous Groups

Although endeavors have been made to identify useful wild rice genes that can be used to improve cultivated rice, the virtual reservoir of genetic variation hidden within the wild relatives of cultivated rice is largely untapped. Here, using next-generation sequencing technology, we investigated the leaf transcriptome of a wild riceO. officinaliswith CC genome. Approximately 23 million reads were produced in the species leaf transcriptome analysis andde novoassembly methods constructed 68,132 unigenes. Functional annotations for the unigenes were conducted using sequence similarity comparisons against the following databases: the nonredundant nucleotide database, the nonredundant protein database, the SWISS-PROT database, the Clusters of Orthologous Groups of proteins database, the Kyoto Encyclopedia of Genes and Genomes database, the Gene Ontology Consortium database, and the InterPro domains database. In addition, a total of 476 unigenes related to disease resistance were identified inO. officinalis, and these unigenes can serve as important genetic resources for cultivated rice breeding and quality improvement. The present study broadens our understanding of the genetic background of non-AA genomic wild rice species and it also provides a bridge to extend studies to otherOryzaspecies with CC genomes.

Download Full-text

The proteome: structure, function and evolution

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2005.1802 ◽

2006 ◽

Vol 361 (1467) ◽

pp. 441-451 ◽

Cited By ~ 17

Author(s):

Keiran Fleming ◽

Lawrence A Kelley ◽

Suhail A Islam ◽

Robert M MacCallum ◽

Arne Muller ◽

...

Keyword(s):

Protein Function ◽

Sequence Similarity ◽

Structural Annotation ◽

New Approach ◽

Link Type ◽

University College London ◽

Automated Pipeline ◽

3D Genomics ◽

And Function ◽

Structural Homologue

This paper reports two studies to model the inter-relationships between protein sequence, structure and function. First, an automated pipeline to provide a structural annotation of proteomes in the major genomes is described. The results are stored in a database at Imperial College, London (3D-GENOMICS) that can be accessed at www.sbg.bio.ic.ac.uk . Analysis of the assignments to structural superfamilies provides evolutionary insights. 3D-GENOMICS is being integrated with related proteome annotation data at University College London and the European Bioinformatics Institute in a project known as e-protein ( http://www.e-protein.org/ ). The second topic is motivated by the developments in structural genomics projects in which the structure of a protein is determined prior to knowledge of its function. We have developed a new approach PHUNCTIONER that uses the gene ontology (GO) classification to supervise the extraction of the sequence signal responsible for protein function from a structure-based sequence alignment. Using GO we can obtain profiles for a range of specificities described in the ontology. In the region of low sequence similarity (around 15%), our method is more accurate than assignment from the closest structural homologue. The method is also able to identify the specific residues associated with the function of the protein family.

Download Full-text

Identification of residue inversions in large phylogenies of duplicated proteins

10.1101/2021.11.04.467263 ◽

2021 ◽

Author(s):

Stefano Pascarelli ◽

Paola Laurino

Keyword(s):

Gene Duplication ◽

Protein Sequence ◽

Protein Function ◽

High Throughput Sequencing ◽

Functional Divergence ◽

Growth Factor Receptor ◽

Sequence Evolution ◽

Protein Database ◽

Sequencing Studies ◽

Homology Relationship

Connecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. Protein database annotation helps to bridge this gap; however, it is fundamental to understand the mechanisms underlying functional inheritance and divergence. If the homology relationship between proteins is known, can we determine whether the function diverged? In this work, we analyze different possibilities of protein sequence evolution after gene duplication and identify "residue inversions", i.e., sites where the relationship between the ancestry and the functional signal is decoupled. Residues in these sites play a role in functional divergence and could indicate a shift in protein function. We develop a method to recognize residue inversions in a phylogeny and test it on real and simulated datasets. In a dataset built from the Epidermal Growth Factor Receptor (EGFR) sequences found in 88 fish species, we identify 19 positions that went through inversion after gene duplication, mostly located at the ligand-binding extracellular domain.

Download Full-text

Role of undergraduate biochemistry education in protein function assignment (618.26)

The FASEB Journal ◽

10.1096/fasebj.28.1_supplement.618.26 ◽

2014 ◽

Vol 28 (S1) ◽

Author(s):

Paul Craig ◽

Greg Dodge ◽

Herbert Bernstein

Keyword(s):

Protein Function ◽

Function Assignment

Download Full-text

A Novel Method for Functional Annotation Prediction Based on Combination of Classification Methods

The Scientific World JOURNAL ◽

10.1155/2014/542824 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9

Author(s):

Jaehee Jung ◽

Heung Ki Lee ◽

Gangman Yi

Keyword(s):

Protein Function ◽

Protein Function Prediction ◽

Controlled Vocabulary ◽

Functional Annotations ◽

Functional Homology ◽

Large Sets ◽

Unknown Protein ◽

Protein Functions ◽

Novel Method ◽

The Relationship

Automated protein function prediction defines the designation of functions of unknown protein functions by using computational methods. This technique is useful to automatically assign gene functional annotations for undefined sequences in next generation genome analysis (NGS). NGS is a popular research method since high-throughput technologies such as DNA sequencing and microarrays have created large sets of genes. These huge sequences have greatly increased the need for analysis. Previous research has been based on the similarities of sequences as this is strongly related to the functional homology. However, this study aimed to designate protein functions by automatically predicting the function of the genome by utilizing InterPro (IPR), which can represent the properties of the protein family and groups of the protein function. Moreover, we used gene ontology (GO), which is the controlled vocabulary used to comprehensively describe the protein function. To define the relationship between IPR and GO terms, three pattern recognition techniques have been employed under different conditions, such as feature selection and weighted value, instead of a binary one.

Download Full-text

The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens

Genome Biology ◽

10.1186/s13059-019-1835-8 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 41

Author(s):

Naihui Zhou ◽

Yuxiang Jiang ◽

Timothy R. Bergquist ◽

Alexandra J. Lee ◽

Balint Z. Kacsoh ◽

...

Keyword(s):

Protein Function ◽

Functional Annotation ◽

Protein Function Prediction ◽

Mutation Screening ◽

Function Prediction ◽

Long Term Memory ◽

Functional Annotations ◽

Genome Wide ◽

New Development ◽

Working Together

Abstract Background The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Results Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. Conclusion We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.

Download Full-text

Prediction of Functional Class of Proteins and Peptides Irrespective of Sequence Homology by Support Vector Machines

Bioinformatics and Biology Insights ◽

10.4137/bbi.s315 ◽

2007 ◽

Vol 1 ◽

pp. BBI.S315 ◽

Cited By ~ 3

Author(s):

Zhi Qun Tang ◽

Hong Huang Lin ◽

Hai Lei Zhang ◽

Lian Yi Han ◽

Xin Chen ◽

...

Keyword(s):

Support Vector Machines ◽

Protein Function ◽

Wide Spectrum ◽

Sequence Similarity ◽

Functional Class ◽

Support Vector ◽

Homologous Proteins ◽

Vector Machines ◽

Derived Properties ◽

Proteins And Peptides

Various computational methods have been used for the prediction of protein and peptide function based on their sequences. A particular challenge is to derive functional properties from sequences that show low or no homology to proteins of known function. Recently, a machine learning method, support vector machines (SVM), have been explored for predicting functional class of proteins and peptides from amino acid sequence derived properties independent of sequence similarity, which have shown promising potential for a wide spectrum of protein and peptide classes including some of the low- and non-homologous proteins. This method can thus be explored as a potential tool to complement alignment-based, clustering-based, and structure-based methods for predicting protein function. This article reviews the strategies, current progresses, and underlying difficulties in using SVM for predicting the functional class of proteins. The relevant software and web-servers are described. The reported prediction performances in the application of these methods are also presented.

Download Full-text

Superimposition of Viral Protein Structures: A Means to Decipher the Phylogenies of Viruses

Viruses ◽

10.3390/v12101146 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1146

Author(s):

Janne J. Ravantti ◽

Ane Martinez-Castillo ◽

Nicola G.A. Abrescia

Keyword(s):

Protein Function ◽

Major Capsid Protein ◽

Sequence Similarity ◽

Protein Structures ◽

Structural Homology ◽

Structural Comparison ◽

Emerging Viruses ◽

Enveloped Viruses ◽

Original Observation ◽

Jelly Roll

Superimposition of protein structures is key in unravelling structural homology across proteins whose sequence similarity is lost. Structural comparison provides insights into protein function and evolution. Here, we review some of the original findings and thoughts that have led to the current established structure-based phylogeny of viruses: starting from the original observation that the major capsid proteins of plant and animal viruses possess similar folds, to the idea that each virus has an innate “self”. This latter idea fueled the conceptualization of the PRD1-adenovirus lineage whose members possess a major capsid protein (innate “self”) with a double jelly roll fold. Based on this approach, long-range viral evolutionary relationships can be detected allowing the virosphere to be classified in four structure-based lineages. However, this process is not without its challenges or limitations. As an example of these hurdles, we finally touch on the difficulty of establishing structural “self” traits for enveloped viruses showcasing the coronaviruses but also the power of structure-based analysis in the understanding of emerging viruses

Download Full-text

Phylo-PFP: improved automated protein function prediction using phylogenetic distance of distantly related sequences

Bioinformatics ◽

10.1093/bioinformatics/bty704 ◽

2018 ◽

Vol 35 (5) ◽

pp. 753-759 ◽

Cited By ~ 8

Author(s):

Aashish Jain ◽

Daisuke Kihara

Keyword(s):

Protein Function ◽

Transfer Functions ◽

Sequence Similarity ◽

Protein Function Prediction ◽

Prediction Method ◽

Query Protein ◽

Function Prediction ◽

Homology Search ◽

Supplementary Information ◽

Phylogenetic Distance

Abstract Motivation Function annotation of proteins is fundamental in contemporary biology across fields including genomics, molecular biology, biochemistry, systems biology and bioinformatics. Function prediction is indispensable in providing clues for interpreting omics-scale data as well as in assisting biologists to build hypotheses for designing experiments. As sequencing genomes is now routine due to the rapid advancement of sequencing technologies, computational protein function prediction methods have become increasingly important. A conventional method of annotating a protein sequence is to transfer functions from top hits of a homology search; however, this approach has substantial short comings including a low coverage in genome annotation. Results Here we have developed Phylo-PFP, a new sequence-based protein function prediction method, which mines functional information from a broad range of similar sequences, including those with a low sequence similarity identified by a PSI-BLAST search. To evaluate functional similarity between identified sequences and the query protein more accurately, Phylo-PFP reranks retrieved sequences by considering their phylogenetic distance. Compared to the Phylo-PFP’s predecessor, PFP, which was among the top ranked methods in the second round of the Critical Assessment of Functional Annotation (CAFA2), Phylo-PFP demonstrated substantial improvement in prediction accuracy. Phylo-PFP was further shown to outperform prediction programs to date that were ranked top in CAFA2. Availability and implementation Phylo-PFP web server is available for at http://kiharalab.org/phylo_pfp.php. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

In Vivo Action of the HRD Ubiquitin Ligase Complex: Mechanisms of Endoplasmic Reticulum Quality Control and Sterol Regulation

Molecular and Cellular Biology ◽

10.1128/mcb.21.13.4276-4291.2001 ◽

2001 ◽

Vol 21 (13) ◽

pp. 4276-4291 ◽

Cited By ~ 93

Author(s):

Richard G. Gardner ◽

Alexander G. Shearer ◽

Randolph Y. Hampton

Keyword(s):

Quality Control ◽

Endoplasmic Reticulum ◽

Ubiquitin Ligase ◽

Sequence Similarity ◽

High Specificity ◽

Specific Sequence ◽

Ubiquitin Ligase Complex ◽

Ubiquitin Conjugating Enzyme ◽

Conjugating Enzyme

ABSTRACT Ubiquitination is used to target both normal proteins for specific regulated degradation and misfolded proteins for purposes of quality control destruction. Ubiquitin ligases, or E3 proteins, promote ubiquitination by effecting the specific transfer of ubiquitin from the correct ubiquitin-conjugating enzyme, or E2 protein, to the target substrate. Substrate specificity is usually determined by specific sequence determinants, or degrons, in the target substrate that are recognized by the ubiquitin ligase. In quality control, however, a potentially vast collection of proteins with characteristic hallmarks of misfolding or misassembly are targeted with high specificity despite the lack of any sequence similarity between substrates. In order to understand the mechanisms of quality control ubiquitination, we have focused our attention on the first characterized quality control ubiquitin ligase, the HRD complex, which is responsible for the endoplasmic reticulum (ER)-associated degradation (ERAD) of numerous ER-resident proteins. Using an in vivo cross-linking assay, we directly examined the association of the separate HRDcomplex components with various ERAD substrates. We have discovered that the HRD ubiquitin ligase complex associates with both ERAD substrates and stable proteins, but only mediates ubiquitin-conjugating enzyme association with ERAD substrates. Our studies with the sterol pathway-regulated ERAD substrate Hmg2p, an isozyme of the yeast cholesterol biosynthetic enzyme HMG-coenzyme A reductase (HMGR), indicated that the HRD complex discerns between a degradation-competent “misfolded” state and a stable, tightly folded state. Thus, it appears that the physiologically regulated, HRD-dependent degradation of HMGR is effected by a programmed structural transition from a stable protein to a quality control substrate.

Download Full-text