protein function annotation
Recently Published Documents


TOTAL DOCUMENTS

35
(FIVE YEARS 11)

H-INDEX

13
(FIVE YEARS 1)

2021 ◽  
pp. 107863
Author(s):  
Marcelo B.A. Veras ◽  
Bishnu Sarker ◽  
Sabeur Aridhi ◽  
João P.P. Gomes ◽  
José A.F. Macêdo ◽  
...  

GigaScience ◽  
2021 ◽  
Vol 10 (6) ◽  
Author(s):  
Pedro Queirós ◽  
Francesco Delogu ◽  
Oskar Hickl ◽  
Patrick May ◽  
Paul Wilmes

Abstract Background The rapid development of the (meta-)omics fields has produced an unprecedented amount of high-resolution and high-fidelity data. Through the use of these datasets we can infer the role of previously functionally unannotated proteins from single organisms and consortia. In this context, protein function annotation can be described as the identification of regions of interest (i.e., domains) in protein sequences and the assignment of biological functions. Despite the existence of numerous tools, challenges remain in terms of speed, flexibility, and reproducibility. In the big data era, it is also increasingly important to cease limiting our findings to a single reference, coalescing knowledge from different data sources, and thus overcoming some limitations in overly relying on computationally generated data from single sources. Results We implemented a protein annotation tool, Mantis, which uses database identifiers intersection and text mining to integrate knowledge from multiple reference data sources into a single consensus-driven output. Mantis is flexible, allowing for the customization of reference data and execution parameters, and is reproducible across different research goals and user environments. We implemented a depth-first search algorithm for domain-specific annotation, which significantly improved annotation performance compared to sequence-wide annotation. The parallelized implementation of Mantis results in short runtimes while also outputting high coverage and high-quality protein function annotations. Conclusions Mantis is a protein function annotation tool that produces high-quality consensus-driven protein annotations. It is easy to set up, customize, and use, scaling from single genomes to large metagenomes. Mantis is available under the MIT license at https://github.com/PedroMTQ/mantis.


Author(s):  
Yue Cao ◽  
Yang Shen

Abstract Motivation Facing the increasing gap between high-throughput sequence data and limited functional insights, computational protein function annotation provides a high-throughput alternative to experimental approaches. However, current methods can have limited applicability while relying on protein data besides sequences, or lack generalizability to novel sequences, species and functions. Results To overcome aforementioned barriers in applicability and generalizability, we propose a novel deep learning model using only sequence information for proteins, named Transformer-based protein function Annotation through joint sequence–Label Embedding (TALE). For generalizability to novel sequences we use self attention-based transformers to capture global patterns in sequences. For generalizability to unseen or rarely seen functions (tail labels), we embed protein function labels (hierarchical GO terms on directed graphs) together with inputs/features (1D sequences) in a joint latent space. Combining TALE and a sequence similarity-based method, TALE+ outperformed competing methods when only sequence input is available. It even outperformed a state-of-the-art method using network information besides sequence, in two of the three gene ontologies. Furthermore, TALE and TALE+ showed superior generalizability to proteins of low similarity, new species, or rarely annotated functions compared to training data, revealing deep insights into the protein sequence–function relationship. Ablation studies elucidated contributions of algorithmic components toward the accuracy and the generalizability. Availability The data, source codes and models are available at https://github.com/Shen-Lab/TALE Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Pedro Queirós ◽  
Francesco Delogu ◽  
Oskar Hickl ◽  
Patrick May ◽  
Paul Wilmes

AbstractBackgroundThe past decades have seen a rapid development of the (meta-)omics fields, producing an unprecedented amount of data. Through the use of well-characterized datasets we can infer the role of previously functionally unannotated proteins from single organisms and consortia. In this context, protein function annotation allows the identification of regions of interest (i.e. domains) in protein sequences and the assignment of biological functions. Despite the existence of numerous tools, some challenges remain, specifically in terms of speed, flexibility, and reproducibility. In the era of big data it also becomes increasingly important to cease limiting our findings to a single reference, coalescing knowledge from different data sources, thus overcoming some limitations in overly relying on computationally generated data.ResultsWe implemented a protein annotation tool - Mantis, which uses text mining to integrate knowledge from multiple reference data sources into a single consensus-driven output. Mantis is flexible, allowing for total customization of the reference data used, adaptable, and reproducible across different research goals and user environments. We implemented a depth-first search algorithm for domain-specific annotation, which led to an average 0.038 increase in precision when compared to sequence-wide annotation. Mantis is fast, annotating an average genome in 25-40 minutes, whilst also outputting high-quality annotations (average coverage 81.4%, average precision 0.892).ConclusionsMantis is a protein function annotation tool that produces high-quality consensusdriven protein annotations. It is easy to set up, customize, and use, scaling from single genomes to large metagenomes. Mantis is available under the MIT license available at https://github.com/PedroMTQ/mantis.


2020 ◽  
Author(s):  
Yue Cao ◽  
Yang Shen

AbstractMotivationFacing the increasing gap between high-throughput sequence data and limited functional insights, computational protein function annotation provides a high-throughput alternative to experimental approaches. However, current methods can have limited applicability while relying on data besides sequences, or lack generalizability to novel sequences, species and functions.ResultsTo overcome aforementioned barriers in applicability and generalizability, we propose a novel deep learning model, named Transformer-based protein function Annotation through joint sequence–Label Embedding (TALE). For generalizbility to novel sequences we use self attention-based transformers to capture global patterns in sequences. For generalizability to unseen or rarely seen functions, we also embed protein function labels (hierarchical GO terms on directed graphs) together with inputs/features (sequences) in a joint latent space. Combining TALE and a sequence similarity-based method, TALE+ outperformed competing methods when only sequence input is available. It even outperformed a state-of-the-art method using network information besides sequence, in two of the three gene ontologies. Furthermore, TALE and TALE+ showed superior generalizability to proteins of low homology and never/rarely annotated novel species or functions compared to training data, revealing deep insights into the protein sequence–function relationship. Ablation studies elucidated contributions of algorithmic components toward the accuracy and the generalizability.AvailabilityThe data, source codes and models are available at https://github.com/Shen-Lab/[email protected] informationSupplementary data are available at Bioinformatics online.


2020 ◽  
Vol 27 ◽  
Author(s):  
Veda P. Pandey ◽  
Apoorvi Tyagi ◽  
Shagoofa Ali ◽  
Kusum Yadav ◽  
Anurag Yadav ◽  
...  

Background: Class III plant peroxidases play important role in a number of physiological processes in plant such as lignin biosynthesis, suberization, cell wall biosynthesis, reactive oxygen species metabolism and plant defense against pathogens. Peroxidases are also of significance in several industrial applications. In view of this, the production and identification of novel peroxidases having resistance towards temperature, pH, salts is desirable. Objective: The objective of the present work was to clone and characterize a novel plant peroxidase suitable for industrial application. Methods: A full length cDNA clone of lemon peroxidase was isolated using PCR and RACE approaches, characterized and heterologously expressed in Escherichia coli using standard protocols. The expressed peroxidase was purified using Ni-NTA agarose column and biochemically characterized using standard protocols. The peroxidase was also in-silico characterized at nucleotide as well as protein levels using standard protocols. Results: A full length cDNA clone of lemon peroxidase was isolated and expressed heterologously expressed in Escherichia coli. The expressed recombinant lemon peroxidase (LPRX) was activated by in-vitro refolding and purified. The purified LPRX exhibited pH and temperature optima of pH 7.0 and 50°C, respectively. The LPRX was found to be activated by metal ions (Na+ , Ca2+, Mg2+ and Mn2+) at lower concentration. The expressional analysis of the transcripts suggested involvement of lemon peroxidase in plant defense. The lemon peroxidase was in silico modelled and docked with the substrates guaiacol, and pyrogallol and results show the favourability of pyrogallol over guaiacol, which is in agreement with the in-vitro findings. The protein function annotation analyses suggested the involvement of lemon peroxidase in the phenylpropanoid biosynthesis pathway and plant defense mechanisms. Conclusion: Based on the biochemical characterization, the purified peroxidase was found to be resistant towards the salts and thus, might be a good candidate for industrial exploitation. The in-silico protein function annotation and transcript analyses highlighted the possible involvement of the lemon peroxidase in plant defense response.


2019 ◽  
Vol 21 (4) ◽  
pp. 1437-1447 ◽  
Author(s):  
Jiajun Hong ◽  
Yongchao Luo ◽  
Yang Zhang ◽  
Junbiao Ying ◽  
Weiwei Xue ◽  
...  

Abstract Functional annotation of protein sequence with high accuracy has become one of the most important issues in modern biomedical studies, and computational approaches of significantly accelerated analysis process and enhanced accuracy are greatly desired. Although a variety of methods have been developed to elevate protein annotation accuracy, their ability in controlling false annotation rates remains either limited or not systematically evaluated. In this study, a protein encoding strategy, together with a deep learning algorithm, was proposed to control the false discovery rate in protein function annotation, and its performances were systematically compared with that of the traditional similarity-based and de novo approaches. Based on a comprehensive assessment from multiple perspectives, the proposed strategy and algorithm were found to perform better in both prediction stability and annotation accuracy compared with other de novo methods. Moreover, an in-depth assessment revealed that it possessed an improved capacity of controlling the false discovery rate compared with traditional methods. All in all, this study not only provided a comprehensive analysis on the performances of the newly proposed strategy but also provided a tool for the researcher in the fields of protein function annotation.


2019 ◽  
Vol 33 (S1) ◽  
Author(s):  
Mary Jo Ondrechen ◽  
Caitlyn L. Mills ◽  
Lydia A. Ruffner ◽  
Penny J. Beuning

Sign in / Sign up

Export Citation Format

Share Document