scholarly journals Sifter-T: A scalable framework for phylogenomic probabilistic protein domain functional annotation

2015 ◽  
Vol 16 (S8) ◽  
Author(s):  
Danillo C Almeida-E-Silva ◽  
Ricardo ZN Vêncio
2021 ◽  
Author(s):  
Carlos P Cantalapiedra ◽  
Ana Hernandez-Plaza ◽  
Ivica Letunic ◽  
Peer Bork ◽  
Jaime Huerta-Cepas

Even though automated functional annotation of genes represents a fundamental step in most genomic and metagenomic workflows, it remains challenging at large scales. Here, we describe a major upgrade to eggNOG-mapper, a tool for functional annotation based on precomputed orthology assignments, now optimized for vast (meta)genomic data sets. Improvements in version 2 include a full update of both the genomes and functional databases to those from eggNOG v5, as well as several efficiency enhancements and new features. Most notably, eggNOG-mapper v2 now allows: (i) de novo gene prediction from raw contigs, (ii) built-in pairwise orthology prediction, (iii) fast protein domain discovery, and (iv) automated GFF decoration. eggNOG-mapper v2 is available as a standalone tool or as an online service at http://eggnog-mapper.embl.de.


2017 ◽  
Vol 2017 ◽  
pp. 1-5 ◽  
Author(s):  
Tengjiao Wang ◽  
Yanghe Feng ◽  
Qi Wang

Uncovering the signaling architecture in protein-protein interaction (PPI) can certainly benefit the understanding of disease mechanisms and promise to facilitate the therapeutic interventions. Therefore, it is important to reveal the signaling relationship from one protein to another in terms of activation and inhibition. In this study, we propose a new measurement to characterize the regulation relationship of a PPI pair. By utilizing both Gene Ontology (GO) functional annotation and protein domain information, we developed a tool called Prediction of Activation/Inhibition Regulation Signaling Pathway (PAIRS) that takes protein interaction pairs as input and gives both known and predicted result of the human protein regulation relationship in terms of activation and inhibition. It helps to give prognostic regulation information for further signaling pathway reconstruction.


2017 ◽  
Author(s):  
Kokulapalan Wimalanathan ◽  
Iddo Friedberg ◽  
Carson M. Andorf ◽  
Carolyn J. Lawrence-Dill

1SummaryWe created a new high-coverage, robust, and reproducible functional annotation of maize protein coding genes based on Gene Ontology (GO) term assignments. Whereas the existing Phytozome and Gramene maize GO annotation sets only cover 41% and 56% of maize protein coding genes, respectively, this study provides annotations for 100% of the genes. We also compared the quality of our newly-derived annotations with the existing Gramene and Phytozome functional annotation sets by comparing all three to a manually annotated gold standard set of 1,619 genes where annotations were primarily inferred from direct assay or mutant phenotype. Evaluations based on the gold standard indicate that our new annotation set is measurably more accurate than those from Phytozome and Gramene. To derive this new high-coverage, high-confidence annotation set we used sequence-similarity and protein-domain-presence methods as well as mixed-method pipelines that developed for the Critical Assessment of Function Annotation (CAFA) challenge. Our project to improve maize annotations is called maize-GAMER (GO Annotation Method, Evaluation, and Review) and the newly-derived annotations are accessible via MaizeGDB (http://download.maizegdb.org/maize-GAMER) and CyVerse (B73 RefGen_v3 5b+ at doi: doi.org/10.7946/P2S62P and B73 RefGen_v4 Zm00001d.2 at doi: doi.org/10.7946/P2M925).


2022 ◽  
Author(s):  
Dong Xu ◽  
Kangming Jin ◽  
Heling Jiang ◽  
Desheng Gong ◽  
Jinbao Yang ◽  
...  

Sequence alignment is the basis of gene functional annotation for unknow sequences. Selecting closely related species as the reference species should be an effective way to improve the accuracy of gene annotation for plants, compared with only based on one or some model plants. Therefore, limited species number in previous software or website is disadvantageous for plant gene annotation. Here, we collected the protein sequences of 236 plant species with known genomic information from 63 families. After that, these sequences were annotated by pfam, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases to construct our databases. Furthermore, we developed the software, Gene Annotation Software for Plants (GFAP), to perform gene annotation using our databases. GFAP, an open-source software running on Windows and MacOS systems, is an efficient and network independent tool. GFAP can search the protein domain, GO and KEGG information for 43000 genes within 4 minutes. In addition, GFAP can also perform the sequence alignment, statistical analysis and drawing. The website of https://gitee.com/simon198912167815/gfap-database provides the software, databases, testing data and video tutorials for users. GFAP contained large amount of plant-species information. We believe that it will become a powerful tool in gene annotation using closely related species for phytologists.


2019 ◽  
Author(s):  
Georgy Smyshlyaev ◽  
Orsolya Barabas ◽  
Alex Bateman

Background: Tyrosine recombinases perform site-specific genetic recombination in bacteria and archaea. They safeguard genome integrity by resolving chromosome multimers, as well as mobilize transposons, phages and integrons, driving dissemination of genetic traits and antibiotic resistance. Despite their abundance and genetic impact, tyrosine recombinase diversity and evolution has not been thoroughly characterized, which greatly hampers their functional classification. Results: Here, we conducted a comprehensive search and comparative analysis of diverse tyrosine recombinases from bacterial, archaeal and phage genomes. We characterized their major phylogenetic groups and show that recombinases of integrons and insertion sequences are closely related to the chromosomal Xer proteins, while integrases of integrative and conjugative elements (ICEs) and phages are more distant. We find that proteins in distinct phylogenetic groups share specific structural features and have characteristic taxonomic distribution. We further trace tyrosine recombinase evolution and propose that phage and ICE integrases originated by acquisition of an N-terminal arm-binding domain. Based on this phylogeny, we classify numerous known ICEs and predict new ones. Conclusions: This work provides a new resource for comparative analysis and functional annotation of tyrosine recombinases. We reconstitute protein evolution and show that adaptation for a role in gene transfer involved acquisition of a specific protein domain, which allows precise regulation of excision and integration.


Author(s):  
Carlos P Cantalapiedra ◽  
Ana Hernández-Plaza ◽  
Ivica Letunic ◽  
Peer Bork ◽  
Jaime Huerta-Cepas

Abstract Even though automated functional annotation of genes represents a fundamental step in most genomic and metagenomic workflows, it remains challenging at large scales. Here, we describe a major upgrade to eggNOG-mapper, a tool for functional annotation based on precomputed orthology assignments, now optimized for vast (meta)genomic data sets. Improvements in version 2 include a full update of both the genomes and functional databases to those from eggNOG v5, as well as several efficiency enhancements and new features. Most notably, eggNOG-mapper v2 now allows for: (i) de novo gene prediction from raw contigs, (ii) built-in pairwise orthology prediction, (iii) fast protein domain discovery, and (iv) automated GFF decoration. eggNOG-mapper v2 is available as a standalone tool or as an online service at http://eggnog-mapper.embl.de.


2017 ◽  
Author(s):  
Stacia K. Wyman ◽  
Aram Avila-Herrera ◽  
Stephen Nayfach ◽  
Katherine S. Pollard

AbstractThe number and proportion of genes with no known function are growing rapidly. To quantify this phenomenon and provide criteria for prioritizing genes for functional characterization, we developed a bioinformatics pipeline that identifies robustly defined protein families with no annotated domains, ranks these with respect to phylogenetic breadth, and identifies them in metagenomics data. We applied this approach to 271 965 protein families from the SFams database and discovered many with no functional annotation, including >118 000 families lacking any known protein domain. From these, we prioritized 6 668 conserved protein families with at least three sequences from organisms in at least two distinct classes. These Function Unknown Families (FUnkFams) are present in Tara Oceans Expedition and Human Microbiome Project metagenomes, with distributions associated with sampling environment. Our findings highlight the extent of functional novelty in sequence databases and establish an approach for creating a “most wanted” list of genes to characterize.


2018 ◽  
Author(s):  
Sarah Klass ◽  
Matthew J. Smith ◽  
Tahoe Fiala ◽  
Jessica Lee ◽  
Anthony Omole ◽  
...  

Herein, we describe a new series of fusion proteins that have been developed to self-assemble spontaneously into stable micelles that are 27 nm in diameter after enzymatic cleavage of a solubilizing protein tag. The sequences of the proteins are based on a human intrinsically disordered protein, which has been appended with a hydrophobic segment. The micelles were found to form across a broad range of pH, ionic strength, and temperature conditions, with critical micelle concentration (CMC) values below 1 µM being observed in some cases. The reported micelles were found to solubilize hydrophobic metal complexes and organic molecules, suggesting their potential suitability for catalysis and drug delivery applications.


Sign in / Sign up

Export Citation Format

Share Document