uniprotkb database
Recently Published Documents


TOTAL DOCUMENTS

8
(FIVE YEARS 5)

H-INDEX

1
(FIVE YEARS 1)

2021 ◽  
Vol 1 ◽  
Author(s):  
Jin Tao ◽  
Kelly A. Brayton ◽  
Shira L. Broschat

Advances in genome sequencing have accelerated the growth of sequenced genomes but at a cost in the quality of genome annotation. At the same time, computational analysis is widely used for protein annotation, but a dearth of experimental verification has contributed to inaccurate annotation as well as to annotation error propagation. Thus, a tool to help life scientists with accurate protein annotation would be useful. In this work we describe a website we have developed, the Protein Annotation Surveillance Site (PASS), which provides such a tool. This website consists of three major components: a database of homologous clusters of more than eight million protein sequences deduced from the representative genomes of bacteria, archaea, eukarya, and viruses, together with sequence information; a machine-learning software tool which periodically queries the UniprotKB database to determine whether protein function has been experimentally verified; and a query-able webpage where the FASTA headers of sequences from the cluster best matching an input sequence are returned. The user can choose from these sequences to create a sequence similarity network to assist in annotation or else use their expert knowledge to choose an annotation from the cluster sequences. Illustrations demonstrating use of this website are presented.


2021 ◽  
Vol 12 ◽  
Author(s):  
Stefani Díaz-Valerio ◽  
Anat Lev Hacohen ◽  
Raphael Schöppe ◽  
Heiko Liesegang

Biopesticide-based crop protection is constantly challenged by insect resistance. Thus, expansion of available biopesticides is crucial for sustainable agriculture. Although Bacillus thuringiensis is the major agent for pesticide bioprotection, the number of bacteria species synthesizing proteins with biopesticidal potential is much higher. The Bacterial Pesticidal Protein Resource Center (BPPRC) offers a database of sequences for the control of insect pests, grouped in structural classes. Here we present IDOPS, a tool that detects novel biopesticidal sequences and analyzes them within their genetic environment. The backbone of the IDOPS detection unit is a curated collection of high-quality hidden Markov models that is in accordance with the BPPRC nomenclature. IDOPS was positively benchmarked with BtToxin_Digger and Cry_Processor. In addition, a scan of the UniProtKB database using the IDOPS models returned an abundance of new pesticidal protein candidates distributed across all of the structural groups. Gene expression depends on the genomic environment, therefore, IDOPS provides a comparative genomics module to investigate the genetic regions surrounding pesticidal genes. This feature enables the investigation of accessory elements and evolutionary traits relevant for optimal toxin expression and functional diversification. IDOPS contributes and expands our current arsenal of pesticidal proteins used for crop protection.


2020 ◽  
Vol 11 (1) ◽  
pp. 24
Author(s):  
Jin Tao ◽  
Kelly Brayton ◽  
Shira Broschat

Advances in genome sequencing technology and computing power have brought about the explosive growth of sequenced genomes in public repositories with a concomitant increase in annotation errors. Many protein sequences are annotated using computational analysis rather than experimental verification, leading to inaccuracies in annotation. Confirmation of existing protein annotations is urgently needed before misannotation becomes even more prevalent due to error propagation. In this work we present a novel approach for automatically confirming the existence of manually curated information with experimental evidence of protein annotation. Our ensemble learning method uses a combination of recurrent convolutional neural network, logistic regression, and support vector machine models. Natural language processing in the form of word embeddings is used with journal publication titles retrieved from the UniProtKB database. Importantly, we use recall as our most significant metric to ensure the maximum number of verifications possible; results are reported to a human curator for confirmation. Our ensemble model achieves 91.25% recall, 71.26% accuracy, 65.19% precision, and an F1 score of 76.05% and outperforms the Bidirectional Encoder Representations from Transformers for Biomedical Text Mining (BioBERT) model with fine-tuning using the same data.


mSystems ◽  
2020 ◽  
Vol 5 (5) ◽  
Author(s):  
Alexander M. Kloosterman ◽  
Kyle E. Shelton ◽  
Gilles P. van Wezel ◽  
Marnix H. Medema ◽  
Douglas A. Mitchell

Bioinformatics-powered discovery of novel ribosomal natural products (RiPPs) has historically been hindered by the lack of a common genetic feature across RiPP classes. Herein, we introduce RRE-Finder, a method for identifying RRE domains, which are present in a majority of prokaryotic RiPP biosynthetic gene clusters (BGCs). RRE-Finder identifies RRE domains 3,000 times faster than current methods, which rely on time-consuming secondary structure prediction. Depending on user goals, RRE-Finder can operate in precision mode to accurately identify RREs present in known RiPP classes or in exploratory mode to assist with novel RiPP discovery. Employing RRE-Finder on the UniProtKB database revealed several high-confidence RREs in novel RiPP-like clusters, suggesting that many new RiPP classes remain to be discovered.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Antonio Ginés García-Saura ◽  
Rubén Zapata-Pérez ◽  
Ana Belén Martínez-Moñino ◽  
José Francisco Hidalgo ◽  
Asunción Morte ◽  
...  

AbstractNudix (for nucleoside diphosphatases linked to other moieties, X) hydrolases are a diverse family of proteins capable of cleaving an enormous variety of substrates, ranging from nucleotide sugars to NAD+-capped RNAs. Although all the members of this superfamily share a common conserved catalytic motif, the Nudix box, their substrate specificity lies in specific sequence traits, which give rise to different subfamilies. Among them, NADH pyrophosphatases or diphosphatases (NADDs) are poorly studied and nothing is known about their distribution. To address this, we designed a Prosite-compatible pattern to identify new NADDs sequences. In silico scanning of the UniProtKB database showed that 3% of Nudix proteins were NADDs and displayed 21 different domain architectures, the canonical architecture (NUDIX-like_zf-NADH-PPase_NUDIX) being the most abundant (53%). Interestingly, NADD fungal sequences were prominent among eukaryotes, and were distributed over several Classes, including Pezizomycetes. Unexpectedly, in this last fungal Class, NADDs were found to be present from the most common recent ancestor to Tuberaceae, following a molecular phylogeny distribution similar to that previously described using two thousand single concatenated genes. Finally, when truffle-forming ectomycorrhizal Tuber melanosporum NADD was biochemically characterized, it showed the highest NAD+/NADH catalytic efficiency ratio ever described.


2018 ◽  
Vol 2018 ◽  
pp. 1-11 ◽  
Author(s):  
Jian Zhang ◽  
Qiqiang Zhang ◽  
Xiaofei Chen ◽  
Yan Liu ◽  
Jiyang Xue ◽  
...  

Gandi capsule, a traditional Chinese herbal medicinal formulation that consists of eight herbs, is used as a clinical therapy for diabetic nephropathy. To clarify the potential synergistic mechanism, this study adopted a network pharmacology strategy to screen the action targets that corresponded to the active components in the Gandi capsule. We first constructed a compound database of 315 components in the Gandi capsule and a target database of diabetic nephropathy, which included 155 target proteins. Six representative compounds were selected to dock with 99 proteins found in the UniProtKB database with their PDB code, and interaction networks between the active ingredients of the Gandi capsule and their targets were mapped out. Results revealed 47 proteins with a high affinity with at least one compound molecule in the Gandi capsule. The main action pathways closely related to the development of diabetic nephropathy were the TGF-β1, AMPK, insulin, TNF-α, and lipid metabolism pathways as per network pharmacology analysis. In the interaction network, ACC1, SOD2, COX2, PKC-B, IR, and ROCK1 proteins had the most frequent interactions with the six compounds. We performed visual molecular docking in silico and experimentally confirmed competitive component-protein binding by SPR and an enzyme activity test, which highlighted the relationships of wogonin to COX2 and SOD2, astragaloside IV to ACC1, and morroniside to ACC1. We concluded that the potential synergistic mechanism of the Gandi capsule resulted from high affinities with multiple proteins and intervention in multiple pathways in combination therapy of diabetic nephropathy.


2013 ◽  
Vol 2013 ◽  
pp. 1-10 ◽  
Author(s):  
Chiquito Crasto ◽  
Chandrahas Narne ◽  
Mikako Kawai ◽  
Landon Wilson ◽  
Stephen Barnes

Quantitative proteomics applications in mass spectrometry depend on the knowledge of the mass-to-charge ratio (m/z) values of proteotypic peptides for the proteins under study and their product ions. MRMPath and MRMutation, web-based bioinformatics software that are platform independent, facilitate the recovery of this information by biologists. MRMPath utilizes publicly available information related to biological pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. All the proteins involved in pathways of interest are recovered and processed in silico to extract information relevant to quantitative mass spectrometry analysis. Peptides may also be subjected to automated BLAST analysis to determine whether they are proteotypic. MRMutation catalogs and makes available, following processing, known (mutant) variants of proteins from the current UniProtKB database. All these results, available via the web from well-maintained, public databases, are written to an Excel spreadsheet, which the user can download and save. MRMPath and MRMutation can be freely accessed. As a system that seeks to allow two or more resources to interoperate, MRMPath represents an advance in bioinformatics tool development. As a practical matter, the MRMPath automated approach represents significant time savings to researchers.


Author(s):  
S. I. Spivak ◽  
O. M. Demchuk ◽  
P. A. Karpov ◽  
S. P. Ozheriedov ◽  
Ya. B. Blume

Aim. The aim of this work is to create library of pathogenic helminths tubulin 3D-models. Created models will be used as targets for in silico screening of new anthelmintic compounds. Methods. Tubulin sequences were obtained from UniProtKB database. Modeling of spatial structure of the proteins was performed using I-TASSER server. Optimization of three-dimensional models geometry was performed in Gromacs using amber03 force field. Results. Based on Uni-ProtKB database analysis, 302 amino acid sequences of α- (105), β- (170) and γ-tubulins (27) of pathogenic worms were selected. 3D-models of selected proteins were built using base protocol. Conclusions. A built 3D models of pathogenic worms tubulins were deposited in CSModDB (http://csmoddb.ifbg.org.ua/comodore/index.php) database. Created library of 3-D structures are suitable for further usage in VO CSLabGrid virtual screening for new compounds. Keywords: tubulin, pathogenic worms, database, Grid.


Sign in / Sign up

Export Citation Format

Share Document