domain annotation
Recently Published Documents


TOTAL DOCUMENTS

25
(FIVE YEARS 10)

H-INDEX

6
(FIVE YEARS 1)

Viruses ◽  
2021 ◽  
Vol 13 (12) ◽  
pp. 2426
Author(s):  
Kristen L. Beck ◽  
Edward Seabolt ◽  
Akshay Agarwal ◽  
Gowri Nayar ◽  
Simone Bianco ◽  
...  

SARS-CoV-2 genomic sequencing efforts have scaled dramatically to address the current global pandemic and aid public health. However, autonomous genome annotation of SARS-CoV-2 genes, proteins, and domains is not readily accomplished by existing methods and results in missing or incorrect sequences. To overcome this limitation, we developed a novel semi-supervised pipeline for automated gene, protein, and functional domain annotation of SARS-CoV-2 genomes that differentiates itself by not relying on the use of a single reference genome and by overcoming atypical genomic traits that challenge traditional bioinformatic methods. We analyzed an initial corpus of 66,000 SARS-CoV-2 genome sequences collected from labs across the world using our method and identified the comprehensive set of known proteins with 98.5% set membership accuracy and 99.1% accuracy in length prediction, compared to proteome references, including Replicase polyprotein 1ab (with its transcriptional slippage site). Compared to other published tools, such as Prokka (base) and VAPiD, we yielded a 6.4- and 1.8-fold increase in protein annotations. Our method generated 13,000,000 gene, protein, and domain sequences—some conserved across time and geography and others representing emerging variants. We observed 3362 non-redundant sequences per protein on average within this corpus and described key D614G and N501Y variants spatiotemporally in the initial genome corpus. For spike glycoprotein domains, we achieved greater than 97.9% sequence identity to references and characterized receptor binding domain variants. We further demonstrated the robustness and extensibility of our method on an additional 4000 variant diverse genomes containing all named variants of concern and interest as of August 2021. In this cohort, we successfully identified all keystone spike glycoprotein mutations in our predicted protein sequences with greater than 99% accuracy as well as demonstrating high accuracy of the protein and domain annotations. This work comprehensively presents the molecular targets to refine biomedical interventions for SARS-CoV-2 with a scalable, high-accuracy method to analyze newly sequenced infections as they arise.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Tao Yun ◽  
Jionggang Hua ◽  
Weicheng Ye ◽  
Zheng Ni ◽  
Liu Chen ◽  
...  

Abstract Duck reovirus (DRV) is a fatal member of the genus Orthoreovirus in the family Reoviridae. The disease caused by DRV leads to huge economic losses to the duck industry. Post-translational modification is an efficient strategy to enhance the immune responses to virus infection. However, the roles of protein phosphorylation in the responses of ducklings to Classic/Novel DRV (C/NDRV) infections are largely unknown. Using a high-resolution LC–MS/MS integrated to highly sensitive immune-affinity antibody method, phosphoproteomes of Cairna moschata spleen tissues under the C/NDRV infections were analyzed, producing a total of 8,504 phosphorylation sites on 2,853 proteins. After normalization with proteomic data, 392 sites on 288 proteins and 484 sites on 342 proteins were significantly changed under the C/NDRV infections, respectively. To characterize the differentially phosphorylated proteins (DPPs), a systematic bioinformatics analyses including Gene Ontology annotation, domain annotation, subcellular localization, and Kyoto Encyclopedia of Genes and Genomes pathway annotation were performed. Two important serine protease system-related proteins, coagulation factor X and fibrinogen α-chain, were identified as phosphorylated proteins, suggesting an involvement of blood coagulation under the C/NDRV infections. Furthermore, 16 proteins involving the intracellular signaling pathways of pattern-recognition receptors were identified as phosphorylated proteins. Changes in the phosphorylation levels of MyD88, NF-κB, RIP1, MDA5 and IRF7 suggested a crucial role of protein phosphorylation in host immune responses of C. moschata. Our study provides new insights into the responses of ducklings to the C/NDRV infections at PTM level.


2020 ◽  
Vol 23 (3) ◽  
pp. 253-268
Author(s):  
Shreya Bhattacharya ◽  
Puja Ghosh ◽  
Debasmita Banerjee ◽  
Arundhati Banerjee ◽  
Sujay Ray

Aim and Objective: One of the challenges to conventional therapies against Mycobacterium tuberculosis is the development of multi-drug resistant pathogenic strains. This study was undertaken to explore new therapeutic targets for the revolutionary antivirulence therapy utilizing the pathogen’s essential hypothetical proteins, serving as virulence factors, which is the essential first step in novel drug designing. Methods: Functional annotations of essential hypothetical proteins from Mycobacterium tuberculosis (H37Rv strain) were performed through domain annotation, Gene Ontology analysis, physicochemical characterization and prediction of subcellular localization. Virulence factors among the essential hypothetical proteins were predicted, among which pathogen-specific drug target candidates, non-homologous to human and gut microbiota, were identified. This was followed by druggability and spectrum analysis of the identified targets. Results and conclusion: The study successfully assigned functions of 83 essential hypothetical proteins of Mycobacterium tuberculosis, among which 25 were identified as virulence factors. Out of 25, 12 virulence factors were observed as potential pathogen-specific drug target candidates. Nine potential targets had druggable properties and rest three were considered as novel targets. Exploration of these targets will provide new insights into future drug development. Characterization of subcellular localizations revealed that most of the predicted targets were cytoplasmic which could be ideal for intracellular drugs, while two drug targets were membranebound, ideal for vaccines. Spectrum analysis identified one broad-spectrum and 11 narrowspectrum targets. This study would, therefore, instigate designing novel therapeutics for antivirulence therapy, which have the potential to serve as revolutionary treatment instead of conventional antibiotic therapies to overcome the lethality of antibiotic-resistant strains.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
R. Berlemont ◽  
N. Winans ◽  
D. Talamantes ◽  
H. Dang ◽  
H-W. Tsai

2020 ◽  
Vol 36 (13) ◽  
pp. 3975-3981
Author(s):  
Laurent David ◽  
Riccardo Vicedomini ◽  
Hugues Richard ◽  
Alessandra Carbone

Abstract Motivation The understanding of the ever-increasing number of metagenomic sequences accumulating in our databases demands for approaches that rapidly ‘explore’ the content of multiple and/or large metagenomic datasets with respect to specific domain targets, avoiding full domain annotation and full assembly. Results S3A is a fast and accurate domain-targeted assembler designed for a rapid functional profiling. It is based on a novel construction and a fast traversal of the Overlap-Layout-Consensus graph, designed to reconstruct coding regions from domain annotated metagenomic sequence reads. S3A relies on high-quality domain annotation to efficiently assemble metagenomic sequences and on the design of a new confidence measure for a fast evaluation of overlapping reads. Its implementation is highly generic and can be applied to any arbitrary type of annotation. On simulated data, S3A achieves a level of accuracy similar to that of classical metagenomics assembly tools while permitting to conduct a faster and sensitive profiling on domains of interest. When studying a few dozens of functional domains—a typical scenario—S3A is up to an order of magnitude faster than general purpose metagenomic assemblers, thus enabling the analysis of a larger number of datasets in the same amount of time. S3A opens new avenues to the fast exploration of the rapidly increasing number of metagenomic datasets displaying an ever-increasing size. Availability and implementation S3A is available at http://www.lcqb.upmc.fr/S3A_ASSEMBLER/. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Amit Kumar ◽  
Ajit Kumar Saxena ◽  
Gwo Giun (Chris) Lee ◽  
Amita Kashyap ◽  
G. Jyothsna

Author(s):  
Wang-Ren Qiu ◽  
Ao Xu ◽  
Zhao-Chun Xu ◽  
Chun-Hua Zhang ◽  
Xuan Xiao

2019 ◽  
Vol 20 (5) ◽  
pp. 389-399
Author(s):  
Wangren Qiu ◽  
Chunhui Xu ◽  
Xuan Xiao ◽  
Dong Xu

Background: Ubiquitination, as a post-translational modification, is a crucial biological process in cell signaling, apoptosis, and localization. Identification of ubiquitination proteins is of fundamental importance for understanding the molecular mechanisms in biological systems and diseases. Although high-throughput experimental studies using mass spectrometry have identified many ubiquitination proteins and ubiquitination sites, the vast majority of ubiquitination proteins remain undiscovered, even in well-studied model organisms. Objective: To reduce experimental costs, computational methods have been introduced to predict ubiquitination sites, but the accuracy is unsatisfactory. If it can be predicted whether a protein can be ubiquitinated or not, it will help in predicting ubiquitination sites. However, all the computational methods so far can only predict ubiquitination sites. Methods: In this study, the first computational method for predicting ubiquitination proteins without relying on ubiquitination site prediction has been developed. The method extracts features from sequence conservation information through a grey system model, as well as functional domain annotation and subcellular localization. Results: Together with the feature analysis and application of the relief feature selection algorithm, the results of 5-fold cross-validation on three datasets achieved a high accuracy of 90.13%, with Matthew’s correlation coefficient of 80.34%. The predicted results on an independent test data achieved 87.71% as accuracy and 75.43% of Matthew’s correlation coefficient, better than the prediction from the best ubiquitination site prediction tool available. Conclusion: Our study may guide experimental design and provide useful insights for studying the mechanisms and modulation of ubiquitination pathways. The code is available at: https://github.com/Chunhuixu/UBIPredic_QWRCHX.


2019 ◽  
Vol 24 (1) ◽  
pp. 1-9
Author(s):  
Arli Aditya Parikesit ◽  
◽  
Rizky Nurdiansyah ◽  

Database ◽  
2019 ◽  
Vol 2019 ◽  
Author(s):  
Minglei Yang ◽  
Wenliang Zhang ◽  
Guocai Yao ◽  
Haiyue Zhang ◽  
Weizhong Li

Abstract Iterative homology search has been widely used in identification of remotely related proteins. Our previous study has found that the query-seeded sequence iterative search can reduce homologous over-extension errors and greatly improve selectivity. However, iterative homology search remains challenging in protein functional prediction. More sensitive scoring models are highly needed to improve the predictive performance of the alignment methods, and alignment annotation with better visualization has also become imperative for result interpretation. Here we report an open-source application PSISearch2D that runs query-seeded iterative sequence search for remotely related protein detection. PSISearch2D retrieves domain annotation from Pfam, UniProtKB, CDD and PROSITE for resulting hits and demonstrates combined domain and sequence alignments in novel visualizations. A scoring model called C-value is newly defined to re-order hits with consideration of the combination of sequence and domain alignments. The benchmarking on the use of C-value indicates that PSISearch2D outperforms the original PSISearch2 tool in terms of both accuracy and specificity. PSISearch2D improves the characterization of unknown proteins in remote protein detection. Our evaluation tests show that PSISearch2D has provided annotation for 77 695 of 139 503 unknown bacteria proteins and 140 751 of 352 757 unknown virus proteins in UniProtKB, about 2.3-fold and 1.8-fold more characterization than the original PSISearch2, respectively. Together with advanced features of auto-iteration mode to handle large-scale data and optional programs for global and local sequence alignments, PSISearch2D enhances remotely related protein search.


Sign in / Sign up

Export Citation Format

Share Document