scholarly journals PSORTm: a bacterial and archaeal protein subcellular localization prediction tool for metagenomics data

2020 ◽  
Vol 36 (10) ◽  
pp. 3043-3048 ◽  
Author(s):  
Michael A Peabody ◽  
Wing Yin Venus Lau ◽  
Gemma R Hoad ◽  
Baofeng Jia ◽  
Finlay Maguire ◽  
...  

Abstract Motivation Many methods for microbial protein subcellular localization (SCL) prediction exist; however, none is readily available for analysis of metagenomic sequence data, despite growing interest from researchers studying microbial communities in humans, agri-food relevant organisms and in other environments (e.g. for identification of cell-surface biomarkers for rapid protein-based diagnostic tests). We wished to also identify new markers of water quality from freshwater samples collected from pristine versus pollution-impacted watersheds. Results We report PSORTm, the first bioinformatics tool designed for prediction of diverse bacterial and archaeal protein SCL from metagenomics data. PSORTm incorporates components of PSORTb, one of the most precise and widely used protein SCL predictors, with an automated classification by cell envelope. An evaluation using 5-fold cross-validation with in silico-fragmented sequences with known localization showed that PSORTm maintains PSORTb’s high precision, while sensitivity increases proportionately with metagenomic sequence fragment length. PSORTm’s read-based analysis was similar to PSORTb-based analysis of metagenome-assembled genomes (MAGs); however, the latter requires non-trivial manual classification of each MAG by cell envelope, and cannot make use of unassembled sequences. Analysis of the watershed samples revealed the importance of normalization and identified potential biomarkers of water quality. This method should be useful for examining a wide range of microbial communities, including human microbiomes, and other microbiomes of medical, environmental or industrial importance. Availability and implementation Documentation, source code and docker containers are available for running PSORTm locally at https://www.psort.org/psortm/ (freely available, open-source software under GNU General Public License Version 3). Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Vol 49 (D1) ◽  
pp. D803-D808
Author(s):  
Wing Yin Venus Lau ◽  
Gemma R Hoad ◽  
Vivian Jin ◽  
Geoffrey L Winsor ◽  
Ashmeet Madyan ◽  
...  

Abstract Protein subcellular localization (SCL) is important for understanding protein function, genome annotation, and aids identification of potential cell surface diagnostic markers, drug targets, or vaccine components. PSORTdb comprises ePSORTdb, a manually curated database of experimentally verified protein SCLs, and cPSORTdb, a pre-computed database of PSORTb-predicted SCLs for NCBI’s RefSeq deduced bacterial and archaeal proteomes. We now report PSORTdb 4.0 (http://db.psort.org/). It features a website refresh, in particular a more user-friendly database search. It also addresses the need to uniquely identify proteins from NCBI genomes now that GI numbers have been retired. It further expands both ePSORTdb and cPSORTdb, including additional data about novel secondary localizations, such as proteins found in bacterial outer membrane vesicles. Protein predictions in cPSORTdb have increased along with the number of available microbial genomes, from approximately 13 million when PSORTdb 3.0 was released, to over 66 million currently. Now, analyses of both complete and draft genomes are included. This expanded database will be of wide use to researchers developing SCL predictors or studying diverse microbes, including medically, agriculturally and industrially important species that have both classic or atypical cell envelope structures or vesicles.


2019 ◽  
Vol 36 (7) ◽  
pp. 2244-2250 ◽  
Author(s):  
Wei Long ◽  
Yang Yang ◽  
Hong-Bin Shen

Abstract Motivation The tissue atlas of the human protein atlas (HPA) houses immunohistochemistry (IHC) images visualizing the protein distribution from the tissue level down to the cell level, which provide an important resource to study human spatial proteome. Especially, the protein subcellular localization patterns revealed by these images are helpful for understanding protein functions, and the differential localization analysis across normal and cancer tissues lead to new cancer biomarkers. However, computational tools for processing images in this database are highly underdeveloped. The recognition of the localization patterns suffers from the variation in image quality and the difficulty in detecting microscopic targets. Results We propose a deep multi-instance multi-label model, ImPLoc, to predict the subcellular locations from IHC images. In this model, we employ a deep convolutional neural network-based feature extractor to represent image features, and design a multi-head self-attention encoder to aggregate multiple feature vectors for subsequent prediction. We construct a benchmark dataset of 1186 proteins including 7855 images from HPA and 6 subcellular locations. The experimental results show that ImPLoc achieves significant enhancement on the prediction accuracy compared with the current computational methods. We further apply ImPLoc to a test set of 889 proteins with images from both normal and cancer tissues, and obtain 8 differentially localized proteins with a significance level of 0.05. Availability and implementation https://github.com/yl2019lw/ImPloc. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Daniel G. Alonso-Reyes ◽  
Maria Eugenia Farias ◽  
Virginia Helena Albarracín

ABSTRACTDuring evolution, microorganisms exposed to high UV-B doses developed a fine-tuned photo-enzymes called “photolyases” to cope with DNA damage by UV-B. These photoreceptors belonging to the Cryptochrome/Photolyase Family (CPF) were well characterized at the genomic and proteomic level in bacteria isolated from a wide range of environments. In this work, we go further towards studying the abundance of CPF on aquatic microbial communities from different geographic regions across the globe. Metagenomics data combined with geo-referenced solar irradiation measurements indicated that the higher the UV-B dose suffered by the microbiome’s environment, the higher the abundance of CPF genes and lower the microbial diversity. A connection between CPF abundance and radiation intensity/photoperiod was reported. Likewise, cryptochrome-like genes were found abundant in most exposed microbiomes, indicating a complementary role to standard photolyases. Also, we observed that CPFs are more likely present in dominant taxa of the highly irradiated microbiomes, suggesting an evolutionary force for survival and dominance under extreme solar exposure. Finally, this work reported three novel CPF clades not identified so far, proving the potential of global metagenomic analyses in detecting novel proteins.


2015 ◽  
Vol 44 (D1) ◽  
pp. D663-D668 ◽  
Author(s):  
Michael A. Peabody ◽  
Matthew R. Laird ◽  
Caitlyn Vlasschaert ◽  
Raymond Lo ◽  
Fiona S.L. Brinkman

2019 ◽  
Author(s):  
Dahan Zhang ◽  
Haiyun Huang ◽  
Xiaogang Bai ◽  
Xiaodong Fang ◽  
Yi Zhang

ABSTRACTMotivationSubcellular location plays an essential role in protein synthesis, transport, and secretion, thus it is an important step in understanding the mechanisms of trait-related proteins. Generally, homology methods provide reliable homology-based results with small E-values. We must resort to pattern recognition algorithms (SVM, Fisher discriminant, KNN, random forest, etc.) for proteins that do not share significant homologous domains with known proteins. However, satisfying results are seldom obtained.ResultsHere, a novel hybrid method “Basic Local Alignment Search Tool+Smith-Waterman+Needleman-Wunsch” or BLAST+SWNW, has been obtained by integrating a loosened E-value Basic Local Alignment Search Tool (BLAST) with the Smith-Waterman (SW) and Needleman-Wunsch (NW) algorithms, and this method has been introduced to predict protein subcellular localization in eukaryotes. When tested on Dataset I and Dataset II, BLAST+SWNW showed an average accuracy of 97.18% and 99.60%, respectively, surpassing the performance of other algorithms in predicting eukaryotic protein subcellular localization.Availability and ImplementationBLAST+SWNW is an open source collaborative initiative available in the GitHub repository (https://github.com/ZHANGDAHAN/BLAST-SWNW-for-SLP or http://202.206.64.158:80/link/72016CAC26E4298B3B7E0EAF42288935)[email protected]; [email protected] InformationSupplementary data are available at PLOS Computational Biology online.


Author(s):  
Yu-Miao Zhang ◽  
Jun Wang ◽  
Tao Wu

In this study, the Agrobacterium infection medium, infection duration, detergent, and cell density were optimized. The sorghum-based infection medium (SbIM), 10-20 min infection time, addition of 0.01% Silwet L-77, and Agrobacterium optical density at 600 nm (OD600), improved the competence of onion epidermal cells to support Agrobacterium infection at >90% efficiency. Cyclin-dependent kinase D-2 (CDKD-2) and cytochrome c-type biogenesis protein (CYCH), protein-protein interactions were localized. The optimized procedure is a quick and efficient system for examining protein subcellular localization and protein-protein interaction.


2019 ◽  
Vol 24 (34) ◽  
pp. 4013-4022 ◽  
Author(s):  
Xiang Cheng ◽  
Xuan Xiao ◽  
Kuo-Chen Chou

Knowledge of protein subcellular localization is vitally important for both basic research and drug development. With the avalanche of protein sequences emerging in the post-genomic age, it is highly desired to develop computational tools for timely and effectively identifying their subcellular localization based on the sequence information alone. Recently, a predictor called “pLoc-mPlant” was developed for identifying the subcellular localization of plant proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, called “multiplex proteins”, may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mPlant was trained by an extremely skewed dataset in which some subsets (i.e., the protein numbers for some subcellular locations) were more than 10 times larger than the others. Accordingly, it cannot avoid the biased consequence caused by such an uneven training dataset. To overcome such biased consequence, we have developed a new and bias-free predictor called pLoc_bal-mPlant by balancing the training dataset. Cross-validation tests on exactly the same experimentconfirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mPlant, the existing state-of-the-art predictor in identifying the subcellular localization of plant proteins. To maximize the convenience for the majority of experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mPlant/, by which users can easily get their desired results without the need to go through the detailed mathematics.


Sign in / Sign up

Export Citation Format

Share Document