Location proteomics: a systems approach to subcellular location

Systems Biology requires comprehensive systematic data on all aspects and levels of biological organization and function. In addition to information on the sequence, structure, activities and binding interactions of all biological macromolecules, the creation of accurate predictive models of cell behaviour will require detailed information on the distribution of those molecules within cells and the ways in which those distributions change over the cell cycle and in response to mutations or external stimuli. Current information on subcellular location in protein databases is limited to unstructured text descriptions or sets of terms assigned by human curators. These entries do not permit basic operations that are common to other biological databases, such as measurement of the degree of similarity between the distributions of two proteins, and they are not able to fully capture the complexity of protein patterns that can be observed. The field of location proteomics seeks to provide automated, objective high-resolution descriptions of protein location patterns within cells. Methods have been developed to group proteins into statistically indistinguishable location patterns using automated analysis of fluorescence microscope images. The resulting clusters, or location families, are analogous to clusters found for other domains, such as protein sequence families. Preliminary work suggests the feasibility of expressing each unique pattern as a generative model that can be incorporated into comprehensive models of cell behaviour.

Download Full-text

Learning complex subcellular distribution patterns of proteins via analysis of immunohistochemistry images

Bioinformatics ◽

10.1093/bioinformatics/btz844 ◽

2019 ◽

Vol 36 (6) ◽

pp. 1908-1914 ◽

Cited By ~ 4

Author(s):

Ying-Ying Xu ◽

Hong-Bin Shen ◽

Robert F Murphy

Keyword(s):

Subcellular Location ◽

Distribution Patterns ◽

Cell Types ◽

Supplementary Information ◽

Protein Distribution ◽

Protein Subcellular Location ◽

Location Patterns ◽

Location Proteomics ◽

Human Protein Atlas ◽

Protein Subcellular Locations

Abstract Motivation Systematic and comprehensive analysis of protein subcellular location as a critical part of proteomics (‘location proteomics’) has been studied for many years, but annotating protein subcellular locations and understanding variation of the location patterns across various cell types and states is still challenging. Results In this work, we used immunohistochemistry images from the Human Protein Atlas as the source of subcellular location information, and built classification models for the complex protein spatial distribution in normal and cancerous tissues. The models can automatically estimate the fractions of protein in different subcellular locations, and can help to quantify the changes of protein distribution from normal to cancer tissues. In addition, we examined the extent to which different annotated protein pathways and complexes showed similarity in the locations of their member proteins, and then predicted new potential proteins for these networks. Availability and implementation The dataset and code are available at: www.csbio.sjtu.edu.cn/bioinf/complexsubcellularpatterns. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Location proteomics: determining the optimal grouping of proteins according to their subcellular location patterns as determined from fluorescence microscope images

Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004. ◽

10.1109/acssc.2004.1399085 ◽

2005 ◽

Author(s):

Xiang Chen ◽

R.F. Murphy

Keyword(s):

Subcellular Location ◽

Fluorescence Microscope ◽

Microscope Images ◽

Location Patterns ◽

Location Proteomics ◽

Optimal Grouping

Download Full-text

Advances in the Prediction of Protein Subcellular Locations with Machine Learning

Current Bioinformatics ◽

10.2174/1574893614666181217145156 ◽

2019 ◽

Vol 14 (5) ◽

pp. 406-421 ◽

Cited By ~ 3

Author(s):

Ting-He Zhang ◽

Shao-Wu Zhang

Keyword(s):

Machine Learning ◽

Feature Fusion ◽

Protein Sequences ◽

Subcellular Location ◽

Automated Analysis ◽

Cellular Level ◽

Machine Learning Algorithms ◽

Feature Representation ◽

Protein Subcellular Location ◽

Protein Subcellular Locations

Background: Revealing the subcellular location of a newly discovered protein can bring insight into their function and guide research at the cellular level. The experimental methods currently used to identify the protein subcellular locations are both time-consuming and expensive. Thus, it is highly desired to develop computational methods for efficiently and effectively identifying the protein subcellular locations. Especially, the rapidly increasing number of protein sequences entering the genome databases has called for the development of automated analysis methods. Methods: In this review, we will describe the recent advances in predicting the protein subcellular locations with machine learning from the following aspects: i) Protein subcellular location benchmark dataset construction, ii) Protein feature representation and feature descriptors, iii) Common machine learning algorithms, iv) Cross-validation test methods and assessment metrics, v) Web servers. Result & Conclusion: Concomitant with a large number of protein sequences generated by highthroughput technologies, four future directions for predicting protein subcellular locations with machine learning should be paid attention. One direction is the selection of novel and effective features (e.g., statistics, physical-chemical, evolutional) from the sequences and structures of proteins. Another is the feature fusion strategy. The third is the design of a powerful predictor and the fourth one is the protein multiple location sites prediction.

Download Full-text