scholarly journals Estimation of Amino Acid Residue Substitution Rates at Local Spatial Regions and Application in Protein Function Inference: A Bayesian Monte Carlo Approach

2005 ◽  
Vol 23 (2) ◽  
pp. 421-436 ◽  
Author(s):  
Yan Y. Tseng ◽  
Jie Liang
2020 ◽  
Author(s):  
Tair Shauli ◽  
Nadav Brandes ◽  
Michal Linial

Abstract The characterization of human genetic variation in coding regions is fundamental to the understanding of protein function, structure and evolution. Amino-acid (AA) substitution matrices encapsulate the stochastic nature of such proteomic variation and are widely used in studying protein families and evolutionary processes. The conventional substitution matrices, namely BLOSUM and PAM, were constructed to reflect polymorphism across species. In this study, we analyzed the frequencies of >4.8M single nucleotide variants within the healthy human population to accurately represent proteomic variability within the human species, at codon and AA resolution. Our model exposes various AA substitutions which are observed more frequently in one specific direction than in the opposite direction. We further demonstrate that nucleotide substitution rates only partially determine AA substitution rates. Finally, we investigate AA substitutions in post-translational modification and ion-binding sites, exposing purifying selection over a range of residue-based functions. These novel matrices provide a robust baseline for the analysis of protein variation in health and disease.


Author(s):  
Shinji Chiba ◽  
◽  
Ken Sugawara ◽  

The function of unknown proteins is currently most effective determined by retrieving similar known sequences. Some effective techniques involve sequence retrieval. We propose retrieval using a finite state automaton (FSA). The FSA is created with accumulated amino acid residue scores that express a property of a protein family. We calculate the similarity of known and unknown protein sequences using the FSA and used it to determine protein functions. To improve accuracy, we optimized the FSA using a genetic algorithm. Results from determining protein functions indicated that our proposal was superior to general motif analysis.


2009 ◽  
Vol 28 (8) ◽  
pp. 2315-2329 ◽  
Author(s):  
Jonathan Brouillat ◽  
Christian Bouville ◽  
Brad Loos ◽  
Charles Hansen ◽  
Kadi Bouatouch

2020 ◽  
Author(s):  
Tair Shauli ◽  
Nadav Brandes ◽  
Michal Linial

AbstractThe characterization of human genetic variation in coding regions is fundamental to our understanding of protein function, structure, and evolution. Amino-acid (AA) substitution matrices such as BLOSUM (BLOcks SUbstitution Matrix) and PAM (Point Accepted Mutations) encapsulate the stochastic nature of such proteomic variation and are used in studying protein families and evolutionary processes. However, these matrices were constructed from protein sequences spanning long evolutionary distances and are not designed to reflect polymorphism within species. To accurately represent proteomic variation within the human population, we constructed a set of human-centric substitution matrices derived from genetic variations by analyzing the frequencies of >4.8M single nucleotide variants (SNVs). These human-specific matrices expose short-term evolutionary trends at both codon and AA resolution and therefore present an evolutionary perspective that differs from that implicated in the traditional matrices. Specifically, our matrices consider the directionality of variants, and uncover a set of AA pairs that exhibit a strong tendency to substitute in a specific direction. We further demonstrate that the substitution rates of nucleotides only partially determine AA substitution rates. Finally, we investigate AA substitutions in post-translational modification (PTM) and ion-binding sites. We confirm a strong propensity towards conservation of the identity of the AA that participates in such functions. The empirically-derived human-specific substitution matrices expose purifying selection over a range of residue-based protein properties. The new substitution matrices provide a robust baseline for the analysis of protein variations in health and disease. The underlying methodology is available as an open-access to the biomedical community.


2005 ◽  
Vol 280 (25) ◽  
pp. 24104-24112 ◽  
Author(s):  
Shigetarou Mori ◽  
Shigeyuki Kawai ◽  
Feng Shi ◽  
Bunzo Mikami ◽  
Kousaku Murata

Sign in / Sign up

Export Citation Format

Share Document