Deterministic Motif Mining in Protein Databases

2009 ◽  
pp. 2632-2656
Author(s):  
Pedro Gabriel Ferreira ◽  
Paulo Jorge Azevedo

Protein sequence motifs describe, through means of enhanced regular expression syntax, regions of amino acids that have been conserved across several functionally related proteins. These regions may have an implication at the structural and functional level of the proteins. Sequence motif analysis can bring significant improvements towards a better understanding of the protein sequence- structure-function relation. In this chapter, we review the subject of mining deterministic motifs from protein sequence databases. We start by giving a formal definition of the different types of motifs and the respective specificities. Then, we explore the methods available to evaluate the quality and interest of such patterns. Examples of applications and motif repositories are described. We discuss the algorithmic aspects and different methodologies for motif extraction. A brief description on how sequence motifs can be used to extract structural level information patterns is also provided.

Author(s):  
Pedro Gabriel Ferreira ◽  
Paulo Jorge Azevedo

Protein sequence motifs describe, through means of enhanced regular expression syntax, regions of amino-acids that have been conserved across several functionally related proteins. These regions may have an implication at the structural and functional level of the proteins. Sequence motif analysis can bring significant improvements towards a better understanding of the protein sequence-structure-function relation. In this chapter we review the subject of mining deterministic motifs from protein sequence databases. We start by giving a formal definition of the different types of motifs and the respective specificities. Then, we explore the methods available to evaluate the quality and interest of such patterns. Examples of applications and motif repositories are described. We discuss the algorithmic aspects and different methodologies for motif extraction. A briefly description on how sequence motifs can be used to extract structural level information patterns is also provided.


2008 ◽  
pp. 1722-1746
Author(s):  
Pedro Gabriel Ferreira ◽  
Paulo Jorge Azevedo

Protein sequence motifs describe, through means of enhanced regular expression syntax, regions of amino-acids that have been conserved across several functionally related proteins. These regions may have an implication at the structural and functional level of the proteins. Sequence motif analysis can bring significant improvements towards a better understanding of the protein sequence-structure-function relation. In this chapter we review the subject of mining deterministic motifs from protein sequence databases. We start by giving a formal definition of the different types of motifs and the respective specificities. Then, we explore the methods available to evaluate the quality and interest of such patterns. Examples of applications and motif repositories are described. We discuss the algorithmic aspects and different methodologies for motif extraction. A briefly description on how sequence motifs can be used to extract structural level information patterns is also provided.


2020 ◽  
Author(s):  
Rachel Nadeau ◽  
Soroush Shahryari Fard ◽  
Amit Scheer ◽  
Emily Roth ◽  
Dallas Nygard ◽  
...  

AbstractWhile the COVID-19 pandemic is causing important loss of life, knowledge of the effects of the causative SARS-CoV-2 virus on human cells is currently limited. Investigating protein-protein interactions (PPIs) between viral and host proteins can provide a better understanding of the mechanisms exploited by the virus and enable the identification of potential drug targets. We therefore performed an in-depth computational analysis of the interactome of SARS-CoV-2 and human proteins in infected HEK293 cells published by Gordon et al. to reveal processes that are potentially affected by the virus and putative protein binding sites. Specifically, we performed a set of network-based functional and sequence motif enrichment analyses on SARS-CoV-2-interacting human proteins and on a PPI network generated by supplementing viral-host PPIs with known interactions. Using a novel implementation of our GoNet algorithm, we identified 329 Gene Ontology terms for which the SARS-CoV-2-interacting human proteins are significantly clustered in the network. Furthermore, we present a novel protein sequence motif discovery approach, LESMoN-Pro, that identified 9 amino acid motifs for which the associated proteins are clustered in the network. Together, these results provide insights into the processes and sequence motifs that are putatively implicated in SARS-CoV-2 infection and could lead to potential therapeutic targets.


2006 ◽  
Vol 52 (3-4) ◽  
pp. 375-387 ◽  
Author(s):  
Edward N. Trifonov

Four fundamentally novel, recent developments make a basis for the Theory of Early Molecular Evolution. The theory outlines the molecular events from the onset of the triplet code to the formation of the earliest sequence/structure/function modules of proteins. These developments are: (1) Reconstruction of the evolutionary chart of codons; (2) Discovery of omnipresent protein sequence motifs, apparently conserved since the last common ancestor; (3) Discovery of closed loops—standard structural modules of modern proteins; (4) Construction of protein sequence space of module size fragments, with far-reaching evolutionary implications. The theory generates numerous predictions, confirmed by massive nucleotide and protein sequence analyses, such as existence of two distinct classes of amino acids, and their periodical distribution along the sequences. The emerging picture of the earliest molecular evolutionary events is outlined: consecutive engagement of codons, formation of the earliest short peptides, and growth of the polypeptide chains to the size of loop closure, 25-30 residues.


Sign in / Sign up

Export Citation Format

Share Document