Homology-Based Annotation of Large Protein Datasets

Author(s):  
Marco Punta ◽  
Jaina Mistry
2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Pablo Mier ◽  
Miguel A. Andrade-Navarro

Abstract According to the amino acid composition of natural proteins, it could be expected that all possible sequences of three or four amino acids will occur at least once in large protein datasets purely by chance. However, in some species or cellular context, specific short amino acid motifs are missing due to unknown reasons. We describe these as Avoided Motifs, short amino acid combinations missing from biological sequences. Here we identify 209 human and 154 bacterial Avoided Motifs of length four amino acids, and discuss their possible functionality according to their presence in other species. Furthermore, we determine two Avoided Motifs of length three amino acids in human proteins specifically located in the cytoplasm, and two more in secreted proteins. Our results support the hypothesis that the characterization of Avoided Motifs in particular contexts can provide us with information about functional motifs, pointing to a new approach in the use of molecular sequences for the discovery of protein function.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Bruno Thiago de Lima Nichio ◽  
Aryel Marlus Repula de Oliveira ◽  
Camilla Reginatto de Pierri ◽  
Leticia Graziela Costa Santos ◽  
Alexandre Quadros Lejambre ◽  
...  

eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Daniel Griffith ◽  
Alex S Holehouse

The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems.


F1000Research ◽  
2013 ◽  
Vol 2 ◽  
pp. 190 ◽  
Author(s):  
Alexey V Uversky ◽  
Bin Xue ◽  
Zhenling Peng ◽  
Lukasz Kurgan ◽  
Vladimir N Uversky

Earlier computational and bioinformatics analysis of several large protein datasets across 28 species showed that proteins involved in regulation and execution of programmed cell death (PCD) possess substantial amounts of intrinsic disorder. Based on the comprehensive analysis of these datasets by a wide array of modern bioinformatics tools it was concluded that disordered regions of PCD-related proteins are involved in a multitude of biological functions and interactions with various partners, possess numerous posttranslational modification sites, and have specific evolutionary patterns (Peng et al. 2013). This study extends our previous work by providing information on the intrinsic disorder status of some of the major players of the three major PCD pathways: apoptosis, autophagy, and necroptosis. We also present a detailed description of the disorder status and interactomes of selected proteins that are involved in the p53-mediated apoptotic signaling pathways.


2012 ◽  
Vol 28 (19) ◽  
pp. 2431-2440 ◽  
Author(s):  
Edvin Fuglebakk ◽  
Julián Echave ◽  
Nathalie Reuter

2018 ◽  
Author(s):  
Bruno Thiago de Lima Nichio ◽  
Aryel Marlus Repula de Oliveira ◽  
Camilla Reginatto de Pierri ◽  
Leticia Graziela Costa Santos ◽  
Ricardo Assunção Vialle ◽  
...  

AbstractThe need to develop computational tools and techniques that can predict efficiently consistent groups of family proteins in large volume of biological information is still a great perspective in Bioinformatic studies. Besides that, it is difficult to increase speed demanding low computational processing to minimize the information complexity. Tools already consolidated as the CD-HIT and UCLUST generates very compact data that makes the Data Mining difficult and have low efficiency when used for detect homology among proteins requiring manual intervention, therefore it is necessary a tool that is also efficient in low similarity. Here we present a new approach for the Data Mining and analysis of homology in large dataset of protein sequences, the RAFTS3G. We used the UniProtKB/Swiss-Prot database with the most popular clustering tools and RAFTS3G proved to be more than 10 times faster than CD-HIT and its strategy increases the performance in low similarity to detect protein families.Contact:[email protected]


2021 ◽  
Author(s):  
Daniel Griffith ◽  
Alex S Holehouse

The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex non-linear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid-beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems.


Author(s):  
H.B. Pollard ◽  
C.E. Creutz ◽  
C.J. Pazoles ◽  
J.H. Scott

Exocytosis is a general concept describing secretion of enzymes, hormones and transmitters that are otherwise sequestered in intracellular granules. Chemical evidence for this concept was first gathered from studies on chromaffin cells in perfused adrenal glands, in which it was found that granule contents, including both large protein and small molecules such as adrenaline and ATP, were released together while the granule membrane was retained in the cell. A number of exhaustive reviews of this early work have been published and are summarized in Reference 1. The critical experiments demonstrating the importance of extracellular calcium for exocytosis per se were also first performed in this system (2,3), further indicating the substantial service given by chromaffin cells to those interested in secretory phenomena over the years.


Sign in / Sign up

Export Citation Format

Share Document