protein space
Recently Published Documents


TOTAL DOCUMENTS

36
(FIVE YEARS 9)

H-INDEX

11
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Krzysztof Odrzywolek ◽  
Zuzanna Karwowska ◽  
Jan Majta ◽  
Aleksander Byrski ◽  
Kaja Milanowska-Zabel ◽  
...  

Understanding the function of microbial proteins is essential to reveal the clinical potential of the microbiome. The application of high-throughput sequencing technologies allows for fast and increasingly cheaper acquisition of data from microbial communities. However, many of the inferred protein sequences are novel and not catalogued, hence the possibility of predicting their function through conventional homology-based approaches is limited. Here, we leverage a deep-learning-based representation of proteins to assess its utility in alignment-free analysis of microbial proteins. We trained a language model on the Unified Human Gastrointestinal Protein catalogue and validated the resulting protein representation on the bacterial part of the SwissProt database. Finally, we present a use case on proteins involved in SCFA metabolism. Results indicate that our model (ArdiMiPE) manages to accurately represent features related to protein structure and function, allowing for alignment-free protein analyses. Technologies such as ArdiMiPE that contextualize metagenomic data are a promising direction to deeply understand the microbiome.


2021 ◽  
Vol 22 (15) ◽  
pp. 7773
Author(s):  
Neann Mathai ◽  
Conrad Stork ◽  
Johannes Kirchmair

Experimental screening of large sets of compounds against macromolecular targets is a key strategy to identify novel bioactivities. However, large-scale screening requires substantial experimental resources and is time-consuming and challenging. Therefore, small to medium-sized compound libraries with a high chance of producing genuine hits on an arbitrary protein of interest would be of great value to fields related to early drug discovery, in particular biochemical and cell research. Here, we present a computational approach that incorporates drug-likeness, predicted bioactivities, biological space coverage, and target novelty, to generate optimized compound libraries with maximized chances of producing genuine hits for a wide range of proteins. The computational approach evaluates drug-likeness with a set of established rules, predicts bioactivities with a validated, similarity-based approach, and optimizes the composition of small sets of compounds towards maximum target coverage and novelty. We found that, in comparison to the random selection of compounds for a library, our approach generates substantially improved compound sets. Quantified as the “fitness” of compound libraries, the calculated improvements ranged from +60% (for a library of 15,000 compounds) to +184% (for a library of 1000 compounds). The best of the optimized compound libraries prepared in this work are available for download as a dataset bundle (“BonMOLière”).


Author(s):  
Danilo Gullotto

Abstract In the regime of domain classifications, the protein universe unveils a discrete set of folds connected by hierarchical relationships. Instead, at sub-domain-size resolution and because of physical constraints not necessarily requiring evolution to shape polypeptide chains, networks of protein motifs depict a continuous view that lies beyond the extent of hierarchical classification schemes. A number of studies, however, suggest that universal sub-sequences could be the descendants of peptides emerged in an ancient pre-biotic world. Should this be the case, evolutionary signals retained by structurally conserved motifs, along with hierarchical features of ancient domains, could sew relationships among folds that diverged beyond the point where homology is discernable. In view of the aforementioned, this paper provides a rationale where a network with hierarchical and continuous levels of the protein space, together with sequence profiles that probe the extent of sequence similarity and contacting residues that capture the transition from pre-biotic to domain world, has been used to explore relationships between ancient folds. Statistics of detected signals have been reported. As a result, an example of an emergent sub-network that makes sense from an evolutionary perspective, where conserved signals retrieved from the assessed protein space have been co-opted, has been discussed.


2020 ◽  
Vol 39 (5) ◽  
pp. 472-475
Author(s):  
Jorge A. Vila
Keyword(s):  

Author(s):  
Nir Hecht ◽  
Caroline L Monteil ◽  
Guy Perrière ◽  
Marina Vishkautzan ◽  
Eyal Gur

Abstract The understanding of how proteins evolve to perform novel functions has long been sought by biologists. In this regard, two homologous bacterial enzymes, PafA and Dop, pose an insightful case study, as both rely on similar mechanistic properties, yet catalyze different reactions. PafA conjugates a small protein tag to target proteins, whereas Dop removes the tag by hydrolysis. Given that both enzymes present a similar fold and high sequence similarity, we sought to identify the differences in the amino acid sequence and folding responsible for each distinct activity. We tackled this question using analysis of sequence–function relationships, and identified a set of uniquely conserved residues in each enzyme. Reciprocal mutagenesis of the hydrolase, Dop, completely abolished the native activity, at the same time yielding a catalytically active ligase. Based on the available Dop and PafA crystal structures, this change of activity required a conformational change of a critical loop at the vicinity of the active site. We identified the conserved positions essential for stabilization of the alternative loop conformation, and tracked alternative mutational pathways that lead to a change in activity. Remarkably, all these pathways were combined in the evolution of PafA and Dop, despite their redundant effect on activity. Overall, we identified the residues and structural elements in PafA and Dop responsible for their activity differences. This analysis delineated, in molecular terms, the changes required for the emergence of a new catalytic function from a preexisting one.


2020 ◽  
Author(s):  
Kentaro Tomii ◽  
Shravan Kumar ◽  
Degui Zhi ◽  
Steven E. Brenner

AbstractBackgroundInsertion and deletion sequencing errors are relatively common in next-generation sequencing data and produce long stretches of mistranslated sequence. These frameshifting errors can cause very serious damages to downstream data analysis of reads. However, it is possible to obtain more precise alignment of DNA sequences by taking into account both coding frame and sequencing errors estimated by quality scores.ResultsHere we designed and proposed a novel hidden Markov model (HMM)-based pairwise alignment algorithm, Meta-Align, that aligns DNA sequences in the protein space, incorporating quality scores from the DNA sequences and allowing frameshifts caused by insertions and deletions. Our model is based on both an HMM transducer of a pair HMM and profile HMMs for all possible amino acid pairs. A Viterbi algorithm over our model produces the optimal alignment of a pair of metagenomic reads taking into account all possible translating frames and gap penalties in both the protein space and the DNA space. To reduce the sheer number of states of this model, we also derived and implemented a computationally feasible model, leveraging the degeneracy of the genetic code. In a benchmark test on a diverse set of simulated reads based on BAliBASE we show that Meta-Align outperforms TBLASTX which compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database using the BLAST algorithm. We also demonstrate the effects of incorporating quality scores on Meta-Align.ConclusionsMeta-Align will be particularly effective when applied to error-prone DNA sequences. The package of our software can be downloaded at https://github.com/shravan-repos/Metaalign.


Genetics ◽  
2020 ◽  
Vol 214 (4) ◽  
pp. 749-754
Author(s):  
C. Brandon Ogbunugafor

In 1970, John Maynard Smith published a letter, entitled “Natural Selection and the Concept of a Protein Space,” that proposed a simple analogy for the incremental process of adaptive evolution. His “Protein Space” analogy contains the substrate for many central ideas in evolutionary genetics, and has motivated important discoveries within several subdisciplines of evolutionary science. In this Perspectives article, I commemorate the 50th anniversary of this seminal work by discussing its unique legacy and by describing its intriguing historical context. I propose that the Protein Space analogy is not only important because of its scientific richness, but also because of what it can teach us about the art of constructing useful and subversive analogies.


Sign in / Sign up

Export Citation Format

Share Document