Evidence for the Emergence of β-Trefoils by ‘Peptide Budding’ from an IgG-like β-Sandwich

2021 ◽  
Author(s):  
Liam M. Longo ◽  
Rachel Kolodny ◽  
Shawn E. McGlynn

AbstractAs sequence and structure comparison algorithms gain sensitivity, the intrinsic interconnectedness of the protein universe has become increasingly apparent. Despite this general trend, β-trefoils have emerged as an uncommon counterexample: They are an isolated protein lineage for which few, if any, sequence or structure associations to other lineages have been identified. If β-trefoils are, in fact, remote islands in sequence-structure space, it implies that the oligomerizing peptide that founded the β-trefoil lineage itself arose de novo. To better understand β-trefoil evolution, and to probe the limits of fragment sharing across the protein universe, we identified both ‘β-trefoil bridging themes’ (evolutionarily-related sequence segments) and ‘β-trefoil-like motifs’ (structure motifs with a hallmark feature of the β-trefoil architecture) in multiple, ostensibly unrelated, protein lineages. The success of the present approach stems, in part, from considering β-trefoil sequence segments or structure motifs rather than the β-trefoil architecture as a whole, as has been done previously. The newly uncovered inter-lineage connections presented here suggest a novel hypothesis about the origins of the β-trefoil fold itself – namely, that it is a derived fold formed by ‘budding’ from an Immunoglobulin-like β-sandwich protein. These results demonstrate how the emergence of a folded domain from a peptide need not be a signature of antiquity and underpin an emerging truth: few protein lineages escape nature’s sewing table.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Huihui Li ◽  
Mingzhe Xie ◽  
Yan Wang ◽  
Ludong Yang ◽  
Zhi Xie ◽  
...  

AbstractriboCIRC is a translatome data-oriented circRNA database specifically designed for hosting, exploring, analyzing, and visualizing translatable circRNAs from multi-species. The database provides a comprehensive repository of computationally predicted ribosome-associated circRNAs; a manually curated collection of experimentally verified translated circRNAs; an evaluation of cross-species conservation of translatable circRNAs; a systematic de novo annotation of putative circRNA-encoded peptides, including sequence, structure, and function; and a genome browser to visualize the context-specific occupant footprints of circRNAs. It represents a valuable resource for the circRNA research community and is publicly available at http://www.ribocirc.com.


2014 ◽  
Vol 42 (W1) ◽  
pp. W377-W381 ◽  
Author(s):  
Gewen He ◽  
Albert Steppi ◽  
Jose Laborde ◽  
Anuj Srivastava ◽  
Peixiang Zhao ◽  
...  

2012 ◽  
Vol 2012 ◽  
pp. 1-5 ◽  
Author(s):  
Julien Allali ◽  
Cédric Saule ◽  
Cédric Chauve ◽  
Yves d’Aubenton-Carafa ◽  
Alain Denise ◽  
...  

The pairwise comparison of RNA secondary structures is a fundamental problem, with direct application in mining databases for annotating putative noncoding RNA candidates in newly sequenced genomes. An increasing number of software tools are available for comparing RNA secondary structures, based on different models (such as ordered trees or forests, arc annotated sequences, and multilevel trees) and computational principles (edit distance, alignment). We describe here the website BRASERO that offers tools for evaluating such software tools on real and synthetic datasets.


2020 ◽  
Author(s):  
Xingjie Pan ◽  
Michael Thompson ◽  
Yang Zhang ◽  
Lin Liu ◽  
James S. Fraser ◽  
...  

AbstractNaturally occurring proteins use a limited set of fold topologies, but vary the precise geometries of structural elements to create distinct shapes optimal for function. Here we present a computational design method termed LUCS that mimics nature’s ability to create families of proteins with the same overall fold but precisely tunable geometries. Through near-exhaustive sampling of loop-helix-loop elements, LUCS generates highly diverse geometries encompassing those found in nature but also surpassing known structure space. Biophysical characterization shows that 17 (38%) out of 45 tested LUCS designs were well folded, including 16 with designed non-native geometries. Four experimentally solved structures closely match the designs. LUCS greatly expands the designable structure space and provides a new paradigm for designing proteins with tunable geometries customizable for novel functions.One Sentence SummaryA computational method to systematically sample loop-helix-loop geometries expands the structure space of designer proteins.


2019 ◽  
Author(s):  
Mostafa Karimi ◽  
Shaowen Zhu ◽  
Yue Cao ◽  
Yang Shen

AbstractMotivationFacing data quickly accumulating on protein sequence and structure, this study is addressing the following question: to what extent could current data alone reveal deep insights into the sequence-structure relationship, such that new sequences can be designed accordingly for novel structure folds?ResultsWe have developed novel deep generative models, constructed low-dimensional and generalizable representation of fold space, exploited sequence data with and without paired structures, and developed ultra-fast fold predictor as an oracle providing feedback. The resulting semi-supervised gcWGAN is assessed with the oracle over 100 novel folds not in the training set and found to generate more yields and cover 3.6 times more target folds compared to a competing data-driven method (cVAE). Assessed with structure predictor over representative novel folds (including one not even part of basis folds), gcWGAN designs are found to have comparable or better fold accuracy yet much more sequence diversity and novelty than cVAE. gcWGAN explores uncharted sequence space to design proteins by learning from current sequence-structure data. The ultra fast data-driven model can be a powerful addition to principle-driven design methods through generating seed designs or tailoring sequence space.AvailabilityData and source codes will be available upon [email protected] informationSupplementary data are available at Bioinformatics online.


2017 ◽  
Vol 7 (1) ◽  
Author(s):  
Prabhakaran Munusamy ◽  
Yevgen Zolotarov ◽  
Louis-Valentin Meteignier ◽  
Peter Moffett ◽  
Martina V. Strömvik

2013 ◽  
Vol 41 (11) ◽  
pp. e114-e114 ◽  
Author(s):  
Jose Laborde ◽  
Daniel Robinson ◽  
Anuj Srivastava ◽  
Eric Klassen ◽  
Jinfeng Zhang

2017 ◽  
Vol 15 (06) ◽  
pp. 1740009 ◽  
Author(s):  
Abdullah N. Arslan ◽  
Jithendar Anandan ◽  
Eric Fry ◽  
Keith Monschke ◽  
Nitin Ganneboina ◽  
...  

Recently proposed relative addressing-based ([Formula: see text]) RNA secondary structure representation has important features by which an RNA structure database can be stored into a suffix array. A fast substructure search algorithm has been proposed based on binary search on this suffix array. Using this substructure search algorithm, we present a fast algorithm that finds the largest common substructure of given multiple RNA structures in [Formula: see text] format. The multiple RNA structure comparison problem is NP-hard in its general formulation. We introduced a new problem for comparing multiple RNA structures. This problem has more strict similarity definition and objective, and we propose an algorithm that solves this problem efficiently. We also develop another comparison algorithm that iteratively calls this algorithm to locate nonoverlapping large common substructures in compared RNAs. With the new resulting tools, we improved the RNASSAC website (linked from http://faculty.tamuc.edu/aarslan ). This website now also includes two drawing tools: one specialized for preparing RNA substructures that can be used as input by the search tool, and another one for automatically drawing the entire RNA structure from a given structure sequence.


Sign in / Sign up

Export Citation Format

Share Document