Evidence for the Emergence of β-Trefoils by ‘Peptide Budding’ from an IgG-like β-Sandwich

Mapping Intimacies ◽

10.1101/2021.10.04.462989 ◽

2021 ◽

Author(s):

Liam M. Longo ◽

Rachel Kolodny ◽

Shawn E. McGlynn

Keyword(s):

De Novo ◽

General Trend ◽

Sequence Structure ◽

Structure Comparison ◽

Related Sequence ◽

Structure Space ◽

Protein Universe ◽

Remote Islands ◽

Comparison Algorithms ◽

Hallmark Feature

AbstractAs sequence and structure comparison algorithms gain sensitivity, the intrinsic interconnectedness of the protein universe has become increasingly apparent. Despite this general trend, β-trefoils have emerged as an uncommon counterexample: They are an isolated protein lineage for which few, if any, sequence or structure associations to other lineages have been identified. If β-trefoils are, in fact, remote islands in sequence-structure space, it implies that the oligomerizing peptide that founded the β-trefoil lineage itself arose de novo. To better understand β-trefoil evolution, and to probe the limits of fragment sharing across the protein universe, we identified both ‘β-trefoil bridging themes’ (evolutionarily-related sequence segments) and ‘β-trefoil-like motifs’ (structure motifs with a hallmark feature of the β-trefoil architecture) in multiple, ostensibly unrelated, protein lineages. The success of the present approach stems, in part, from considering β-trefoil sequence segments or structure motifs rather than the β-trefoil architecture as a whole, as has been done previously. The newly uncovered inter-lineage connections presented here suggest a novel hypothesis about the origins of the β-trefoil fold itself – namely, that it is a derived fold formed by ‘budding’ from an Immunoglobulin-like β-sandwich protein. These results demonstrate how the emergence of a folded domain from a peptide need not be a signature of antiquity and underpin an emerging truth: few protein lineages escape nature’s sewing table.

Download Full-text

riboCIRC: a comprehensive database of translatable circRNAs

Genome Biology ◽

10.1186/s13059-021-02300-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Huihui Li ◽

Mingzhe Xie ◽

Yan Wang ◽

Ludong Yang ◽

Zhi Xie ◽

...

Keyword(s):

De Novo ◽

Species Conservation ◽

Structure And Function ◽

Research Community ◽

Genome Browser ◽

Valuable Resource ◽

Sequence Structure ◽

A Genome ◽

Context Specific ◽

And Function

AbstractriboCIRC is a translatome data-oriented circRNA database specifically designed for hosting, exploring, analyzing, and visualizing translatable circRNAs from multi-species. The database provides a comprehensive repository of computationally predicted ribosome-associated circRNAs; a manually curated collection of experimentally verified translated circRNAs; an evaluation of cross-species conservation of translatable circRNAs; a systematic de novo annotation of putative circRNA-encoded peptides, including sequence, structure, and function; and a genome browser to visualize the context-specific occupant footprints of circRNAs. It represents a valuable resource for the circRNA research community and is publicly available at http://www.ribocirc.com.

Download Full-text

RASS: a web server for RNA alignment in the joint sequence-structure space

Nucleic Acids Research ◽

10.1093/nar/gku429 ◽

2014 ◽

Vol 42 (W1) ◽

pp. W377-W381 ◽

Cited By ~ 7

Author(s):

Gewen He ◽

Albert Steppi ◽

Jose Laborde ◽

Anuj Srivastava ◽

Peixiang Zhao ◽

...

Keyword(s):

Web Server ◽

Sequence Structure ◽

Structure Space

Download Full-text

BRASERO: A Resource for Benchmarking RNA Secondary Structure Comparison Algorithms

Advances in Bioinformatics ◽

10.1155/2012/893048 ◽

2012 ◽

Vol 2012 ◽

pp. 1-5 ◽

Cited By ~ 6

Author(s):

Julien Allali ◽

Cédric Saule ◽

Cédric Chauve ◽

Yves d’Aubenton-Carafa ◽

Alain Denise ◽

...

Keyword(s):

Noncoding Rna ◽

Fundamental Problem ◽

Pairwise Comparison ◽

Secondary Structures ◽

Software Tools ◽

Structure Comparison ◽

Rna Secondary Structures ◽

Ordered Trees ◽

Synthetic Datasets ◽

Comparison Algorithms

The pairwise comparison of RNA secondary structures is a fundamental problem, with direct application in mining databases for annotating putative noncoding RNA candidates in newly sequenced genomes. An increasing number of software tools are available for comparing RNA secondary structures, based on different models (such as ordered trees or forests, arc annotated sequences, and multilevel trees) and computational principles (edit distance, alignment). We describe here the website BRASERO that offers tools for evaluating such software tools on real and synthetic datasets.

Download Full-text

Expanding the space of protein geometries by computational design of de novo fold families

10.1101/2020.04.14.041772 ◽

2020 ◽

Author(s):

Xingjie Pan ◽

Michael Thompson ◽

Yang Zhang ◽

Lin Liu ◽

James S. Fraser ◽

...

Keyword(s):

De Novo ◽

Design Method ◽

Computational Design ◽

Computational Method ◽

Structural Elements ◽

New Paradigm ◽

Biophysical Characterization ◽

Structure Space ◽

Naturally Occurring ◽

Designer Proteins

AbstractNaturally occurring proteins use a limited set of fold topologies, but vary the precise geometries of structural elements to create distinct shapes optimal for function. Here we present a computational design method termed LUCS that mimics nature’s ability to create families of proteins with the same overall fold but precisely tunable geometries. Through near-exhaustive sampling of loop-helix-loop elements, LUCS generates highly diverse geometries encompassing those found in nature but also surpassing known structure space. Biophysical characterization shows that 17 (38%) out of 45 tested LUCS designs were well folded, including 16 with designed non-native geometries. Four experimentally solved structures closely match the designs. LUCS greatly expands the designable structure space and provides a new paradigm for designing proteins with tunable geometries customizable for novel functions.One Sentence SummaryA computational method to systematically sample loop-helix-loop geometries expands the structure space of designer proteins.

Download Full-text

De Novo Protein Design for Novel Folds using Guided Conditional Wasserstein Generative Adversarial Networks (gcWGAN)

10.1101/769919 ◽

2019 ◽

Cited By ~ 4

Author(s):

Mostafa Karimi ◽

Shaowen Zhu ◽

Yue Cao ◽

Yang Shen

Keyword(s):

Protein Design ◽

Sequence Space ◽

De Novo ◽

Sequence Data ◽

Generative Models ◽

Current Data ◽

Data Driven ◽

Supplementary Information ◽

Generative Adversarial Networks ◽

Sequence Structure

AbstractMotivationFacing data quickly accumulating on protein sequence and structure, this study is addressing the following question: to what extent could current data alone reveal deep insights into the sequence-structure relationship, such that new sequences can be designed accordingly for novel structure folds?ResultsWe have developed novel deep generative models, constructed low-dimensional and generalizable representation of fold space, exploited sequence data with and without paired structures, and developed ultra-fast fold predictor as an oracle providing feedback. The resulting semi-supervised gcWGAN is assessed with the oracle over 100 novel folds not in the training set and found to generate more yields and cover 3.6 times more target folds compared to a competing data-driven method (cVAE). Assessed with structure predictor over representative novel folds (including one not even part of basis folds), gcWGAN designs are found to have comparable or better fold accuracy yet much more sequence diversity and novelty than cVAE. gcWGAN explores uncharted sequence space to design proteins by learning from current sequence-structure data. The ultra fast data-driven model can be a powerful addition to principle-driven design methods through generating seed designs or tailoring sequence space.AvailabilityData and source codes will be available upon [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

Protein Structure Comparison: Algorithms and Applications

Lecture Notes in Computer Science - Mathematical Methods for Protein Structure Analysis and Design ◽

10.1007/978-3-540-44827-3_1 ◽

2003 ◽

pp. 1-33 ◽

Cited By ~ 21

Author(s):

Giuseppe Lancia ◽

Sorin Istrail

Keyword(s):

Protein Structure ◽

Structure Comparison ◽

Protein Structure Comparison ◽

Comparison Algorithms

Download Full-text

De novo computational identification of stress-related sequence motifs and microRNA target sites in untranslated regions of a plant translatome

Scientific Reports ◽

10.1038/srep43861 ◽

2017 ◽

Vol 7 (1) ◽

Cited By ~ 3

Author(s):

Prabhakaran Munusamy ◽

Yevgen Zolotarov ◽

Louis-Valentin Meteignier ◽

Peter Moffett ◽

Martina V. Strömvik

Keyword(s):

De Novo ◽

Untranslated Regions ◽

Sequence Motifs ◽

Microrna Target ◽

Related Sequence ◽

Target Sites ◽

Computational Identification

Download Full-text

RNA global alignment in the joint sequence–structure space using elastic shape analysis

Nucleic Acids Research ◽

10.1093/nar/gkt187 ◽

2013 ◽

Vol 41 (11) ◽

pp. e114-e114 ◽

Cited By ~ 12

Author(s):

Jose Laborde ◽

Daniel Robinson ◽

Anuj Srivastava ◽

Eric Klassen ◽

Jinfeng Zhang

Keyword(s):

Shape Analysis ◽

Global Alignment ◽

Sequence Structure ◽

Structure Space ◽

Elastic Shape Analysis

Download Full-text

Efficient RNA structure comparison algorithms

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720017400091 ◽

2017 ◽

Vol 15 (06) ◽

pp. 1740009 ◽

Cited By ~ 3

Author(s):

Abdullah N. Arslan ◽

Jithendar Anandan ◽

Eric Fry ◽

Keith Monschke ◽

Nitin Ganneboina ◽

...

Keyword(s):

Rna Structure ◽

Search Algorithm ◽

Suffix Array ◽

Rna Structures ◽

Structure Comparison ◽

Structure Representation ◽

Substructure Search ◽

Common Substructure ◽

Search Tool ◽

Comparison Algorithms

Recently proposed relative addressing-based ([Formula: see text]) RNA secondary structure representation has important features by which an RNA structure database can be stored into a suffix array. A fast substructure search algorithm has been proposed based on binary search on this suffix array. Using this substructure search algorithm, we present a fast algorithm that finds the largest common substructure of given multiple RNA structures in [Formula: see text] format. The multiple RNA structure comparison problem is NP-hard in its general formulation. We introduced a new problem for comparing multiple RNA structures. This problem has more strict similarity definition and objective, and we propose an algorithm that solves this problem efficiently. We also develop another comparison algorithm that iteratively calls this algorithm to locate nonoverlapping large common substructures in compared RNAs. With the new resulting tools, we improved the RNASSAC website (linked from http://faculty.tamuc.edu/aarslan ). This website now also includes two drawing tools: one specialized for preparing RNA substructures that can be used as input by the search tool, and another one for automatically drawing the entire RNA structure from a given structure sequence.

Download Full-text

Evaluation of Novel Protein Structure Comparison Algorithms Based on Objective Function Rankings

2009 2nd International Conference on Biomedical Engineering and Informatics ◽

10.1109/bmei.2009.5304822 ◽

2009 ◽

Author(s):

Hitomi Hasegawa ◽

Liisa Holm

Keyword(s):

Protein Structure ◽

Objective Function ◽

Structure Comparison ◽

Protein Structure Comparison ◽

Comparison Algorithms ◽

Novel Protein

Download Full-text