Phylogenetic, sequence and structural analysis of Insulin superfamily proteins reveals an indelible link between evolution and structure-function relationship

AbstractThe insulin superfamily proteins (ISPs), in particular, insulin, IGFs and relaxins are key modulators of animal physiology. They are known to have evolved from the same ancestral gene and have diverged into proteins with varied sequences and distinct functions, but maintain a similar structural architecture stabilized by highly conserved disulphide bridges. A recent surge of sequence data and the structures of these proteins prompted a need for a comprehensive analysis which connects the evolution of these sequences in the light of available functional and structural information and their interaction with cognate receptors. This study reveals a) unusually high sequence conservation of IGFs (>90%), which has never been reported before. In fact, it was interesting to observe that the functional domains (excluding signal peptide) of human, horse, pig and Ord’s kangaroo rat are 100% identical. (b) an updated definition of the signature motif of the relaxin family (c) a non-canonical C-peptide cleavage site in a few killifish insulin sequences and so on. We also provide a structure-based rationale for such conservation by introducing a concept called binding partners imposed evolutionary constraints. Furthermore, the high conservation of IGFs appears to represent a classic case of resistance to sequence diversity exerted by physiologically important interactions with multiple partners. Furthermore, we propose a probable mechanism for C-peptide cleavage in killifish insulin sequences.

Download Full-text

The insulin A and B chains contain structural information for the formation of the native molecule. Studies with protein disulphide-isomerase

Biochemical Journal ◽

10.1042/bj2680429 ◽

1990 ◽

Vol 268 (2) ◽

pp. 429-435 ◽

Cited By ~ 31

Author(s):

J G Tang ◽

C L Tsou

Keyword(s):

Structural Information ◽

Sufficient Information ◽

Appreciable Effect ◽

Dilute Solution ◽

Reaction Products ◽

C Peptide ◽

Native Insulin ◽

Predominant Product ◽

Better Than

It has been shown previously [Tang, Wang & Tsou (1988) Biochem. J. 255, 451-455] that, under appropriate conditions, native insulin can be obtained from scrambled insulin or the S-sulphonates of the chains with a yield of 25-30%, together with reaction products containing the separated A and B chains. The native hormone is by far the predominant product among the isomers containing both chains. It is now shown that the presence of added C peptide has no appreciable effect on the yield of native insulin. At higher temperatures the content of the native hormone decreases whereas those of the separated chains increase, and in no case was scrambled insulin containing both chains the predominant product in the absence of denaturants. Both the scrambling and the unscrambling reactions give similar h.p.l.c. profiles for the products. Under similar conditions cross-linked insulin with native disulphide linkages can be obtained from the scrambled molecule or from the S-sulphonate derivative with yields of 50% and 75% respectively at 4 degrees C, and with a dilute solution of the hexa-S-sulphonate yields better than 90% can be obtained. The regenerated product is shown to have the native disulphide bridges by treatment with CNBr to give insulin and by the identity of the h.p.l.c. profile of its peptic hydrolysate with that for cross-linked insulin. It appears that the insulin A and B chains contain sufficient information for the formation of the native molecule and that the role of the connecting C peptide is to bring and to keep the two chains together.

Download Full-text

Genomic Characterization of Human DSPG3

Genome Research ◽

10.1101/gr.9.5.449 ◽

1999 ◽

Vol 9 (5) ◽

pp. 449-456

Author(s):

Michelle Deere ◽

Jose L. Dieguez ◽

Sung-Joo Kim Yoon ◽

David Hewett-Emmett ◽

Albert de la Chapelle ◽

...

Keyword(s):

Transcription Start Site ◽

Sequence Data ◽

Stop Codon ◽

Genomic Structure ◽

Start Codon ◽

Start Site ◽

Transcription Start ◽

Exon 2 ◽

Ancestral Gene ◽

Link Type

DSPG3, the human homolog to chick PG-Lb, is a member of the small leucine-rich repeat proteoglycan (SLRP) family, including decorin, biglycan, fibromodulin, and lumican. In contrast to the tissue distribution of the other SLRPs, DSPG3 is predominantly expressed in cartilage. In this study, we have determined that the human DSPG3 gene is composed of seven exons: Exon 2 ofDSPG3 includes the start codon, exons 4–7 code for the leucine-rich repeats, exons 3 and 7 contain the potential glycosaminoglycan attachment sites, and exon 7 contains the potential N-glycosylation sites and the stop codon. We have identified two polymorphic variations, an insertion/deletion composed of 19 nucleotides in intron 1 and a tetranucleotide (TATT)n repeat in intron 5. Analysis of 1.6 kb of upstream promoter sequence ofDSPG3 reveals three TATA boxes, one of which is 20 nucleotides before the transcription start site. The transcription start site precedes the translation start site by 98 nucleotides. There are 14 potential binding sites for SOX9, a transcription factor present in cartilage, in the promoter, and in the first intron of DSPG3. We have examined the evolution of the SLRP gene family and found that gene products clustered together in the evolutionary tree are encoded by genes with similarities in genomic structure. Hence, it appears that the majority of the introns in the SLRP genes were inserted after the differentiation of the SLRP genes from an ancestral gene that was most likely composed of 2–3 exons.[The sequence data described in this paper have been submitted to GenBank under accession nos.AF031658 and U63814.]

Download Full-text

Pseutarin C, a Prothrombin Activator from Pseudonaja textilis Venom: Its Structural and Functional Similarity to Mammalian Coagulation Factor Xa-Va Complex

Thrombosis and Haemostasis ◽

10.1055/s-0037-1613264 ◽

2002 ◽

Vol 88 (10) ◽

pp. 611-619 ◽

Cited By ~ 64

Author(s):

Veena Rao ◽

R. Kini

Keyword(s):

Structural Information ◽

Sequence Data ◽

Domain Architecture ◽

Factor Xa ◽

Coagulation Factor ◽

Peptide Bonds ◽

Amino Terminal ◽

Factor Va ◽

Group D ◽

Bovine Factor

SummarySeveral snake venoms contain procoagulant proteins that can activate prothrombin. We have purified pseutarin C, a prothrombin activator from the venom of the Australian brown snake (Pseudonaja textilis). It converts prothrombin to thrombin by cleaving both the peptide bonds Arg274 – Thr275 and Arg323 – Ile324, similar to mammalian factor Xa. It is a protein complex (∼250 Kd) consisting of an enzymatic and a nonenzymatic subunit. These subunits were separated by reverse phase HPLC and their interactions with bovine factor Xa and factor Va were studied. The enzymatic subunit of pseutarin C has a ∼13 fold higher affinity for bovine factor Va (K d of 11.4 nM for pseutarin C enzymatic subunit – bovine factor Va interaction as compared to a K d of 147.4 nM for the bovine factor Xa-Va interaction). The non-enzymatic component, however, was unable to activate bovine factor Xa. N-terminal sequence analysis of the catalytic subunit of pseutarin C showed ∼ 60% homology to mammalian factor Xa and ∼78% homology to trocarin, a group D prothrombin activator from Tropidechis carinatus venom. Structural information for the non-enzymatic subunit of pseutarin C was obtained by amino terminal sequencing of several internal peptides. The sequence data obtained indicates that the non-enzymatic subunit of pseutarin C has similar domain architecture like the mammalian factor Va and the overall homology is ∼55%. Thus pseutarin C is the first venom procoagulant protein that is structurally and functionally similar to mammalian factor Xa-Va complex.

Download Full-text

New Insights into Ligand-Receptor Pairing and Coevolution of Relaxin Family Peptides and Their Receptors in Teleosts

International Journal of Evolutionary Biology ◽

10.1155/2012/310278 ◽

2012 ◽

Vol 2012 ◽

pp. 1-14 ◽

Cited By ~ 16

Author(s):

Sara Good ◽

Sergey Yegorov ◽

Joran Martijn ◽

Jens Franck ◽

Jan Bogerd

Keyword(s):

Sequence Data ◽

Gene Families ◽

Theoretical Background ◽

Whole Genome Duplications ◽

Relaxin Family ◽

Receptor Interactions ◽

Genome Duplications ◽

Show Evidence ◽

Specific Receptors ◽

Respective Function

Relaxin-like peptides (RLN/INSL) play diverse roles in reproductive and neuroendocrine processes in placental mammals and are functionally associated with two distinct types of receptors (RXFP) for each respective function. The diversification of RLN/INSL and RXFP gene families in vertebrates was predominantly driven by whole genome duplications (2R and 3R). Teleosts preferentially retained duplicates of genes putatively involved in neuroendocrine regulation, harboring a total of 10-11 receptors and 6 ligand genes, while most mammals have equal numbers of ligands and receptors. To date, the ligand-receptor relationships of teleost Rln/Insl peptides and their receptors have largely remained unexplored. Here, we use selection analyses based on sequence data from 5 teleosts and qPCR expression data from zebrafish to explore possible ligand-receptor pairings in teleosts. We find support for the hypothesis that, with the exception of RLN, which has undergone strong positive selection in mammalian lineages, the ligand and receptor genes shared between mammals and teleosts appear to have similar pairings. On the other hand, the teleost-specific receptors show evidence of subfunctionalization. Overall, this study underscores the complexity of RLN/INSL and RXFP ligand-receptor interactions in teleosts and establishes theoretical background for further experimental work in nonmammals.

Download Full-text

A Novel Representation of Sequence Data Based on Structural Information for Effective Music Retrieval

Database Systems for Advanced Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-540-24571-1_36 ◽

2004 ◽

pp. 393-404 ◽

Cited By ~ 1

Author(s):

Chia-Hsiung Lee ◽

Chung-Wen Cho ◽

Yi-Hung Wu ◽

Arbee L. P. Chen

Keyword(s):

Structural Information ◽

Sequence Data ◽

Music Retrieval

Download Full-text

The dynamic nature of the conserved tegument protein UL37 of herpesviruses

Journal of Biological Chemistry ◽

10.1074/jbc.ra118.004481 ◽

2018 ◽

Vol 293 (41) ◽

pp. 15827-15839 ◽

Cited By ~ 9

Author(s):

Andrea L. Koenigsberg ◽

Ekaterina E. Heldwein

Keyword(s):

Viral Replication ◽

Pseudorabies Virus ◽

Structural Information ◽

Surface Region ◽

Helical Structure ◽

Tegument Protein ◽

Dynamic Nature ◽

Multiple Partners ◽

Capsid Shell ◽

Folded Core

In all herpesviruses, the space between the capsid shell and the lipid envelope is occupied by the unique tegument layer composed of proteins that, in addition to structural roles, play many other roles in the viral replication. UL37 is a highly conserved tegument protein that has activities ranging from virion morphogenesis to directional capsid trafficking to manipulation of the host innate immune response and binds multiple partners. The N-terminal half of UL37 (UL37N) has a compact bean-shaped α-helical structure that contains a surface region essential for neuroinvasion. However, no biochemical or structural information is currently available for the C-terminal half of UL37 (UL37C) that mediates most of its interactions with multiple binding partners. Here, we show that the C-terminal half of UL37 from pseudorabies virus UL37C is a conformationally flexible monomer composed of an elongated folded core and an unstructured C-terminal tail. This elongated structure, along with that of its binding partner UL36, explains the nature of filamentous tegument structures bridging the capsid and the envelope. We propose that the dynamic nature of UL37 underlies its ability to perform diverse roles during viral replication.

Download Full-text

Coevolutionary Analysis of Protein Subfamilies by Sequence Reweighting

Entropy ◽

10.3390/e21111127 ◽

2019 ◽

Vol 21 (11) ◽

pp. 1127 ◽

Cited By ~ 1

Author(s):

Malinverni ◽

Barducci

Keyword(s):

Sequence Variation ◽

Structural Information ◽

Sequence Data ◽

Response Regulator ◽

A Priori ◽

Structural Features ◽

Sequence Alignments ◽

Continuous Sequence ◽

The Family ◽

Variation Data

Extracting structural information from sequence co-variation has become a common computational biology practice in the recent years, mainly due to the availability of large sequence alignments of protein families. However, identifying features that are specific to sub-classes and not shared by all members of the family using sequence-based approaches has remained an elusive problem. We here present a coevolutionary-based method to differentially analyze subfamily specific structural features by a continuous sequence reweighting (SR) approach. We introduce the underlying principles and test its predictive capabilities on the Response Regulator family, whose subfamilies have been previously shown to display distinct, specific homo-dimerization patterns. Our results show that this reweighting scheme is effective in assigning structural features known a priori to subfamilies, even when sequence data is relatively scarce. Furthermore, sequence reweighting allows assessing if individual structural contacts pertain to specific subfamilies and it thus paves the way for the identification specificity-determining contacts from sequence variation data.

Download Full-text

Benchmarking inverse statistical approaches for protein structure and design with exactly solvable models

10.1101/028936 ◽

2015 ◽

Cited By ~ 2

Author(s):

Hugo Jacquin ◽

Amy Gilson ◽

Eugene Shakhnovich ◽

Simona Cocco ◽

Rémi Monasson

Keyword(s):

Protein Structure ◽

Structural Information ◽

Sequence Data ◽

Careful Analysis ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Pairwise Models ◽

Statistical Approaches ◽

And Function

Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of `true' LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons of the power of inverse approaches to the modelling of proteins from sequence data, and their limitations; we show, in particular, that their success crucially depend on the accurate inference of the Potts pairwise couplings.

Download Full-text

Complex introgression history of the erato-sara clade of Heliconius butterflies

10.1101/2021.02.10.430600 ◽

2021 ◽

Author(s):

Yuttapong Thawornwattana ◽

Fernando A. Seixas ◽

Ziheng Yang ◽

James Mallet

Keyword(s):

Gene Flow ◽

Genomic Sequence ◽

Sequence Data ◽

Whole Genome Sequence ◽

Hybrid Origin ◽

Ancestral Population ◽

Ancestral Gene ◽

Species Phylogeny ◽

Full Likelihood ◽

History Of

AbstractIntrogression plays a key role in adaptive evolution and species diversification in many groups of species including Heliconius butterflies. However, frequent hybridization and subsequent gene flow between species makes estimation of the species phylogeny challenging. Here, we infer species phylogeny and introgression events from whole-genome sequence data of six members of the erato-sara clade of Heliconius using a multispecies coalescent model with introgression (MSci) and an isolation-with-migration (IM) model. These approaches probabilistically capture the genealogical heterogeneity across the genome due to introgression and incomplete lineage sorting in a full likelihood framework. We detect robust signals of introgression across the genome, and estimate the direction, timing and magnitude of each introgression event. The results clarify several processes of speciation and introgression in the erato-sara group. In particular, we confirm ancestral gene flow between the sara clade and an ancestral population of H. telesiphe, a hybrid origin of H. hecalesia, and gene flow between the sister species H. erato and H. himera. The ability to confidently infer the presence, timing and magnitude of introgression events using genomic sequence data is helpful for understanding speciation in the presence of gene flow and will be useful for understanding the adaptive consequences of introgressed regions of the genome. Our analysis serves to highlight the power of full likelihood methods under the MSci model to the history of species divergence and cross-species introgression from genome-scale data.

Download Full-text

Molecular docking of Sulfobacillus acidophilus barbiturase with s-triazine compounds

10.7287/peerj.preprints.2070 ◽

2016 ◽

Author(s):

Zarrin Basharat ◽

Azra Yasmin

Keyword(s):

Molecular Docking ◽

Structure Prediction ◽

De Novo ◽

Structural Information ◽

Sequence Data ◽

Raw Material ◽

Biological Information ◽

Docking Analysis ◽

Herbicide Degradation ◽

Key Residues

Barbiturases have scarce structural information available and do not fit in the conventional group of proteins. It is contemplated that they play a role in catabolism of s-triazine herbicide compounds. Structure as well as interaction data information of barbiturase with s-triazine compounds is missing. Sequence data is a goldmine of biological information and acts as raw material for structure and docking analysis. De novo structure prediction of the Sulfobacillus acidophilus DSM 10332 barbiturase has been attempted in this data article. Molecular docking analysis was carried out with atrazine, simazine and hexazinone belonging to s-triazine class of herbicides. The analysis revealed key residues necessary for these interactions. The generated data could be used by environmental scientists working on the enzyme assisted herbicide degradation.

Download Full-text