scholarly journals Entropy Analysis of Protein Sequences Reveals a Hierarchical Organization

Entropy ◽  
2021 ◽  
Vol 23 (12) ◽  
pp. 1647
Author(s):  
Anastasia A. Anashkina ◽  
Irina Yu. Petrushanko ◽  
Rustam H. Ziganshin ◽  
Yuriy L. Orlov ◽  
Alexei N. Nekrasov

Background: Analyzing the local sequence content in proteins, earlier we found that amino acid residue frequencies differ on various distances between amino acid positions in the sequence, assuming the existence of structural units. Methods: We used informational entropy of protein sequences to find that the structural unit of proteins is a block of adjacent amino acid residues—“information unit”. The ANIS (ANalysis of Informational Structure) method uses these information units for revealing hierarchically organized Elements of the Information Structure (ELIS) in amino acid sequences. Results: The developed mathematical apparatus gives stable results on the structural unit description even with a significant variation in the parameters. The optimal length of the information unit is five, and the number of allowed substitutions is one. Examples of the application of the method for the design of protein molecules, intermolecular interactions analysis, and the study of the mechanisms of functioning of protein molecular machines are given. Conclusions: ANIS method makes it possible not only to analyze native proteins but also to design artificial polypeptide chains with a given spatial organization and, possibly, function.

2020 ◽  
Vol 17 (1) ◽  
pp. 59-77
Author(s):  
Anand Kumar Nelapati ◽  
JagadeeshBabu PonnanEttiyappan

Background:Hyperuricemia and gout are the conditions, which is a response of accumulation of uric acid in the blood and urine. Uric acid is the product of purine metabolic pathway in humans. Uricase is a therapeutic enzyme that can enzymatically reduces the concentration of uric acid in serum and urine into more a soluble allantoin. Uricases are widely available in several sources like bacteria, fungi, yeast, plants and animals.Objective:The present study is aimed at elucidating the structure and physiochemical properties of uricase by insilico analysis.Methods:A total number of sixty amino acid sequences of uricase belongs to different sources were obtained from NCBI and different analysis like Multiple Sequence Alignment (MSA), homology search, phylogenetic relation, motif search, domain architecture and physiochemical properties including pI, EC, Ai, Ii, and were performed.Results:Multiple sequence alignment of all the selected protein sequences has exhibited distinct difference between bacterial, fungal, plant and animal sources based on the position-specific existence of conserved amino acid residues. The maximum homology of all the selected protein sequences is between 51-388. In singular category, homology is between 16-337 for bacterial uricase, 14-339 for fungal uricase, 12-317 for plants uricase, and 37-361 for animals uricase. The phylogenetic tree constructed based on the amino acid sequences disclosed clusters indicating that uricase is from different source. The physiochemical features revealed that the uricase amino acid residues are in between 300- 338 with a molecular weight as 33-39kDa and theoretical pI ranging from 4.95-8.88. The amino acid composition results showed that valine amino acid has a high average frequency of 8.79 percentage compared to different amino acids in all analyzed species.Conclusion:In the area of bioinformatics field, this work might be informative and a stepping-stone to other researchers to get an idea about the physicochemical features, evolutionary history and structural motifs of uricase that can be widely used in biotechnological and pharmaceutical industries. Therefore, the proposed in silico analysis can be considered for protein engineering work, as well as for gout therapy.


1980 ◽  
Vol 187 (1) ◽  
pp. 65-74 ◽  
Author(s):  
D Penny ◽  
M D Hendy ◽  
L R Foulds

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.


Life ◽  
2019 ◽  
Vol 9 (1) ◽  
pp. 8 ◽  
Author(s):  
Michael S. Wang ◽  
Kenric J. Hoegler ◽  
Michael H. Hecht

Life as we know it would not exist without the ability of protein sequences to bind metal ions. Transition metals, in particular, play essential roles in a wide range of structural and catalytic functions. The ubiquitous occurrence of metalloproteins in all organisms leads one to ask whether metal binding is an evolved trait that occurred only rarely in ancestral sequences, or alternatively, whether it is an innate property of amino acid sequences, occurring frequently in unevolved sequence space. To address this question, we studied 52 proteins from a combinatorial library of novel sequences designed to fold into 4-helix bundles. Although these sequences were neither designed nor evolved to bind metals, the majority of them have innate tendencies to bind the transition metals copper, cobalt, and zinc with high nanomolar to low-micromolar affinity.


2019 ◽  
Vol 20 (23) ◽  
pp. 5978 ◽  
Author(s):  
Minkiewicz ◽  
Iwaniak ◽  
Darewicz

The BIOPEP-UWM™ database of bioactive peptides (formerly BIOPEP) has recently become a popular tool in the research on bioactive peptides, especially on these derived from foods and being constituents of diets that prevent development of chronic diseases. The database is continuously updated and modified. The addition of new peptides and the introduction of new information about the existing ones (e.g., chemical codes and references to other databases) is in progress. New opportunities include the possibility of annotating peptides containing D-enantiomers of amino acids, batch processing option, converting amino acid sequences into SMILES code, new quantitative parameters characterizing the presence of bioactive fragments in protein sequences, and finding proteinases that release particular peptides.


2017 ◽  
Vol 61 (4) ◽  
pp. 421-426 ◽  
Author(s):  
Joanna Kołsut ◽  
Paulina Borówka ◽  
Błażej Marciniak ◽  
Ewelina Wójcik ◽  
Arkadiusz Wojtasik ◽  
...  

AbstractIntroduction: Colibacillosis – the most common disease of poultry, is caused mainly by avian pathogenic Escherichia coli (APEC). However, thus far, no pattern to the molecular basis of the pathogenicity of these bacteria has been established beyond dispute. In this study, genomes of APEC were investigated to ascribe importance and explore the distribution of 16 genes recognised as their virulence factors.Material and Methods: A total of 14 pathogenic for poultry E. coli strains were isolated, and their DNA was sequenced, assembled de novo, and annotated. Amino acid sequences from these bacteria and an additional 16 freely available APEC amino acid sequences were analysed with the DIFFIND tool to define their virulence factors.Results: The DIFFIND tool enabled quick, reliable, and convenient assessment of the differences between compared amino acid sequences from bacterial genomes. The presence of 16 protein sequences indicated as pathogenicity factors in poultry resulted in the generation of a heatmap which categorises genomes in terms of the existence and similarity of the analysed protein sequences.Conclusion: The proposed method of detection of virulence factors using the capabilities of the DIFFIND tool may be useful in the analysis of similarities of E. coli and other sequences deriving from bacteria. Phylogenetic analysis resulted in reliable segregation of 30 APEC strains into five main clusters containing various virulence associated genes (VAGs).


2008 ◽  
Vol 191 (1) ◽  
pp. 65-73 ◽  
Author(s):  
Pavel S. Novichkov ◽  
Yuri I. Wolf ◽  
Inna Dubchak ◽  
Eugene V. Koonin

ABSTRACT In order to explore microevolutionary trends in bacteria and archaea, we constructed a data set of 41 alignable tight genome clusters (ATGCs). We show that the ratio of the medians of nonsynonymous to synonymous substitution rates (dN/dS) that is used as a measure of the purifying selection pressure on protein sequences is a stable characteristic of the ATGCs. In agreement with previous findings, parasitic bacteria, notwithstanding the sometimes dramatic genome shrinkage caused by gene loss, are typically subjected to relatively weak purifying selection, presumably owing to relatively small effective population sizes and frequent bottlenecks. However, no evidence of genome streamlining caused by strong selective pressure was found in any of the ATGCs. On the contrary, a significant positive correlation between the genome size, as well as gene size, and selective pressure was observed, although a variety of free-living prokaryotes with very close selective pressures span nearly the entire range of genome sizes. In addition, we examined the connections between the sequence evolution rate and other genomic features. Although gene order changes much faster than protein sequences during the evolution of prokaryotes, a strong positive correlation was observed between the “rearrangement distance” and the amino acid distance, suggesting that at least some of the events leading to genome rearrangement are subjected to the same type of selective constraints as the evolution of amino acid sequences.


2011 ◽  
Vol 1 (2) ◽  
pp. 69
Author(s):  
Vanny Narita ◽  
Asma Omar ◽  
Agus Masduki

<p style="text-align: justify;" align="center">Protein non-struktural 1 adalah protein Virus Dengue yang terkonservasi, tetapi protein non-struktural 1 dari Virus Dengue yang berbeda strain memiliki epitop berbeda yang dapat dikenali oleh sel-B. Epitop-epitop ini mungkin disusun oleh asam amino yang sama dalam urutan yang berbeda. Kemungkinan ini perlu dipertimbangkan dalam rangka memprediksi epitop sekuensial Virus Dengue. Tujuan penelitian kami adalah menganalisis hubungan kekerabatan dan susunan asam amino pada epitop spesifik yang telah dikonfirmasi dari sampel representatif gen protein NS1 dari Virus Dengue di kawasan Asia Tenggara. Hubungan kekerabatan protein non-struktural 1 dianalisis dengan perangkat lunak Lasergene<sup>®</sup>. Sekuen gen ditranslasi terlebih dahulu ke urutan asam amino, dan analisis pohon filogenetik kemudian dilakukan. Hasilnya menunjukkan bahwa hubungan kekerabatan protein non-struktural 1 berkisar antara 72-98%. Selanjutnya, epitop serospesifik dibandingkan berdasarkan hasil pengolahan data dnegan Lasergene. Perbandingan epitop serospesifik menunjukkan bahwa asam amino yang dominan dalam epitop adalah histidin, tirosin, glutamine dan serin</p><h6 style="text-align: center;"><em> </em><em> </em><strong>Abstract</strong></h6>Non-structural 1 protein is a conserved protein of dengue virus, but non-structural 1 proteins of dengue virus from different strains have different epitopes which can be recognized by B-cell. These epitopes may be constructed of similar amino acids in a different arrangement. This possibility  must be considered in order to predict the sequencial epitope of dengue virus. The objective of our study was to analyze the phylogenetic relation and the arrangment of confirmed specific epitopes of dengue strains  from representatives of South East Asia’s NS1 dengue gene samples. The phylogenetic relation of non-structural 1 protein sequences from South East Asia was analyzed with Lasergene<sup>®</sup> software. The gene sequences were translated to amino acid sequences, and phylogenetic tree analysis was performed. The results showed that the relatedness values among full sequences of non-structural 1 protein were 72-98%. Furthermore, the serospesific epitopes were compared according to the Lasergene results. The serospesific epitope comparation showed that the dominant   amino acids in these epitopes were histidine, tyrosine, glutamine and serine.


2019 ◽  
Author(s):  
Akshara Pande ◽  
Sumeet Patiyal ◽  
Anjali Lathwal ◽  
Chakit Arora ◽  
Dilraj Kaur ◽  
...  

AbstractMotivationIn last three decades, a wide range of protein descriptors/features have been discovered to annotate a protein with high precision. A wide range of features have been integrated in numerous software packages (e.g., PROFEAT, PyBioMed, iFeature, protr, Rcpi, propy) to predict function of a protein. These features are not suitable to predict function of a protein at residue level such as prediction of ligand binding residues, DNA interacting residues, post translational modification etc.ResultsIn order to facilitate scientific community, we have developed a software package that computes more than 50,000 features, important for predicting function of a protein and its residues. It has five major modules for computing; composition-based features, binary profiles, evolutionary information, structure-based features and patterns. The composition-based module allows user to compute; i) simple compositions like amino acid, dipeptide, tripeptide; ii) Properties based compositions; iii) Repeats and distribution of amino acids; iv) Shannon entropy to measure the low complexity regions; iv) Miscellaneous compositions like pseudo amino acid, autocorrelation, conjoint triad, quasi-sequence order. Binary profile of amino acid sequences provides complete information including order of residues or type of residues; specifically, suitable to predict function of a protein at residue level. Pfeature allows one to compute evolutionary information-based features in form of PSSM profile generated using PSIBLAST. Structure based module allows computing structure-based features, specifically suitable to annotate chemically modified peptides/proteins. Pfeature also allows generating overlapping patterns and feature from whole protein or its parts (e.g., N-terminal, C-terminal). In summary, Pfeature comprises of almost all features used till now, for predicting function of a protein/peptide including its residues.AvailabilityIt is available in form of a web server, named as Pfeature (https://webs.iiitd.edu.in/raghava/pfeature/), as well as python library and standalone package (https://github.com/raghavagps/Pfeature) suitable for Windows, Ubuntu, Fedora, MacOS and Centos based operating system.


2003 ◽  
Vol 30 (8) ◽  
pp. 843 ◽  
Author(s):  
Tursun Kerim ◽  
Nijat Imin ◽  
Jeremy J. Weinman ◽  
Barry G. Rolfe

Three isoallergens of Ory s 2, homologues of grass group II pollen allergens, were identified from rice and characterised by proteome and immunochemical analyses. The N-terminal amino acid sequence profiles of three proteins on a 2-dimensional electrophoresis (2-DE) gel of rice pollen proteins matched 100% to the protein sequences encoded by three rice expressed sequence tags (ESTs). The deduced protein sequences from these ESTs share sequence identities of 41–43% with the protein sequences of the group II pollen allergens of different grasses, and sequence identity of 39% with the C-terminal portion of rice group I pollen allergens. Signal peptide sequences, which are similar to the leader peptides of other major pollen allergens, are also present in the deduced amino acid sequences. Polyclonal antibodies, produced in rabbits using Ory s 2 proteins purified by 2-DE, were used to investigate the developmental-stage- and tissue-specific expression of Ory s 2 by immunochemical analysis. Results of immunochemical experiments show that Ory s 2 proteins are expressed only at the late stage of pollen development and they do not have cross-reactivity with group II pollen allergens from some other common grasses.


1976 ◽  
Vol 153 (3) ◽  
pp. 681-690 ◽  
Author(s):  
G M Polya ◽  
D R Phillips

1. A procedure is described for the detection and assessment of informational complementarity in an amino acid sequence; it is based on possible autocomplementarity in the mRNA, and involves codon-to-codon matching. 2. This procedure was applied to myelin basic protein, a variety of protamines, histone IV, silk fibroin, rat skin collagen α1 chain and a sheep keratin. A multiplicity of extensive low-probability informational symmetries, based on codon-to-codon matching, were detected. 3. These low-probability orderings, which are independent of the actual mRNA codons, are rationalized in terms of the evolutionary ordering of the amino acid sequences concerned, in such a way that constraints on the secondary structure of the coding polynucleotides were satisfied. This possible interpretation is supported by a number of significant common properties of the protein sequences analysed.


Sign in / Sign up

Export Citation Format

Share Document