scholarly journals Self-Analysis of Repeat Proteins Reveals Evolutionarily Conserved Patterns

2019 ◽  
Author(s):  
Matthew Merski ◽  
Krzysztof Młynarczyk ◽  
Jan Ludwiczak ◽  
Jakub Skrzeczkowski ◽  
Stanisław Dunin-Horkawicz ◽  
...  

Abstract Background Protein repeats can confound sequence analyses due to the repetitiveness of their amino acid sequences leading to difficulties in identifying when similar repeats are due to convergent or divergent evolution. We noted that the patterns derived from traditional dot plot protein self-analysis tended to be conserved in sets of related repeat proteins and this conservation could be quantified using a standard Jaccard metric.Results Use of these plots obviated the issues due to sequence similarity for analysis of these proteins. The dot plot patterns decay quickly in the absence of selective pressure with an expected loss of 50% of Jaccard similarity due to a loss of 8.2% sequence identity. Comparison of repeat and non-repeat proteins in the PDB suggested that the information content in dot plots could be used to identify repeat proteins from pure sequence. Analysis of the UniProt90 database suggested that 16.9% of all known proteins could be classified as repeat proteins. These 13.3 million putative repeat protein chains were clustered and significant amount (82.9%) of clusters containing between 5 and 200 members were of a single functional type.Conclusions Dot plot analysis of repeat proteins obviates the issues that arise from sequence degeneracy. These results show that this kind of analysis can efficiently be applied to analyze repeat proteins on a large scale.

2020 ◽  
Author(s):  
Matthew Merski ◽  
Krzysztof Młynarczyk ◽  
Jan Ludwiczak ◽  
Jakub Skrzeczkowski ◽  
Stanisław Dunin-Horkawicz ◽  
...  

Abstract Background: Protein repeats can confound sequence analyses because the repetitiveness of their amino acid sequences lead to difficulties in identifying whether similar repeats are due to convergent or divergent evolution. We noted that the patterns derived from traditional “dot plot” protein sequence self-similarity analysis tended to be conserved in sets of related repeat proteins and this conservation could be quantitated using a Jaccard metric. Results: Comparison of these dot plots obviated the issues due to sequence similarity for analysis of repeat proteins. A high Jaccard similarity score was suggestive of a conserved relationship between closely related repeat proteins. The dot plot patterns decayed quickly in the absence of selective pressure with an expected loss of 50% of Jaccard similarity due to a loss of 8.2 % sequence identity. To perform method testing, we assembled a standard set of 79 repeat proteins representing all the subgroups in RepeatsDB. Comparison of known repeat and non-repeat proteins from the PDB suggested that the information content in dot plots could be used to identify repeat proteins from pure sequence with no requirement for structural information. Analysis of the UniRef90 database suggested that 16.9% of all known proteins could be classified as repeat proteins. These 13.3 million putative repeat protein chains were clustered and a significant amount (82.9%) of clusters containing between 5 and 200 members were of a single functional type.Conclusions: Dot plot analysis of repeat proteins attempts to obviate issues that arise due to the sequence degeneracy of repeat proteins. These results show that this kind of analysis can efficiently be applied to analyze repeat proteins on a large scale.


2020 ◽  
Author(s):  
Matthew Merski ◽  
Krzysztof Młynarczyk ◽  
Jan Ludwiczak ◽  
Jakub Skrzeczkowski ◽  
Stanisław Dunin-Horkawicz ◽  
...  

Abstract Background Protein repeats can confound sequence analyses because the repetitiveness of their amino acid sequences lead to difficulties in identifying whether similar repeats are due to convergent or divergent evolution. We noted that the patterns derived from traditional “dot plot” protein sequence self-similarity analysis tended to be conserved in sets of related repeat proteins and this conservation could be quantitated using a Jaccard metric. Results Comparison of these dot plots obviated the issues due to sequence similarity for analysis of repeat proteins. A high Jaccard similarity score was suggestive of a conserved relationship between closely related repeat proteins. The dot plot patterns decay quickly in the absence of selective pressure with an expected loss of 50% of Jaccard similarity due to a loss of 8.2 % sequence identity. We assembled a standard set of 79 repeat proteins representing all the subgroups in RepeatsDB to perform method testing on. Comparison of known repeat and non-repeat proteins from the PDB suggested that the information content in dot plots could be used to identify repeat proteins from pure sequence without needing structural information. Analysis of the UniRef90 database suggested that 16.9% of all known proteins could be classified as repeat proteins. These 13.3 million putative repeat protein chains were clustered and a significant amount (82.9%) of clusters containing between 5 and 200 members were of a single functional type. Conclusions Dot plot analysis of repeat proteins attempts to obviate issues that arise due to the sequence degeneracy of repeat proteins. These results show that this kind of analysis can efficiently be applied to analyze repeat proteins on a large scale.


2020 ◽  
Author(s):  
Matthew Merski ◽  
Krzysztof Młynarczyk ◽  
Jan Ludwiczak ◽  
Jakub Skrzeczkowski ◽  
Stanisław Dunin-Horkawicz ◽  
...  

Abstract Background Protein repeats can confound sequence analyses because the repetitiveness of their amino acid sequences lead to difficulties in identifying whether similar repeats are due to convergent or divergent evolution. We noted that the patterns derived from traditional “dot plot” protein sequence self-similarity analysis tended to be conserved in sets of related repeat proteins and this conservation could be quantitated using a Jaccard metric. Results Comparison of these dot plots obviated the issues due to sequence similarity for analysis of repeat proteins. A high Jaccard similarity score was suggestive of a conserved relationship between closely related repeat proteins. The dot plot patterns decayed quickly in the absence of selective pressure with an expected loss of 50% of Jaccard similarity due to a loss of 8.2 % sequence identity. To perform method testing, we assembled a standard set of 79 repeat proteins representing all the subgroups in RepeatsDB. Comparison of known repeat and non-repeat proteins from the PDB suggested that the information content in dot plots could be used to identify repeat proteins from pure sequence with no requirement for structural information. Analysis of the UniRef90 database suggested that 16.9% of all known proteins could be classified as repeat proteins. These 13.3 million putative repeat protein chains were clustered and a significant amount (82.9%) of clusters containing between 5 and 200 members were of a single functional type. Conclusions Dot plot analysis of repeat proteins attempts to obviate issues that arise due to the sequence degeneracy of repeat proteins. These results show that this kind of analysis can efficiently be applied to analyze repeat proteins on a large scale.


2020 ◽  
Author(s):  
Qing Wei Cheang ◽  
Shuo Sheng ◽  
Linghui Xu ◽  
Zhao-Xun Liang

AbstractPilZ domain-containing proteins constitute a superfamily of widely distributed bacterial signalling proteins. Although studies have established the canonical PilZ domain as an adaptor protein domain evolved to specifically bind the second messenger c-di-GMP, mounting evidence suggest that the PilZ domain has undergone enormous divergent evolution to generate a superfamily of proteins that are characterized by a wide range of c-di-GMP-binding affinity, binding partners and cellular functions. The divergent evolution has even generated families of non-canonical PilZ domains that completely lack c-di-GMP binding ability. In this study, we performed a large-scale sequence analysis on more than 28,000 single- and di-domain PilZ proteins using the sequence similarity networking tool created originally to analyse functionally diverse enzyme superfamilies. The sequence similarity networks (SSN) generated by the analysis feature a large number of putative isofunctional protein clusters, and thus, provide an unprecedented panoramic view of the sequence-function relationship and function diversification in PilZ proteins. Some of the protein clusters in the networks are considered as unexplored clusters that contain proteins with completely unknown biological function; whereas others contain one, two or a few functionally known proteins, and therefore, enabling us to infer the cellular function of uncharacterized homologs or orthologs. With the ultimate goal of elucidating the diverse roles played by PilZ proteins in bacterial signal transduction, the work described here will facilitate the annotation of the vast number of PilZ proteins encoded by bacterial genome and help to prioritize functionally unknown PilZ proteins for future studies.ImportanceAlthough PilZ domain is best known as the protein domain evolved specifically for the binding of the second messenger c-di-GMP, divergent evolution has generated a superfamily of PilZ proteins with a diversity of ligand or protein-binding properties and cellular functions. We analysed the sequences of more than 28,000 PilZ proteins using the sequence similarity networking (SSN) tool to yield a global view of the sequence-function relationship and function diversification in PilZ proteins. The results will facilitate the annotation of the vast number of PilZ proteins encoded by bacterial genomes and help us prioritize PilZ proteins for future studies.


Genome ◽  
2004 ◽  
Vol 47 (1) ◽  
pp. 141-155 ◽  
Author(s):  
H H Yan ◽  
J Mudge ◽  
D-J Kim ◽  
R C Shoemaker ◽  
D R Cook ◽  
...  

To gain insight into genomic relationships between soybean (Glycine max) and Medicago truncatula, eight groups of bacterial artificial chromosome (BAC) contigs, together spanning 2.60 million base pairs (Mb) in G. max and 1.56 Mb in M. truncatula, were compared through high-resolution physical mapping combined with sequence and hybridization analysis of low-copy BAC ends. Cross-hybridization among G. max and M. truncatula contigs uncovered microsynteny in six of the contig groups and extensive microsynteny in three. Between G. max homoeologous (within genome duplicate) contigs, 85% of coding and 75% of noncoding sequences were conserved at the level of cross-hybridization. By contrast, only 29% of sequences were conserved between G. max and M. truncatula, and some kilobase-scale rearrangements were also observed. Detailed restriction maps were constructed for 11 contigs from the three highly microsyntenic groups, and these maps suggested that sequence order was highly conserved between G. max duplicates and generally conserved between G. max and M. truncatula. One instance of homoeologous BAC contigs in M. truncatula was also observed and examined in detail. A sequence similarity search against the Arabidopsis thaliana genome sequence identified up to three microsyntenic regions in A. thaliana for each of two of the legume BAC contig groups. Together, these results confirm previous predictions of one recent genome-wide duplication in G. max and suggest that M. truncatula also experienced ancient large-scale genome duplications.Key words: Glycine max, Medicago truncatula, Arabidopsis thaliana, conserved microsynteny, genome duplication.


Molecules ◽  
2021 ◽  
Vol 26 (11) ◽  
pp. 3228
Author(s):  
Xiaotong Li ◽  
Minghong Jian ◽  
Yanhong Sun ◽  
Qunyan Zhu ◽  
Zhenxin Wang

In order to improve their bioapplications, inorganic nanoparticles (NPs) are usually functionalized with specific biomolecules. Peptides with short amino acid sequences have attracted great attention in the NP functionalization since they are easy to be synthesized on a large scale by the automatic synthesizer and can integrate various functionalities including specific biorecognition and therapeutic function into one sequence. Conjugation of peptides with NPs can generate novel theranostic/drug delivery nanosystems with active tumor targeting ability and efficient nanosensing platforms for sensitive detection of various analytes, such as heavy metallic ions and biomarkers. Massive studies demonstrate that applications of the peptide–NP bioconjugates can help to achieve the precise diagnosis and therapy of diseases. In particular, the peptide–NP bioconjugates show tremendous potential for development of effective anti-tumor nanomedicines. This review provides an overview of the effects of properties of peptide functionalized NPs on precise diagnostics and therapy of cancers through summarizing the recent publications on the applications of peptide–NP bioconjugates for biomarkers (antigens and enzymes) and carcinogens (e.g., heavy metallic ions) detection, drug delivery, and imaging-guided therapy. The current challenges and future prospects of the subject are also discussed.


2005 ◽  
Vol 391 (2) ◽  
pp. 409-415 ◽  
Author(s):  
Anna Kärkönen ◽  
Alain Murigneux ◽  
Jean-Pierre Martinant ◽  
Elodie Pepey ◽  
Christophe Tatout ◽  
...  

UDPGDH (UDP-D-glucose dehydrogenase) oxidizes UDP-Glc (UDP-D-glucose) to UDP-GlcA (UDP-D-glucuronate), the precursor of UDP-D-xylose and UDP-L-arabinose, major cell wall polysaccharide precursors. Maize (Zea mays L.) has at least two putative UDPGDH genes (A and B), according to sequence similarity to a soya bean UDPGDH gene. The predicted maize amino acid sequences have 95% similarity to that of soya bean. Maize mutants with a Mu-element insertion in UDPGDH-A or UDPGDH-B were isolated (udpgdh-A1 and udpgdh-B1 respectively) and studied for changes in wall polysaccharide biosynthesis. The udpgdh-A1 and udpgdh-B1 homozygotes showed no visible phenotype but exhibited 90 and 60–70% less UDPGDH activity respectively than wild-types in a radiochemical assay with 30 μM UDP-glucose. Ethanol dehydrogenase (ADH) activity varied independently of UDPGDH activity, supporting the hypothesis that ADH and UDPGDH activities are due to different enzymes in maize. When extracts from wild-types and udpgdh-A1 homozygotes were assayed with increasing concentrations of UDP-Glc, at least two isoforms of UDPGDH were detected, having Km values of approx. 380 and 950 μM for UDP-Glc. Leaf and stem non-cellulosic polysaccharides had lower Ara/Gal and Xyl/Gal ratios in udpgdh-A1 homozygotes than in wild-types, whereas udpgdh-B1 homozygotes exhibited more variability among individual plants, suggesting that UDPGDH-A activity has a more important role than UDPGDH-B in UDP-GlcA synthesis. The fact that mutation of a UDPGDH gene interferes with polysaccharide synthesis suggests a greater importance for the sugar nucleotide oxidation pathway than for the myo-inositol pathway in UDP-GlcA biosynthesis during post-germinative growth of maize.


2001 ◽  
Vol 21 (15) ◽  
pp. 5109-5121 ◽  
Author(s):  
Yann-Gaël Gangloff ◽  
Jean-Christophe Pointud ◽  
Sylvie Thuault ◽  
Lucie Carré ◽  
Christophe Romier ◽  
...  

ABSTRACT The RNA polymerase II transcription factor TFIID comprises the TATA binding protein (TBP) and a set of TBP-associated factors (TAFIIs). TFIID has been extensively characterized for yeast, Drosophila, and humans, demonstrating a high degree of conservation of both the amino acid sequences of the constituent TAFIIs and overall molecular organization. In recent years, it has been assumed that all the metazoan TAFIIs have been identified, yet no metazoan homologues of yeast TAFII47 (yTAFII47) and yTAFII65 are known. Both of these yTAFIIs contain a histone fold domain (HFD) which selectively heterodimerizes with that of yTAFII25. We have cloned a novel mouse protein, TAFII140, containing an HFD and a plant homeodomain (PHD) finger, which we demonstrated by immunoprecipitation to be a mammalian TFIID component. TAFII140 shows extensive sequence similarity toDrosophila BIP2 (dBIP2) (dTAFII155), which we also show to be a component of DrosophilaTFIID. These proteins are metazoan homologues of yTAFII47 as their HFDs selectively heterodimerize with dTAFII24 and human TAFII30, metazoan homologues of yTAFII25. We further show that yTAFII65 shares two domains with theDrosophila Prodos protein, a recently described potential dTAFII. These conserved domains are critical for yTAFII65 function in vivo. Our results therefore identify metazoan homologues of yTAFII47 and yTAFII65.


2021 ◽  
Vol 11 ◽  
Author(s):  
Xiaowen Xu ◽  
Meifeng Li ◽  
Zeyuan Deng ◽  
Jihuan Hu ◽  
Zeyin Jiang ◽  
...  

Accumulating evidence indicates that mammalian NIMA (never in mitosis, gene A)-related kinase 6 (NEK6) plays potential roles during the course of tumorigenesis, but little is known about NEK6 in lower vertebrates. Herein, we reported a mammalian ortholog of NEK6 in grass carp (Ctenopharyngodon idellus) (CiNEK6). Multiple alignment of amino acid sequences and phylogenetic analysis showed that CiNEK6 shares a high level of sequence similarity with its counterparts in birds. CiNEK6 was ubiquitously expressed in all tested tissues, and its expression level was increased under treatment with GCRV (dsRNA virus) or poly I:C (dsRNA analog). Q-PCR and dual-luciferase assays suggested that CiNEK6 overexpression suppressed IFN I activity in CIK cells treated with poly I:C. Knockdown of CiNEK6 resulted in a higher level of IFN I expression in CIK cells treated with poly I:C compared to those which received PBS. Interestingly, analysis of subcellular localization demonstrated that CiNEK6 protein scattered throughout the cytoplasm is gradually congregated together at the edges of karyotheca upon stimulation with poly I:C. Co-IP and co-localization assays suggested that CiNEK6 interacts with CiIRF3 after poly I:C challenge. In poly I:C-treated cells, the phosphorylation of CiIRF3 was increased by CiNEK6 knockdown, but was suppressed by CiNEK6 overexpression, suggesting that CiNEK6 decreases IFN I expression through inhibiting CiIRF3 activity. Cell viability assay, crystal violet staining, and detection of Vp5 also showed that CiNEK6 plays an inhibitory role in IRF3-mediated antiviral responses.


Author(s):  
Sona. S Dev ◽  
P. Poornima ◽  
Akhil Venu

Eggplantor brinjal (Solanum melongena L.), is highly susceptible to various soil-borne diseases. The extensive use of chemical fungicides to combat these diseases can be minimized by identification of resistance gene analogs (RGAs) in wild species of cultivated plants.In the present study, degenerate PCR primers for the conserved regions ofnucleotide binding site-leucine rich repeat (NBS-LRR) were used to amplify RGAs from wild relatives of eggplant (Black nightshade (Solanum nigrum), Indian nightshade (Solanumviolaceum)and Solanu mincanum) which showed resistance to the bacterial wilt pathogen, Ralstonia solanacearumin the preliminary investigation. The amino acid sequence of the amplicons when compared to each other and to the amino acid sequences of known RGAs deposited in Gen Bank revealed significant sequence similarity. The phylogenetic analysis indicated that they belonged to the toll interleukin-1 receptors (TIR)-NBS-LRR type R-genes. Multiple sequence alignment with other known R genes showed significant homology with P-loop, Kinase 2 and GLPL domains of NBS-LRR class genes. There has been no report on R genes from these wild eggplants and hence the diversity analysis of these novel RGAs can lead to the identification of other novel R genes within the germplasm of different brinjal plants as well as other species of Solanum.


Sign in / Sign up

Export Citation Format

Share Document