scholarly journals Entropy and Fractal Dimension Study of the TDP-43 Protein Low Complexity Domain Sequence in ALS Disease Severity and SARS-CoV-2 Gene Sequences in Virulence Variability

Entropy ◽  
2021 ◽  
Vol 23 (8) ◽  
pp. 1038
Author(s):  
Sunil Dehipawala ◽  
Eric Cheung ◽  
George Tremberger ◽  
Tak Cheung

The low complexity domain (LCD) sequence has been defined in terms of entropy using a 12 amino acid sliding window along a protein sequence in the study of disease-related genes. The amyotrophic lateral sclerosis (ALS)-related TDP-43 protein sequence with intra-LCD structural information based on cryo-EM data was published recently. An application of entropy and Higuchi fractal dimension calculations was described using the Znf521 and HAR1 sequences. A computational analysis of the intra-LCD sequence entropy and Higuchi fractal dimension values at the amino acid level and at the ATCG nucleotide level were conducted without the sliding window requirement. The computational results were consistent in predicting the intermediate entropy/fractal dimension value produced when two subsequences at two different entropy/fractal dimension values were combined. The computational method without the application of a sliding-window was extended to an analysis of the recently reported virulent genes—Orf6, Nsp6, and Orf7a—in SARS-CoV-2. The relationship between the virulence functionality and entropy values was found to have correlation coefficients between 0.84 and 0.99, using a 5% uncertainty on the cell viability data. The analysis found that the most virulent Orf6 gene sequence had the lowest nucleotide entropy and the highest protein fractal dimension, in line with extreme value theory. The Orf6 codon usage bias in relation to vaccine design was discussed.


Author(s):  
Renganayaki G. ◽  
Achuthsankar S. Nair

Sequence alignment algorithms and  database search methods use BLOSUM and PAM substitution matrices constructed from general proteins. These de facto matrices are not optimal to align sequences accurately, for the proteins with markedly different compositional bias in the amino acid.   In this work, a new amino acid substitution matrix is calculated for the disorder and low complexity rich region of Hub proteins, based on residue characteristics. Insights into the amino acid background frequencies and the substitution scores obtained from the Hubsm unveils the  residue substitution patterns which differs from commonly used scoring matrices .When comparing the Hub protein sequences for detecting homologs,  the use of this Hubsm matrix yields better results than PAM and BLOSUM matrices. Usage of Hubsm matrix can be optimal in database search and for the construction of more accurate sequence alignments of Hub proteins.



2020 ◽  
Vol 27 (3) ◽  
pp. 178-186 ◽  
Author(s):  
Ganesan Pugalenthi ◽  
Varadharaju Nithya ◽  
Kuo-Chen Chou ◽  
Govindaraju Archunan

Background: N-Glycosylation is one of the most important post-translational mechanisms in eukaryotes. N-glycosylation predominantly occurs in N-X-[S/T] sequon where X is any amino acid other than proline. However, not all N-X-[S/T] sequons in proteins are glycosylated. Therefore, accurate prediction of N-glycosylation sites is essential to understand Nglycosylation mechanism. Objective: In this article, our motivation is to develop a computational method to predict Nglycosylation sites in eukaryotic protein sequences. Methods: In this article, we report a random forest method, Nglyc, to predict N-glycosylation site from protein sequence, using 315 sequence features. The method was trained using a dataset of 600 N-glycosylation sites and 600 non-glycosylation sites and tested on the dataset containing 295 Nglycosylation sites and 253 non-glycosylation sites. Nglyc prediction was compared with NetNGlyc, EnsembleGly and GPP methods. Further, the performance of Nglyc was evaluated using human and mouse N-glycosylation sites. Results: Nglyc method achieved an overall training accuracy of 0.8033 with all 315 features. Performance comparison with NetNGlyc, EnsembleGly and GPP methods shows that Nglyc performs better than the other methods with high sensitivity and specificity rate. Conclusion: Our method achieved an overall accuracy of 0.8248 with 0.8305 sensitivity and 0.8182 specificity. Comparison study shows that our method performs better than the other methods. Applicability and success of our method was further evaluated using human and mouse N-glycosylation sites. Nglyc method is freely available at https://github.com/bioinformaticsML/ Ngly.



2000 ◽  
Vol 17 (6) ◽  
pp. 847-854 ◽  
Author(s):  
JAMES C. RYAN ◽  
SERGEY ZNOIKO ◽  
LIN XU ◽  
ROSALIE K. CROUCH ◽  
JIAN-XING MA

The mammalian retina is known to contain two distinct transducins that interact with their respective rod and cone pigments. However, there are no reports of a nonmammalian species having two distinct transducins. In the present study, we report the cloning and cellular localization of two transducin α subunits (Gαt) from the tiger salamander. Through degenerate polymerase chain reaction (PCR) and subsequent screening of a salamander retina cDNA library, we have identified two forms of Gαt. When compared to existing sequences in GenBank, the cloned subunits showed high similarity to rod and cone transducins. The salamander Gαt-1 has 91.2–93.7% amino acid sequence identity to mammalian rod Gαt subunits and 79.7–80.9% to mammalian cone Gαts. The salamander Gαt-2 has 86.2–87.9% sequence identity to mammalian cone Gαts and 78.9–80.9% to mammalian rod Gαts at the amino acid level. The Gαt-1 cDNA encodes 350 amino acids while the Gαt-2 cDNA encodes 354 residues, which is typical for rod and cone Gαts, respectively, and we thus identified the Gαt-1 as rod and Gαt-2 as cone Gαt. Sequences identified as effector binding sites and GTPase activity regions are highly conserved between the two subunits. Genomic Southern blot analysis showed that rod and cone Gαt subunits are both encoded by single-copy genes. Northern blot analysis identified retina-specific transcripts of 3.0 kb for rod Gαt and 2.6 kb for cone Gαt. Immunohistochemistry in the flat-mounted salamander retina demonstrated that rod Gαt is localized to rods, predominantly in the outer segments; similarly, cone Gαt is localized to cone outer segments. The results confirm that the two sequences encode rod and cone transducins and demonstrate that this lower vertebrate contains two distinct transducins that are localized specifically to rod and cone photoreceptors.



2007 ◽  
Vol 51 (8) ◽  
pp. 985-998 ◽  
Author(s):  
Gregory S. Ladics ◽  
Gary A. Bannon ◽  
Andre Silvanovich ◽  
Robert F. Cressman
Keyword(s):  


2017 ◽  
Vol 8 (9) ◽  
pp. 5992-6004 ◽  
Author(s):  
Tiia Kittilä ◽  
Claudia Kittel ◽  
Julien Tailhades ◽  
Diane Butz ◽  
Melanie Schoppet ◽  
...  

Halogenase enzymes involved in glycopeptide antibiotic biosynthesis accept aminoacyl-carrier protein substrates.



2007 ◽  
Vol 2007 ◽  
pp. 1-23 ◽  
Author(s):  
G. R. Hemalatha ◽  
D. Satyanarayana Rao ◽  
L. Guruprasad

We have identified four repeats and ten domains that are novel in proteins encoded by theBacillus anthracisstr.Amesproteome using automated in silico methods. A “repeat” corresponds to a region comprising less than 55-amino-acid residues that occur more than once in the protein sequence and sometimes present in tandem. A “domain” corresponds to a conserved region with greater than 55-amino-acid residues and may be present as single or multiple copies in the protein sequence. These correspond to (1) 57-amino-acid-residue PxV domain, (2) 122-amino-acid-residue FxF domain, (3) 111-amino-acid-residue YEFF domain, (4) 109-amino-acid-residue IMxxH domain, (5) 103-amino-acid-residue VxxT domain, (6) 84-amino-acid-residue ExW domain, (7) 104-amino-acid-residue NTGFIG domain, (8) 36-amino-acid-residue NxGK repeat, (9) 95-amino-acid-residue VYV domain, (10) 75-amino-acid-residue KEWE domain, (11) 59-amino-acid-residue AFL domain, (12) 53-amino-acid-residue RIDVK repeat, (13) (a) 41-amino-acid-residue AGQF repeat and (b) 42-amino-acid-residue GSAL repeat. A repeat or domain type is characterized by specific conserved sequence motifs. We discuss the presence of these repeats and domains in proteins from other genomes and their probable secondary structure.



2004 ◽  
Vol 91 (01) ◽  
pp. 38-42 ◽  
Author(s):  
Christof Geisen ◽  
Erhard Seifried ◽  
Johannes Oldenburg ◽  
Matthias Watzka

SummaryFactorVIII acts as an essential compound of the tenase complex of the coagulation system. Herein we report the cDNA of the rat factor VIII. The rat cDNA comprises 6777 nucleotides and encodes a protein of 2258 amino acids, 61 amino acids less than mouse and 92 amino acids less than human factor VIII. The overall identity compared to human cDNA is 61% on the cDNA and 51% on the amino acid level. In cDNA, highest levels of sequence identity can be observed in the A and C domains (ranging between 68% and 73%), whereas B domain and the small acidic regions are more divergent (34%-49%). Compared to mouse and human most sites for posttranslational modifications such as sulfatation and glycosylation as well as thrombin and protein C cleavage sites are conserved in rat. Alternative transcripts lacking exon 17 and/or comprising additional 26 bp due to alternative splicing of exon 20 were found. Furthermore, 13 polymorphisms (seven in exon 14, one in exon 20, 23, 24, and 25, two in the 3’UTR) three of which lead to an amino acid exchange could be detected. Our findings might provide new insights into the structure-function analysis of the factor VIII protein and might prove useful for future animal models addressing the function of factor VIII.



1980 ◽  
Vol 187 (1) ◽  
pp. 65-74 ◽  
Author(s):  
D Penny ◽  
M D Hendy ◽  
L R Foulds

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.



eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Pavan Vedula ◽  
Satoshi Kurosaka ◽  
Brittany MacTaggart ◽  
Qin Ni ◽  
Garegin Papoian ◽  
...  

β- and γ-cytoplasmic actins are ubiquitously expressed in every cell type and are nearly identical at the amino acid level but play vastly different roles in vivo. Their essential roles in embryogenesis and mesenchymal cell migration critically depend on the nucleotide sequences of their genes, rather than their amino acid sequence, however it is unclear which gene elements underlie this effect. Here we address the specific role of the coding sequence in β- and γ-cytoplasmic actins' intracellular functions, using stable polyclonal populations of immortalized mouse embryonic fibroblasts with exogenously expressed actin isoforms and their 'codon-switched' variants. When targeted to the cell periphery using the β-actin 3′UTR, β-actin and γ-actin have differential effects on cell migration. These effects directly depend on the coding sequence. Single molecule measurements of actin isoform translation, combined with fluorescence recovery after photobleaching, demonstrate a pronounced difference in β- and γ-actins' translation elongation rates in cells, leading to changes in their dynamics at the focal adhesions, impairments in actin bundle formation, and reduced cell anchoring to the substrate during migration. Our results demonstrate that coding sequence-mediated differences in actin translation play a key role in cell migration.



Sign in / Sign up

Export Citation Format

Share Document