scholarly journals A model of k-mer surprisal to quantify local sequence information content surrounding splice regions

PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e10063
Author(s):  
Sam Humphrey ◽  
Alastair Kerr ◽  
Magnus Rattray ◽  
Caroline Dive ◽  
Crispin J. Miller

Molecular sequences carry information. Analysis of sequence conservation between homologous loci is a proven approach with which to explore the information content of molecular sequences. This is often done using multiple sequence alignments to support comparisons between homologous loci. These methods therefore rely on sufficient underlying sequence similarity with which to construct a representative alignment. Here we describe a method using a formal metric of information, surprisal, to analyse biological sub-sequences without alignment constraints. We applied our model to the genomes of five different species to reveal similar patterns across a panel of eukaryotes. As the surprisal of a sub-sequence is inversely proportional to its occurrence within the genome, the optimal size of the sub-sequences was selected for each species under consideration. With the model optimized, we found a strong correlation between surprisal and CG dinucleotide usage. The utility of our model was tested by examining the sequences of genes known to undergo splicing. We demonstrate that our model can identify biological features of interest such as known donor and acceptor sites. Analysis across all annotated coding exon junctions in Homo sapiens reveals the information content of coding exons to be greater than the surrounding intron regions, a consequence of increased suppression of the CG dinucleotide in intronic space. Sequences within coding regions proximal to exon junctions exhibited novel patterns within DNA and coding mRNA that are not a function of the encoded amino acid sequence. Our findings are consistent with the presence of secondary information encoding features such as DNA and RNA binding sites, multiplexed through the coding sequence and independent of the information required to define the corresponding amino-acid sequence. We conclude that surprisal provides a complementary methodology with which to locate regions of interest in the genome, particularly in situations that lack an appropriate multiple sequence alignment.




2002 ◽  
Vol 68 (6) ◽  
pp. 2731-2736 ◽  
Author(s):  
Hirokazu Nankai ◽  
Wataru Hashimoto ◽  
Kousaku Murata

ABSTRACT When cells of Bacillus sp. strain GL1 were grown in a medium containing xanthan as a carbon source, α-mannosidase exhibiting activity toward p-nitrophenyl-α-d-mannopyranoside (pNP-α-d-Man) was produced intracellularly. The 350-kDa α-mannosidase purified from a cell extract of the bacterium was a trimer comprising three identical subunits, each with a molecular mass of 110 kDa. The enzyme hydrolyzed pNP-α-d-Man (Km = 0.49 mM) and d-mannosyl-(α-1,3)-d-glucose most efficiently at pH 7.5 to 9.0, indicating that the enzyme catalyzes the last step of the xanthan depolymerization pathway of Bacillus sp. strain GL1. The gene for α-mannosidase cloned most by using N-terminal amino acid sequence information contained an open reading frame (3,144 bp) capable of coding for a polypeptide with a molecular weight of 119,239. The deduced amino acid sequence showed homology with the amino acid sequences of α-mannosidases belonging to glycoside hydrolase family 38.



2001 ◽  
Vol 183 (2) ◽  
pp. 490-499 ◽  
Author(s):  
Chung-Dar Lu ◽  
Ahmed T. Abdelal

ABSTRACT The NAD+-dependent glutamate dehydrogenase (NAD-GDH) from Pseudomonas aeruginosa PAO1 was purified, and its amino-terminal amino acid sequence was determined. This sequence information was used in identifying and cloning the encodinggdhB gene and its flanking regions. The molecular mass predicted from the derived sequence for the encoded NAD-GDH was 182.6 kDa, in close agreement with that determined from sodium dodecyl sulfate-polyacrylamide gel electrophoresis of the purified enzyme (180 kDa). Cross-linking studies established that the native NAD-GDH is a tetramer of equal subunits. Comparison of the derived amino acid sequence of NAD-GDH from P. aeruginosa with the GenBank database showed the highest homology with hypothetical polypeptides from Pseudomonas putida, Mycobacterium tuberculosis, Rickettsia prowazakii, Legionella pneumophila, Vibrio cholerae, Shewanella putrefaciens, Sinorhizobium meliloti, andCaulobacter crescentus. A moderate degree of homology, primarily in the central domain, was observed with the smaller tetrameric NAD-GDH (protomeric mass of 110 kDa) fromSaccharomyces cerevisiae or Neurospora crassa. Comparison with the yet smaller hexameric GDH (protomeric mass of 48 to 55 kDa) of other prokaryotes yielded a low degree of homology that was limited to residues important for binding of substrates and for catalytic function. NAD-GDH was induced 27-fold by exogenous arginine and only 3-fold by exogenous glutamate. Primer extension experiments established that transcription of gdhB is initiated from an arginine-inducible promoter and that this induction is dependent on the arginine regulatory protein, ArgR, a member of the AraC/XyIS family of regulatory proteins. NAD-GDH was purified to homogeneity from a recombinant strain of P. aeruginosa and characterized. The glutamate saturation curve was sigmoid, indicating positive cooperativity in the binding of glutamate. NAD-GDH activity was subject to allosteric control by arginine and citrate, which function as positive and negative effectors, respectively. Both effectors act by influencing the affinity of the enzyme for glutamate. NAD-GDH from this organism differs from previously characterized enzymes with respect to structure, protomer mass, and allosteric properties indicate that this enzyme represents a novel class of microbial glutamate dehydrogenases.



1987 ◽  
Vol 246 (1) ◽  
pp. 115-120 ◽  
Author(s):  
R P Ambler ◽  
T E Meyer ◽  
M A Cusanovich ◽  
M D Kamen

The amino acid sequence of the principal soluble cytochrome c from the phototrophic acidophilic bacterium Rhodopseudomonas (or Rhodopila) globiformis was determined. By the criteria of percentage sequence identity and fewness of internal insertions and deletions it is more similar in sequence to some mitochondrial cytochromes c than to any known bacterial cytochrome. The organism does not have any properties that commend it as being particularly similar to postulated prokaryotic precursors of the mitochondrion. We consider that the relatively high degree of sequence similarity is an instance of convergence, and is an example of the limitations that are imposed on attempts to deduce distant evolutionary relationships from sequence information. Detailed evidence for the amino acid sequence of the protein has been deposited as Supplementary Publication SUP 50136 (12 pages) at the British Library Lending Division, Boston Spa, West Yorkshire LS23 7BQ, U.K., from whom copies are available on prepayment [see Biochem. J. (1987) 241, 5].



2004 ◽  
Vol 134 (4) ◽  
pp. 1366-1376 ◽  
Author(s):  
Tyng-Shyan Huang ◽  
Dominique Anzellotti ◽  
Fabienne Dedaldechamp ◽  
Ragai K. Ibrahim


Biomolecules ◽  
2020 ◽  
Vol 10 (6) ◽  
pp. 938
Author(s):  
Kriti Chopra ◽  
Bhawna Burdak ◽  
Kaushal Sharma ◽  
Ajit Kembhavi ◽  
Shekhar C. Mande ◽  
...  

Decrypting the interface residues of the protein complexes provides insight into the functions of the proteins and, hence, the overall cellular machinery. Computational methods have been devised in the past to predict the interface residues using amino acid sequence information, but all these methods have been majorly applied to predict for prokaryotic protein complexes. Since the composition and rate of evolution of the primary sequence is different between prokaryotes and eukaryotes, it is important to develop a method specifically for eukaryotic complexes. Here, we report a new hybrid pipeline for predicting the protein-protein interaction interfaces in a pairwise manner from the amino acid sequence information of the interacting proteins. It is based on the framework of Co-evolution, machine learning (Random Forest), and Network Analysis named CoRNeA trained specifically on eukaryotic protein complexes. We use Co-evolution, physicochemical properties, and contact potential as major group of features to train the Random Forest classifier. We also incorporate the intra-contact information of the individual proteins to eliminate false positives from the predictions keeping in mind that the amino acid sequence of a protein also holds information for its own folding and not only the interface propensities. Our prediction on example datasets shows that CoRNeA not only enhances the prediction of true interface residues but also reduces false positive rates significantly.



1972 ◽  
Vol 50 (3) ◽  
pp. 312-329 ◽  
Author(s):  
R. S. Hodges ◽  
L. B. Smillie

Amino acid analyses of tropomyosin have previously shown four histidine and 13–14 methionine residues per mole (70 000 daltons) of tropomyosin. The isolation of two unique histidyl and five unique methionyl sequences is described. The number of unique methionyl peptides will undoubtedly be increased when more extensive sequence information becomes available although the value of 2 for the unique histidine sequences is considered to be a maximal one. These data support the conclusion that the two subunits of tropomyosin are similar in amino acid sequence. Both the acetylated NH2-terminal and COOH-terminal sequences of the protein have been determined in this study. The isolation and sequence analysis of two varieties of peptides arising from the COOH-terminus of the protein indicates either a degree of proteolysis during its isolation or a difference in the constituent polypeptide chains of tropomyosin in this region of their structures. The limited sequences reported indicate a repeat of hydrophobic residues as required by the inter-chain packing of a coiled-coil structure.



1994 ◽  
Vol 124 (6) ◽  
pp. 949-961 ◽  
Author(s):  
LA Jesaitis ◽  
DA Goodenough

ZO-1 is a 210-225-kD peripheral membrane protein associated with cytoplasmic surfaces of the zonula occludens or tight junction. A 160-kD polypeptide, designated ZO-2, was found to coimmunoprecipitate with ZO-1 from MDCK cell extracts prepared under conditions which preserve protein associations (Gumbiner, B., T. Lowenkopf, and D. Apatira. 1991. Proc. Natl. Acad. Sci. USA. 88: 3460-3464). We have isolated ZO-2 from MDCK cell monolayers by bulk coimmunoprecipitation with ZO-1 followed by electroelution from preparative SDS-PAGE gel slices. Amino acid sequence information obtained from a ZO-2 tryptic fragment was used to isolate a partial cDNA clone from an MDCK library. The deduced amino acid sequence revealed that canine ZO-2 contains a region that is very similar to sequences in human and mouse ZO-1. This region includes both a 90-amino acid repeat domain of unknown function and guanylate kinase-like domains which are shared among members of the family of proteins that includes ZO-1, erythrocyte p55, the product of the lethal(1)discs-large-1 (dlg) gene of Drosophila, and a synapse-associated protein from rat brain, PSD-95/SAP90. The dlg gene product has been shown to act as a tumor suppressor in the imaginal disc of the Drosophila larva, although the functions of other family members have not yet been defined. A polyclonal antiserum was raised against a unique region of ZO-2 and found to exclusively label the cytoplasmic surfaces of tight junctions in MDCK plasma membrane preparations, indicating that ZO-2 is a tight junction-associated protein. Immunohistochemical staining of frozen sections of whole tissue demonstrated that ZO-2 localized to the region of the tight junction in a number of epithelia, including liver, intestine, kidney, testis, and arterial endothelium, suggesting that this protein is a ubiquitous component of the tight junction. Double-label immunofluorescence microscopy performed on cryosections of heart, a nonepithelial tissue, revealed the presence of ZO-1 but no ZO-2 staining at the fascia adherens, a specialized junction of cardiac myocytes which has previously been shown to contain ZO-1 (Itoh, M., S. Yonemura, A. Nagafuchi, S. Tsukita, and Sh. Tsukita. 1991. J. Cell Biol. 115:1449-1462). Thus it appears that ZO-2 is not a component of the fascia adherens, and that unlike ZO-1, this protein is restricted to the epithelial tight junction.



Sign in / Sign up

Export Citation Format

Share Document