scholarly journals A sequence embedding method for enzyme optimal condition analysis

2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Xiangjun Li ◽  
Zhixin Dou ◽  
Yuqing Sun ◽  
Lushan Wang ◽  
Bin Gong ◽  
...  

Abstract Background An enzyme activity is influenced by the external environment. It is important to have an enzyme remain high activity in a specific condition. A usual way is to first determine the optimal condition of an enzyme by either the gradient test or by tertiary structure, and then to use protein engineering to mutate a wild type enzyme for a higher activity in an expected condition. Results In this paper, we investigate the optimal condition of an enzyme by directly analyzing the sequence. We propose an embedding method to represent the amino acids and the structural information as vectors in the latent space. These vectors contain information about the correlations between amino acids and sites in the aligned amino acid sequences, as well as the correlation with the optimal condition. We crawled and processed the amino acid sequences in the glycoside hydrolase GH11 family, and got 125 amino acid sequences with optimal pH condition. We used probabilistic approximation method to implement the embedding learning method on these samples. Based on these embedding vectors, we design a computational score to determine which one has a better optimal condition for two given amino acid sequences and achieves the accuracy 80% on the test proteins in the same family. We also give the mutation suggestion such that it has a higher activity in an expected environment, which is consistent with the previously professional wet experiments and analysis. Conclusion A new computational method is proposed for the sequence based on the enzyme optimal condition analysis. Compared with the traditional process that involves a lot of wet experiments and requires multiple mutations, this method can give recommendations on the direction and location of amino acid substitution with reference significance for an expected condition in an efficient and effective way.

2019 ◽  
Author(s):  
Xiangjun Li ◽  
Zhixin Dou ◽  
Yuqing Sun ◽  
Lushan Wang ◽  
Bin Gong

Abstract Background: An enzyme activity is influenced by the external environment condition. It is important to have an enzyme remain high activity in a specific condition. A usual way is to first determine the optimal condition of an enzyme by either the gradient test or by tertiary structure, and then to use protein engineering to mutate a wild type enzyme for a higher activity in an expected condition. Results: In this paper, we investigate the optimal condition of an enzyme by directly analyzing the sequence. We propose an embedding method to represent the amino acids and the construct information as vectors in the latent space. These vectors contain information about the correlations between amino acids and sites in the aligned amino acid sequences, as well as the correlations with the optimal conditions. We crawled and processed the amino acid sequence in glycoside hydrolase GH11 family, and got 125 amino acid sequences with optimal pH condition. We used probabilistic approximation method to implement the embedding learning method on these samples. Based on these embedding vectors, we design a computational score to determine the optimal condition for an enzyme and achieves the accuracy 80% on the test proteins in the same family. We also give the mutation suggestion such that it has a higher activity in the expected environment, which is consistent with the professional wet experiments and analysis. Conclusion: A new computational method is proposed for the sequence based enzyme optimal condition analysis. Compared with the traditional process that involves a lot of wet experiments and requires multiple mutations, this method can get the desired protein for an expected condition in an efficient and effective way. Keywords: Protein sequence analysis; Embedding; Bioinformatics


2002 ◽  
Vol 184 (8) ◽  
pp. 2225-2234 ◽  
Author(s):  
Jason P. Folster ◽  
Terry D. Connell

ABSTRACT ChiA, an 88-kDa endochitinase encoded by the chiA gene of the gram-negative enteropathogen Vibrio cholerae, is secreted via the eps-encoded main terminal branch of the general secretory pathway (GSP), a mechanism which also transports cholera toxin. To localize the extracellular transport signal of ChiA that initiates transport of the protein through the GSP, a chimera comprised of ChiA fused at the N terminus with the maltose-binding protein (MalE) of Escherichia coli and fused at the C terminus with a 13-amino-acid epitope tag (E-tag) was expressed in strain 569B(chiA::Kanr), a chiA-deficient but secretion-competent mutant of V. cholerae. Fractionation studies revealed that blockage of the natural N terminus and C terminus of ChiA did not prevent secretion of the MalE-ChiA-E-tag chimera. To locate the amino acid sequences which encoded the transport signal, a series of truncations of ChiA were engineered. Secretion of the mutant polypeptides was curtailed only when ChiA was deleted from the N terminus beyond amino acid position 75 or from the C terminus beyond amino acid 555. A mutant ChiA comprised of only those amino acids was secreted by wild-type V. cholerae but not by an epsD mutant, establishing that amino acids 75 to 555 independently harbored sufficient structural information to promote secretion by the GSP of V. cholerae. Cys77 and Cys537, two cysteines located just within the termini of ChiA(75-555), were not required for secretion, indicating that those residues were not essential for maintaining the functional activity of the ChiA extracellular transport signal.


2017 ◽  
Vol 15 (03) ◽  
pp. 1750009 ◽  
Author(s):  
Bruno Grisci ◽  
Márcio Dorn

The development of computational methods to accurately model three-dimensional protein structures from sequences of amino acid residues is becoming increasingly important to the structural biology field. This paper addresses the challenge of predicting the tertiary structure of a given amino acid sequence, which has been reported to belong to the NP-Complete class of problems. We present a new method, namely NEAT–FLEX, based on NeuroEvolution of Augmenting Topologies (NEAT) to extract structural features from (ABS) proteins that are determined experimentally. The proposed method manipulates structural information from the Protein Data Bank (PDB) and predicts the conformational flexibility (FLEX) of residues of a target amino acid sequence. This information may be used in three-dimensional structure prediction approaches as a way to reduce the conformational search space. The proposed method was tested with 24 different amino acid sequences. Evolving neural networks were compared against a traditional error back-propagation algorithm; results show that the proposed method is a powerful way to extract and represent structural information from protein molecules that are determined experimentally.


Author(s):  
Rajneesh - ◽  
Soumila Mondal ◽  
Jainendra Pathak ◽  
Prashant R. Singh ◽  
Shailendra P. Singh ◽  
...  

Photolyases (Phrs) are enzymes that utilize blue/ultraviolet (UV-A) region of light for repairing UV-induced cyclopyramidine dimer. We have studied Phr groups by bioinformatic analyses as well as active-site and structural modeling. The analysis of 238 amino acid sequences from 85 completely sequenced cyanobacterial genomes revealed five classes of Phrs, i.e., CPD Gr I, 6-4 Phrs/cryptochrome, Cry-DASH, Fe-S bacteria Phrs, and a group having fewer number of amino acids (276-385) in length. Distribution of Phr groups in cyanobacteria belonging to the order Synechococcales was found to be influenced by the habitats of the organisms. Class V Phrs were exclusively present in cyanobacteria. Unique motif and binding sites were reported in Group II and III. Fe-S protein binding site was only present in Group V. Active site residues and putative CPD/6-4pp binding residues are charged amino acids which were present on the surface of the proteins. Majority of hydrophilic amino acid residues were present on surface of Phrs. Sequence analysis confirmed the diverse nature of Phrs, though, sequence diversity does not affect their overall 3D structure. Protein-ligand interaction analysis identified novel CPD/6-4PP binding sites on Phrs. This structural information of Phrs can be used for the preparation of efficient Phr based formulations.


2021 ◽  
Vol 22 (3) ◽  
pp. 1018
Author(s):  
Hiroaki Yokota

Helicases are nucleic acid-unwinding enzymes that are involved in the maintenance of genome integrity. Several parts of the amino acid sequences of helicases are very similar, and these quite well-conserved amino acid sequences are termed “helicase motifs”. Previous studies by X-ray crystallography and single-molecule measurements have suggested a common underlying mechanism for their function. These studies indicate the role of the helicase motifs in unwinding nucleic acids. In contrast, the sequence and length of the C-terminal amino acids of helicases are highly variable. In this paper, I review past and recent studies that proposed helicase mechanisms and studies that investigated the roles of the C-terminal amino acids on helicase and dimerization activities, primarily on the non-hexermeric Escherichia coli (E. coli) UvrD helicase. Then, I center on my recent study of single-molecule direct visualization of a UvrD mutant lacking the C-terminal 40 amino acids (UvrDΔ40C) used in studies proposing the monomer helicase model. The study demonstrated that multiple UvrDΔ40C molecules jointly participated in DNA unwinding, presumably by forming an oligomer. Thus, the single-molecule observation addressed how the C-terminal amino acids affect the number of helicases bound to DNA, oligomerization, and unwinding activity, which can be applied to other helicases.


1973 ◽  
Vol 131 (3) ◽  
pp. 485-498 ◽  
Author(s):  
R. P. Ambler ◽  
Margaret Wynn

The amino acid sequences of the cytochromes c-551 from three species of Pseudomonas have been determined. Each resembles the protein from Pseudomonas strain P6009 (now known to be Pseudomonas aeruginosa, not Pseudomonas fluorescens) in containing 82 amino acids in a single peptide chain, with a haem group covalently attached to cysteine residues 12 and 15. In all four sequences 43 residues are identical. Although by bacteriological criteria the organisms are closely related, the differences between pairs of sequences range from 22% to 39%. These values should be compared with the differences in the sequence of mitochondrial cytochrome c between mammals and amphibians (about 18%) or between mammals and insects (about 33%). Detailed evidence for the amino acid sequences of the proteins has been deposited as Supplementary Publication SUP 50015 at the National Lending Library for Science and Technology, Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1973), 131, 5.


2001 ◽  
Vol 75 (17) ◽  
pp. 8127-8136 ◽  
Author(s):  
Daniel R. Perez ◽  
Ruben O. Donis

ABSTRACT Influenza A virus expresses three viral polymerase (P) subunits—PB1, PB2, and PA—all of which are essential for RNA and viral replication. The functions of P proteins in transcription and replication have been partially elucidated, yet some of these functions seem to be dependent on the formation of a heterotrimer for optimal viral RNA transcription and replication. Although it is conceivable that heterotrimer subunit interactions may allow a more efficient catalysis, direct evidence of their essentiality for viral replication is lacking. Biochemical studies addressing the molecular anatomy of the P complexes have revealed direct interactions between PB1 and PB2 as well as between PB1 and PA. Previous studies have shown that the N-terminal 48 amino acids of PB1, termed domain α, contain the residues required for binding PA. We report here the refined mapping of the amino acid sequences within this small region of PB1 that are indispensable for binding PA by deletion mutagenesis of PB1 in a two-hybrid assay. Subsequently, we used site-directed mutagenesis to identify the critical amino acid residues of PB1 for interaction with PA in vivo. The first 12 amino acids of PB1 were found to constitute the core of the interaction interface, thus narrowing the previous boundaries of domain α. The role of the minimal PB1 domain α in influenza virus gene expression and genome replication was subsequently analyzed by evaluating the activity of a set of PB1 mutants in a model reporter minigenome system. A strong correlation was observed between a functional PA binding site on PB1 and P activity. Influenza viruses bearing mutant PB1 genes were recovered using a plasmid-based influenza virus reverse genetics system. Interestingly, mutations that rendered PB1 unable to bind PA were either nonviable or severely growth impaired. These data are consistent with an essential role for the N terminus of PB1 in binding PA, P activity, and virus growth.


1986 ◽  
Vol 6 (5) ◽  
pp. 1711-1721
Author(s):  
E M McIntosh ◽  
R H Haynes

The dCMP deaminase gene (DCD1) of Saccharomyces cerevisiae has been isolated by screening a Sau3A clone bank for complementation of the dUMP auxotrophy exhibited by dcd1 dmp1 haploids. Plasmid pDC3, containing a 7-kilobase (kb) Sau3A insert, restores dCMP deaminase activity to dcd1 mutants and leads to an average 17.5-fold overproduction of the enzyme in wild-type cells. The complementing activity of the plasmid was localized to a 4.2-kb PvuII restriction fragment within the Sau3A insert. Subcloning experiments demonstrated that a single HindIII restriction site within this fragment lies within the DCD1 gene. Subsequent DNA sequence analysis revealed a 936-nucleotide open reading frame encompassing this HindIII site. Disruption of the open reading frame by integrative transformation led to a loss of enzyme activity and confirmed that this region constitutes the dCMP deaminase gene. Northern analysis indicated that the DCD1 mRNA is a 1.15-kb poly(A)+ transcript. The 5' end of the transcript was mapped by primer extension and appears to exhibit heterogeneous termini. Comparison of the amino acid sequence of the T2 bacteriophage dCMP deaminase with that deduced for the yeast enzyme revealed a limited degree of homology which extends over the entire length of the phage polypeptide (188 amino acids) but is confined to the carboxy-terminal half of the yeast protein (312 amino acids). A potential dTTP-binding site in the yeast and phage enzymes was identified by comparison of homologous regions with the amino acid sequences of a variety of other dTTP-binding enzymes. Despite the role of dCMP deaminase in dTTP biosynthesis, Northern analysis revealed that the DCD1 gene is not subject to the same cell cycle-dependent pattern of transcription recently found for the yeast thymidylate synthetase gene (TMP1).


1977 ◽  
Vol 162 (2) ◽  
pp. 411-421 ◽  
Author(s):  
S J Yeaman ◽  
P Cohen ◽  
D C Watson ◽  
G H Dixon

The known amino acid sequences at the two sites on phosphorylase kinase that are phosphorylated by cyclic AMP-dependent protein kinase were extended. The sequences of 42 amino acids around the phosphorylation site on the alpha-subunit and of 14 amino acids around the phosphorylation site on the beta-subunit were shown to be: alpha-subunit Phe-Arg-Arg-Leu-Ser(P)-Ile-Ser-Thr-Glu-Ser-Glx-Pro-Asx-Gly-Gly-His-Ser-Leu-Gly-Ala-Asp-Leu-Met-Ser-Pro-Ser-Phe-Leu-Ser-Pro-Gly-Thr-Ser-Val-Phe(Ser,Pro,Gly)His-Thr-Ser-Lys; beta-subunit, Ala-Arg-Thr-Lys-Arg-Ser-Gly-Ser(P)-VALIle-Tyr-Glu-Pro-Leu-Lys. The sites on histone H2B which are phosphorylated by cyclic AMP-dependent protein kinase in vitro were identified as serine-36 and serine-32. The amino acid sequence in this region is: Lys-Lys-Arg-Lys-Arg-Ser32(P)-Arg-Lys-Glu-Ser36(P)-Tyr-Ser-Val-Tyr-Val- [Iwai, K., Ishikawa, K. & Hayashi, H. (1970) Nature (London) 226, 1056-1058]. Serine-36 was phosphorylated at 50% of the rate at which the beta-subunit of phosphorylase kinase was phosphorylated, and it was phosphorylated 6-7-fold more rapidly than was serine-32. The amino acid sequences when compared with those at the phosphorylation sites of other physiological substrates suggest that the presence of two adjacent basic amino acids on the N-terminal side of the susceptible serine residue may be critical for specific substrate recognition in vivo.


1963 ◽  
Vol 18 (12) ◽  
pp. 1032-1049 ◽  
Author(s):  
B. Wittmann-Liebold ◽  
H. G. Wittmann

The amino acid sequence of dahlemense, a naturally occuring strain of tobacco mosaic virus, has been determined and compared with that of the strain vulgare (Fig. 7). In this communication the experimental details are given for the elucidation of the amino acid sequences within two tryptic peptides with 65 amino acids.


Sign in / Sign up

Export Citation Format

Share Document