Data Mining of Protein Sequences with Amino Acid Position-Based Feature Encoding Technique

Background:Hyperuricemia and gout are the conditions, which is a response of accumulation of uric acid in the blood and urine. Uric acid is the product of purine metabolic pathway in humans. Uricase is a therapeutic enzyme that can enzymatically reduces the concentration of uric acid in serum and urine into more a soluble allantoin. Uricases are widely available in several sources like bacteria, fungi, yeast, plants and animals.Objective:The present study is aimed at elucidating the structure and physiochemical properties of uricase by insilico analysis.Methods:A total number of sixty amino acid sequences of uricase belongs to different sources were obtained from NCBI and different analysis like Multiple Sequence Alignment (MSA), homology search, phylogenetic relation, motif search, domain architecture and physiochemical properties including pI, EC, Ai, Ii, and were performed.Results:Multiple sequence alignment of all the selected protein sequences has exhibited distinct difference between bacterial, fungal, plant and animal sources based on the position-specific existence of conserved amino acid residues. The maximum homology of all the selected protein sequences is between 51-388. In singular category, homology is between 16-337 for bacterial uricase, 14-339 for fungal uricase, 12-317 for plants uricase, and 37-361 for animals uricase. The phylogenetic tree constructed based on the amino acid sequences disclosed clusters indicating that uricase is from different source. The physiochemical features revealed that the uricase amino acid residues are in between 300- 338 with a molecular weight as 33-39kDa and theoretical pI ranging from 4.95-8.88. The amino acid composition results showed that valine amino acid has a high average frequency of 8.79 percentage compared to different amino acids in all analyzed species.Conclusion:In the area of bioinformatics field, this work might be informative and a stepping-stone to other researchers to get an idea about the physicochemical features, evolutionary history and structural motifs of uricase that can be widely used in biotechnological and pharmaceutical industries. Therefore, the proposed in silico analysis can be considered for protein engineering work, as well as for gout therapy.

Download Full-text

iAFP-gap-SMOTE: An Efficient Feature Extraction Scheme Gapped Dipeptide Composition is Coupled with an Oversampling Technique for Identification of Antifreeze Proteins

Letters in Organic Chemistry ◽

10.2174/1570178615666180816101653 ◽

2019 ◽

Vol 16 (4) ◽

pp. 294-302 ◽

Cited By ~ 6

Author(s):

Shahid Akbar ◽

Maqsood Hayat ◽

Muhammad Kabir ◽

Muhammad Iqbal

Keyword(s):

Feature Extraction ◽

Amino Acid ◽

Antifreeze Proteins ◽

Protein Sequences ◽

Sampling Technique ◽

Lower Class ◽

Success Rates ◽

Throughput Model ◽

Extraction Scheme ◽

Living Organisms

Antifreeze proteins (AFPs) perform distinguishable roles in maintaining homeostatic conditions of living organisms and protect their cell and body from freezing in extremely cold conditions. Owing to high diversity in protein sequences and structures, the discrimination of AFPs from non- AFPs through experimental approaches is expensive and lengthy. It is, therefore, vastly desirable to propose a computational intelligent and high throughput model that truly reflects AFPs quickly and accurately. In a sequel, a new predictor called “iAFP-gap-SMOTE” is proposed for the identification of AFPs. Protein sequences are expressed by adopting three numerical feature extraction schemes namely; Split Amino Acid Composition, G-gap di-peptide Composition and Reduce Amino Acid alphabet composition. Usually, classification hypothesis biased towards majority class in case of the imbalanced dataset. Oversampling technique Synthetic Minority Over-sampling Technique is employed in order to increase the instances of the lower class and control the biasness. 10-fold cross-validation test is applied to appraise the success rates of “iAFP-gap-SMOTE” model. After the empirical investigation, “iAFP-gap-SMOTE” model obtained 95.02% accuracy. The comparison suggested that the accuracy of” iAFP-gap-SMOTE” model is higher than that of the present techniques in the literature so far. It is greatly recommended that our proposed model “iAFP-gap-SMOTE” might be helpful for the research community and academia.

Download Full-text

A Novel Amino Acid Sequence-based Computational Approach to Predicting Cell-penetrating Peptides

Current Computer - Aided Drug Design ◽

10.2174/1573409914666180925100355 ◽

2019 ◽

Vol 15 (3) ◽

pp. 206-211 ◽

Cited By ~ 2

Author(s):

Jihui Tang ◽

Jie Ning ◽

Xiaoyan Liu ◽

Baoming Wu ◽

Rongfeng Hu

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Amino Acid Position ◽

Cell Penetrating Peptides ◽

Support Vector ◽

Cell Penetration ◽

Drug Candidates ◽

Machine Learning Model ◽

Cell Penetrating ◽

Novel Method

Introduction: Machine Learning is a useful tool for the prediction of cell-penetration compounds as drug candidates. Materials and Methods: In this study, we developed a novel method for predicting Cell-Penetrating Peptides (CPPs) membrane penetrating capability. For this, we used orthogonal encoding to encode amino acid and each amino acid position as one variable. Then a software of IBM spss modeler and a dataset including 533 CPPs, were used for model screening. Results: The results indicated that the machine learning model of Support Vector Machine (SVM) was suitable for predicting membrane penetrating capability. For improvement, the three CPPs with the most longer lengths were used to predict CPPs. The penetration capability can be predicted with an accuracy of close to 95%. Conclusion: All the results indicated that by using amino acid position as a variable can be a perspective method for predicting CPPs membrane penetrating capability.

Download Full-text

Drosophila kinesin motor domain extending to amino acid position 392 is dimeric when expressed in Escherichia coli.

Journal of Biological Chemistry ◽

10.1016/s0021-9258(18)31692-2 ◽

1994 ◽

Vol 269 (51) ◽

pp. 32708

Author(s):

T G Huang ◽

J Suhan ◽

D D Hackney

Keyword(s):

Escherichia Coli ◽

Amino Acid ◽

Amino Acid Position ◽

Motor Domain ◽

Kinesin Motor

Download Full-text

Fe(2)OG: an integrated HMM profile-based web server to predict and analyze putative non-haem iron(II)- and 2-oxoglutarate-dependent dioxygenase function in protein sequences

BMC Research Notes ◽

10.1186/s13104-021-05477-z ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Siddhartha Kundu

Keyword(s):

Amino Acid ◽

Water Molecule ◽

Active Site ◽

Ferrous Iron ◽

Web Server ◽

Protein Sequences ◽

Diverse Group ◽

And Function ◽

Functionally Diverse ◽

Haem Iron

Abstract Objective Non-haem iron(II)- and 2-oxoglutarate-dependent dioxygenases (i2OGdd), are a taxonomically and functionally diverse group of enzymes. The active site comprises ferrous iron in a hexa-coordinated distorted octahedron with the apoenzyme, 2-oxoglutarate and a displaceable water molecule. Current information on novel i2OGdd members is sparse and relies on computationally-derived annotation schema. The dissimilar amino acid composition and variable active site geometry thereof, results in differing reaction chemistries amongst i2OGdd members. An additional need of researchers is a curated list of sequences with putative i2OGdd function which can be probed further for empirical data. Results This work reports the implementation of $$Fe\left(2\right)OG$$ F e 2 O G , a web server with dual functionality and an extension of previous work on i2OGdd enzymes $$\left(Fe\left(2\right)OG\equiv \{H2OGpred,DB2OG\}\right)$$ F e 2 O G ≡ { H 2 O G p r e d , D B 2 O G } . $$Fe\left(2\right)OG$$ F e 2 O G , in this form is completely revised, updated (URL, scripts, repository) and will strengthen the knowledge base of investigators on i2OGdd biochemistry and function. $$Fe\left(2\right)OG$$ F e 2 O G , utilizes the superior predictive propensity of HMM-profiles of laboratory validated i2OGdd members to predict probable active site geometries in user-defined protein sequences. $$Fe\left(2\right)OG$$ F e 2 O G , also provides researchers with a pre-compiled list of analyzed and searchable i2OGdd-like sequences, many of which may be clinically relevant. $$Fe(2)OG$$ F e ( 2 ) O G , is freely available (http://204.152.217.16/Fe2OG.html) and supersedes all previous versions, i.e., H2OGpred, DB2OG.

Download Full-text

Cloning, Expression, and Characterization of Mouse Tissue Factor Pathway Inhibitor (TFPI)

Thrombosis and Haemostasis ◽

10.1055/s-0037-1614983 ◽

1998 ◽

Vol 79 (02) ◽

pp. 306-309 ◽

Cited By ~ 5

Author(s):

Dougald Monroe ◽

Julie Oliver ◽

Darla Liles ◽

Harold Roberts ◽

Jen-Yea Chang

Keyword(s):

Amino Acid ◽

Tissue Factor ◽

Signal Peptide ◽

Tissue Factor Pathway Inhibitor ◽

Factor Xa ◽

Protein Sequences ◽

Cloning And Expression ◽

Mouse Tissue ◽

Amino Acid Residues ◽

Tissue Factor Pathway

SummaryTissue factor pathway inhibitor (TFPI) acts to regulate the initiation of coagulation by first inhibiting factor Xa. The complex of factor Xa/ TFPI then inhibits the factor VIIa/tissue factor complex. The cDNA sequences of TFPI from several different species have been previously reported. A high level of similarity is present among TFPIs at the molecular level (DNA and protein sequences) as well as in biochemical function (inhibition of factor Xa, VIIa/tissue factor). In this report, we used a PCR-based screening method to clone cDNA for full length TFPI from a mouse macrophage cDNA library. Both cDNA and predicted protein sequences show significant homology to the other reported TFPI sequences, especially to that of rat. Mouse TFPI has a signal peptide of 28 amino acid residues followed by the mature protein (in which the signal peptide is removed) which has 278 amino acid residues. Mouse TFPI, like that of other species, consists of three tandem Kunitz type domains. Recombinant mouse TFPI was expressed in the human kidney cell line 293 and purified for functional assays. When using human clotting factors to investigate the inhibition spectrum of mouse TFPI, it was shown that, in addition to human factor Xa, mouse TFPI inhibits human factors VIIa, IXa, as well as factor XIa. Cloning and expression of the mouse TFPI gene will offer useful information and material for coagulation studies performed in a mouse model system.

Download Full-text

Genetic Relationships in the Toxin-Producing Fungal Endophyte, Alternaria oxytropis Using Polyketide Synthase and Non-Ribosomal Peptide Synthase Genes

Journal of Fungi ◽

10.3390/jof7070538 ◽

2021 ◽

Vol 7 (7) ◽

pp. 538

Author(s):

Rebecca Creamer ◽

Deana Baucom Hille ◽

Marwa Neyaz ◽

Tesneem Nusayr ◽

Christopher L. Schardl ◽

...

Keyword(s):

Amino Acid ◽

Polyketide Synthase ◽

Genetic Relationships ◽

Protein Sequences ◽

Fungal Endophyte ◽

Melanin Synthesis ◽

Melanin Biosynthesis ◽

Protein Levels ◽

Oxytropis Sericea ◽

And Function

The legume Oxytropis sericea hosts a fungal endophyte, Alternaria oxytropis, which produces secondary metabolites (SM), including the toxin swainsonine. Polyketide synthase (PKS) and non-ribosomal peptide synthase (NRPS) enzymes are associated with biosynthesis of fungal SM. To better understand the origins of the SM, an unannotated genome of A. oxytropis was assessed for protein sequences similar to known PKS and NRPS enzymes of fungi. Contigs exhibiting identity with known genes were analyzed at nucleotide and protein levels using available databases. Software were used to identify PKS and NRPS domains and predict identity and function. Confirmation of sequence for selected gene sequences was accomplished using PCR. Thirteen PKS, 5 NRPS, and 4 PKS-NRPS hybrids were identified and characterized with functions including swainsonine and melanin biosynthesis. Phylogenetic relationships among closest amino acid matches with Alternaria spp. were identified for seven highly conserved PKS and NRPS, including melanin synthesis. Three PKS and NRPS were most closely related to other fungi within the Pleosporaceae family, while five PKS and PKS-NRPS were closely related to fungi in the Pleosporales order. However, seven PKS and PKS-NRPS showed no identity with fungi in the Pleosporales or the class Dothideomycetes, suggesting a different evolutionary origin for those genes.

Download Full-text

Identification of separate domains in the adenovirus E1A gene for immortalization activity and the activation of virus early genes.

Molecular and Cellular Biology ◽

10.1128/mcb.6.10.3470 ◽

1986 ◽

Vol 6 (10) ◽

pp. 3470-3480 ◽

Cited By ~ 98

Author(s):

E Moran ◽

B Zerler ◽

T M Harrison ◽

M B Mathews

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Point Mutations ◽

Apparent Molecular Weight ◽

Amino Acid Position ◽

Cysteine Residue ◽

Transformation Function ◽

Single Amino Acid ◽

Amino Acid Substitutions ◽

E1a Gene

The transformation and early adenovirus gene transactivation functions of the E1A region were analyzed with deletion and point mutations. Deletion of amino acids from position 86 through 120 had little effect on the lytic or transforming functions of the E1A products, while deletion of amino acids from position 121 through 150 significantly impaired both functions. The sensitivity of the transformation function to alterations in the region from amino acid position 121 to 150 was further indicated by the impairment of transforming activity resulting from single amino acid substitutions at positions 124 and 135. Interestingly, conversion of a cysteine residue at position 124 to glycine severely impaired the transformation function without affecting the early adenovirus gene activating functions. Single amino acid substitutions in a different region of the E1A gene had the converse effect. All the mutants produced polypeptides of sufficient stability to be detected by Western immunoblot analysis. The single amino acid substitutions at positions 124 and 135, although impairing the transformation functions, did not detectably alter the formation of the higher-apparent-molecular-weight forms of the E1A products.

Download Full-text

Techniques for the verification of minimal phylogenetic trees illustrated with ten mammalian haemoglobin sequences

Biochemical Journal ◽

10.1042/bj1870065 ◽

1980 ◽

Vol 187 (1) ◽

pp. 65-74 ◽

Cited By ~ 12

Author(s):

D Penny ◽

M D Hendy ◽

L R Foulds

Keyword(s):

Amino Acid ◽

Phylogenetic Tree ◽

Protein Sequence ◽

Phylogenetic Trees ◽

Sequence Data ◽

Protein Sequences ◽

Nucleotide Sequences ◽

Amino Acid Sequences ◽

Minimal Tree ◽

Protein Sequence Data

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.

Download Full-text

T-Cell Response to Human Papillomavirus Type 58 L1, E6, and E7 Peptides in Women with Cleared Infection, Cervical Intraepithelial Neoplasia, or Invasive Cancer

Clinical and Vaccine Immunology ◽

10.1128/cvi.00105-10 ◽

2010 ◽

Vol 17 (9) ◽

pp. 1315-1321 ◽

Cited By ~ 7

Author(s):

Paul K. S. Chan ◽

Shih-Jen Liu ◽

T. H. Cheung ◽

Winnie Yeo ◽

S. M. Ngai ◽

...

Keyword(s):

Human Papillomavirus ◽

Amino Acid ◽

T Cell ◽

Cell Response ◽

Positive Response ◽

Intraepithelial Neoplasia ◽

Amino Acid Position ◽

T Cell Response ◽

Ifn Γ ◽

E6 And E7

ABSTRACT Human papillomavirus type 58 (HPV-58) exists in a relatively high prevalence in certain parts of the world, including East Asia. This study examined the T-cell response to HPV-58 L1, E6, and E7 peptides among women with cleared infection, cervical intraepithelial neoplasia grade 2 (CIN2) or CIN3, or invasive cervical cancer (ICC). Peptides found to be reactive in the in vitro peptide binding assay or mouse-stimulating study were tested with a gamma interferon (IFN-γ) enzyme-linked immunospot (ELISPOT) assay to detect peptide-specific responses from the peripheral blood mononuclear cells (PBMC) collected from 91 HPV-58-infected women (32 with cleared infection, 16 CIN2, 15 CIN3, and 28 ICC). Four HLA-A11-restricted HPV-58 L1 peptides, located at amino acid positions 296 to 304, 327 to 335, 101 to 109, and 469 to 477, showed positive IFN-γ ELISPOT results and were mainly from women with cleared infection. Two HLA-A11-restricted E6 peptides (amino acid positions 64 to 72 and 94 to 102) and three HLA-A11-restricted E7 peptides (amino acid positions 78 to 86, 74 to 82, and 88 to 96) showed a positive response. A response to E6 and E7 peptides was mainly observed from subjects with CIN2 or above. One HLA-A2-restricted E6 peptide, located at amino acid position 99 to 107, elicited a positive response in two CIN2 subjects. One HLA-A24-restricted L1 peptide, located at amino acid position 468 to 476, also elicited a positive response in two CIN2 subjects. In summary, this study has identified a few immunogenic epitopes for HPV-58 E6 and E7 proteins. It is worthwhile to further investigate whether responses to these epitopes have a role in clearing an established cervical lesion.

Download Full-text