Discriminating acidic and alkaline enzymes using a random forest model with secondary structure amino acid composition

2009 ◽  
Vol 44 (6) ◽  
pp. 654-660 ◽  
Author(s):  
Guangya Zhang ◽  
Hongchun Li ◽  
Baishan Fang
mBio ◽  
2018 ◽  
Vol 9 (5) ◽  
Author(s):  
Ursula Goodenough ◽  
Robyn Roth ◽  
Thamali Kariyawasam ◽  
Amelia He ◽  
Jae-Hyeok Lee

ABSTRACTAnimals and amoebae assemble actin/spectrin-based plasma membrane skeletons, forming what is often called the cell cortex, whereas euglenids and alveolates (ciliates, dinoflagellates, and apicomplexans) have been shown to assemble a thin, viscoelastic, actin/spectrin-free membrane skeleton, here called the epiplast. Epiplasts include a class of proteins, here called the epiplastins, with a head/medial/tail domain organization, whose medial domains have been characterized in previous studies by their low-complexity amino acid composition. We have identified two additional features of the medial domains: a strong enrichment of acid/base amino acid dyads and a predicted β-strand/random coil secondary structure. These features have served to identify members in two additional unicellular eukaryotic radiations—the glaucophytes and cryptophytes—as well as additional members in the alveolates and euglenids. We have analyzed the amino acid composition and domain structure of 219 epiplastin sequences and have used quick-freeze deep-etch electron microscopy to visualize the epiplasts of glaucophytes and cryptophytes. We define epiplastins as proteins encoded in organisms that assemble epiplasts, but epiplastin-like proteins, of unknown function, are also encoded in Insecta, Basidiomycetes, andCaulobactergenomes. We discuss the diverse cellular traits that are supported by epiplasts and propose evolutionary scenarios that are consonant with their distribution in extant eukaryotes.IMPORTANCEMembrane skeletons associate with the inner surface of the plasma membrane to provide support for the fragile lipid bilayer and an elastic framework for the cell itself. Several radiations, including animals, organize such skeletons using actin/spectrin proteins, but four major radiations of eukaryotic unicellular organisms, including disease-causing parasites such asPlasmodium, have been known to construct an alternative and essential skeleton (the epiplast) using a class of proteins that we term epiplastins. We have identified epiplastins in two additional radiations and present images of their epiplasts using electron microscopy. We analyze the sequences and secondary structure of 219 epiplastins and present an in-depth overview and analysis of their known and posited roles in cellular organization and parasite infection. An understanding of epiplast assembly may suggest therapeutic approaches to combat infectious agents such asPlasmodiumas well as approaches to the engineering of useful viscoelastic biofilms.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Zhixia Teng ◽  
Zitong Zhang ◽  
Zhen Tian ◽  
Yanjuan Li ◽  
Guohua Wang

Abstract Background Amyloids are insoluble fibrillar aggregates that are highly associated with complex human diseases, such as Alzheimer’s disease, Parkinson’s disease, and type II diabetes. Recently, many studies reported that some specific regions of amino acid sequences may be responsible for the amyloidosis of proteins. It has become very important for elucidating the mechanism of amyloids that identifying the amyloidogenic regions. Accordingly, several computational methods have been put forward to discover amyloidogenic regions. The majority of these methods predicted amyloidogenic regions based on the physicochemical properties of amino acids. In fact, position, order, and correlation of amino acids may also influence the amyloidosis of proteins, which should be also considered in detecting amyloidogenic regions. Results To address this problem, we proposed a novel machine-learning approach for predicting amyloidogenic regions, called ReRF-Pred. Firstly, the pseudo amino acid composition (PseAAC) was exploited to characterize physicochemical properties and correlation of amino acids. Secondly, tripeptides composition (TPC) was employed to represent the order and position of amino acids. To improve the distinguishability of TPC, all possible tripeptides were analyzed by the binomial distribution method, and only those which have significantly different distribution between positive and negative samples remained. Finally, all samples were characterized by PseAAC and TPC of their amino acid sequence, and a random forest-based amyloidogenic regions predictor was trained on these samples. It was proved by validation experiments that the feature set consisted of PseAAC and TPC is the most distinguishable one for detecting amyloidosis. Meanwhile, random forest is superior to other concerned classifiers on almost all metrics. To validate the effectiveness of our model, ReRF-Pred is compared with a series of gold-standard methods on two datasets: Pep-251 and Reg33. The results suggested our method has the best overall performance and makes significant improvements in discovering amyloidogenic regions. Conclusions The advantages of our method are mainly attributed to that PseAAC and TPC can describe the differences between amyloids and other proteins successfully. The ReRF-Pred server can be accessed at http://106.12.83.135:8080/ReRF-Pred/.


Sign in / Sign up

Export Citation Format

Share Document