A PRUNING APPROACH TO PATTERN DISCOVERY

2008 ◽  
Vol 07 (04) ◽  
pp. 721-736 ◽  
Author(s):  
HSIAO-FAN WANG ◽  
ZU-WEN CHAN

In this study, we proposed a general pruning procedure to reduce the dimension of a large database so that the properties of the extracted subset can be well defined. Since learning functions have been widely applied, we take this group of functions as an example to demonstrate the proposed procedure. Based on the concept of Support Vector Machine (SVM), three major stages of preliminary pruning, fitting function, and refining are proposed to discover a subset that possess the characteristics of some learning function from the given large data set. Three models were used to illustrate and evaluate the proposed pruning procedure and the results have shown to be promising in application.

2020 ◽  
Vol 27 (4) ◽  
pp. 329-336 ◽  
Author(s):  
Lei Xu ◽  
Guangmin Liang ◽  
Baowen Chen ◽  
Xu Tan ◽  
Huaikun Xiang ◽  
...  

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.


2019 ◽  
Vol 21 (9) ◽  
pp. 662-669 ◽  
Author(s):  
Junnan Zhao ◽  
Lu Zhu ◽  
Weineng Zhou ◽  
Lingfeng Yin ◽  
Yuchen Wang ◽  
...  

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ruolan Zeng ◽  
Jiyong Deng ◽  
Limin Dang ◽  
Xinliang Yu

AbstractA three-descriptor quantitative structure–activity/toxicity relationship (QSAR/QSTR) model was developed for the skin permeability of a sufficiently large data set consisting of 274 compounds, by applying support vector machine (SVM) together with genetic algorithm. The optimal SVM model possesses the coefficient of determination R2 of 0.946 and root mean square (rms) error of 0.253 for the training set of 139 compounds; and a R2 of 0.872 and rms of 0.302 for the test set of 135 compounds. Compared with other models reported in the literature, our SVM model shows better statistical performance in a model that deals with more samples in the test set. Therefore, applying a SVM algorithm to develop a nonlinear QSAR model for skin permeability was achieved.


2021 ◽  
pp. 102586
Author(s):  
Chuanjun Du ◽  
Ruoying He ◽  
Zhiyu Liu ◽  
Tao Huang ◽  
Lifang Wang ◽  
...  

2017 ◽  
Vol 128 (1) ◽  
pp. 243-250 ◽  
Author(s):  
Mark L. Scheuer ◽  
Anto Bagic ◽  
Scott B. Wilson

2014 ◽  
Author(s):  
Carlos Enrique Gutierrez ◽  
Prof. Mohamad Reza Alsharif ◽  
Mahdi Khosravy ◽  
Prof. Katsumi Yamashita ◽  
Prof. Hayao Miyagi ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document