Enrichment of High-Throughput Screening Data with Increasing Levels of Noise Using Support Vector Machines, Recursive Partitioning, and Laplacian-Modified Naive Bayesian Classifiers

2006 ◽  
Vol 46 (1) ◽  
pp. 193-200 ◽  
Author(s):  
Meir Glick ◽  
Jeremy L. Jenkins ◽  
James H. Nettles ◽  
Hamilton Hitchings ◽  
John W. Davies
2009 ◽  
Vol 49 (12) ◽  
pp. 2718-2725 ◽  
Author(s):  
Quan Liao ◽  
Jibo Wang ◽  
Yue Webster ◽  
Ian A. Watson

Author(s):  
Kaizhu Huang ◽  
Zenglin Xu ◽  
Irwin King ◽  
Michael R. Lyu ◽  
Zhangbing Zhou

Naive Bayesian network (NB) is a simple yet powerful Bayesian network. Even with a strong independency assumption among the features, it demonstrates competitive performance against other state-of-the-art classifiers, such as support vector machines (SVM). In this chapter, we propose a novel discriminative training approach originated from SVM for deriving the parameters of NB. This new model, called discriminative naive Bayesian network (DNB), combines both merits of discriminative methods (e.g., SVM) and Bayesian networks. We provide theoretic justifications, outline the algorithm, and perform a series of experiments on benchmark real-world datasets to demonstrate our model’s advantages. Its performance outperforms NB in classification tasks and outperforms SVM in handling missing information tasks.


2005 ◽  
Vol 11 (2) ◽  
pp. 138-144 ◽  
Author(s):  
Jianwen Fang ◽  
Yinghua Dong ◽  
Gerald H. Lushington ◽  
Qi-Zhuang Ye ◽  
Gunda I. Georg

This article reports a successful application of support vector machines (SVMs) in mining high-throughput screening (HTS) data of a type I methionine aminopeptidases (MetAPs) inhibition study. A library with 43,736 small organic molecules was used in the study, and 1355 compounds in the library with 40% or higher inhibition activity were considered as active. The data set was randomly split into a training set and a test set (3:1 ratio). The authors were able to rank compounds in the test set using their decision values predicted by SVM models that were built on the training set. They defined a novel score PT50, the percentage of the test set needed to be screened to recover 50% of the actives, to measure the performance of the models. With carefully selected parameters, SVM models increased the hit rates significantly, and 50% of the active compounds could be recovered by screening just 7% of the test set. The authors found that the size of the training set played a significant role in the performance of the models. A training set with 10,000 member compounds is likely the minimum size required to build a model with reasonable predictive power.


Sign in / Sign up

Export Citation Format

Share Document