scholarly journals Molecular Similarity Searching Method Based on Adaptive IR Technique

2014 ◽  
Vol 4 (6) ◽  
pp. 787-797
Author(s):  
Mohammed Binwahlan
2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Jimin Wang ◽  
Yuelong Zhu ◽  
Shijin Li ◽  
Dingsheng Wan ◽  
Pengcheng Zhang

Multivariate time series (MTS) datasets are very common in various financial, multimedia, and hydrological fields. In this paper, a dimension-combination method is proposed to search similar sequences for MTS. Firstly, the similarity of single-dimension series is calculated; then the overall similarity of the MTS is obtained by synthesizing each of the single-dimension similarity based on weighted BORDA voting method. The dimension-combination method could use the existing similarity searching method. Several experiments, which used the classification accuracy as a measure, were performed on six datasets from the UCI KDD Archive to validate the method. The results show the advantage of the approach compared to the traditional similarity measures, such as Euclidean distance (ED), cynamic time warping (DTW), point distribution (PD), PCA similarity factorSPCA, and extended Frobenius norm (Eros), for MTS datasets in some ways. Our experiments also demonstrate that no measure can fit all datasets, and the proposed measure is a choice for similarity searches.


2005 ◽  
Vol 10 (7) ◽  
pp. 658-666 ◽  
Author(s):  
Andreas Bender ◽  
Hamse Y. Mussa ◽  
Robert C. Glen

A fragment-based similarity searching method, MOLPRINT 2D, was employed for virtual screening of Escherichia coli dihydrofolate reductase inhibitors. Using the original training set of 50,000 compounds, only marginal enrichment factors (between 1 and 3) could be achieved on the test library. The active structures contained in the training and test libraries represented different types of “chemistry,” that is, different substructural features associated with activity. Training and test sets were pooled in a 2nd step and randomly split into training and test of equal size, with the objective of smoothing out the different chemical characteristics of both libraries. In a 10-fold cross-validation study on the new training and test sets, typically 10-fold enrichment could be found in the first 96 positions, 4-fold enrichment in the first 384 positions, and 3-fold enrichment in the first 1536 positions, corresponding to 6, 10, and 28 hits, respectively (out of a total of 307; activity defined as average residual activity of less than 80%). The conclusions are 2-fold. On one hand, the exact fragment-matching similarity searching method employed here is not capable of finding completely novel hit structures. On the other hand, this study emphasizes the requirement for a comparable distribution of chemical features of the training and test sets. MOLPRINT 2D is freely downloadable from http://www.cheminformatics.org.


Author(s):  
Andreas Bender ◽  
Andreas Klamt ◽  
Karin Wichmann ◽  
Michael Thormann ◽  
Robert C. Glen

Sign in / Sign up

Export Citation Format

Share Document