similarity searching
Recently Published Documents


TOTAL DOCUMENTS

308
(FIVE YEARS 33)

H-INDEX

38
(FIVE YEARS 2)

2021 ◽  
pp. 2100106
Author(s):  
Abeer Abdulhakeem Mansour Alhasbary ◽  
Nurul Hashimah Ahamed Hassain Malim

2021 ◽  
Author(s):  
Sebastián Ayala-Ruano ◽  
Yovani Marrero-Ponce ◽  
Longendri Aguilera‑Mendoza ◽  
Noel Pérez ◽  
Guillermin Agüero-Chapin ◽  
...  

Antimicrobial peptides (AMPs) are small bioactive chemicals that have appeared as promising compounds to treat a wide range of diseases. The effectiveness of AMPs resides in the wide range of mechanisms they can use for both killing microbes and modulating immune responses. However, the AMPs’ chemical space (AMPCS) is huge, it is estimated that there exist more than 1065 unique sequences of peptides with 50 residues or fewer, which represent a big challenge for the discovery of new promising sequences and the identification of common features, motifs, or relevant biological functions shared by these peptides. Therefore, we present a new approach based on network science and similarity searching to discover new potential AMPs, specifically antiparasitic peptides (APPs). We have taken advantage of network-based representation of APPs’ chemical space (APPCS) to retrieve valuable information, using three types of networks: chemical space (CSN), half-space proximal (HSPN), and metadata (METN). Some centrality measures were applied to identify the most important and non-redundant nodes, and these peptides were taken as queries (Qs) against the graph database starPepDB to discover new potential APPs with similarity searching by group fusion (MAX-SIM rule) models. We evaluated the multi-query similarity searching models (mQSSMs) performance with five benchmarking data sets of APP/non-APPs. It can be stated that the predictions performed by the best mQSSMs present a strong-to-very strong predictive agreement since their external Matthews correlation coefficient (MCC) values ranged from 0.834 to 0.965. Outstanding outcomes were attained by the mQSSM with 219 Qs from both networks CSN and HSPN (219Q_0.5_HB-HC-Singletons_CSN-HSPN) and by using 0.5 as similarity threshold, with MCC values greater than 0.85 in external datasets. Then, we compared the performance metrics of our mQSSMs with APPs prediction servers AMPDiscover and AMPFun. The model proposed in this report outperformed the machine learning approaches with statistically significant differences, showing the enormous potential of this method. After applying our method and additional filters, we proposed 95 repurposed leads as potential APPs, which have not been associated with this activity until now. In addition, we explored sequence similarities and motifs shared by these peptides, which can serve as templates for searching and designing new promising APPs. The analyses show that the similarity models proposed in this study could contribute to identifying APPs with high effectivity and reliability. Our models and pipeline are freely available through the starPep toolbox software at http://mobiosd-hub.com/starpep.


2021 ◽  
Vol 12 (3) ◽  
Author(s):  
Camila R. Lopes ◽  
Lúcio F. D. Santos ◽  
Daniel L. Jasbick ◽  
Daniel De Oliveira ◽  
Marcos Bedo

A diversified similarity search retrieves elements that are simultaneously similar to a query object and akin to the different collections within the explored data. While several methods in information retrieval, data clustering, and similarity searching have tackled the problem of adding diversity into result sets, the experimental comparison of their performances is still an open issue mainly because the quality metrics are “borrowed” from those different research areas, bringing their biases alongside. In this manuscript, we investigate a series of such metrics and experimentally discuss their trends and limitations. We conclude diversity is better addressed by a set of measures rather than a single quality index and introduce the concept of Diversity Features Model (DFM), which combines the viewpoints of biased metrics into a multidimensional representation. Experimental evaluations indicate (i) DFM enables comparing different result diversification algorithms by considering multiple criteria, and (ii) the most suitable searching methods for a particular dataset are spotted by combining DFM with ranking aggregation and parallel coordinates maps.


Molecules ◽  
2021 ◽  
Vol 26 (17) ◽  
pp. 5208
Author(s):  
Rosa Purgatorio ◽  
Nicola Gambacorta ◽  
Modesto de Candia ◽  
Marco Catto ◽  
Mariagrazia Rullo ◽  
...  

Recently, the direct thrombin (thr) inhibitor dabigatran has proven to be beneficial in animal models of Alzheimer’s disease (AD). Aiming at discovering novel multimodal agents addressing thr and AD-related targets, a selection of previously and newly synthesized potent thr and factor Xa (fXa) inhibitors were virtually screened by the Multi-fingerprint Similarity Searching aLgorithm (MuSSeL) web server. The N-phenyl-1-(pyridin-4-yl)piperidine-4-carboxamide derivative 1, which has already been experimentally shown to inhibit thr with a Ki value of 6 nM, has been flagged by a new, upcoming release of MuSSeL as a binder of cholinesterase (ChE) isoforms (acetyl- and butyrylcholinesterase, AChE and BChE), as well as thr, fXa, and other enzymes and receptors. Interestingly, the inhibition potency of 1 was predicted by the MuSSeL platform to fall within the low-to-submicromolar range and this was confirmed by experimental Ki values, which were found equal to 0.058 and 6.95 μM for eeAChE and eqBChE, respectively. Thirty analogs of 1 were then assayed as inhibitors of thr, fXa, AChE, and BChE to increase our knowledge of their structure-activity relationships, while the molecular determinants responsible for the multiple activities towards the target enzymes were rationally investigated by molecular cross-docking screening.


2021 ◽  
Vol 24 ◽  
pp. 256-266
Author(s):  
Nihayatul Karimah ◽  
Gijs Schaftenaar

Purpose: Structurally similar molecules are likely to have similar biological activity. In this study, similarity searching based on molecular 2D fingerprint was performed to analyze off-target effects of drugs. The purpose of this study is to determine the correlation between the adverse effects and drug off-targets. Methods: A workflow was built using KNIME to run dataset preparation of twenty-nine targets from ChEMBL, generate molecular 2D fingerprints of the ligands, calculate the similarity between ligand sets, and compute the statistical significance using similarity ensemble approach (SEA). Tanimoto coefficients (Tc) are used as a measure of chemical similarity in which the values between 0.2 and 0.4 are the most common for the majority of ligand pairs and considered to be insignificant similar. Result: The majority of ligand sets are unrelated, as is evidenced by the intrinsic chemical differences and the classification of statistical significance based on expectation value. The rank-ordered expectation value of inter-target similarity showed a correlation with off-target effects of the known drugs. Conclusion: Similarity-searching using molecular 2D fingerprint can be applied to predict off-targets and correlate them to the adverse effects of the drugs. KNIME as an open-source data analytic platform is applicable to build a workflow for data mining of ChEMBL database and generating SEA statistical model.


2021 ◽  
Author(s):  
Wout Bittremieux ◽  
Kris Laukens ◽  
William Stafford Noble ◽  
Pieter C. Dorrestein

AbstractRationaleAdvanced algorithmic solutions are necessary to process the ever increasing amounts of mass spectrometry data that is being generated. Here we describe the falcon spectrum clustering tool for efficient clustering of millions of MS/MS spectra.Methodsfalcon succeeds in efficiently clustering large amounts of mass spectral data using advanced techniques for fast spectrum similarity searching. First, high-resolution spectra are binned and converted to low-dimensional vectors using feature hashing. Next, the spectrum vectors are used to construct nearest neighbor indexes for fast similarity searching. The nearest neighbor indexes are used to efficiently compute a sparse pair-wise distance matrix without having to exhaustively compare all spectra to each other. Finally, density-based clustering is performed to group similar spectra into clusters.ResultsUsing a large draft human proteome dataset consisting of 25 million spectra, falcon is able to generate clusters of a similar quality as MS-Cluster and spectra-cluster, two widely used clustering tools, while being considerably faster. Notably, at comparable cluster quality levels, falcon generates larger clusters than alternative tools, leading to a larger reduction in data volume without the loss of relevant information for more efficient downstream processing.Conclusionsfalcon is a highly efficient spectrum clustering tool. It is publicly available as open source under the permissive BSD license at https://github.com/bittremieux/falcon.


Author(s):  
Maged Nasser ◽  
Naomie Salim ◽  
Hentabli Hamza ◽  
Faisal Saeed ◽  
Idris Rabiu

Virtual screening (VS) is defined as the use of a compilation of computational procedures to grade, score and/or sort several chemical formations. The purpose of VS is to identify the molecules holding the greatest prior probabilities of activity. Many of the conventional similarity methods assume that molecular features that do not relate to the biological activity carry the same weight as the important ones. For this reason, the researchers on this paper investigated that some features are being more important than others through the chemist structure diagrams and the weight for each fragment should be taken into consideration by giving more weight to those fragments that are more important. In this paper, a deep learning method specifically known as Deep Belief Networks (DBN) has been used to reweight the molecule features and based on this new weigh, the reconstruction feature error has been calculated for all the features. Based on the reconstruction feature error values, Principal Component Analysis (PCA) has been used for the dimension’s reduction and only few hundreds of features have been selected based on the less error rate. The main aim of this research is to show an improvement of the similarity searching performance based on the selected features those have less error rate. The results derived through the DBN were compared with those derived through other similarity methods, such as the Tanimoto coefficient and the quantum-based methods. This comparison revealed the performance of the DBN with the structurally heterogeneous data sets (DS1 and DS3) to be superior to the performances of all the other techniques.


Molecules ◽  
2020 ◽  
Vol 26 (1) ◽  
pp. 128
Author(s):  
Maged Nasser ◽  
Naomie Salim ◽  
Hentabli Hamza ◽  
Faisal Saeed ◽  
Idris Rabiu

Virtual screening (VS) is a computational practice applied in drug discovery research. VS is popularly applied in a computer-based search for new lead molecules based on molecular similarity searching. In chemical databases similarity searching is used to identify molecules that have similarities to a user-defined reference structure and is evaluated by quantitative measures of intermolecular structural similarity. Among existing approaches, 2D fingerprints are widely used. The similarity of a reference structure and a database structure is measured by the computation of association coefficients. In most classical similarity approaches, it is assumed that the molecular features in both biological and non-biologically-related activity carry the same weight. However, based on the chemical structure, it has been found that some distinguishable features are more important than others. Hence, this difference should be taken consideration by placing more weight on each important fragment. The main aim of this research is to enhance the performance of similarity searching by using multiple descriptors. In this paper, a deep learning method known as deep belief networks (DBN) has been used to reweight the molecule features. Several descriptors have been used for the MDL Drug Data Report (MDDR) dataset each of which represents different important features. The proposed method has been implemented with each descriptor individually to select the important features based on a new weight, with a lower error rate, and merging together all new features from all descriptors to produce a new descriptor for similarity searching. Based on the extensive experiments conducted, the results show that the proposed method outperformed several existing benchmark similarity methods, including Bayesian inference networks (BIN), the Tanimoto similarity method (TAN), adapted similarity measure of text processing (ASMTP) and the quantum-based similarity method (SQB). The results of this proposed multi-descriptor-based on Stack of deep belief networks method (SDBN) demonstrated a higher accuracy compared to existing methods on structurally heterogeneous datasets.


Sign in / Sign up

Export Citation Format

Share Document