Automated Structure–Activity Relationship Mining

Understanding the structure–activity relationships (SARs) of small molecules is important for developing probes and novel therapeutic agents in chemical biology and drug discovery. Increasingly, multiplexed small-molecule profiling assays allow simultaneous measurement of many biological response parameters for the same compound (e.g., expression levels for many genes or binding constants against many proteins). Although such methods promise to capture SARs with high granularity, few computational methods are available to support SAR analyses of high-dimensional compound activity profiles. Many of these methods are not generally applicable or reduce the activity space to scalar summary statistics before establishing SARs. In this article, we present a versatile computational method that automatically extracts interpretable SAR rules from high-dimensional profiling data. The rules connect chemical structural features of compounds to patterns in their biological activity profiles. We applied our method to data from novel cell-based gene-expression and imaging assays collected on more than 30,000 small molecules. Based on the rules identified for this data set, we prioritized groups of compounds for further study, including a novel set of putative histone deacetylase inhibitors.

Download Full-text

The Pharmacophore Network: A Computational Method for Exploring Structure–Activity Relationships from a Large Chemical Data Set

Journal of Medicinal Chemistry ◽

10.1021/acs.jmedchem.7b01890 ◽

2018 ◽

Vol 61 (8) ◽

pp. 3551-3564 ◽

Cited By ~ 5

Author(s):

Jean-Philippe Métivier ◽

Bertrand Cuissart ◽

Ronan Bureau ◽

Alban Lepailleur

Keyword(s):

Computational Method ◽

Chemical Data ◽

Data Set ◽

Structure Activity Relationships ◽

Structure Activity

Download Full-text

HAZARD: An Expert System for Risk Assessment of Environmental Chemicals

Methods of Information in Medicine ◽

10.1055/s-0038-1635482 ◽

1987 ◽

Vol 26 (01) ◽

pp. 13-23 ◽

Cited By ~ 2

Author(s):

H. W. Gottinger

Keyword(s):

Expert System ◽

Structural Information ◽

Chemical Carcinogenesis ◽

Structural Features ◽

Environmental Chemicals ◽

Chemical Carcinogens ◽

Carcinogenic Activity ◽

Current State ◽

Structure Activity ◽

Carcinogenic Potential

AbstractThe purpose of this paper is to report on an expert system in design that screens for potential hazards from environmental chemicals on the basis of structure-activity relationships in the study of chemical carcinogenesis, particularly with respect to analyzing the current state of known structural information about chemical carcinogens and predicting the possible carcinogenicity of untested chemicals. The structure-activity tree serves as an index of known chemical structure features associated with carcinogenic activity. The basic units of the tree are the principal recognized classes of chemical carcinogens that are subdivided into subclasses known as nodes according to specific structural features that may reflect differences in carcinogenic potential among chemicals in the class. An analysis of a computerized data base of known carcinogens (knowledge base) is proposed using the structure-activity tree in order to test the validity of the tree as a classification scheme (inference engine).

Download Full-text

A Computational Method for the Identification of Endolysins and Autolysins

Protein and Peptide Letters ◽

10.2174/0929866526666191002104735 ◽

2020 ◽

Vol 27 (4) ◽

pp. 329-336 ◽

Cited By ~ 1

Author(s):

Lei Xu ◽

Guangmin Liang ◽

Baowen Chen ◽

Xu Tan ◽

Huaikun Xiang ◽

...

Keyword(s):

Support Vector Machine ◽

Cell Wall ◽

Experimental Results ◽

Computational Method ◽

Lytic Enzyme ◽

Support Vector ◽

Lytic Enzymes ◽

Data Set ◽

Optimal Feature ◽

Better Than

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.

Download Full-text

Cytotoxic Agents in the Minor Alkaloid Groups of the Amaryllidaceae

Planta Medica ◽

10.1055/a-1380-1888 ◽

2021 ◽

Author(s):

Jerald J. Nair ◽

Johannes van Staden

Keyword(s):

Cancer Cells ◽

Cancer Cell Line ◽

Structural Features ◽

Biological Properties ◽

Cytotoxic Agents ◽

Plant Family ◽

Structure Activity ◽

Significant Interest ◽

Different Types ◽

Minor Alkaloid

AbstractOver 600 alkaloids have to date been identified in the plant family Amaryllidaceae. These have been arranged into as many as 15 different groups based on their characteristic structural features. The vast majority of studies on the biological properties of Amaryllidaceae alkaloids have probed their anticancer potential. While most efforts have focused on the major alkaloid groups, the volume and diversity afforded by the minor alkaloid groups have promoted their usefulness as targets for cancer cell line screening purposes. This survey is an in-depth review of such activities described for around 90 representatives from 10 minor alkaloid groups of the Amaryllidaceae. These have been evaluated against over 60 cell lines categorized into 18 different types of cancer. The montanine and cripowellin groups were identified as the most potent, with some in the latter demonstrating low nanomolar level antiproliferative activities. Despite their challenging molecular architectures, the minor alkaloid groups have allowed for facile adjustments to be made to their structures, thereby altering the size, geometry, and electronics of the targets available for structure-activity relationship studies. Nevertheless, it was seen with a regular frequency that the parent alkaloids were better cytotoxic agents than the corresponding semisynthetic derivatives. There has also been significant interest in how the minor alkaloid groups manifest their effects in cancer cells. Among the various targets and pathways in which they were seen to mediate, their ability to induce apoptosis in cancer cells is most appealing.

Download Full-text

4,7-Disubstituted 7H-Pyrrolo[2,3-d]pyrimidines and Their Analogs as Antiviral Agents against Zika Virus

Molecules ◽

10.3390/molecules26133779 ◽

2021 ◽

Vol 26 (13) ◽

pp. 3779

Author(s):

Ruben Soto-Acosta ◽

Eunkyung Jung ◽

Li Qiu ◽

Daniel J. Wilson ◽

Robert J. Geraghty ◽

...

Keyword(s):

Antiviral Activity ◽

Dengue Virus ◽

Small Molecules ◽

Zika Virus ◽

Antiviral Agents ◽

Structural Features ◽

Molecular Target ◽

Core Structure ◽

Human Pathogens ◽

Important Group

Discovery of compound 1 as a Zika virus (ZIKV) inhibitor has prompted us to investigate its 7H-pyrrolo[2,3-d]pyrimidine scaffold, revealing structural features that elicit antiviral activity. Furthermore, we have demonstrated that 9H-purine or 1H-pyrazolo[3,4-d]pyrimidine can serve as an alternative core structure. Overall, we have identified 4,7-disubstituted 7H-pyrrolo[2,3-d]pyrimidines and their analogs including compounds 1, 8 and 11 as promising antiviral agents against flaviviruses ZIKV and dengue virus (DENV). While the molecular target of these compounds is yet to be elucidated, 4,7-disubstituted 7H-pyrrolo[2,3-d]pyrimidines and their analogs are new chemotypes in the design of small molecules against flaviviruses, an important group of human pathogens.

Download Full-text

Synthesis, Antiprotozoal Activity, and Cheminformatic Analysis of 2-Phenyl-2H-Indazole Derivatives

Molecules ◽

10.3390/molecules26082145 ◽

2021 ◽

Vol 26 (8) ◽

pp. 2145

Author(s):

Karen Rodríguez-Villar ◽

Lilián Yépez-Mulia ◽

Miguel Cortés-Gines ◽

Jacobo David Aguilera-Perdomo ◽

Edgar A. Quintana-Salazar ◽

...

Keyword(s):

Phenyl Ring ◽

Medicinal Chemistry ◽

Structural Features ◽

Pharmacological Properties ◽

Antiprotozoal Activity ◽

One Pot ◽

Biological Assays ◽

Structure Activity Relationships ◽

Structure Activity ◽

Synthetic Methodologies

Indazole is an important scaffold in medicinal chemistry. At present, the progress on synthetic methodologies has allowed the preparation of several new indazole derivatives with interesting pharmacological properties. Particularly, the antiprotozoal activity of indazole derivatives have been recently reported. Herein, a series of 22 indazole derivatives was synthesized and studied as antiprotozoals. The 2-phenyl-2H-indazole scaffold was accessed by a one-pot procedure, which includes a combination of ultrasound synthesis under neat conditions as well as Cadogan’s cyclization. Moreover, some compounds were derivatized to have an appropriate set to provide structure-activity relationships (SAR) information. Whereas the antiprotozoal activity of six of these compounds against E. histolytica, G. intestinalis, and T. vaginalis had been previously reported, the activity of the additional 16 compounds was evaluated against these same protozoa. The biological assays revealed structural features that favor the antiprotozoal activity against the three protozoans tested, e.g., electron withdrawing groups at the 2-phenyl ring. It is important to mention that the indazole derivatives possess strong antiprotozoal activity and are also characterized by a continuous SAR.

Download Full-text

Cancer classification and biomarker selection via a penalized logsum network-based logistic regression model

Technology and Health Care ◽

10.3233/thc-218026 ◽

2021 ◽

Vol 29 ◽

pp. 287-295

Author(s):

Zhiming Zhou ◽

Haihui Huang ◽

Yong Liang

Keyword(s):

Logistic Regression ◽

Regression Model ◽

Logistic Regression Model ◽

Gene Selection ◽

Simulated Data ◽

Biological Data ◽

Cancer Classification ◽

High Dimensional ◽

Data Set ◽

Biomarker Selection

BACKGROUND: In genome research, it is particularly important to identify molecular biomarkers or signaling pathways related to phenotypes. Logistic regression model is a powerful discrimination method that can offer a clear statistical explanation and obtain the classification probability of classification label information. However, it is unable to fulfill biomarker selection. OBJECTIVE: The aim of this paper is to give the model efficient gene selection capability. METHODS: In this paper, we propose a new penalized logsum network-based regularization logistic regression model for gene selection and cancer classification. RESULTS: Experimental results on simulated data sets show that our method is effective in the analysis of high-dimensional data. For a large data set, the proposed method has achieved 89.66% (training) and 90.02% (testing) AUC performances, which are, on average, 5.17% (training) and 4.49% (testing) better than mainstream methods. CONCLUSIONS: The proposed method can be considered a promising tool for gene selection and cancer classification of high-dimensional biological data.

Download Full-text

Detecting common breaks in the means of high dimensional cross-dependent panels

Econometrics Journal ◽

10.1093/ectj/utab028 ◽

2021 ◽

Author(s):

Lajos Horváth ◽

Zhenya Liu ◽

Gregory Rice ◽

Yuqian Zhao

Keyword(s):

Panel Data ◽

Common Factors ◽

Real Data ◽

Change Points ◽

High Dimensional ◽

Asymptotic Results ◽

Cross Sectional ◽

Data Set ◽

Monte Carlo Simulation Study ◽

Cross Sectional Dependence

Abstract The problem of detecting change points in the mean of high dimensional panel data with potentially strong cross–sectional dependence is considered. Under the assumption that the cross–sectional dependence is captured by an unknown number of common factors, a new CUSUM type statistic is proposed. We derive its asymptotic properties under three scenarios depending on to what extent the common factors are asymptotically dominant. With panel data consisting of N cross sectional time series of length T, the asymptotic results hold under the mild assumption that min {N, T} → ∞, with an otherwise arbitrary relationship between N and T, allowing the results to apply to most panel data examples. Bootstrap procedures are proposed to approximate the sampling distribution of the test statistics. A Monte Carlo simulation study showed that our test outperforms several other existing tests in finite samples in a number of cases, particularly when N is much larger than T. The practical application of the proposed results are demonstrated with real data applications to detecting and estimating change points in the high dimensional FRED-MD macroeconomic data set.

Download Full-text

Integrated geophysical modelling of terranes and other structural features along the western Canadian margin

Canadian Journal of Earth Sciences ◽

10.1139/e92-119 ◽

1992 ◽

Vol 29 (7) ◽

pp. 1492-1508 ◽

Cited By ~ 21

Author(s):

S. A. Dehler ◽

R. M. Clowes

Keyword(s):

Seismic Refraction ◽

Vancouver Island ◽

Structural Features ◽

Pacific Rim ◽

Data Set ◽

Reflection Seismic ◽

Barkley Sound ◽

The Pacific ◽

Wrangellia Terrane ◽

Lower Crustal

An integrated geophysical data set has been used to develop structural models across the continental margin west of Vancouver Island, Canada. A modern accretionary complex underlies the continental slope and shelf and rests against and below the allochthonous Crescent and Pacific Rim terranes. These terranes in turn abut against the pre-Tertiary Wrangellia terrane that constitutes most of the island. Gravity and magnetic anomaly data, constrained by seismic reflection, seismic refraction, and other data, were interpreted to determine the offshore positions of these terranes and related features. Iterative 2.5-dimensional forward models of anomaly profiles were stepped laterally along the margin to extend areal coverage over a 70 km wide swath oriented normal to the tectonic features. An average model was then developed to represent this part of the margin. The Pacific Rim terrane appears to be continuous and close to the coastline along the length of Vancouver Island, consistent with emplacement by strike-slip motion along the margin. The Westcoast fault, the boundary between the Pacific Rim and Wrangellia terranes, is interpreted to be 15 km farther seaward than in previous interpretations in the region of Barkley Sound. The Crescent terrane forms a thin landward-dipping slab along the southern half of the Vancouver Island margin, and cannot be confirmed along the northern part. Model results suggest the slab has buckled into an anticline beneath southern Vancouver Island and Juan de Fuca Strait, uplifting high-density lower crustal or upper mantle material close to the surface to produce the observed intense positive gravity anomaly. This geometry is consistent with emplacement of the Crescent terrane by oblique subduction.

Download Full-text

SAR by kinetics for drug discovery in protein misfolding diseases

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1807884115 ◽

2018 ◽

Vol 115 (41) ◽

pp. 10245-10250 ◽

Cited By ~ 25

Author(s):

Sean Chia ◽

Johnny Habchi ◽

Thomas C. T. Michaels ◽

Samuel I. A. Cohen ◽

Sara Linse ◽

...

Keyword(s):

Drug Discovery ◽

Small Molecules ◽

Protein Misfolding ◽

Amyloid Beta Peptide ◽

Oligomer Formation ◽

Chemical Derivatives ◽

Structure Activity ◽

Beta Peptide ◽

Misfolding Diseases ◽

Aβ Oligomer

To develop effective therapeutic strategies for protein misfolding diseases, a promising route is to identify compounds that inhibit the formation of protein oligomers. To achieve this goal, we report a structure−activity relationship (SAR) approach based on chemical kinetics to estimate quantitatively how small molecules modify the reactive flux toward oligomers. We use this estimate to derive chemical rules in the case of the amyloid beta peptide (Aβ), which we then exploit to optimize starting compounds to curtail Aβ oligomer formation. We demonstrate this approach by converting an inactive rhodanine compound into an effective inhibitor of Aβ oligomer formation by generating chemical derivatives in a systematic manner. These results provide an initial demonstration of the potential of drug discovery strategies based on targeting directly the production of protein oligomers.

Download Full-text