scholarly journals Analysis of Biological Screening Compounds with Single- or Multi-Target Activity via Diagnostic Machine Learning

Biomolecules ◽  
2020 ◽  
Vol 10 (12) ◽  
pp. 1605
Author(s):  
Christian Feldmann ◽  
Dimitar Yonchev ◽  
Jürgen Bajorath

Predicting compounds with single- and multi-target activity and exploring origins of compound specificity and promiscuity is of high interest for chemical biology and drug discovery. We present a large-scale analysis of compound promiscuity including two major components. First, high-confidence datasets of compounds with multi- and corresponding single-target activity were extracted from biological screening data. Positive and negative assay results were taken into account and data completeness was ensured. Second, these datasets were investigated using diagnostic machine learning to systematically distinguish between compounds with multi- and single-target activity. Models built on the basis of chemical structure consistently produced meaningful predictions. These findings provided evidence for the presence of structural features differentiating promiscuous and non-promiscuous compounds. Machine learning under varying conditions using modified datasets revealed a strong influence of nearest neighbor relationship on the predictions. Many multi-target compounds were found to be more similar to other multi-target compounds than single-target compounds and vice versa, which resulted in consistently accurate predictions. The results of our study confirm the presence of structural relationships that differentiate promiscuous and non-promiscuous compounds.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Christian Feldmann ◽  
Jürgen Bajorath

AbstractCompounds with defined multi-target activity (promiscuity) play an increasingly important role in drug discovery. However, the molecular basis of multi-target activity is currently only little understood. In particular, it remains unclear whether structural features exist that generally characterize promiscuous compounds and set them apart from compounds with single-target activity. We have devised a test system using machine learning to systematically examine structural features that might characterize compounds with multi-target activity. Using this system, more than 860,000 diagnostic predictions were carried out. The analysis provided compelling evidence for the presence of structural characteristics of promiscuous compounds that were dependent on given target combinations, but not generalizable. Feature weighting and mapping identified characteristic substructures in test compounds. Taken together, these findings are relevant for the design of compounds with desired multi-target activity.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 233
Author(s):  
Jonathan Z.L. Zhao ◽  
Eliseos J. Mucaki ◽  
Peter K. Rogan

Background: Gene signatures derived from transcriptomic data using machine learning methods have shown promise for biodosimetry testing. These signatures may not be sufficiently robust for large scale testing, as their performance has not been adequately validated on external, independent datasets. The present study develops human and murine signatures with biochemically-inspired machine learning that are strictly validated using k-fold and traditional approaches. Methods: Gene Expression Omnibus (GEO) datasets of exposed human and murine lymphocytes were preprocessed via nearest neighbor imputation and expression of genes implicated in the literature to be responsive to radiation exposure (n=998) were then ranked by Minimum Redundancy Maximum Relevance (mRMR). Optimal signatures were derived by backward, complete, and forward sequential feature selection using Support Vector Machines (SVM), and validated using k-fold or traditional validation on independent datasets. Results: The best human signatures we derived exhibit k-fold validation accuracies of up to 98% (DDB2,  PRKDC, TPP2, PTPRE, and GADD45A) when validated over 209 samples and traditional validation accuracies of up to 92% (DDB2,  CD8A,  TALDO1,  PCNA,  EIF4G2,  LCN2,  CDKN1A,  PRKCH,  ENO1,  and PPM1D) when validated over 85 samples. Some human signatures are specific enough to differentiate between chemotherapy and radiotherapy. Certain multi-class murine signatures have sufficient granularity in dose estimation to inform eligibility for cytokine therapy (assuming these signatures could be translated to humans). We compiled a list of the most frequently appearing genes in the top 20 human and mouse signatures. More frequently appearing genes among an ensemble of signatures may indicate greater impact of these genes on the performance of individual signatures. Several genes in the signatures we derived are present in previously proposed signatures. Conclusions: Gene signatures for ionizing radiation exposure derived by machine learning have low error rates in externally validated, independent datasets, and exhibit high specificity and granularity for dose estimation.


Author(s):  
Sangeeta Lal ◽  
Neetu Sardana ◽  
Ashish Sureka

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 233 ◽  
Author(s):  
Jonathan Z.L. Zhao ◽  
Eliseos J. Mucaki ◽  
Peter K. Rogan

Background: Gene signatures derived from transcriptomic data using machine learning methods have shown promise for biodosimetry testing. These signatures may not be sufficiently robust for large scale testing, as their performance has not been adequately validated on external, independent datasets. The present study develops human and murine signatures with biochemically-inspired machine learning that are strictly validated using k-fold and traditional approaches. Methods: Gene Expression Omnibus (GEO) datasets of exposed human and murine lymphocytes were preprocessed via nearest neighbor imputation and expression of genes implicated in the literature to be responsive to radiation exposure (n=998) were then ranked by Minimum Redundancy Maximum Relevance (mRMR). Optimal signatures were derived by backward, complete, and forward sequential feature selection using Support Vector Machines (SVM), and validated using k-fold or traditional validation on independent datasets. Results: The best human signatures we derived exhibit k-fold validation accuracies of up to 98% (DDB2,  PRKDC, TPP2, PTPRE, and GADD45A) when validated over 209 samples and traditional validation accuracies of up to 92% (DDB2,  CD8A,  TALDO1,  PCNA,  EIF4G2,  LCN2,  CDKN1A,  PRKCH,  ENO1,  and PPM1D) when validated over 85 samples. Some human signatures are specific enough to differentiate between chemotherapy and radiotherapy. Certain multi-class murine signatures have sufficient granularity in dose estimation to inform eligibility for cytokine therapy (assuming these signatures could be translated to humans). We compiled a list of the most frequently appearing genes in the top 20 human and mouse signatures. More frequently appearing genes among an ensemble of signatures may indicate greater impact of these genes on the performance of individual signatures. Several genes in the signatures we derived are present in previously proposed signatures. Conclusions: Gene signatures for ionizing radiation exposure derived by machine learning have low error rates in externally validated, independent datasets, and exhibit high specificity and granularity for dose estimation.


Author(s):  
Sangeeta Lal ◽  
Neetu Sardana ◽  
Ashish Sureka

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.


2018 ◽  
Vol 39 (12) ◽  
pp. 1457-1462 ◽  
Author(s):  
Jan A. Roth ◽  
Manuel Battegay ◽  
Fabrice Juchler ◽  
Julia E. Vogt ◽  
Andreas F. Widmer

AbstractTo exploit the full potential of big routine data in healthcare and to efficiently communicate and collaborate with information technology specialists and data analysts, healthcare epidemiologists should have some knowledge of large-scale analysis techniques, particularly about machine learning. This review focuses on the broad area of machine learning and its first applications in the emerging field of digital healthcare epidemiology.


Author(s):  
Elangovan Ramanujam ◽  
L. Rasikannan ◽  
S. Viswa ◽  
B. Deepan Prashanth

Machine learning is not a simple technology but an amazing field having more and more to explore. It has a number of real-time applications such as weather forecast, price prediction, gaming, medicine, fraud detection, etc. Machine learning has an increased usage in today's technological world as data is growing in volumes and machine learning is capable of producing mathematical and statistical models that can analyze complex data and generate accurate results. To analyze the scalable performance of the learning algorithms, this chapter utilizes various medical datasets from the UCI Machine Learning repository ranges from smaller to large datasets. The performance of learning algorithms such as naïve Bayes, decision tree, k-nearest neighbor, and stacking ensemble learning method are compared in different evaluation models using metrics such as accuracy, sensitivity, specificity, precision, and f-measure.


2020 ◽  
Vol 6 (30) ◽  
pp. eaay2922
Author(s):  
Aleksandra Nivina ◽  
Maj Svea Grieb ◽  
Céline Loot ◽  
David Bikard ◽  
Jean Cury ◽  
...  

Recombination systems are widely used as bioengineering tools, but their sites have to be highly similar to a consensus sequence or to each other. To develop a recombination system free of these constraints, we turned toward attC sites from the bacterial integron system: single-stranded DNA hairpins specifically recombined by the integrase. Here, we present an algorithm that generates synthetic attC sites with conserved structural features and minimal sequence-level constraints. We demonstrate that all generated sites are functional, their recombination efficiency can reach 60%, and they can be embedded into protein coding sequences. To improve recombination of less efficient sites, we applied large-scale mutagenesis and library enrichment coupled to next-generation sequencing and machine learning. Our results validated the efficiency of this approach and allowed us to refine synthetic attC design principles. They can be embedded into virtually any sequence and constitute a unique example of a structure-specific DNA recombination system.


2018 ◽  
Vol 145 ◽  
pp. 243-254
Author(s):  
Alassane Samba ◽  
Yann Busnel ◽  
Alberto Blanc ◽  
Philippe Dooze ◽  
Gwendal Simon

Sign in / Sign up

Export Citation Format

Share Document