scholarly journals Activity Cliffs As Protein-Related Phenomenon: Investigation Using Machine Learning Against Numerous Protein Kinases

Author(s):  
Safa Daoud ◽  
Mutasem Taha

Abstract Activity cliffs (ACs) are analogous compounds of significant affinity discrepancies against certain biotarget. We propose that the ACs phenomenon is protein-related and that the propensity of certain target to have ACs can be predicted by some intrinsic protein properties. We pursued this assumption by collecting the crystallographic structures of 84 protein kinases, each of which has numerous reported inhibitors (hundreds). Following data augmentation using synthetic minority oversampling technique (SMOTE), we attempted to correlate the presence/absence of ACs within the ligand pools of collected protein kinases with their corresponding protein properties using genetic algorithm (GA) coupled with variety of machine learners (MLs). Very good GA-ML models were achieved with accuracies of around 75% against external testing set. The models were further validated by Y-scrambling. Shapely additive explanations highlighted the significance of protein rotatable bonds, hydrophobic and acidic residues in relation to the presence of ACs. These results support the hypothesis that ACs are protein-related.

2020 ◽  
Vol 43 ◽  
Author(s):  
Myrthe Faber

Abstract Gilead et al. state that abstraction supports mental travel, and that mental travel critically relies on abstraction. I propose an important addition to this theoretical framework, namely that mental travel might also support abstraction. Specifically, I argue that spontaneous mental travel (mind wandering), much like data augmentation in machine learning, provides variability in mental content and context necessary for abstraction.


2018 ◽  
Author(s):  
Steen Lysgaard ◽  
Paul C. Jennings ◽  
Jens Strabo Hummelshøj ◽  
Thomas Bligaard ◽  
Tejs Vegge

A machine learning model is used as a surrogate fitness evaluator in a genetic algorithm (GA) optimization of the atomic distribution of Pt-Au nanoparticles. The machine learning accelerated genetic algorithm (MLaGA) yields a 50-fold reduction of required energy calculations compared to a traditional GA.


2019 ◽  
Vol 9 (6) ◽  
pp. 1128 ◽  
Author(s):  
Yundong Li ◽  
Wei Hu ◽  
Han Dong ◽  
Xueyan Zhang

Using aerial cameras, satellite remote sensing or unmanned aerial vehicles (UAV) equipped with cameras can facilitate search and rescue tasks after disasters. The traditional manual interpretation of huge aerial images is inefficient and could be replaced by machine learning-based methods combined with image processing techniques. Given the development of machine learning, researchers find that convolutional neural networks can effectively extract features from images. Some target detection methods based on deep learning, such as the single-shot multibox detector (SSD) algorithm, can achieve better results than traditional methods. However, the impressive performance of machine learning-based methods results from the numerous labeled samples. Given the complexity of post-disaster scenarios, obtaining many samples in the aftermath of disasters is difficult. To address this issue, a damaged building assessment method using SSD with pretraining and data augmentation is proposed in the current study and highlights the following aspects. (1) Objects can be detected and classified into undamaged buildings, damaged buildings, and ruins. (2) A convolution auto-encoder (CAE) that consists of VGG16 is constructed and trained using unlabeled post-disaster images. As a transfer learning strategy, the weights of the SSD model are initialized using the weights of the CAE counterpart. (3) Data augmentation strategies, such as image mirroring, rotation, Gaussian blur, and Gaussian noise processing, are utilized to augment the training data set. As a case study, aerial images of Hurricane Sandy in 2012 were maximized to validate the proposed method’s effectiveness. Experiments show that the pretraining strategy can improve of 10% in terms of overall accuracy compared with the SSD trained from scratch. These experiments also demonstrate that using data augmentation strategies can improve mAP and mF1 by 72% and 20%, respectively. Finally, the experiment is further verified by another dataset of Hurricane Irma, and it is concluded that the paper method is feasible.


2021 ◽  
Vol 54 (3) ◽  
pp. 1-18
Author(s):  
Petr Spelda ◽  
Vit Stritecky

As our epistemic ambitions grow, the common and scientific endeavours are becoming increasingly dependent on Machine Learning (ML). The field rests on a single experimental paradigm, which consists of splitting the available data into a training and testing set and using the latter to measure how well the trained ML model generalises to unseen samples. If the model reaches acceptable accuracy, then an a posteriori contract comes into effect between humans and the model, supposedly allowing its deployment to target environments. Yet the latter part of the contract depends on human inductive predictions or generalisations, which infer a uniformity between the trained ML model and the targets. The article asks how we justify the contract between human and machine learning. It is argued that the justification becomes a pressing issue when we use ML to reach “elsewhere” in space and time or deploy ML models in non-benign environments. The article argues that the only viable version of the contract can be based on optimality (instead of on reliability, which cannot be justified without circularity) and aligns this position with Schurz's optimality justification. It is shown that when dealing with inaccessible/unstable ground-truths (“elsewhere” and non-benign targets), the optimality justification undergoes a slight change, which should reflect critically on our epistemic ambitions. Therefore, the study of ML robustness should involve not only heuristics that lead to acceptable accuracies on testing sets. The justification of human inductive predictions or generalisations about the uniformity between ML models and targets should be included as well. Without it, the assumptions about inductive risk minimisation in ML are not addressed in full.


Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3726
Author(s):  
Ivan Vaccari ◽  
Vanessa Orani ◽  
Alessia Paglialonga ◽  
Enrico Cambiaso ◽  
Maurizio Mongelli

The application of machine learning and artificial intelligence techniques in the medical world is growing, with a range of purposes: from the identification and prediction of possible diseases to patient monitoring and clinical decision support systems. Furthermore, the widespread use of remote monitoring medical devices, under the umbrella of the “Internet of Medical Things” (IoMT), has simplified the retrieval of patient information as they allow continuous monitoring and direct access to data by healthcare providers. However, due to possible issues in real-world settings, such as loss of connectivity, irregular use, misuse, or poor adherence to a monitoring program, the data collected might not be sufficient to implement accurate algorithms. For this reason, data augmentation techniques can be used to create synthetic datasets sufficiently large to train machine learning models. In this work, we apply the concept of generative adversarial networks (GANs) to perform a data augmentation from patient data obtained through IoMT sensors for Chronic Obstructive Pulmonary Disease (COPD) monitoring. We also apply an explainable AI algorithm to demonstrate the accuracy of the synthetic data by comparing it to the real data recorded by the sensors. The results obtained demonstrate how synthetic datasets created through a well-structured GAN are comparable with a real dataset, as validated by a novel approach based on machine learning.


Sign in / Sign up

Export Citation Format

Share Document