Activity Cliffs As Protein-Related Phenomenon: Investigation Using Machine Learning Against Numerous Protein Kinases

Mapping Intimacies ◽

10.21203/rs.3.rs-1120840/v1 ◽

2022 ◽

Author(s):

Safa Daoud ◽

Mutasem Taha

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Protein Kinases ◽

Data Augmentation ◽

Related Phenomenon ◽

Intrinsic Protein ◽

Protein Properties ◽

Crystallographic Structures ◽

Activity Cliffs ◽

Testing Set

Abstract Activity cliffs (ACs) are analogous compounds of significant affinity discrepancies against certain biotarget. We propose that the ACs phenomenon is protein-related and that the propensity of certain target to have ACs can be predicted by some intrinsic protein properties. We pursued this assumption by collecting the crystallographic structures of 84 protein kinases, each of which has numerous reported inhibitors (hundreds). Following data augmentation using synthetic minority oversampling technique (SMOTE), we attempted to correlate the presence/absence of ACs within the ligand pools of collected protein kinases with their corresponding protein properties using genetic algorithm (GA) coupled with variety of machine learners (MLs). Very good GA-ML models were achieved with accuracies of around 75% against external testing set. The models were further validated by Y-scrambling. Shapely additive explanations highlighted the significance of protein rotatable bonds, hydrophobic and acidic residues in relation to the presence of ACs. These results support the hypothesis that ACs are protein-related.

Download Full-text

Mind wandering as data augmentation: How mental travel supports abstraction

Behavioral and Brain Sciences ◽

10.1017/s0140525x1900311x ◽

2020 ◽

Vol 43 ◽

Author(s):

Myrthe Faber

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Mental Content ◽

Mind Wandering ◽

Theoretical Framework ◽

Important Addition

Abstract Gilead et al. state that abstraction supports mental travel, and that mental travel critically relies on abstraction. I propose an important addition to this theoretical framework, namely that mental travel might also support abstraction. Specifically, I argue that spontaneous mental travel (mind wandering), much like data augmentation in machine learning, provides variability in mental content and context necessary for abstraction.

Download Full-text

Machine Learning Accelerated Genetic Algorithms for Computational Materials Search

10.26434/chemrxiv.7411172 ◽

2018 ◽

Author(s):

Steen Lysgaard ◽

Paul C. Jennings ◽

Jens Strabo Hummelshøj ◽

Thomas Bligaard ◽

Tejs Vegge

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Genetic Algorithms ◽

Au Nanoparticles ◽

Learning Model ◽

Energy Calculations ◽

Atomic Distribution ◽

Machine Learning Model ◽

Fold Reduction ◽

Computational Materials

A machine learning model is used as a surrogate fitness evaluator in a genetic algorithm (GA) optimization of the atomic distribution of Pt-Au nanoparticles. The machine learning accelerated genetic algorithm (MLaGA) yields a 50-fold reduction of required energy calculations compared to a traditional GA.

Download Full-text

Enhancement of Image Classification through Data Augmentation using Machine Learning

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i9.220224 ◽

2018 ◽

Vol 6 (9) ◽

pp. 220-224

Author(s):

Th. S. Kumar

Keyword(s):

Machine Learning ◽

Image Classification ◽

Data Augmentation

Download Full-text

Building Damage Detection from Post-Event Aerial Imagery Using Single Shot Multibox Detector

Applied Sciences ◽

10.3390/app9061128 ◽

2019 ◽

Vol 9 (6) ◽

pp. 1128 ◽

Cited By ~ 12

Author(s):

Yundong Li ◽

Wei Hu ◽

Han Dong ◽

Xueyan Zhang

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Hurricane Sandy ◽

Training Data ◽

Aerial Images ◽

Detection Methods ◽

Single Shot ◽

Data Set ◽

Augmentation Strategies ◽

Post Disaster

Using aerial cameras, satellite remote sensing or unmanned aerial vehicles (UAV) equipped with cameras can facilitate search and rescue tasks after disasters. The traditional manual interpretation of huge aerial images is inefficient and could be replaced by machine learning-based methods combined with image processing techniques. Given the development of machine learning, researchers find that convolutional neural networks can effectively extract features from images. Some target detection methods based on deep learning, such as the single-shot multibox detector (SSD) algorithm, can achieve better results than traditional methods. However, the impressive performance of machine learning-based methods results from the numerous labeled samples. Given the complexity of post-disaster scenarios, obtaining many samples in the aftermath of disasters is difficult. To address this issue, a damaged building assessment method using SSD with pretraining and data augmentation is proposed in the current study and highlights the following aspects. (1) Objects can be detected and classified into undamaged buildings, damaged buildings, and ruins. (2) A convolution auto-encoder (CAE) that consists of VGG16 is constructed and trained using unlabeled post-disaster images. As a transfer learning strategy, the weights of the SSD model are initialized using the weights of the CAE counterpart. (3) Data augmentation strategies, such as image mirroring, rotation, Gaussian blur, and Gaussian noise processing, are utilized to augment the training data set. As a case study, aerial images of Hurricane Sandy in 2012 were maximized to validate the proposed method’s effectiveness. Experiments show that the pretraining strategy can improve of 10% in terms of overall accuracy compared with the SSD trained from scratch. These experiments also demonstrate that using data augmentation strategies can improve mAP and mF1 by 72% and 20%, respectively. Finally, the experiment is further verified by another dataset of Hurricane Irma, and it is concluded that the paper method is feasible.

Download Full-text

Data Augmentation for Machine Learning-Based Hardware Trojan Detection at Gate-Level Netlists

2021 IEEE 27th International Symposium on On-Line Testing and Robust System Design (IOLTS) ◽

10.1109/iolts52814.2021.9486713 ◽

2021 ◽

Author(s):

Kento Hasegawa ◽

Seira Hidano ◽

Kohei Nozawa ◽

Shinsaku Kiyomoto ◽

Nozomu Togawa

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Hardware Trojan ◽

Hardware Trojan Detection ◽

Trojan Detection

Download Full-text

Human Induction in Machine Learning

ACM Computing Surveys ◽

10.1145/3444691 ◽

2021 ◽

Vol 54 (3) ◽

pp. 1-18

Author(s):

Petr Spelda ◽

Vit Stritecky

Keyword(s):

Machine Learning ◽

A Posteriori ◽

Experimental Paradigm ◽

Risk Minimisation ◽

Inductive Risk ◽

Acceptable Accuracy ◽

Space And Time ◽

The Common ◽

Testing Set

As our epistemic ambitions grow, the common and scientific endeavours are becoming increasingly dependent on Machine Learning (ML). The field rests on a single experimental paradigm, which consists of splitting the available data into a training and testing set and using the latter to measure how well the trained ML model generalises to unseen samples. If the model reaches acceptable accuracy, then an a posteriori contract comes into effect between humans and the model, supposedly allowing its deployment to target environments. Yet the latter part of the contract depends on human inductive predictions or generalisations, which infer a uniformity between the trained ML model and the targets. The article asks how we justify the contract between human and machine learning. It is argued that the justification becomes a pressing issue when we use ML to reach “elsewhere” in space and time or deploy ML models in non-benign environments. The article argues that the only viable version of the contract can be based on optimality (instead of on reliability, which cannot be justified without circularity) and aligns this position with Schurz's optimality justification. It is shown that when dealing with inaccessible/unstable ground-truths (“elsewhere” and non-benign targets), the optimality justification undergoes a slight change, which should reflect critically on our epistemic ambitions. Therefore, the study of ML robustness should involve not only heuristics that lead to acceptable accuracies on testing sets. The justification of human inductive predictions or generalisations about the uniformity between ML models and targets should be included as well. Without it, the assumptions about inductive risk minimisation in ML are not addressed in full.

Download Full-text

DataLoc+: A Data Augmentation Technique for Machine Learning in Room-Level Indoor Localization

2021 IEEE Wireless Communications and Networking Conference (WCNC) ◽

10.1109/wcnc49053.2021.9417246 ◽

2021 ◽

Author(s):

Amr Hilal ◽

Ismail Arai ◽

Samy El-Tawab

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Indoor Localization

Download Full-text

A Generative Adversarial Network (GAN) Technique for Internet of Medical Things Data

Sensors ◽

10.3390/s21113726 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3726

Author(s):

Ivan Vaccari ◽

Vanessa Orani ◽

Alessia Paglialonga ◽

Enrico Cambiaso ◽

Maurizio Mongelli

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Monitoring Program ◽

Clinical Decision Support Systems ◽

Direct Access ◽

Generative Adversarial Networks ◽

Chronic Obstructive ◽

Generative Adversarial Network ◽

Internet Of Medical Things ◽

Synthetic Datasets

The application of machine learning and artificial intelligence techniques in the medical world is growing, with a range of purposes: from the identification and prediction of possible diseases to patient monitoring and clinical decision support systems. Furthermore, the widespread use of remote monitoring medical devices, under the umbrella of the “Internet of Medical Things” (IoMT), has simplified the retrieval of patient information as they allow continuous monitoring and direct access to data by healthcare providers. However, due to possible issues in real-world settings, such as loss of connectivity, irregular use, misuse, or poor adherence to a monitoring program, the data collected might not be sufficient to implement accurate algorithms. For this reason, data augmentation techniques can be used to create synthetic datasets sufficiently large to train machine learning models. In this work, we apply the concept of generative adversarial networks (GANs) to perform a data augmentation from patient data obtained through IoMT sensors for Chronic Obstructive Pulmonary Disease (COPD) monitoring. We also apply an explainable AI algorithm to demonstrate the accuracy of the synthetic data by comparing it to the real data recorded by the sensors. The results obtained demonstrate how synthetic datasets created through a well-structured GAN are comparable with a real dataset, as validated by a novel approach based on machine learning.

Download Full-text