Proposed Improvements for Automated Chemical Safety Evaluations Using In-Silico Techniques

Deep Learning Prediction of Adverse Drug Reactions in Drug Discovery Using Open TG–GATEs and FAERS Databases

Frontiers in Drug Discovery ◽

10.3389/fddsv.2021.768792 ◽

2021 ◽

Vol 1 ◽

Author(s):

Attayeb Mohsen ◽

Lokesh P. Tripathi ◽

Kenji Mizuguchi

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Adverse Drug Reactions ◽

Predictive Models ◽

Prediction Models ◽

Expression Profiles ◽

Fine Tuning ◽

Machine Learning Techniques ◽

Drug Reactions ◽

Wide Range

Machine learning techniques are being increasingly used in the analysis of clinical and omics data. This increase is primarily due to the advancements in Artificial intelligence (AI) and the build-up of health-related big data. In this paper we have aimed at estimating the likelihood of adverse drug reactions or events (ADRs) in the course of drug discovery using various machine learning methods. We have also described a novel machine learning-based framework for predicting the likelihood of ADRs. Our framework combines two distinct datasets, drug-induced gene expression profiles from Open TG–GATEs (Toxicogenomics Project–Genomics Assisted Toxicity Evaluation Systems) and ADR occurrence information from FAERS (FDA [Food and Drug Administration] Adverse Events Reporting System) database, and can be applied to many different ADRs. It incorporates data filtering and cleaning as well as feature selection and hyperparameters fine tuning. Using this framework with Deep Neural Networks (DNN), we built a total of 14 predictive models with a mean validation accuracy of 89.4%, indicating that our approach successfully and consistently predicted ADRs for a wide range of drugs. As case studies, we have investigated the performances of our prediction models in the context of Duodenal ulcer and Hepatitis fulminant, highlighting mechanistic insights into those ADRs. We have generated predictive models to help to assess the likelihood of ADRs in testing novel pharmaceutical compounds. We believe that our findings offer a promising approach for ADR prediction and will be useful for researchers in drug discovery.

Download Full-text

Handling Imbalanced Data in Intrusion Detection Systems using Generative Adversarial Networks

Research and Development on Information and Communication Technology ◽

10.32913/mic-ict-research.v2020.n1.894 ◽

2020 ◽

Vol 2020 (1) ◽

pp. 1-13

Author(s):

Ly Vu ◽

Quang Uy Nguyen

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Detection System ◽

Imbalanced Data ◽

Original Data ◽

Machine Learning Techniques ◽

Generative Adversarial Networks ◽

Detection Systems ◽

Adversarial Networks ◽

Attack Data

Machine learning-based intrusion detection hasbecome more popular in the research community thanks to itscapability in discovering unknown attacks. To develop a gooddetection model for an intrusion detection system (IDS) usingmachine learning, a great number of attack and normal datasamples are required in the learning process. While normaldata can be relatively easy to collect, attack data is muchrarer and harder to gather. Subsequently, IDS datasets areoften dominated by normal data and machine learning modelstrained on those imbalanced datasets are ineffective in detect-ing attacks. In this paper, we propose a novel solution to thisproblem by using generative adversarial networks to generatesynthesized attack data for IDS. The synthesized attacks aremerged with the original data to form the augmented dataset.Three popular machine learning techniques are trained on theaugmented dataset. The experiments conducted on the threecommon IDS datasets and one our own dataset show thatmachine learning algorithms achieve better performance whentrained on the augmented dataset of the generative adversarialnetworks compared to those trained on the original datasetand other sampling techniques. The visualization techniquewas also used to analyze the properties of the synthesizeddata of the generative adversarial networks and the others.

Download Full-text

Using GANs with adaptive training data to search for new molecules

Journal of Cheminformatics ◽

10.1186/s13321-021-00494-3 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Andrew E. Blanchard ◽

Christopher Stanley ◽

Debsindhu Bhowmik

Keyword(s):

Drug Discovery ◽

Chemical Space ◽

Traditional Approach ◽

Chemical Compounds ◽

Original Data ◽

Training Data ◽

Generative Adversarial Networks ◽

Small Subset ◽

Adversarial Networks ◽

Potential Applications

AbstractThe process of drug discovery involves a search over the space of all possible chemical compounds. Generative Adversarial Networks (GANs) provide a valuable tool towards exploring chemical space and optimizing known compounds for a desired functionality. Standard approaches to training GANs, however, can result in mode collapse, in which the generator primarily produces samples closely related to a small subset of the training data. In contrast, the search for novel compounds necessitates exploration beyond the original data. Here, we present an approach to training GANs that promotes incremental exploration and limits the impacts of mode collapse using concepts from Genetic Algorithms. In our approach, valid samples from the generator are used to replace samples from the training data. We consider both random and guided selection along with recombination during replacement. By tracking the number of novel compounds produced during training, we show that updates to the training data drastically outperform the traditional approach, increasing potential applications for GANs in drug discovery.

Download Full-text

Toward a Comparison of Classical and New Privacy Mechanism

Entropy ◽

10.3390/e23040467 ◽

2021 ◽

Vol 23 (4) ◽

pp. 467

Author(s):

Daniel Heredia-Ductram ◽

Miguel Nunez-del-Prado ◽

Hugo Alatrista-Salas

Keyword(s):

Machine Learning ◽

Differential Privacy ◽

Machine Learning Techniques ◽

Generative Adversarial Networks ◽

Current Effort ◽

Privacy Concerns ◽

Adversarial Networks ◽

Learning Techniques ◽

Statistical Disclosure ◽

Big Data Technologies

In the last decades, the development of interconnectivity, pervasive systems, citizen sensors, and Big Data technologies allowed us to gather many data from different sources worldwide. This phenomenon has raised privacy concerns around the globe, compelling states to enforce data protection laws. In parallel, privacy-enhancing techniques have emerged to meet regulation requirements allowing companies and researchers to exploit individual data in a privacy-aware way. Thus, data curators need to find the most suitable algorithms to meet a required trade-off between utility and privacy. This crucial task could take a lot of time since there is a lack of benchmarks on privacy techniques. To fill this gap, we compare classical approaches of privacy techniques like Statistical Disclosure Control and Differential Privacy techniques to more recent techniques such as Generative Adversarial Networks and Machine Learning Copies using an entire commercial database in the current effort. The obtained results allow us to show the evolution of privacy techniques and depict new uses of the privacy-aware Machine Learning techniques.

Download Full-text

Stegomalware: A Systematic Survey of Malware Hiding and Detection in Images, Machine Learning Models and Research Challenges

10.36227/techrxiv.16755457 ◽

2021 ◽

Author(s):

Raj chaganti ◽

vinayakumar R ◽

Mamoun Alazab ◽

Tuan Pham

Keyword(s):

Machine Learning ◽

Academic Research ◽

Image Steganography ◽

Machine Learning Techniques ◽

Current Status ◽

Generative Adversarial Networks ◽

Malware Analysis ◽

Source Of Infection ◽

Adversarial Networks ◽

File Formats

<div>Malware distribution to the victim network is commonly performed through file attachments in phishing email or downloading illegitimate files from the internet, when the victim interacts with the source of infection. To detect and prevent the malware distribution in the victim machine, the existing end device security applications may leverage sophisticated techniques such as signature-based or anomaly-based, machine learning techniques. The well-known file formats Portable Executable (PE) for Windows and Executable and Linkable Format (ELF) for Linux based operating system are used for malware analysis and the malware detection capabilities of these files has been well advanced for real time detection. But the malware payload hiding in multimedia like cover images using steganography detection has been a challenge for enterprises, as these are rarely seen and usually act as a stager in sophisticated attacks. In this article, to our knowledge, we are the first to try to address the knowledge gap between the current progress in image steganography and steganalysis academic research focusing on data hiding and the review of the stegomalware (malware payload hiding in images) targeting enterprises with cyberattacks current status. We present the stegomalware history, generation tools, file format specification description. Based on our findings, we perform the detail review of the image steganography techniques including the recent Generative Adversarial Networks (GAN) based models and the image steganalysis methods including the Deep Learning opportunities and challenges in stegomalware generation and detection are presented based on our findings.</div>

Download Full-text

Time-Series prediction for the epidemic trends of COVID-19 using Conditional Generative adversarial Networks Regression on country-wise case studies

10.21203/rs.3.rs-1148944/v1 ◽

2021 ◽

Author(s):

Arnabi Bej ◽

Ujjwal Maulik ◽

Anasua Sarkar

Keyword(s):

Machine Learning ◽

Time Series Prediction ◽

Statistical Technique ◽

World Health ◽

Generative Adversarial Networks ◽

Adversarial Networks ◽

Wide Range ◽

The World ◽

Virulent Virus ◽

Health Organization

Abstract Probabilistic Regression is a statistical technique and a crucial problem in the machine learning domain which employs a set of machine learning methods to forecast a continuous target variable based on the value of one or multiple predictor variables. COVID-19 is a virulent virus that has brought the whole world to a standstill. The potential of the virus to cause inter human transmission makes the world a dangerous place. This thesis predicts the upcoming circumstances of the Corona virus to subside its action. We have performed Conditional GAN regression to anticipate the subsequent Covid-19 cases of 5 countries. The GAN variant CGAN is used to design the model and predict the Covid-19 cases for three months ahead with least error for the dataset provided. Each country is examined individually, due to their variation in population size, tradition, medical manage- ment, preventive measures. The analysis is based on confirmed data, as provided by the World Health Organization. This paper investigates how conditional Generative Adversarial Networks (GANs) can be used to accurately exhibit intricate conditional distributions. GANs have got spectacular achievement in producing convoluted highdimensional data, but work done on their use for regression prob- lems is minimal. This paper exhibits how conditional GANs can be employed in probabilistic regression. It is shown that conditional GANs can be used to evaluate a wide range of various distributions and be competitive with existing probabilistic regression models.

Download Full-text

Wasserstein Generative Adversarial Networks Based Data Augmentation for Radar Data Analysis

Applied Sciences ◽

10.3390/app10041449 ◽

2020 ◽

Vol 10 (4) ◽

pp. 1449

Author(s):

Hansoo Lee ◽

Jonggeun Kim ◽

Eun Kyeong Kim ◽

Sungshin Kim

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Weather Radar ◽

Image Synthesis ◽

Radar Data ◽

Generative Adversarial Networks ◽

Necessary Condition ◽

Radar Images ◽

Adversarial Networks ◽

Wide Range

Ground-based weather radar can observe a wide range with a high spatial and temporal resolution. They are beneficial to meteorological research and services by providing valuable information. Recent weather radar data related research has focused on applying machine learning and deep learning to solve complicated problems. It is a well-known fact that an adequate amount of data is a positively necessary condition in machine learning and deep learning. Generative adversarial networks (GANs) have received extensive attention for their remarkable data generation capacity, with a fascinating competitive structure having been proposed since. Consequently, a massive number of variants have been proposed; which model is adequate to solve the given problem is an inevitable concern. In this paper, we propose exploring the problem of radar image synthesis and evaluating different GANs with authentic radar observation results. The experimental results showed that the improved Wasserstein GAN is more capable of generating similar radar images while achieving higher structural similarity results.

Download Full-text

Spatial landslide susceptibility assessment using machine learning techniques assisted by additional data created with generative adversarial networks

Geoscience Frontiers ◽

10.1016/j.gsf.2020.09.002 ◽

2021 ◽

Vol 12 (2) ◽

pp. 625-637

Author(s):

HusamA.H. Al-Najjar ◽

Biswajeet Pradhan

Keyword(s):

Machine Learning ◽

Landslide Susceptibility ◽

Additional Data ◽

Machine Learning Techniques ◽

Generative Adversarial Networks ◽

Susceptibility Assessment ◽

Landslide Susceptibility Assessment ◽

Adversarial Networks ◽

Learning Techniques

Download Full-text

Application of network link prediction in drug discovery

BMC Bioinformatics ◽

10.1186/s12859-021-04082-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Khushnood Abbas ◽

Alireza Abbasi ◽

Shi Dong ◽

Ling Niu ◽

Laihang Yu ◽

...

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Link Prediction ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Biomedical Data ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Wide Range

Abstract Background Technological and research advances have produced large volumes of biomedical data. When represented as a network (graph), these data become useful for modeling entities and interactions in biological and similar complex systems. In the field of network biology and network medicine, there is a particular interest in predicting results from drug–drug, drug–disease, and protein–protein interactions to advance the speed of drug discovery. Existing data and modern computational methods allow to identify potentially beneficial and harmful interactions, and therefore, narrow drug trials ahead of actual clinical trials. Such automated data-driven investigation relies on machine learning techniques. However, traditional machine learning approaches require extensive preprocessing of the data that makes them impractical for large datasets. This study presents wide range of machine learning methods for predicting outcomes from biomedical interactions and evaluates the performance of the traditional methods with more recent network-based approaches. Results We applied a wide range of 32 different network-based machine learning models to five commonly available biomedical datasets, and evaluated their performance based on three important evaluations metrics namely AUROC, AUPR, and F1-score. We achieved this by converting link prediction problem as binary classification problem. In order to achieve this we have considered the existing links as positive example and randomly sampled negative examples from non-existant set. After experimental evaluation we found that Prone, ACT and $$LRW_5$$ L R W 5 are the top 3 best performers on all five datasets. Conclusions This work presents a comparative evaluation of network-based machine learning algorithms for predicting network links, with applications in the prediction of drug-target and drug–drug interactions, and applied well known network-based machine learning methods. Our work is helpful in guiding researchers in the appropriate selection of machine learning methods for pharmaceutical tasks.

Download Full-text

Stegomalware: A Systematic Survey of Malware Hiding and Detection in Images, Machine Learning Models and Research Challenges

10.36227/techrxiv.16755457.v1 ◽

2021 ◽

Author(s):

Raj chaganti ◽

vinayakumar R ◽

Mamoun Alazab ◽

Tuan Pham

Keyword(s):

Machine Learning ◽

Academic Research ◽

Image Steganography ◽

Machine Learning Techniques ◽

Current Status ◽

Generative Adversarial Networks ◽

Malware Analysis ◽

Source Of Infection ◽

Adversarial Networks ◽

File Formats

<div>Malware distribution to the victim network is commonly performed through file attachments in phishing email or downloading illegitimate files from the internet, when the victim interacts with the source of infection. To detect and prevent the malware distribution in the victim machine, the existing end device security applications may leverage sophisticated techniques such as signature-based or anomaly-based, machine learning techniques. The well-known file formats Portable Executable (PE) for Windows and Executable and Linkable Format (ELF) for Linux based operating system are used for malware analysis and the malware detection capabilities of these files has been well advanced for real time detection. But the malware payload hiding in multimedia like cover images using steganography detection has been a challenge for enterprises, as these are rarely seen and usually act as a stager in sophisticated attacks. In this article, to our knowledge, we are the first to try to address the knowledge gap between the current progress in image steganography and steganalysis academic research focusing on data hiding and the review of the stegomalware (malware payload hiding in images) targeting enterprises with cyberattacks current status. We present the stegomalware history, generation tools, file format specification description. Based on our findings, we perform the detail review of the image steganography techniques including the recent Generative Adversarial Networks (GAN) based models and the image steganalysis methods including the Deep Learning opportunities and challenges in stegomalware generation and detection are presented based on our findings.</div>

Download Full-text