AMPGAN v2: Machine Learning Guided Discovery of Anti-Microbial Peptides

AbstractAntibiotic resistance is a critical public health problem. Each year ~2.8 million resistant infections lead to more than 35,000 deaths in the U.S. alone. Anti-microbial peptides (AMPs) show promise in treating resistant infections. But, applications of known AMPs have encountered issues in development, production, and shelf-life. To drive the development of AMPbased treatments it is necessary to create design approaches with higher precision and selectivity towards resistant targets.In this paper we present AMPGAN v2, a generative adversarial network (GAN) based approach for rational AMP design. Like AMP-GAN,1 AMPGAN v2 combines data driven priors and controlled generation. These elements allow for the generation of AMP candidates tailored for specific applications. AMPGAN v2 is able to generate AMP candidates that are novel and diverse, making it an efficient AMP design tool.

Download Full-text

ORGANIC (1).pdf

10.26434/chemrxiv.5309668.v1 ◽

2017 ◽

Author(s):

Benjamin Sanchez-Lengeling ◽

Carlos Outeiral ◽

Gabriel L. Guimaraes ◽

Alan Aspuru-Guzik

Keyword(s):

Machine Learning ◽

Learning Community ◽

Chemical Species ◽

Material Design ◽

Organic Photovoltaic ◽

Generative Adversarial Networks ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Adversarial Networks ◽

Photovoltaic Material

Molecular discovery seeks to generate chemical species tailored to very specific needs. In this paper, we present ORGANIC, a framework based on Objective-Reinforced Generative Adversarial Networks (ORGAN), capable of producing a distribution over molecular space that matches with a certain set of desirable metrics. This methodology combines two successful techniques from the machine learning community: a Generative Adversarial Network (GAN), to create non-repetitive sensible molecular species, and Reinforcement Learning (RL), to bias this generative distribution towards certain attributes. We explore several applications, from optimization of random physicochemical properties to candidates for drug discovery and organic photovoltaic material design.

Download Full-text

Machine Learning for Dissimulating Reality

Proceedings ◽

10.3390/proceedings2021077017 ◽

2021 ◽

Vol 77 (1) ◽

pp. 17

Author(s):

Andrea Giussani

Keyword(s):

Machine Learning ◽

Language Processing ◽

Huge Amount ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Technological Advances ◽

Textual Data ◽

Musical Scores ◽

Mathematical Formulas

In the last decade, advances in statistical modeling and computer science have boosted the production of machine-produced contents in different fields: from language to image generation, the quality of the generated outputs is remarkably high, sometimes better than those produced by a human being. Modern technological advances such as OpenAI’s GPT-2 (and recently GPT-3) permit automated systems to dramatically alter reality with synthetic outputs so that humans are not able to distinguish the real copy from its counteracts. An example is given by an article entirely written by GPT-2, but many other examples exist. In the field of computer vision, Nvidia’s Generative Adversarial Network, commonly known as StyleGAN (Karras et al. 2018), has become the de facto reference point for the production of a huge amount of fake human face portraits; additionally, recent algorithms were developed to create both musical scores and mathematical formulas. This presentation aims to stimulate participants on the state-of-the-art results in this field: we will cover both GANs and language modeling with recent applications. The novelty here is that we apply a transformer-based machine learning technique, namely RoBerta (Liu et al. 2019), to the detection of human-produced versus machine-produced text concerning fake news detection. RoBerta is a recent algorithm that is based on the well-known Bidirectional Encoder Representations from Transformers algorithm, known as BERT (Devlin et al. 2018); this is a bi-directional transformer used for natural language processing developed by Google and pre-trained over a huge amount of unlabeled textual data to learn embeddings. We will then use these representations as an input of our classifier to detect real vs. machine-produced text. The application is demonstrated in the presentation.

Download Full-text

GAINESIS: Generative Artificial Intelligence NEtlists SynthesIS

Electronics ◽

10.3390/electronics11020245 ◽

2022 ◽

Vol 11 (2) ◽

pp. 245

Author(s):

Konstantinos G. Liakos ◽

Georgios K. Georgakilas ◽

Fotis C. Plessas ◽

Paris Kitsos

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Power Analysis ◽

Public Libraries ◽

Data Sets ◽

Hardware Trojan ◽

Generative Adversarial Network ◽

Data Set ◽

Encrypted Data ◽

Adversarial Network

A significant problem in the field of hardware security consists of hardware trojan (HT) viruses. The insertion of HTs into a circuit can be applied for each phase of the circuit chain of production. HTs degrade the infected circuit, destroy it or leak encrypted data. Nowadays, efforts are being made to address HTs through machine learning (ML) techniques, mainly for the gate-level netlist (GLN) phase, but there are some restrictions. Specifically, the number and variety of normal and infected circuits that exist through the free public libraries, such as Trust-HUB, are based on the few samples of benchmarks that have been created from circuits large in size. Thus, it is difficult, based on these data, to develop robust ML-based models against HTs. In this paper, we propose a new deep learning (DL) tool named Generative Artificial Intelligence Netlists SynthesIS (GAINESIS). GAINESIS is based on the Wasserstein Conditional Generative Adversarial Network (WCGAN) algorithm and area–power analysis features from the GLN phase and synthesizes new normal and infected circuit samples for this phase. Based on our GAINESIS tool, we synthesized new data sets, different in size, and developed and compared seven ML classifiers. The results demonstrate that our new generated data sets significantly enhance the performance of ML classifiers compared with the initial data set of Trust-HUB.

Download Full-text

NeMoR: a New Method Based on Data-Driven for Neonatal Mortality Rate Forecasting

10.1101/2021.04.22.21255916 ◽

2021 ◽

Author(s):

Carlos Eduardo Beluzo ◽

Luciana Correia Alves ◽

Natália Martins Arruda ◽

Cátia Sepetauskas ◽

Everton Silva ◽

...

Keyword(s):

Public Health ◽

Machine Learning ◽

Neonatal Mortality ◽

Regression Models ◽

Mortality Rates ◽

Data Driven ◽

Health Policies ◽

Neonatal Mortality Rate ◽

Policy Makers ◽

Public Health Policies

ABSTRACTReduction in child mortality is one of the United Nations Sustainable Development Goals for 2030. In Brazil, despite recent reduction in child mortality in the last decades, the neonatal mortality is a persistent problem and it is associated with the quality of prenatal, childbirth care and social-environmental factors. In a proper health system, the effect of some of these factors could be minimized by the appropriate number of newborn intensive care units, number of health care units, number of neonatal incubators and even by the correct level of instruction of mothers, which can lead to a proper care along the prenatal period. With the intent of providing knowledge resources for planning public health policies focused on neonatal mortality reduction, we propose a new data-driven machine leaning method for Neonatal Mortality Rate forecasting called NeMoR, which predicts neonatal mortality rates for 4 months ahead, using NeoDeathForecast, a monthly base time series dataset composed by these factors and by neonatal mortality rates history (2006-2016), having 57,816 samples, for all 438 Brazilian administrative health regions. In order to build the model, Extra-Tree, XGBoost Regressor, Gradient Boosting Regressor and Lasso machine learning regression models were evaluated and a hyperparameters search was also performed as a fine tune step. The method has been validated using São Paulo city data, mainly because of data quality. On the better configuration the method predicted the neonatal mortality rates with a Mean Square Error lower than 0.18. Besides that, the forecast results may be useful as it provides a way for policy makers to anticipate trends on neonatal mortality rates curves, an important resource for planning public health policies.Graphical AbstractHighlightsProposition of a new data-driven approach for neonatal mortality rate forecast, which provides a way for policy-makers to anticipate trends on neonatal mortality rates curves, making a better planning of health policies focused on NMR reduction possible;a method for NMR forecasting with a MSE lower than 0.18;an extensive evaluation of different Machine Learning (ML) regression models, as well as hyperparameters search, which accounts for the last stage in NeMoR;a new time series database for NMR prediction problems;a new features projection space for NMR forecasting problems, which considerably reduces errors in NRM prediction.

Download Full-text

Self-Supervised Machine Learning Approach for Identifying Biochemical Influences on Protein-Ligand Binding Affinity

10.21203/rs.3.rs-1091733/v1 ◽

2021 ◽

Author(s):

Arjun Singh

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

3D Models ◽

Target Protein ◽

Supervised Machine Learning ◽

Significant Feature ◽

Computational Techniques ◽

Lasso Regression ◽

Generative Adversarial Network ◽

Adversarial Network

Abstract Drug discovery is incredibly time-consuming and expensive, averaging over 10 years and $985 million per drug. Calculating the binding affinity between a target protein and a ligand is critical for discovering viable drugs. Although supervised machine learning (ML) models can predict binding affinity accurately, they suffer from lack of interpretability and inaccurate feature selection caused by multicollinear data. This study used self-supervised ML to reveal underlying protein-ligand characteristics that strongly influence binding affinity. Protein-ligand 3D models were collected from the PDBBind database and vectorized into 2422 features per complex. LASSO Regression and hierarchical clustering were utilized to minimize multicollinearity between features. Correlation analyses and Autoencoder-based latent space representations were generated to identify features significantly influencing binding affinity. A Generative Adversarial Network was used to simulate ligands with certain counts of a significant feature, and thereby determine the effect of a feature on improving binding affinity with a given target protein. It was found that the CC and CCCN fragment counts in the ligand notably influence binding affinity. Re-pairing proteins with simulated ligands that had higher CC and CCCN fragment counts could increase binding affinity by 34.99-37.62% and 36.83%-36.94%, respectively. This discovery contributes to a more accurate representation of ligand chemistry that can increase the accuracy, explainability, and generalizability of ML models so that they can more reliably identify novel drug candidates. Directions for future work include integrating knowledge on ligand fragments into supervised ML models, examining the effect of CC and CCCN fragments on fragment-based drug design, and employing computational techniques to elucidate the chemical activity of these fragments.

Download Full-text

Machine Learning the Phenomenology of COVID-19 From Early Infection Dynamics

10.1101/2020.03.17.20037309 ◽

2020 ◽

Cited By ~ 1

Author(s):

Malik Magdon-Ismail

Keyword(s):

Public Health ◽

Machine Learning ◽

Early Infection ◽

Data Driven ◽

Infection Dynamics ◽

Serious Infection ◽

Specific Virus ◽

Different Populations ◽

Learning Analysis ◽

Over Time

AbstractWe present a robust data-driven machine learning analysis of the COVID-19 pandemic from its early infection dynamics, specifically infection counts over time. The goal is to extract actionable public health insights. These insights include the infectious force, the rate of a mild infection becoming serious, estimates for asymtomatic infections and predictions of new infections over time. We focus on USA data starting from the first confirmed infection on January 20 2020. Our methods reveal significant asymptomatic (hidden) infection, a lag of about 10 days, and we quantitatively confirm that the infectious force is strong with about a 0.14% transition from mild to serious infection. Our methods are efficient, robust and general, being agnostic to the specific virus and applicable to different populations or cohorts.

Download Full-text

Class Imbalanced Fault Diagnosis via Combining K-Means Clustering Algorithm with Generative Adversarial Networks

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2021.p0346 ◽

2021 ◽

Vol 25 (3) ◽

pp. 346-355

Author(s):

Huifang Li ◽

◽

Rui Fan ◽

Qisong Shi ◽

Zijian Du

Keyword(s):

Machine Learning ◽

Fault Diagnosis ◽

Diagnostic Accuracy ◽

Clustering Algorithm ◽

Generative Adversarial Networks ◽

Similar Distribution ◽

Generative Adversarial Network ◽

Minority Class ◽

Adversarial Network ◽

Original Dataset

Recent advancements in machine learning and communication technologies have enabled new approaches to automated fault diagnosis and detection in industrial systems. Given wide variation in occurrence frequencies of different classes of faults, the class distribution of real-world industrial fault data is usually imbalanced. However, most prior machine learning-based classification methods do not take this imbalance into consideration, and thus tend to be biased toward recognizing the majority classes and result in poor accuracy for minority ones. To solve such problems, we propose a k-means clustering generative adversarial network (KM-GAN)-based fault diagnosis approach able to reduce imbalance in fault data and improve diagnostic accuracy for minority classes. First, we design a new k-means clustering algorithm and GAN-based oversampling method to generate diverse minority-class samples obeying the similar distribution to the original minority data. The k-means clustering algorithm is adopted to divide minority-class samples into k clusters, while a GAN is applied to learn the data distribution of the resulting clusters and generate a given number of minority-class samples as a supplement to the original dataset. Then, we construct a deep neural network (DNN) and deep belief network (DBN)-based heterogeneous ensemble model as a fault classifier to improve generalization, in which DNN and DBN models are trained separately on the resulting dataset, and then the outputs from both are averaged as the final diagnostic result. A series of comparative experiments are conducted to verify the effectiveness of our proposed method, and the experimental results show that our method can improve diagnostic accuracy for minority-class samples.

Download Full-text

A hybrid quantum-classical conditional generative adversarial network algorithm for human-centered paradigm in cloud

10.21203/rs.3.rs-69957/v1 ◽

2020 ◽

Author(s):

Wenjie Liu ◽

Ying Zhang ◽

Zhiliang Deng ◽

Jiaojiao Zhao ◽

Lian Tong

Keyword(s):

Machine Learning ◽

Quantum Circuit ◽

Machine Learning Algorithms ◽

Generation Process ◽

Generative Adversarial Network ◽

Computing Platform ◽

Adversarial Network ◽

Quantum Machine Learning ◽

Huge Impact ◽

Conditional Information

Abstract As an emerging field that aims to bridge the gap between human activities and computing systems, human-centered computing (HCC) in cloud, edge, fog has had a huge impact on the artificial intelligence algorithms. The quantum generative adversarial network (QGAN) is considered to be one of the quantum machine learning algorithms with great application prospects, which also should be improved to conform to the human-centered paradigm. The generation process of QGAN is relatively random and the generated model does not conform to the human-centered concept, so it is not quite suitable for real scenarios. In order to solve these problems, a hybrid quantum-classical conditional generative adversarial network (QCGAN) algorithm is proposed, which is a knowledge-driven human-computer interaction computing mode in cloud. The purpose of stabilizing the generation process and the interaction between human and computing process is achieved by inputting conditional information in the generator and discriminator. The generator uses the parameterized quantum circuit with an all-to-all connected topology, which facilitates the tuning of network parameters during the training process. The discriminator uses the classical neural network, which effectively avoids the ”input bottleneck” of quantum machine learning. Finally, the BAS training set is selected to conduct experiment on the quantum cloud computing platform. The result shows that the QCGAN algorithm can effectively converge to the Nash equilibrium point after training and perform human-centered classification generation tasks.

Download Full-text

A Critical Review of the Social and Behavioral Contributions to the Overdose Epidemic

Annual Review of Public Health ◽

10.1146/annurev-publhealth-090419-102727 ◽

2020 ◽

Vol 42 (1) ◽

Author(s):

Magdalena Cerdá ◽

Noa Krawczyk ◽

Leah Hamilton ◽

Kara E. Rudolph ◽

Samuel R. Friedman ◽

...

Keyword(s):

Public Health ◽

Public Health Problem ◽

The United States ◽

Social Contexts ◽

Prescription Opioid ◽

Annual Review ◽

Publication Date ◽

Opioid Overdose ◽

The Social ◽

Critical Public Health

More than 750,000 people in the United States died from an overdose between 1999 and 2018; two-thirds of those deaths involved an opioid. In this review, we present trends in opioid overdose rates during this period and discuss how the proliferation of opioid prescribing to treat chronic pain, changes in the heroin and illegally manufactured opioid synthetics markets, and social factors, including deindustrialization and concentrated poverty, contributed to the rise of the overdose epidemic. We also examine how current policies implemented to address the overdose epidemic may have contributed to reducing prescription opioid overdoses but increased overdoses involving illegal opioids. Finally, we identify new directions for research to understand the causes and solutions to this critical public health problem, including research on heterogeneous policy effects across social groups, effective approaches to reduce overdoses of illegal opioids, and the role of social contexts in shaping policy implementation and impact. Expected final online publication date for the Annual Review of Public Health, Volume 42 is April 1, 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text