scholarly journals MB-GAN: Microbiome Simulation via Generative Adversarial Network

GigaScience ◽  
2021 ◽  
Vol 10 (2) ◽  
Author(s):  
Ruichen Rong ◽  
Shuang Jiang ◽  
Lin Xu ◽  
Guanghua Xiao ◽  
Yang Xie ◽  
...  

Abstract Background Trillions of microbes inhabit the human body and have a profound effect on human health. The recent development of metagenome-wide association studies and other quantitative analysis methods accelerate the discovery of the associations between human microbiome and diseases. To assess the strengths and limitations of these analytical tools, simulating realistic microbiome datasets is critically important. However, simulating the real microbiome data is challenging because it is difficult to model their correlation structure using explicit statistical models. Results To address the challenge of simulating realistic microbiome data, we designed a novel simulation framework termed MB-GAN, by using a generative adversarial network (GAN) and utilizing methodology advancements from the deep learning community. MB-GAN can automatically learn from given microbial abundances and compute simulated abundances that are indistinguishable from them. In practice, MB-GAN showed the following advantages. First, MB-GAN avoids explicit statistical modeling assumptions, and it only requires real datasets as inputs. Second, unlike the traditional GANs, MB-GAN is easily applicable and can converge efficiently. Conclusions By applying MB-GAN to a case-control gut microbiome study of 396 samples, we demonstrated that the simulated data and the original data had similar first-order and second-order properties, including sparsity, diversities, and taxa-taxa correlations. These advantages are suitable for further microbiome methodology development where high-fidelity microbiome data are needed.

2019 ◽  
Author(s):  
Ruichen Rong ◽  
Shuang Jiang ◽  
Lin Xu ◽  
Guanghua Xiao ◽  
Yang Xie ◽  
...  

AbstractSimulation is a critical component of experimental design and evaluation of analysis methods in microbiome association studies. However, statistically modeling the microbiome data is challenging since that the complex structure in the real data is difficult to be fully represented by statistical models. To address this challenge, we designed a novel simulation framework for microbiome data using a generative adversarial network (GAN), called MB-GAN, by utilizing methodology advancements from the deep learning community. MB-GAN can automatically learn from a given dataset and compute simulated datasets that are indistinguishable from it. When MB-GAN was applied to a case-control microbiome study of 396 samples, we demonstrated that the simulated data and the original data had similar first-order and second-order properties, including sparsity, diversities, and taxa-taxa correlations. These advantages are suitable for further microbiome methodology development where high fidelity microbiome data are needed.


2017 ◽  
Author(s):  
Benjamin Sanchez-Lengeling ◽  
Carlos Outeiral ◽  
Gabriel L. Guimaraes ◽  
Alan Aspuru-Guzik

Molecular discovery seeks to generate chemical species tailored to very specific needs. In this paper, we present ORGANIC, a framework based on Objective-Reinforced Generative Adversarial Networks (ORGAN), capable of producing a distribution over molecular space that matches with a certain set of desirable metrics. This methodology combines two successful techniques from the machine learning community: a Generative Adversarial Network (GAN), to create non-repetitive sensible molecular species, and Reinforcement Learning (RL), to bias this generative distribution towards certain attributes. We explore several applications, from optimization of random physicochemical properties to candidates for drug discovery and organic photovoltaic material design.


2021 ◽  
Vol 11 (5) ◽  
pp. 2166
Author(s):  
Van Bui ◽  
Tung Lam Pham ◽  
Huy Nguyen ◽  
Yeong Min Jang

In the last decade, predictive maintenance has attracted a lot of attention in industrial factories because of its wide use of the Internet of Things and artificial intelligence algorithms for data management. However, in the early phases where the abnormal and faulty machines rarely appeared in factories, there were limited sets of machine fault samples. With limited fault samples, it is difficult to perform a training process for fault classification due to the imbalance of input data. Therefore, data augmentation was required to increase the accuracy of the learning model. However, there were limited methods to generate and evaluate the data applied for data analysis. In this paper, we introduce a method of using the generative adversarial network as the fault signal augmentation method to enrich the dataset. The enhanced data set could increase the accuracy of the machine fault detection model in the training process. We also performed fault detection using a variety of preprocessing approaches and classified the models to evaluate the similarities between the generated data and authentic data. The generated fault data has high similarity with the original data and it significantly improves the accuracy of the model. The accuracy of fault machine detection reaches 99.41% with 20% original fault machine data set and 93.1% with 0% original fault machine data set (only use generate data only). Based on this, we concluded that the generated data could be used to mix with original data and improve the model performance.


Biostatistics ◽  
2019 ◽  
Author(s):  
Shuang Jiang ◽  
Guanghua Xiao ◽  
Andrew Y Koh ◽  
Jiwoong Kim ◽  
Qiwei Li ◽  
...  

Summary Microbiome omics approaches can reveal intriguing relationships between the human microbiome and certain disease states. Along with identification of specific bacteria taxa associated with diseases, recent scientific advancements provide mounting evidence that metabolism, genetics, and environmental factors can all modulate these microbial effects. However, the current methods for integrating microbiome data and other covariates are severely lacking. Hence, we present an integrative Bayesian zero-inflated negative binomial regression model that can both distinguish differentially abundant taxa with distinct phenotypes and quantify covariate-taxa effects. Our model demonstrates good performance using simulated data. Furthermore, we successfully integrated microbiome taxonomies and metabolomics in two real microbiome datasets to provide biologically interpretable findings. In all, we proposed a novel integrative Bayesian regression model that features bacterial differential abundance analysis and microbiome-covariate effects quantifications, which makes it suitable for general microbiome studies.


2020 ◽  
Vol 6 (1) ◽  
Author(s):  
Huang Lin ◽  
Shyamal Das Peddada

AbstractIncreasingly, researchers are discovering associations between microbiome and a wide range of human diseases such as obesity, inflammatory bowel diseases, HIV, and so on. The first step towards microbiome wide association studies is the characterization of the composition of human microbiome under different conditions. Determination of differentially abundant microbes between two or more environments, known as differential abundance (DA) analysis, is a challenging and an important problem that has received considerable interest during the past decade. It is well documented in the literature that the observed microbiome data (OTU/SV table) are relative abundances with an excess of zeros. Since relative abundances sum to a constant, these data are necessarily compositional. In this article we review some recent methods for DA analysis and describe their strengths and weaknesses.


2020 ◽  
Author(s):  
Mi Jiaqi ◽  
Hao Xia ◽  
Yang Si ◽  
Gao Wanlin ◽  
Li Minzan ◽  
...  

Abstract Background: The artificial identification of rare plants is always a challenging problem in plant taxonomy. Although the convolutional neural network (CNN) in the deep learning method can better realize the automatic classification of plant samples through training, the model accuracy is difficult to reach the human eye discrimination due to the quantitative limit of training samples. Thus, effective data enhancement is vital to improve the generalization ability and robustness of deep learning models, especially for plant small-scale data classification task. Different from traditional methods, the Generative adversarial network (GAN) mimics original data distribution and produces new samples with similar features which can help classifiers equip with extraordinary generalization ability. It has not been studied that data enhancement for plant samples’ characteristics with GAN since sliced bread. Result: In this study, we present a novel GAN model named as Residual Wasserstein GAN (Res-WGAN) for data enhancement. To further adapt to plant small-scale datasets, residual blocks were introduced into the classic WGAN-GP as the basic network unit. These blocks enrich the presentation skills and sustained parameters unchanged simultaneously. Moreover, we enforce the idea from SRGAN to take content loss into a final function, which guarantes the similarity between generated samples and original samples in high-dimensional features. Benefiting from these improvements, the proposed Res-WGAN expanded original datasets efficiently. We test it on the ResNet and the experimental results show that new datasets combined transfer learning significantly promoted the accuracy of classification, especially at testing data. To illustrate the generalization of the model, more particular small datasets are applied for expansion and classification in this paper. Conclusions: Our works report competitive accuracy results than other data enhancement methods, and user study confirms it’s an ideal alternative strategy for small-scale plant datasets enhancement. Developing robust and effective small-scale plants classification method to replace expert testimony, is highly relevant for agricultural automation development.


Author(s):  
Zhanpeng Wang ◽  
Jiaping Wang ◽  
Michael Kourakos ◽  
Nhung Hoang ◽  
Hyong Hark Lee ◽  
...  

AbstractPopulation genetics relies heavily on simulated data for validation, inference, and intuition. In particular, since real data is always limited, simulated data is crucial for training machine learning methods. Simulation software can accurately model evolutionary processes, but requires many hand-selected input parameters. As a result, simulated data often fails to mirror the properties of real genetic data, which limits the scope of methods that rely on it. In this work, we develop a novel approach to estimating parameters in population genetic models that automatically adapts to data from any population. Our method is based on a generative adversarial network that gradually learns to generate realistic synthetic data. We demonstrate that our method is able to recover input parameters in a simulated isolation-with-migration model. We then apply our method to human data from the 1000 Genomes Project, and show that we can accurately recapitulate the features of real data.


2020 ◽  
Author(s):  
Shuang Jiang ◽  
Guanghua Xiao ◽  
Andrew Young Koh ◽  
Bo Yao ◽  
Qiwei Li ◽  
...  

AbstractThe human microbiome is a collection of microorganisms. They form complex communities and collectively affect host health. Recently, the advances in next-generation sequencing technology enable the high-throughput profiling of the human microbiome. This calls for a statistical model to construct microbial networks from the microbiome sequencing count data. As microbiome count data are high-dimensional and suffer from uneven sampling depth, over-dispersion, and zero-inflation, these characteristics can bias the network estimation and require specialized analytical tools. Here we propose a general framework, HARMONIES, a Hybrid Approach foR MicrobiOme Network Inferences via Exploiting Sparsity, to infer a sparse microbiome network. HARMONIES first utilizes a zero-inflated negative binomial (ZINB) distribution to model the skewness and excess zeros in the microbiome data, as well as incorporates a stochastic process prior for sample-wise normalization. This approach infers a sparse and stable network by imposing non-trivial regularizations based on the Gaussian graphical model. In comprehensive simulation studies, HARMONIES outperformed four other commonly used methods. When using published microbiome data from a colorectal cancer study, it discovered a novel community with disease-enriched bacteria. In summary, HARMONIES is a novel and useful statistical framework for microbiome network inference, and it is available at https://github.com/shuangj00/HARMONIES.


Author(s):  
Benjamin Sanchez-Lengeling ◽  
Carlos Outeiral ◽  
Gabriel L. Guimaraes ◽  
Alan Aspuru-Guzik

Molecular discovery seeks to generate chemical species tailored to very specific needs. In this paper, we present ORGANIC, a framework based on Objective-Reinforced Generative Adversarial Networks (ORGAN), capable of producing a distribution over molecular space that matches with a certain set of desirable metrics. This methodology combines two successful techniques from the machine learning community: a Generative Adversarial Network (GAN), to create non-repetitive sensible molecular species, and Reinforcement Learning (RL), to bias this generative distribution towards certain attributes. We explore several applications, from optimization of random physicochemical properties to candidates for drug discovery and organic photovoltaic material design.


Author(s):  
Benjamin Sanchez-Lengeling ◽  
Carlos Outeiral ◽  
Gabriel L. Guimaraes ◽  
Alan Aspuru-Guzik

Molecular discovery seeks to generate chemical species tailored to very specific needs. In this paper, we present ORGANIC, a framework based on Objective-Reinforced Generative Adversarial Networks (ORGAN), capable of producing a distribution over molecular space that matches with a certain set of desirable metrics. This methodology combines two successful techniques from the machine learning community: a Generative Adversarial Network (GAN), to create non-repetitive sensible molecular species, and Reinforcement Learning (RL), to bias this generative distribution towards certain attributes. We explore several applications, from optimization of random physicochemical properties to candidates for drug discovery and organic photovoltaic material design.


Sign in / Sign up

Export Citation Format

Share Document