Latent Space Data Assimilation by using Deep Learning

Abstract We present a deep learning architecture for efficient reduced-order implementation of ensemble data assimilation. Specifically, deep learning is used to improve two important aspects of data assimilation workflows: (i) low-rank representation of complex reservoir property distributions for geologically consistent feature-based model updating, and (ii) efficient prediction of the statistical information that are required for model updating. The proposed method uses deep convolutional autoencoders to nonlinearly map the original complex and high-dimensional parameters onto a low-dimensional parameter latent space that compactly represents the original parameters. In addition, a low-dimensional data latent space is constructed to predict the observable response of each model parameter realization, which can be used to compute the statistical information needed for the data assimilation step. The two mappings are developed as a joint deep learning architecture with two autoencoders that are connected and trained together. The training uses an ensemble of model parameters and their corresponding production response predictions as needed in implementing the standard ensemble-based data assimilation frameworks. Simultaneous training of the two mappings leads to a joint data-parameter manifold that captures the most salient information in the two spaces for a more effective data assimilation, where only relevant data and parameter features are included. Moreover, the parameter-to-data mapping provides a fast forecast model that can be used to increase the ensemble size for a more accurate data assimilation, without a major computational overhead. We implement the developed approach to a series of numerical experiments, including a 3D example based on the Volve field in the North Sea. For data assimilation methods that involve iterative schemes, such as ensemble smoothers with multiple data assimilation or iterative forms of ensemble Kalman filter, the proposed approach offers a computationally competitive alternative. Our results show that a fully low-dimensional implementation of ensemble data assimilation using deep learning architectures offers several advantages compared to standard algorithms, including joint data-parameter reduction that respects the salient features in each space, geologically consistent feature-based updates, increased ensemble sizes to improve the accuracy and computational efficiency of the calculated statistics for the update step.

Download Full-text

Navigating the amino acid sequence space between functional proteins using a deep learning framework

PeerJ Computer Science ◽

10.7717/peerj-cs.684 ◽

2021 ◽

Vol 7 ◽

pp. e684

Author(s):

Tristan Bitard-Feildel

Keyword(s):

Deep Learning ◽

Amino Acid ◽

Amino Acid Sequence ◽

Sequence Space ◽

Protein Sequence ◽

3D Structure ◽

Protein Sequences ◽

Generative Models ◽

Latent Space ◽

Space Data

Motivation Shedding light on the relationships between protein sequences and functions is a challenging task with many implications in protein evolution, diseases understanding, and protein design. The protein sequence space mapping to specific functions is however hard to comprehend due to its complexity. Generative models help to decipher complex systems thanks to their abilities to learn and recreate data specificity. Applied to proteins, they can capture the sequence patterns associated with functions and point out important relationships between sequence positions. By learning these dependencies between sequences and functions, they can ultimately be used to generate new sequences and navigate through uncharted area of molecular evolution. Results This study presents an Adversarial Auto-Encoder (AAE) approached, an unsupervised generative model, to generate new protein sequences. AAEs are tested on three protein families known for their multiple functions the sulfatase, the HUP and the TPP families. Clustering results on the encoded sequences from the latent space computed by AAEs display high level of homogeneity regarding the protein sequence functions. The study also reports and analyzes for the first time two sampling strategies based on latent space interpolation and latent space arithmetic to generate intermediate protein sequences sharing sequential properties of original sequences linked to known functional properties issued from different families and functions. Generated sequences by interpolation between latent space data points demonstrate the ability of the AAE to generalize and produce meaningful biological sequences from an evolutionary uncharted area of the biological sequence space. Finally, 3D structure models computed by comparative modelling using generated sequences and templates of different sub-families point out to the ability of the latent space arithmetic to successfully transfer protein sequence properties linked to function between different sub-families. All in all this study confirms the ability of deep learning frameworks to model biological complexity and bring new tools to explore amino acid sequence and functional spaces.

Download Full-text

De Novo Drug Design Using Artificial Intelligence Applied on SARS-CoV-2 Viral Proteins ASYNT-GAN

BioChem ◽

10.3390/biochem1010004 ◽

2021 ◽

Vol 1 (1) ◽

pp. 36-48

Author(s):

Ivan Jacobs ◽

Manolis Maragoudakis

Keyword(s):

Deep Learning ◽

Small Molecules ◽

Natural Product ◽

De Novo ◽

Viral Proteins ◽

Computer Assisted ◽

De Novo Drug Design ◽

Learning Techniques ◽

Latent Space ◽

Synthetic Molecule

Computer-assisted de novo design of natural product mimetics offers a viable strategy to reduce synthetic efforts and obtain natural-product-inspired bioactive small molecules, but suffers from several limitations. Deep learning techniques can help address these shortcomings. We propose the generation of synthetic molecule structures that optimizes the binding affinity to a target. To achieve this, we leverage important advancements in deep learning. Our approach generalizes to systems beyond the source system and achieves the generation of complete structures that optimize the binding to a target unseen during training. Translating the input sub-systems into the latent space permits the ability to search for similar structures, and the sampling from the latent space for generation.

Download Full-text

Representing Deep Neural Networks Latent Space Geometries with Graphs

Algorithms ◽

10.3390/a14020039 ◽

2021 ◽

Vol 14 (2) ◽

pp. 39

Author(s):

Carlos Lassance ◽

Vincent Gripon ◽

Antonio Ortega

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Objective Function ◽

Learning Process ◽

Deep Neural Networks ◽

State Of The Art ◽

The Core ◽

Learning Tasks ◽

Latent Space

Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch of inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.

Download Full-text

Latent-space inversion (LSI): a deep learning framework for inverse mapping of subsurface flow data

Computational Geosciences ◽

10.1007/s10596-021-10104-8 ◽

2021 ◽

Author(s):

Syamil Mohd Razak ◽

Anyue Jiang ◽

Behnam Jafarpour

Keyword(s):

Deep Learning ◽

Subsurface Flow ◽

Flow Data ◽

Inverse Mapping ◽

Learning Framework ◽

Latent Space ◽

Space Inversion

Download Full-text

Latent-Space Data Augmentation for Visually-Grounded Language Understanding

Advances in Intelligent Systems and Computing - Advances in Artificial Intelligence ◽

10.1007/978-3-030-39878-1_17 ◽

2020 ◽

pp. 179-187

Author(s):

Aly Magassouba ◽

Komei Sugiura ◽

Hisashi Kawai

Keyword(s):

Data Augmentation ◽

Language Understanding ◽

Latent Space ◽

Space Data

Download Full-text

Incorporating structural knowledge into unsupervised deep learning for two-photon imaging data

10.1101/2021.05.18.443587 ◽

2021 ◽

Author(s):

Florian Eichin ◽

Maren Hackenberg ◽

Caroline Broichhagen ◽

Antje Kilias ◽

Jan Schmoranzer ◽

...

Keyword(s):

Deep Learning ◽

Live Imaging ◽

Temporal Changes ◽

Generative Models ◽

Structural Knowledge ◽

Imaging Data ◽

Two Photon ◽

Latent Space ◽

Photon Imaging ◽

Two Photon Imaging

Live imaging techniques, such as two-photon imaging, promise novel insights into cellular activity patterns at a high spatial and temporal resolution. While current deep learning approaches typically focus on specific supervised tasks in the analysis of such data, e.g., learning a segmentation mask as a basis for subsequent signal extraction steps, we investigate how unsupervised generative deep learning can be adapted to obtain interpretable models directly at the level of the video frames. Specifically, we consider variational autoencoders for models that infer a compressed representation of the data in a low-dimensional latent space, allowing for insight into what has been learned. Based on this approach, we illustrate how structural knowledge can be incorporated into the model architecture to improve model fitting and interpretability. Besides standard convolutional neural network components, we propose an architecture for separately encoding the foreground and background of live imaging data. We exemplify the proposed approach with two-photon imaging data from hippocampal CA1 neurons in mice, where we can disentangle the neural activity of interest from the neuropil background signal. Subsequently, we illustrate how to impose smoothness constraints onto the latent space for leveraging knowledge about gradual temporal changes. As a starting point for adaptation to similar live imaging applications, we provide a Jupyter notebook with code for exploration. Taken together, our results illustrate how architecture choices for deep generative models, such as for spatial structure, foreground vs. background, and gradual temporal changes, facilitate a modeling approach that combines the flexibility of deep learning with the benefits of incorporating domain knowledge. Such a strategy is seen to enable interpretable, purely image-based models of activity signals from live imaging, such as for two-photon data.

Download Full-text

A deep learning approach to capture the essence of Candida albicans morphologies

10.1101/2021.06.10.445299 ◽

2021 ◽

Author(s):

Van Bettauer ◽

Anna CBP Costa ◽

Raha Parvizi Omran ◽

Samira Massahi ◽

Eftyhios Kirbizakis ◽

...

Keyword(s):

Deep Learning ◽

Learning Strategy ◽

Developmental Trajectories ◽

Generative Adversarial Networks ◽

Learning Approach ◽

Adversarial Networks ◽

Latent Space ◽

Contrast Microscopy ◽

Opportunistic Human Pathogen ◽

Community Meeting

We present deep learning-based approaches for exploring the complex array of morphologies exhibited by the opportunistic human pathogen C. albicans. Our system entitled Candescence automatically detects C. albicans cells from Differential Image Contrast microscopy, and labels each detected cell with one of nine vegetative, mating-competent or filamentous morphologies. The software is based upon a fully convolutional one-stage object detector and exploits a novel cumulative curriculum-based learning strategy that stratifies our images by difficulty from simple vegetative forms to more complex filamentous architectures. Candescence achieves very good performance on this difficult learning set which has substantial intermixing between the predicted classes. To capture the essence of each C. albicans morphology, we develop models using generative adversarial networks and identify subcomponents of the latent space which control technical variables, developmental trajectories or morphological switches. We envision Candescence as a community meeting point for quantitative explorations of C. albicans morphology.

Download Full-text

Data Assimilation in the Latent Space of a Convolutional Autoencoder

Computational Science – ICCS 2021 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-77977-1_30 ◽

2021 ◽

pp. 373-386

Author(s):

Maddalena Amendola ◽

Rossella Arcucci ◽

Laetitia Mottet ◽

César Quilodrán Casas ◽

Shiwei Fan ◽

...

Keyword(s):

Data Assimilation ◽

Latent Space ◽

Convolutional Autoencoder

Download Full-text

Extracting a biologically latent space of lung cancer epigenetics with variational autoencoders

BMC Bioinformatics ◽

10.1186/s12859-019-3130-9 ◽

2019 ◽

Vol 20 (S18) ◽

Cited By ~ 2

Author(s):

Zhenxing Wang ◽

Yadong Wang

Keyword(s):

Lung Cancer ◽

Dna Methylation ◽

Deep Learning ◽

Malignant Tumors ◽

Methylation Data ◽

Cancer Epigenetics ◽

Latent Space ◽

Unsupervised Deep Learning ◽

Latent Representations ◽

Model Training

Abstract Background Lung cancer is one of the most malignant tumors, causing over 1,000,000 deaths each year worldwide. Deep learning has brought success in many domains in recent years. DNA methylation, an epigenetic factor, is used for model training in many studies. There is an opportunity for deep learning methods to analyze the lung cancer epigenetic data to determine their subtypes for appropriate treatment. Results Here, we employ variational autoencoders (VAEs), an unsupervised deep learning framework, on 450K DNA methylation data of TCGA-LUAD and TCGA-LUSC to learn latent representations of the DNA methylation landscape. We extract a biologically relevant latent space of LUAD and LUSC samples. It is showed that the bivariate classifiers on the further compressed latent features could classify the subtypes accurately. Through clustering of methylation-based latent space features, we demonstrate that the VAEs can capture differential methylation patterns about subtypes of lung cancer. Conclusions VAEs can distinguish the original subtypes from manually mixed methylation data frame with the encoded features of latent space. Further applications about VAEs should focus on fine-grained subtypes identification for precision medicine.

Download Full-text