scholarly journals Interpretable modeling of genotype-phenotype landscapes with state-of-the-art predictive power

2021 ◽  
Author(s):  
Peter D Tonner ◽  
Abe Pressman ◽  
David Ross

Large-scale measurements linking genetic background to biological function have driven a need for models that can incorporate these data for reliable predictions and insight into the underlying biochemical system. Recent modeling efforts, however, prioritize predictive accuracy at the expense of model interpretability. Here, we present LANTERN (https://github.com/usnistgov/lantern), a hierarchical Bayesian model that distills genotype-phenotype landscape(GPL) measurements into a low-dimensional feature-space that represents the fundamental biological mechanisms of the system while also enabling straightforward, explainable predictions. Across a benchmark of large-scale datasets, LANTERN equals or outperforms all alternative approaches, including deep neural networks. LANTERN furthermore extracts useful insights into the landscape including its inherent dimensionality, a latent space of additive mutational effects, and novel metrics of landscape structure. LANTERN facilitates straightforward discovery of fundamental mechanisms in GPLs, while also reliably extrapolating to unexplored regions of genotypic-space.

ruffin_darden ◽  
1998 ◽  
Vol 1 ◽  
pp. 109-122 ◽  
Author(s):  
Larue Tone Hosmer ◽  

Investigations of large scale industrial accidents generally take one of two alternative approaches to identifying the cause or causes of those destructive events. The first is legal analysis, which focuses on the mechanical failure or human error that immediately preceded the accident. The second is socio-technical reasoning, which centers on the complexities of the interlocking technological and organizational systems that brought about the accident. Both are retrospective, and provide little insight into the means of avoiding industrial accidents in the future. This article looks at six levels of managerial responsibility within a firm, and suggests specific changes at all levels that should logically help in the prevention or mitigation of these high impactllow probability events. The most basicneed, however, is for imagination, empathy, and courage at the most senior level of the firm.


Semantic Web ◽  
2020 ◽  
Vol 11 (5) ◽  
pp. 735-750 ◽  
Author(s):  
Carlos Badenes-Olmedo ◽  
José Luis Redondo-García ◽  
Oscar Corcho

Searching for similar documents and exploring major themes covered across groups of documents are common activities when browsing collections of scientific papers. This manual knowledge-intensive task can become less tedious and even lead to unexpected relevant findings if unsupervised algorithms are applied to help researchers. Most text mining algorithms represent documents in a common feature space that abstract them away from the specific sequence of words used in them. Probabilistic Topic Models reduce that feature space by annotating documents with thematic information. Over this low-dimensional latent space some locality-sensitive hashing algorithms have been proposed to perform document similarity search. However, thematic information gets hidden behind hash codes, preventing thematic exploration and limiting the explanatory capability of topics to justify content-based similarities. This paper presents a novel hashing algorithm based on approximate nearest-neighbor techniques that uses hierarchical sets of topics as hash codes. It not only performs efficient similarity searches, but also allows extending those queries with thematic restrictions explaining the similarity score from the most relevant topics. Extensive evaluations on both scientific and industrial text datasets validate the proposed algorithm in terms of accuracy and efficiency.


Author(s):  
Andrew Brock ◽  
Theodore Lim ◽  
J. M. Ritchie ◽  
Nick Weston

Large scale scene generation is a computationally intensive operation, and added complexities arise when dynamic content generation is required. We propose a system capable of generating virtual content from non-expert input. The proposed system uses a 3-dimensional variational autoencoder to interactively generate new virtual objects by interpolating between extant objects in a learned low-dimensional space, as well as by randomly sampling in that space. We present an interface that allows a user to intuitively explore the latent manifold, taking advantage of the network’s ability to perform algebra in the latent space to help infer context and generalize to previously unseen inputs.


2021 ◽  
pp. 1-12
Author(s):  
Haoyue Bai ◽  
Haofeng Zhang ◽  
Qiong Wang

Zero Shot learning (ZSL) aims to use the information of seen classes to recognize unseen classes, which is achieved by transferring knowledge of the seen classes from the semantic embeddings. Since the domains of the seen and unseen classes do not overlap, most ZSL algorithms often suffer from domain shift problem. In this paper, we propose a Dual Discriminative Auto-encoder Network (DDANet), in which visual features and semantic attributes are self-encoded by using the high dimensional latent space instead of the feature space or the low dimensional semantic space. In the embedded latent space, the features are projected to both preserve their original semantic meanings and have discriminative characteristics, which are realized by applying dual semantic auto-encoder and discriminative feature embedding strategy. Moreover, the cross modal reconstruction is applied to obtain interactive information. Extensive experiments are conducted on four popular datasets and the results demonstrate the superiority of this method.


2020 ◽  
Vol 34 (07) ◽  
pp. 11515-11522
Author(s):  
Kaiyi Lin ◽  
Xing Xu ◽  
Lianli Gao ◽  
Zheng Wang ◽  
Heng Tao Shen

Zero-Shot Cross-Modal Retrieval (ZS-CMR) is an emerging research hotspot that aims to retrieve data of new classes across different modality data. It is challenging for not only the heterogeneous distributions across different modalities, but also the inconsistent semantics across seen and unseen classes. A handful of recently proposed methods typically borrow the idea from zero-shot learning, i.e., exploiting word embeddings of class labels (i.e., class-embeddings) as common semantic space, and using generative adversarial network (GAN) to capture the underlying multimodal data structures, as well as strengthen relations between input data and semantic space to generalize across seen and unseen classes. In this paper, we propose a novel method termed Learning Cross-Aligned Latent Embeddings (LCALE) as an alternative to these GAN based methods for ZS-CMR. Unlike using the class-embeddings as the semantic space, our method seeks for a shared low-dimensional latent space of input multimodal features and class-embeddings by modality-specific variational autoencoders. Notably, we align the distributions learned from multimodal input features and from class-embeddings to construct latent embeddings that contain the essential cross-modal correlation associated with unseen classes. Effective cross-reconstruction and cross-alignment criterions are further developed to preserve class-discriminative information in latent space, which benefits the efficiency for retrieval and enable the knowledge transfer to unseen classes. We evaluate our model using four benchmark datasets on image-text retrieval tasks and one large-scale dataset on image-sketch retrieval tasks. The experimental results show that our method establishes the new state-of-the-art performance for both tasks on all datasets.


2017 ◽  
Vol 11 (02) ◽  
pp. 171-192 ◽  
Author(s):  
Kai Li ◽  
Guo-Jun Qi ◽  
Jun Ye ◽  
Tuoerhongjiang Yusuph ◽  
Kien A. Hua

Learning to hash is receiving increasing research attention due to its effectiveness in addressing the large-scale similarity search problem. Most of the existing hashing algorithms are focused on learning hash functions in the form of numeric quantization of some projected feature space. In this work, we propose a novel hash learning method that encodes features’ relative ordering instead of quantizing their numeric values in a set of low-dimensional ranking subspaces. We formulate the ranking-based hash learning problem as the optimization of a continuous probabilistic error function using softmax approximation and present an efficient learning algorithm to solve the problem. As a generalization of Winner-Take-All (WTA) hashing, the proposed algorithm naturally enjoys the numeric stability benefits of rank correlation measures while being optimized to achieve high precision with very compact code. Additionally, the proposed method can also be easily extended to nonlinear kernel spaces to discover ranking structures that can not be revealed in linear subspaces. We demonstrate through extensive experiments that the proposed method can achive competitive performances as compared to a number of state-of-the-art hashing methods.


1998 ◽  
Vol 8 (S1) ◽  
pp. 109-122
Author(s):  
Larue T. Hosmer

Abstract:Investigations of large scale industrial accidents generally take one of two alternative approaches to identifying the cause or causes of those destructive events. The first is legal analysis, which focuses on the mechanical failure or human error that immediately preceded the accident. The second is socio-technical reasoning, which centers on the complexities of the interlocking technological and organizational systems that brought about the accident. Both are retrospective, and provide little insight into the means of avoiding industrial accidents in the future. This article looks at six levels of managerial responsibility within a firm, and suggests specific changes at all levels that should logically help in the prevention or mitigation of these high impact/low probability events. The most basic need, however, is for imagination, empathy, and courage at the most senior level of the firm.


2020 ◽  
Vol 26 (33) ◽  
pp. 4195-4205
Author(s):  
Xiaoyu Ding ◽  
Chen Cui ◽  
Dingyan Wang ◽  
Jihui Zhao ◽  
Mingyue Zheng ◽  
...  

Background: Enhancing a compound’s biological activity is the central task for lead optimization in small molecules drug discovery. However, it is laborious to perform many iterative rounds of compound synthesis and bioactivity tests. To address the issue, it is highly demanding to develop high quality in silico bioactivity prediction approaches, to prioritize such more active compound derivatives and reduce the trial-and-error process. Methods: Two kinds of bioactivity prediction models based on a large-scale structure-activity relationship (SAR) database were constructed. The first one is based on the similarity of substituents and realized by matched molecular pair analysis, including SA, SA_BR, SR, and SR_BR. The second one is based on SAR transferability and realized by matched molecular series analysis, including Single MMS pair, Full MMS series, and Multi single MMS pairs. Moreover, we also defined the application domain of models by using the distance-based threshold. Results: Among seven individual models, Multi single MMS pairs bioactivity prediction model showed the best performance (R2 = 0.828, MAE = 0.406, RMSE = 0.591), and the baseline model (SA) produced the most lower prediction accuracy (R2 = 0.798, MAE = 0.446, RMSE = 0.637). The predictive accuracy could further be improved by consensus modeling (R2 = 0.842, MAE = 0.397 and RMSE = 0.563). Conclusion: An accurate prediction model for bioactivity was built with a consensus method, which was superior to all individual models. Our model should be a valuable tool for lead optimization.


Sign in / Sign up

Export Citation Format

Share Document