scholarly journals Representation learning of genomic sequence motifs with convolutional neural networks

2019 ◽  
Vol 15 (12) ◽  
pp. e1007560 ◽  
Author(s):  
Peter K. Koo ◽  
Sean R. Eddy
2018 ◽  
Author(s):  
Peter K. Koo ◽  
Sean R. Eddy

AbstractAlthough convolutional neural networks (CNNs) have been applied to a variety of computational genomics problems, there remains a large gap in our understanding of how they build representations of regulatory genomic sequences. Here we perform systematic experiments on synthetic sequences to reveal how CNN architecture, specifically convolutional filter size and max-pooling, influences the extent that sequence motif representations are learned by first layer filters. We find that CNNs designed to foster hierarchical representation learning of sequence motifs - assembling partial features into whole features in deeper layers - tend to learn distributed representations, i.e. partial motifs. On the other hand, CNNs that are designed to limit the ability to hierarchically build sequence motif representations in deeper layers tend to learn more interpretable localist representations, i.e. whole motifs. We then validate that this representation learning principle established from synthetic sequences generalizes to in vivo sequences.


2020 ◽  
Vol 10 (3) ◽  
pp. 955
Author(s):  
Taejun Kim ◽  
Han-joon Kim

Researchers frequently use visualizations such as scatter plots when trying to understand how random variables are related to each other, because a single image represents numerous pieces of information. Dependency measures have been widely used to automatically detect dependencies, but these measures only take into account a few types of data, such as the strength and direction of the dependency. Based on advances in the applications of deep learning to vision, we believe that convolutional neural networks (CNNs) can come to understand dependencies by analyzing visualizations, as humans do. In this paper, we propose a method that uses CNNs to extract dependency representations from 2D histograms. We carried out three sorts of experiments and found that CNNs can learn from visual representations. In the first experiment, we used a synthetic dataset to show that CNNs can perfectly classify eight types of dependency. Then, we showed that CNNs can predict correlations based on 2D histograms of real datasets and visualize the learned dependency representation space. Finally, we applied our method and demonstrated that it performs better than the AutoLearn feature generation algorithm in terms of average classification accuracy, while generating half as many features.


2016 ◽  
Vol 127 ◽  
pp. 248-257 ◽  
Author(s):  
John Arevalo ◽  
Fabio A. González ◽  
Raúl Ramos-Pollán ◽  
Jose L. Oliveira ◽  
Miguel Angel Guevara Lopez

2020 ◽  
Vol 30 (3) ◽  
pp. 145-160
Author(s):  
Wang Gao ◽  
Yuan Fang ◽  
Fan Zhang ◽  
Zhifeng Yang

Author(s):  
Peter K. Koo ◽  
Matt Ploenzke

ABSTRACTDeep convolutional neural networks (CNNs) trained on regulatory genomic sequences tend to build representations in a distributed manner, making it a challenge to extract learned features that are biologically meaningful, such as sequence motifs. Here we perform a comprehensive analysis on synthetic sequences to investigate the role that CNN activations have on model interpretability. We show that employing an exponential activation to first layer filters consistently leads to interpretable and robust representations of motifs compared to other commonly used activations. Strikingly, we demonstrate that CNNs with better test performance do not necessarily imply more interpretable representations with attribution methods. We find that CNNs with exponential activations significantly improve the efficacy of recovering biologically meaningful representations with attribution methods. We demonstrate these results generalise to real DNA sequences across several in vivo datasets. Together, this work demonstrates how a small modification to existing CNNs, i.e. setting exponential activations in the first layer, can significantly improve the robustness and interpretabilty of learned representations directly in convolutional filters and indirectly with attribution methods.


Sign in / Sign up

Export Citation Format

Share Document