scholarly journals Critiquing Protein Family Classification Models Using Sufficient Input Subsets

2019 ◽  
Author(s):  
Brandon Carter ◽  
Maxwell L. Bileschi ◽  
Jamie Smith ◽  
Theo Sanderson ◽  
Drew Bryant ◽  
...  

In many application domains, neural networks are highly accurate and have been deployed at large scale. However, users often do not have good tools for understanding how these models arrive at their predictions. This has hindered adoption in fields such as the life and medical sciences, where researchers require that models base their decisions on underlying biological phenomena rather than peculiarities of the dataset introduced. In response, we propose a set of methods for critiquing deep learning models and demonstrate their application for protein family classification, a task for which high-accuracy models have considerable potential impact. Our methods extend the sufficient input subsets technique, which we use to identify subsets of features (SIS) in each protein sequence that are alone sufficient for classification. Our suite of tools analyzes these subsets to shed light on the decision-making criteria employed by models trained on this task. These tools expose that while deep models may perform classification for biologically-relevant reasons, their behavior varies considerably across choice of network architecture and parameter initialization. While the techniques that we develop are specific to the protein sequence classification task, the approach taken generalizes to a broad set of scientific contexts in which model interpretability is essential.

2014 ◽  
Vol 2014 ◽  
pp. 1-12 ◽  
Author(s):  
Jiuwen Cao ◽  
Lianglin Xiong

Precisely classifying a protein sequence from a large biological protein sequences database plays an important role for developing competitive pharmacological products. Comparing the unseen sequence with all the identified protein sequences and returning the category index with the highest similarity scored protein, conventional methods are usually time-consuming. Therefore, it is urgent and necessary to build an efficient protein sequence classification system. In this paper, we study the performance of protein sequence classification using SLFNs. The recent efficient extreme learning machine (ELM) and its invariants are utilized as the training algorithms. The optimal pruned ELM is first employed for protein sequence classification in this paper. To further enhance the performance, the ensemble based SLFNs structure is constructed where multiple SLFNs with the same number of hidden nodes and the same activation function are used as ensembles. For each ensemble, the same training algorithm is adopted. The final category index is derived using the majority voting method. Two approaches, namely, the basic ELM and the OP-ELM, are adopted for the ensemble based SLFNs. The performance is analyzed and compared with several existing methods using datasets obtained from the Protein Information Resource center. The experimental results show the priority of the proposed algorithms.


2009 ◽  
Vol 16 (3) ◽  
pp. 457-474 ◽  
Author(s):  
Renqiang Min ◽  
Anthony Bonner ◽  
Jingjing Li ◽  
Zhaolei Zhang

2013 ◽  
Vol 14 (1) ◽  
pp. 96 ◽  
Author(s):  
Satish M Srinivasan ◽  
Suleyman Vural ◽  
Brian R King ◽  
Chittibabu Guda

Sign in / Sign up

Export Citation Format

Share Document