ABLE: Attention Based Learning for Enzyme Classification
AbstractClassifying proteins into their respective enzyme class is an interesting question for researchers for a variety of reasons. The open source Protein Data Bank (PDB) contains more than 1,60,000 structures, with more being added everyday. This paper proposes an attention-based bidirectional-LSTM model (ABLE) trained on oversampled data generated by SMOTE to analyse and classify a protein into one of the six enzyme classes or a negative class using only the primary structure of the protein described as a string by the FASTA sequence as an input. We achieve the highest F1-score of 0.834 using our proposed model on a dataset of proteins from the PDB. We baseline our model against seventeen other machine learning and deep learning models, including CNN, LSTM, BILSTM and GRU. We perform extensive experimentation and statistical testing to corroborate our results.