Neural Network and Random Forest Models in Protein Function Prediction

AbstractOver the past decade, the demand for automated protein function prediction has increased due to the volume of newly sequenced proteins. In this paper, we address the function prediction task by developing an ensemble system automatically assigning Gene Ontology (GO) terms to the given input protein sequence.We develop an ensemble system which combines the GO predictions made by random forest (RF) and neural network (NN) classifiers. Both RF and NN models rely on features derived from BLAST sequence alignments, taxonomy and protein signature analysis tools. In addition, we report on experiments with a NN model that directly analyzes the amino acid sequence as its sole input, using a convolutional layer. The Swiss-Prot database is used as the training and evaluation data.In the CAFA3 evaluation, which relies on experimental verification of the functional predictions, our submitted ensemble model demonstrates competitive performance ranking among top-10 best-performing systems out of over 100 submitted systems. In this paper, we evaluate and further improve the CAFA3-submitted system. Our machine learning models together with the data pre-processing and feature generation tools are publicly available as an open source software athttps://github.com/TurkuNLP/CAFA3Author summaryUnderstanding the role and function of proteins in biological processes is fundamental for new biological discoveries. Whereas modern sequencing methods have led to a rapid growth of protein databases, the function of these sequences is often unknown and expensive to determine experimentally. This has spurred a lot of interest in predictive modelling of protein functions.We develop a machine learning system for annotating protein sequences with functional definitions selected from a vast set of predefined functions. The approach is based on a combination of neural network and random forest classifiers with features covering structural and taxonomic properties and sequence similarity. The system is thoroughly evaluated on a large set of manually curated functional annotations and shows competitive performance in comparison to other suggested approaches. We also analyze the predictions for different functional annotation and taxonomy categories and measure the importance of different features for the task. This analysis reveals that the system is particularly efficient for bacterial protein sequences.

Download Full-text

ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network

Molecules ◽

10.3390/molecules22101732 ◽

2017 ◽

Vol 22 (10) ◽

pp. 1732 ◽

Cited By ~ 84

Author(s):

Renzhi Cao ◽

Colton Freitas ◽

Leong Chan ◽

Miao Sun ◽

Haiqing Jiang ◽

...

Keyword(s):

Neural Network ◽

Machine Translation ◽

Recurrent Neural Network ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Neural Machine Translation

Download Full-text

Combined application of electronic nose analysis and back-propagation neural network and random forest models for assessing yogurt flavor acceptability

Journal of Food Measurement & Characterization ◽

10.1007/s11694-019-00335-w ◽

2019 ◽

Vol 14 (1) ◽

pp. 573-583 ◽

Cited By ~ 3

Author(s):

Huaixiang Tian ◽

Han Liu ◽

Yujie He ◽

Bin Chen ◽

Lizhong Xiao ◽

...

Keyword(s):

Neural Network ◽

Random Forest ◽

Electronic Nose ◽

Back Propagation ◽

Back Propagation Neural Network ◽

Combined Application ◽

Forest Models ◽

Random Forest Models

Download Full-text

A Deep Neural Network Based Hierarchical Multi-Label Classifier for Protein Function Prediction

2019 International Conference on Computer, Information and Telecommunication Systems (CITS) ◽

10.1109/cits.2019.8862034 ◽

2019 ◽

Author(s):

Xin Yuan ◽

Weite Li ◽

Kui Lin ◽

Jinglu Hu

Keyword(s):

Neural Network ◽

Protein Function ◽

Deep Neural Network ◽

Protein Function Prediction ◽

Function Prediction

Download Full-text

Correction to: Combined application of electronic nose analysis and back‑propagation neural network and random forest models for assessing yogurt flavor acceptability

Journal of Food Measurement & Characterization ◽

10.1007/s11694-020-00467-4 ◽

2020 ◽

Vol 14 (4) ◽

pp. 2359-2359

Author(s):

Huaixiang Tian ◽

Han Liu ◽

Yujie He ◽

Bin Chen ◽

Lizhong Xiao ◽

...

Keyword(s):

Neural Network ◽

Random Forest ◽

Electronic Nose ◽

Back Propagation ◽

Back Propagation Neural Network ◽

Combined Application ◽

Forest Models ◽

Random Forest Models

Download Full-text

An Artificial Intelligence Model to Predict the Mortality of COVID-19 Patients at Hospital Admission Time Using Routine Blood Samples: Development and Validation of an Ensemble Model (Preprint)

10.2196/preprints.25442 ◽

2020 ◽

Author(s):

Hoon Ko ◽

Heewon Chung ◽

Wu Seong Kang ◽

Chul Park ◽

Do Wan Kim ◽

...

Keyword(s):

Neural Network ◽

Artificial Intelligence ◽

Random Forest ◽

Hospital Admission ◽

Deep Neural Network ◽

Care Providers ◽

Blood Samples ◽

Routine Blood ◽

Forest Models ◽

Random Forest Models

BACKGROUND COVID-19, which is accompanied by acute respiratory distress, multiple organ failure, and death, has spread worldwide much faster than previously thought. However, at present, it has limited treatments. OBJECTIVE To overcome this issue, we developed an artificial intelligence (AI) model of COVID-19, named EDRnet (ensemble learning model based on deep neural network and random forest models), to predict in-hospital mortality using a routine blood sample at the time of hospital admission. METHODS We selected 28 blood biomarkers and used the age and gender information of patients as model inputs. To improve the mortality prediction, we adopted an ensemble approach combining deep neural network and random forest models. We trained our model with a database of blood samples from 361 COVID-19 patients in Wuhan, China, and applied it to 106 COVID-19 patients in three Korean medical institutions. RESULTS In the testing data sets, EDRnet provided high sensitivity (100%), specificity (91%), and accuracy (92%). To extend the number of patient data points, we developed a web application (BeatCOVID19) where anyone can access the model to predict mortality and can register his or her own blood laboratory results. CONCLUSIONS Our new AI model, EDRnet, accurately predicts the mortality rate for COVID-19. It is publicly available and aims to help health care providers fight COVID-19 and improve patients’ outcomes.

Download Full-text

Multi-Label Hierarchical Classification using a Competitive Neural Network for protein function prediction

The 2012 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2012.6252736 ◽

2012 ◽

Cited By ~ 11

Author(s):

Helyane Bronoski Borges ◽

Julio Cesar Nievola

Keyword(s):

Neural Network ◽

Protein Function ◽

Protein Function Prediction ◽

Hierarchical Classification ◽

Function Prediction ◽

Competitive Neural Network

Download Full-text

Optimization of EDTA enriched phytoaccumulation of zinc by Ophiopogon japonicus: Comparison of Response Surface, Artificial Neural Network and Random Forest models

Bioresource Technology Reports ◽

10.1016/j.biteb.2019.100265 ◽

2019 ◽

Vol 7 ◽

pp. 100265 ◽

Cited By ~ 10

Author(s):

Janani K. ◽

Sivarajasekar N. ◽

Muthusaravanan S. ◽

Ram K. ◽

Prakashmaran J. ◽

...

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Random Forest ◽

Response Surface ◽

Ophiopogon Japonicus ◽

Forest Models ◽

Random Forest Models ◽

Artificial Neural

Download Full-text

MDPFP-FCNN: Multidomain Protein Function Prediction Using Fuzzy Convolutional Neural Network

International Journal of Intelligent Engineering and Systems ◽

10.22266/ijies2021.1231.57 ◽

2021 ◽

Vol 14 (6) ◽

pp. 642-655

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Multidomain Protein

Download Full-text

An Artificial Intelligence Model to Predict the Mortality of COVID-19 Patients at Hospital Admission Time Using Routine Blood Samples: Development and Validation of an Ensemble Model

Journal of Medical Internet Research ◽

10.2196/25442 ◽

2020 ◽

Vol 22 (12) ◽

pp. e25442

Author(s):

Hoon Ko ◽

Heewon Chung ◽

Wu Seong Kang ◽

Chul Park ◽

Do Wan Kim ◽

...

Keyword(s):

Neural Network ◽

Artificial Intelligence ◽

Random Forest ◽

Hospital Admission ◽

Deep Neural Network ◽

Care Providers ◽

Blood Samples ◽

Routine Blood ◽

Forest Models ◽

Random Forest Models

Background COVID-19, which is accompanied by acute respiratory distress, multiple organ failure, and death, has spread worldwide much faster than previously thought. However, at present, it has limited treatments. Objective To overcome this issue, we developed an artificial intelligence (AI) model of COVID-19, named EDRnet (ensemble learning model based on deep neural network and random forest models), to predict in-hospital mortality using a routine blood sample at the time of hospital admission. Methods We selected 28 blood biomarkers and used the age and gender information of patients as model inputs. To improve the mortality prediction, we adopted an ensemble approach combining deep neural network and random forest models. We trained our model with a database of blood samples from 361 COVID-19 patients in Wuhan, China, and applied it to 106 COVID-19 patients in three Korean medical institutions. Results In the testing data sets, EDRnet provided high sensitivity (100%), specificity (91%), and accuracy (92%). To extend the number of patient data points, we developed a web application (BeatCOVID19) where anyone can access the model to predict mortality and can register his or her own blood laboratory results. Conclusions Our new AI model, EDRnet, accurately predicts the mortality rate for COVID-19. It is publicly available and aims to help health care providers fight COVID-19 and improve patients’ outcomes.

Download Full-text