scholarly journals Neural Network and Random Forest Models in Protein Function Prediction

Author(s):  
Kai Hakala ◽  
Suwisa Kaewphan ◽  
Jari Bjorne ◽  
Farrokh Mehryary ◽  
Hans Moen ◽  
...  
2019 ◽  
Author(s):  
Kai Hakala ◽  
Suwisa Kaewphan ◽  
Jari Björne ◽  
Farrokh Mehryary ◽  
Hans Moen ◽  
...  

AbstractOver the past decade, the demand for automated protein function prediction has increased due to the volume of newly sequenced proteins. In this paper, we address the function prediction task by developing an ensemble system automatically assigning Gene Ontology (GO) terms to the given input protein sequence.We develop an ensemble system which combines the GO predictions made by random forest (RF) and neural network (NN) classifiers. Both RF and NN models rely on features derived from BLAST sequence alignments, taxonomy and protein signature analysis tools. In addition, we report on experiments with a NN model that directly analyzes the amino acid sequence as its sole input, using a convolutional layer. The Swiss-Prot database is used as the training and evaluation data.In the CAFA3 evaluation, which relies on experimental verification of the functional predictions, our submitted ensemble model demonstrates competitive performance ranking among top-10 best-performing systems out of over 100 submitted systems. In this paper, we evaluate and further improve the CAFA3-submitted system. Our machine learning models together with the data pre-processing and feature generation tools are publicly available as an open source software athttps://github.com/TurkuNLP/CAFA3Author summaryUnderstanding the role and function of proteins in biological processes is fundamental for new biological discoveries. Whereas modern sequencing methods have led to a rapid growth of protein databases, the function of these sequences is often unknown and expensive to determine experimentally. This has spurred a lot of interest in predictive modelling of protein functions.We develop a machine learning system for annotating protein sequences with functional definitions selected from a vast set of predefined functions. The approach is based on a combination of neural network and random forest classifiers with features covering structural and taxonomic properties and sequence similarity. The system is thoroughly evaluated on a large set of manually curated functional annotations and shows competitive performance in comparison to other suggested approaches. We also analyze the predictions for different functional annotation and taxonomy categories and measure the importance of different features for the task. This analysis reveals that the system is particularly efficient for bacterial protein sequences.


Molecules ◽  
2017 ◽  
Vol 22 (10) ◽  
pp. 1732 ◽  
Author(s):  
Renzhi Cao ◽  
Colton Freitas ◽  
Leong Chan ◽  
Miao Sun ◽  
Haiqing Jiang ◽  
...  

2020 ◽  
Author(s):  
Hoon Ko ◽  
Heewon Chung ◽  
Wu Seong Kang ◽  
Chul Park ◽  
Do Wan Kim ◽  
...  

BACKGROUND COVID-19, which is accompanied by acute respiratory distress, multiple organ failure, and death, has spread worldwide much faster than previously thought. However, at present, it has limited treatments. OBJECTIVE To overcome this issue, we developed an artificial intelligence (AI) model of COVID-19, named EDRnet (ensemble learning model based on deep neural network and random forest models), to predict in-hospital mortality using a routine blood sample at the time of hospital admission. METHODS We selected 28 blood biomarkers and used the age and gender information of patients as model inputs. To improve the mortality prediction, we adopted an ensemble approach combining deep neural network and random forest models. We trained our model with a database of blood samples from 361 COVID-19 patients in Wuhan, China, and applied it to 106 COVID-19 patients in three Korean medical institutions. RESULTS In the testing data sets, EDRnet provided high sensitivity (100%), specificity (91%), and accuracy (92%). To extend the number of patient data points, we developed a web application (BeatCOVID19) where anyone can access the model to predict mortality and can register his or her own blood laboratory results. CONCLUSIONS Our new AI model, EDRnet, accurately predicts the mortality rate for COVID-19. It is publicly available and aims to help health care providers fight COVID-19 and improve patients’ outcomes.


10.2196/25442 ◽  
2020 ◽  
Vol 22 (12) ◽  
pp. e25442
Author(s):  
Hoon Ko ◽  
Heewon Chung ◽  
Wu Seong Kang ◽  
Chul Park ◽  
Do Wan Kim ◽  
...  

Background COVID-19, which is accompanied by acute respiratory distress, multiple organ failure, and death, has spread worldwide much faster than previously thought. However, at present, it has limited treatments. Objective To overcome this issue, we developed an artificial intelligence (AI) model of COVID-19, named EDRnet (ensemble learning model based on deep neural network and random forest models), to predict in-hospital mortality using a routine blood sample at the time of hospital admission. Methods We selected 28 blood biomarkers and used the age and gender information of patients as model inputs. To improve the mortality prediction, we adopted an ensemble approach combining deep neural network and random forest models. We trained our model with a database of blood samples from 361 COVID-19 patients in Wuhan, China, and applied it to 106 COVID-19 patients in three Korean medical institutions. Results In the testing data sets, EDRnet provided high sensitivity (100%), specificity (91%), and accuracy (92%). To extend the number of patient data points, we developed a web application (BeatCOVID19) where anyone can access the model to predict mortality and can register his or her own blood laboratory results. Conclusions Our new AI model, EDRnet, accurately predicts the mortality rate for COVID-19. It is publicly available and aims to help health care providers fight COVID-19 and improve patients’ outcomes.


Sign in / Sign up

Export Citation Format

Share Document