scholarly journals Improving protein function prediction with synthetic feature samples created by generative adversarial networks

2019 ◽  
Author(s):  
Cen Wan ◽  
David T. Jones

AbstractProtein function prediction is a challenging but important task in bioinformatics. Many prediction methods have been developed, but are still limited by the bottleneck on training sample quantity. Therefore, it is valuable to develop a data augmentation method that can generate high-quality synthetic samples to further improve the accuracy of prediction methods. In this work, we propose a novel generative adversarial networks-based method, namely FFPred-GAN, to accurately learn the high-dimensional distributions of protein sequence-based biophysical features and also generate high-quality synthetic protein feature samples. The experimental results suggest that the synthetic protein feature samples are successful in improving the prediction accuracy for all three domains of the Gene Ontology through augmentation of the original training protein feature samples.

2014 ◽  
Vol 2014 ◽  
pp. 1-34 ◽  
Author(s):  
Giorgio Valentini

Protein function prediction is a complex multiclass multilabel classification problem, characterized by multiple issues such as the incompleteness of the available annotations, the integration of multiple sources of high dimensional biomolecular data, the unbalance of several functional classes, and the difficulty of univocally determining negative examples. Moreover, the hierarchical relationships between functional classes that characterize both the Gene Ontology and FunCat taxonomies motivate the development of hierarchy-aware prediction methods that showed significantly better performances than hierarchical-unaware “flat” prediction methods. In this paper, we provide a comprehensive review of hierarchical methods for protein function prediction based on ensembles of learning machines. According to this general approach, a separate learning machine is trained to learn a specific functional term and then the resulting predictions are assembled in a “consensus” ensemble decision, taking into account the hierarchical relationships between classes. The main hierarchical ensemble methods proposed in the literature are discussed in the context of existing computational methods for protein function prediction, highlighting their characteristics, advantages, and limitations. Open problems of this exciting research area of computational biology are finally considered, outlining novel perspectives for future research.


2016 ◽  
Vol 17 (1) ◽  
Author(s):  
Yuxiang Jiang ◽  
Tal Ronnen Oron ◽  
Wyatt T. Clark ◽  
Asma R. Bankapur ◽  
Daniel D’Andrea ◽  
...  

2008 ◽  
Vol 9 (1) ◽  
Author(s):  
Aaron P Gabow ◽  
Sonia M Leach ◽  
William A Baumgartner ◽  
Lawrence E Hunter ◽  
Debra S Goldberg

PLoS ONE ◽  
2021 ◽  
Vol 16 (2) ◽  
pp. e0244430
Author(s):  
Seyyede Fatemeh Seyyedsalehi ◽  
Mahdieh Soleymani ◽  
Hamid R. Rabiee ◽  
Mohammad R. K. Mofrad

Understanding the functionality of proteins has emerged as a critical problem in recent years due to significant roles of these macro-molecules in biological mechanisms. However, in-laboratory techniques for protein function prediction are not as efficient as methods developed and processed for protein sequencing. While more than 70 million protein sequences are available today, only the functionality of around one percent of them are known. These facts have encouraged researchers to develop computational methods to infer protein functionalities from their sequences. Gene Ontology is the most well-known database for protein functions which has a hierarchical structure, where deeper terms are more determinative and specific. However, the lack of experimentally approved annotations for these specific terms limits the performance of computational methods applied on them. In this work, we propose a method to improve protein function prediction using their sequences by deeply extracting relationships between Gene Ontology terms. To this end, we construct a conditional generative adversarial network which helps to effectively discover and incorporate term correlations in the annotation process. In addition to the baseline algorithms, we compare our method with two recently proposed deep techniques that attempt to utilize Gene Ontology term correlations. Our results confirm the superiority of the proposed method compared to the previous works. Moreover, we demonstrate how our model can effectively help to assign more specific terms to sequences.


Sensors ◽  
2020 ◽  
Vol 20 (23) ◽  
pp. 6850
Author(s):  
Yuanming Li ◽  
Bonhwa Ku ◽  
Shou Zhang ◽  
Jae-Kwang Ahn ◽  
Hanseok Ko

Realistic synthetic data can be useful for data augmentation when training deep learning models to improve seismological detection and classification performance. In recent years, various deep learning techniques have been successfully applied in modern seismology. Due to the performance of deep learning depends on a sufficient volume of data, the data augmentation technique as a data-space solution is widely utilized. In this paper, we propose a Generative Adversarial Networks (GANs) based model that uses conditional knowledge to generate high-quality seismic waveforms. Unlike the existing method of generating samples directly from noise, the proposed method generates synthetic samples based on the statistical characteristics of real seismic waveforms in embedding space. Moreover, a content loss is added to relate high-level features extracted by a pre-trained model to the objective function to enhance the quality of the synthetic data. The classification accuracy is increased from 96.84% to 97.92% after mixing a certain amount of synthetic seismic waveforms, and results of the quality of seismic characteristics derived from the representative experiment show that the proposed model provides an effective structure for generating high-quality synthetic seismic waveforms. Thus, the proposed model is experimentally validated as a promising approach to realistic high-quality seismic waveform data augmentation.


Sign in / Sign up

Export Citation Format

Share Document