Fast Target Set Reduction for Large-Scale Protein Function Prediction: A Multi-class Multi-label Machine Learning Approach

ABSTRACTIn vivo transposon mutagenesis, coupled with deep sequencing, enables large-scale genome-wide mutant screens for genes essential in different growth conditions. We analyzed six large-scale studies performed on haploid strains of three yeast species (Saccharomyces cerevisiae, Schizosaccaromyces pombe, and Candida albicans), each mutagenized with two of three different heterologous transposons (AcDs, Hermes, and PiggyBac). Using a machine-learning approach, we evaluated the ability of the data to predict gene essentiality. Important data features included sufficient numbers and distribution of independent insertion events. All transposons showed some bias in insertion site preference because of jackpot events, and preferences for specific insertion sequences and short-distance vs long-distance insertions. For PiggyBac, a stringent target sequence limited the ability to predict essentiality in genes with few or no target sequences. The machine learning approach also robustly predicted gene function in less well-studied species by leveraging cross-species orthologs. Finally, comparisons of isogenic diploid versus haploid S. cerevisiae isolates identified several genes that are haplo-insufficient, while most essential genes, as expected, were recessive. We provide recommendations for the choice of transposons and the inference of gene essentiality in genome-wide studies of eukaryotic haploid microbes such as yeasts, including species that have been less amenable to classical genetic studies.

Download Full-text

DeepGOZero: Improving protein function prediction from sequence and zero-shot learning based on ontology axioms

10.1101/2022.01.14.476325 ◽

2022 ◽

Author(s):

Maxat Kulmanov ◽

Robert Hoehndorf

Keyword(s):

Machine Learning ◽

Protein Function ◽

Protein Function Prediction ◽

Prediction Method ◽

Function Prediction ◽

Training Data ◽

Large Set ◽

Theoretic Approach ◽

Machine Learning Model ◽

Protein Functions

Motivation: Protein functions are often described using the Gene Ontology (GO) which is an ontology consisting of over 50,000 classes and a large set of formal axioms. Predicting the functions of proteins is one of the key challenges in computational biology and a variety of machine learning methods have been developed for this purpose. However, these methods usually require significant amount of training data and cannot make predictions for GO classes which have only few or no experimental annotations. Results: We developed DeepGOZero, a machine learning model which improves predictions for functions with no or only a small number of annotations. To achieve this goal, we rely on a model-theoretic approach for learning ontology embeddings and combine it with neural networks for protein function prediction. DeepGOZero can exploit formal axioms in the GO to make zero-shot predictions, i.e., predict protein functions even if not a single protein in the training phase was associated with that function. Furthermore, the zero-shot prediction method employed by DeepGOZero is generic and can be applied whenever associations with ontology classes need to be predicted. Availability: http://github.com/bio-ontology-research-group/deepgozero

Download Full-text

deepNF: Deep network fusion for protein function prediction

10.1101/223339 ◽

2017 ◽

Cited By ~ 2

Author(s):

Vladimir Gligorijević ◽

Meet Barot ◽

Richard Bonneau

Keyword(s):

Protein Function ◽

Large Scale ◽

Protein Function Prediction ◽

Predictive Performance ◽

Substantial Improvement ◽

Function Prediction ◽

Interaction Networks ◽

Highly Nonlinear ◽

High Level ◽

String Networks

AbstractThe prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provide a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that cannot capture complex and highly-nonlinear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting GO terms of varying type and specificity.AvailabilitydeepNF is freely available at: https://github.com/VGligorijevic/deepNF

Download Full-text

A Meta-learning Approach for Protein Function Prediction

Advanced Computational Approaches to Biomedical Engineering ◽

10.1007/978-3-642-41539-5_5 ◽

2013 ◽

pp. 113-128

Author(s):

Dariusz Plewczynski ◽

Subhadip Basu

Keyword(s):

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Learning Approach ◽

Meta Learning

Download Full-text

Toward Robust Anxiety Biomarkers: A Machine Learning Approach in a Large-Scale Sample

Biological Psychiatry Cognitive Neuroscience and Neuroimaging ◽

10.1016/j.bpsc.2019.05.018 ◽

2020 ◽

Vol 5 (8) ◽

pp. 799-807 ◽

Cited By ~ 7

Author(s):

Emily A. Boeke ◽

Avram J. Holmes ◽

Elizabeth A. Phelps

Keyword(s):

Machine Learning ◽

Large Scale ◽

Learning Approach ◽

Machine Learning Approach

Download Full-text

A Deep Learning Approach Based on Stacked Denoising Autoencoders for Protein Function Prediction

2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC) ◽

10.1109/compsac.2018.00074 ◽

2018 ◽

Cited By ~ 1

Author(s):

Lester James Miranda ◽

Jinglu Hu

Keyword(s):

Deep Learning ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Learning Approach

Download Full-text

A large-scale evaluation of computational protein function prediction

Nature Methods ◽

10.1038/nmeth.2340 ◽

2013 ◽

Vol 10 (3) ◽

pp. 221-227 ◽

Cited By ~ 521

Author(s):

Predrag Radivojac ◽

Wyatt T Clark ◽

Tal Ronnen Oron ◽

Alexandra M Schnoes ◽

Tobias Wittkop ◽

...

Keyword(s):

Protein Function ◽

Large Scale ◽

Protein Function Prediction ◽

Function Prediction ◽

Scale Evaluation

Download Full-text

A look back at the quality of Protein Function Prediction tools in CAFA

10.7287/peerj.preprints.27161 ◽

2018 ◽

Author(s):

Morteza Pourreza Shahri ◽

Madhusudan Srinivasan ◽

Diane Bimczok ◽

Upulee Kanewala ◽

Indika Kahanda

Keyword(s):

Protein Function ◽

Large Scale ◽

Computational Models ◽

Protein Function Prediction ◽

Function Prediction ◽

Test Case ◽

Test Cases ◽

Metamorphic Testing ◽

Main Challenge ◽

Scale Experiment

The Critical Assessment of protein Function Annotation algorithms (CAFA) is a large-scale experiment for assessing the computational models for automated function prediction (AFP). The models presented in CAFA have shown excellent promise in terms of prediction accuracy, but quality assurance has been paid relatively less attention. The main challenge associated with conducting systematic testing on AFP software is the lack of a test oracle, which determines passing or failing of a test case; unfortunately, the exact expected outcomes are not well defined for the AFP task. Thus, AFP tools face the oracle problem. Metamorphic testing (MT) is a technique used to test programs that face the oracle problem using metamorphic relations (MRs). A MR determines whether a test has passed or failed by specifying how the output should change according to a specific change made to the input. In this work, we use MT to test nine CAFA2 AFP tools by defining a set of MRs that apply input transformations at the protein-level. According to our initial testing, we observe that several tools fail all the test cases and two tools pass all the test cases on different GO ontologies.

Download Full-text

Hands-on on Protein Function Prediction with Machine Learning and Interactive Analytics

10.6019/tol.unip_machine-w.2018.00001.1 ◽

2018 ◽

Keyword(s):

Machine Learning ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Hands On

Download Full-text

Fast Target Set Reduction for Large-Scale Protein Function Prediction: A Multi-class Multi-label Machine Learning Approach

Human Protein Function Prediction Enhancement Using Decision Tree Based Machine Learning Approach

Comparing the utility of in vivo transposon mutagenesis approaches in yeast species to infer gene essentiality

DeepGOZero: Improving protein function prediction from sequence and zero-shot learning based on ontology axioms

deepNF: Deep network fusion for protein function prediction

A Meta-learning Approach for Protein Function Prediction

Toward Robust Anxiety Biomarkers: A Machine Learning Approach in a Large-Scale Sample

A Deep Learning Approach Based on Stacked Denoising Autoencoders for Protein Function Prediction

A large-scale evaluation of computational protein function prediction

A look back at the quality of Protein Function Prediction tools in CAFA

Hands-on on Protein Function Prediction with Machine Learning and Interactive Analytics

Export Citation Format