Supervised learning is an accurate method for network-based gene classification

Abstract Background Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine-learning technique across fields, supervised learning has been applied only in a few network-based studies for predicting pathway-, phenotype- or disease-associated genes. It is unknown how supervised learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label propagation, the widely benchmarked canonical approach for this problem. Results In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene’s full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation’s appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows. Availability and implementation The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Supervised-learning is an accurate method for network-based gene classification

10.1101/721423 ◽

2019 ◽

Cited By ~ 3

Author(s):

Renming Liu ◽

Christopher A Mancuso ◽

Anna Yannakopoulos ◽

Kayla A Johnson ◽

Arjun Krishnan

Keyword(s):

Supervised Learning ◽

Network Connectivity ◽

Accurate Method ◽

Label Propagation ◽

Local Network ◽

Grand Challenge ◽

Multiple Networks ◽

Disease Associated Genes ◽

Propagation Technique ◽

Full Network

AbstractBackgroundAssigning every human gene to specific functions, diseases, and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods such as supervised-learning and label-propagation that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine learning technique across fields, supervised-learning has been applied only in a few network-based studies for predicting pathway-, phenotype-, or disease-associated genes. It is unknown how supervised-learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label-propagation, the widely-benchmarked canonical approach for this problem.ResultsIn this study, we present a comprehensive benchmarking of supervised-learning for network-based gene classification, evaluating this approach and a state-of-the-art label-propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised-learning on a gene’s full network connectivity outperforms label-propagation and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label-propagation’s appeal for naturally using network topology. We further show that supervised-learning on the full network is also superior to learning on node-embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity.ConclusionThese results show that supervised-learning is an accurate approach for prioritizing genes associated with diverse functions, diseases, and traits and should be considered a staple of network-based gene classification workflows. The datasets and the code used to reproduce the results and add new gene classification methods have been made freely [email protected]

Download Full-text

Hyperspectral image classification using semi-supervised learning with label propagation

2020 IEEE India Geoscience and Remote Sensing Symposium (InGARSS) ◽

10.1109/ingarss48198.2020.9358921 ◽

2020 ◽

Author(s):

Usha Patel ◽

Hardik Dave ◽

Vibha Patel

Keyword(s):

Image Classification ◽

Supervised Learning ◽

Hyperspectral Image ◽

Label Propagation ◽

Hyperspectral Image Classification

Download Full-text

Relation extraction using label propagation based semi-supervised learning

10.3115/1220175.1220192 ◽

2006 ◽

Cited By ~ 17

Author(s):

Jinxiu Chen ◽

Donghong Ji ◽

Chew Lim Tan ◽

Zhengyu Niu

Keyword(s):

Supervised Learning ◽

Relation Extraction ◽

Label Propagation

Download Full-text

Differences in Intrinsic Properties and Local Network Connectivity of Identified Layer 5 and Layer 6 Adult Mouse Auditory Corticothalamic Neurons Support a Dual Corticothalamic Projection Hypothesis

Cerebral Cortex ◽

10.1093/cercor/bhp050 ◽

2009 ◽

Vol 19 (12) ◽

pp. 2810-2826 ◽

Cited By ~ 40

Author(s):

D. A. Llano ◽

S. M. Sherman

Keyword(s):

Adult Mouse ◽

Network Connectivity ◽

Local Network ◽

Intrinsic Properties ◽

Corticothalamic Projection

Download Full-text

Label propagation in complex video sequences using semi-supervised learning

Procedings of the British Machine Vision Conference 2010 ◽

10.5244/c.24.27 ◽

2010 ◽

Cited By ~ 13

Author(s):

Ignas Budvytis ◽

Vijay Badrinarayanan ◽

Roberto Cipolla

Keyword(s):

Supervised Learning ◽

Label Propagation ◽

Video Sequences

Download Full-text

Label Propagation with Augmented Anchors: A Simple Semi-supervised Learning Baseline for Unsupervised Domain Adaptation

Computer Vision – ECCV 2020 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-58548-8_45 ◽

2020 ◽

pp. 781-797

Author(s):

Yabin Zhang ◽

Bin Deng ◽

Kui Jia ◽

Lei Zhang

Keyword(s):

Supervised Learning ◽

Domain Adaptation ◽

Label Propagation ◽

Unsupervised Domain Adaptation

Download Full-text

DeepSignal: detecting DNA methylation state from Nanopore sequencing reads using deep-learning

Bioinformatics ◽

10.1093/bioinformatics/btz276 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4586-4595 ◽

Cited By ~ 37

Author(s):

Peng Ni ◽

Neng Huang ◽

Zhi Zhang ◽

De-Peng Wang ◽

Fan Liang ◽

...

Keyword(s):

Dna Methylation ◽

Deep Learning ◽

Bisulfite Sequencing ◽

Homo Sapiens ◽

Accurate Method ◽

Supplementary Information ◽

Nanopore Sequencing ◽

Methylation State ◽

State Prediction ◽

Genome Level

Abstract Motivation The Oxford Nanopore sequencing enables to directly detect methylation states of bases in DNA from reads without extra laboratory techniques. Novel computational methods are required to improve the accuracy and robustness of DNA methylation state prediction using Nanopore reads. Results In this study, we develop DeepSignal, a deep learning method to detect DNA methylation states from Nanopore sequencing reads. Testing on Nanopore reads of Homo sapiens (H. sapiens), Escherichia coli (E. coli) and pUC19 shows that DeepSignal can achieve higher performance at both read level and genome level on detecting 6 mA and 5mC methylation states comparing to previous hidden Markov model (HMM) based methods. DeepSignal achieves similar performance cross different DNA methylation bases, different DNA methylation motifs and both singleton and mixed DNA CpG. Moreover, DeepSignal requires much lower coverage than those required by HMM and statistics based methods. DeepSignal can achieve 90% above accuracy for detecting 5mC and 6 mA using only 2× coverage of reads. Furthermore, for DNA CpG methylation state prediction, DeepSignal achieves 90% correlation with bisulfite sequencing using just 20× coverage of reads, which is much better than HMM based methods. Especially, DeepSignal can predict methylation states of 5% more DNA CpGs that previously cannot be predicted by bisulfite sequencing. DeepSignal can be a robust and accurate method for detecting methylation states of DNA bases. Availability and implementation DeepSignal is publicly available at https://github.com/bioinfomaticsCSU/deepsignal. Supplementary information Supplementary data are available at bioinformatics online.

Download Full-text

Accurate and efficient gene function prediction using a multi-bacterial network

Bioinformatics ◽

10.1093/bioinformatics/btaa885 ◽

2020 ◽

Author(s):

Jeffrey N Law ◽

Shiv D Kale ◽

T M Murali

Keyword(s):

Gene Function ◽

Bacterial Species ◽

Heterogeneous Data ◽

Function Prediction ◽

Label Propagation ◽

Supplementary Information ◽

Gene Function Prediction ◽

Functional Annotations ◽

A Genome ◽

Multiple Species

Abstract Motivation Nearly 40% of the genes in sequenced genomes have no experimentally or computationally derived functional annotations. To fill this gap, we seek to develop methods for network-based gene function prediction that can integrate heterogeneous data for multiple species with experimentally based functional annotations and systematically transfer them to newly sequenced organisms on a genome-wide scale. However, the large sizes of such networks pose a challenge for the scalability of current methods. Results We develop a label propagation algorithm called FastSinkSource. By formally bounding its rate of progress, we decrease the running time by a factor of 100 without sacrificing accuracy. We systematically evaluate many approaches to construct multi-species bacterial networks and apply FastSinkSource and other state-of-the-art methods to these networks. We find that the most accurate and efficient approach is to pre-compute annotation scores for species with experimental annotations, and then to transfer them to other organisms. In this manner, FastSinkSource runs in under 3 min for 200 bacterial species. Availability and implementation An implementation of our framework and all data used in this research are available at https://github.com/Murali-group/multi-species-GOA-prediction. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

GLIDE: combining local methods and diffusion state embeddings to predict missing interactions in biological networks

Bioinformatics ◽

10.1093/bioinformatics/btaa459 ◽

2020 ◽

Vol 36 (Supplement_1) ◽

pp. i464-i473

Author(s):

Kapil Devkota ◽

James M Murphy ◽

Lenore J Cowen

Keyword(s):

Network Structure ◽

Biological Networks ◽

Link Prediction ◽

Prediction Method ◽

Global Network ◽

Local Network ◽

Supplementary Information ◽

Network Data ◽

Ppi Network ◽

Diffusion State

Abstract Motivation One of the core problems in the analysis of biological networks is the link prediction problem. In particular, existing interactions networks are noisy and incomplete snapshots of the true network, with many true links missing because those interactions have not yet been experimentally observed. Methods to predict missing links have been more extensively studied for social than for biological networks; it was recently argued that there is some special structure in protein–protein interaction (PPI) network data that might mean that alternate methods may outperform the best methods for social networks. Based on a generalization of the diffusion state distance, we design a new embedding-based link prediction method called global and local integrated diffusion embedding (GLIDE). GLIDE is designed to effectively capture global network structure, combined with alternative network type-specific customized measures that capture local network structure. We test GLIDE on a collection of three recently curated human biological networks derived from the 2016 DREAM disease module identification challenge as well as a classical version of the yeast PPI network in rigorous cross validation experiments. Results We indeed find that different local network structure is dominant in different types of biological networks. We find that the simple local network measures are dominant in the highly connected network core between hub genes, but that GLIDE’s global embedding measure adds value in the rest of the network. For example, we make GLIDE-based link predictions from genes known to be involved in Crohn’s disease, to genes that are not known to have an association, and make some new predictions, finding support in other network data and the literature. Availability and implementation GLIDE can be downloaded at https://bitbucket.org/kap_devkota/glide. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

The effect of statistical normalisation on network propagation scores

Bioinformatics ◽

10.1093/bioinformatics/btaa896 ◽

2020 ◽

Author(s):

Sergio Picart-Armada ◽

Wesley K Thompson ◽

Alfonso Buil ◽

Alexandre Perera-Lluna

Keyword(s):

Protein Function ◽

Diffusion Processes ◽

Protein Function Prediction ◽

Interaction Network ◽

Mean Value ◽

Statistical Properties ◽

Label Propagation ◽

Supplementary Information ◽

Module Discovery ◽

Permutation Analysis

Abstract Motivation Network diffusion and label propagation are fundamental tools in computational biology, with applications like gene-disease association, protein function prediction and module discovery. More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores. This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applications. In this work, we characterised some common null models behind the permutation analysis and the statistical properties of the diffusion scores. We benchmarked seven diffusion scores on three case studies: synthetic signals on a yeast interactome, simulated differential gene expression on a protein-protein interaction network and prospective gene set prediction on another interaction network. For clarity, all the datasets were based on binary labels, but we also present theoretical results for quantitative labels. Results Diffusion scores starting from binary labels were affected by the label codification, and exhibited a problem-dependent topological bias that could be removed by the statistical normalisation. Parametric and non-parametric normalisation addressed both points by being codification-independent and by equalising the bias. We identified and quantified two sources of bias -mean value and variance- that yielded performance differences when normalising the scores. We provided closed formulae for both and showed how the null covariance is related to the spectral properties of the graph. Despite none of the proposed scores systematically outperformed the others, normalisation was preferred when the sought positive labels were not aligned with the bias. We conclude that the decision on bias removal should be problem and data-driven, i.e. based on a quantitative analysis of the bias and its relation to the positive entities. Availability The code is publicly available at https://github.com/b2slab/diffuBench Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text