scholarly journals Graph regularized, semi-supervised learning improves annotation of de novo transcriptomes

2016 ◽  
Author(s):  
Laraib I. Malik ◽  
Shravya Thatipally ◽  
Nikhil Junneti ◽  
Rob Patro

AbstractWe present a new method, GRASS, for improving an initial annotation of de novo transcriptomes. GRASS makes the shared-sequence relationships between assembled contigs explicit in the form of a graph, and applies an algorithm that performs label propagation to transfer annotations between related contigs and modifies the graph topology iteratively. We demonstrate that GRASS increases the completeness and accuracy of the initial annotation, allows for improved differential analysis, and is very efficient, typically taking 10s of minutes.

2021 ◽  
Vol 22 (S10) ◽  
Author(s):  
Zhenmiao Zhang ◽  
Lu Zhang

Abstract Background Due to the complexity of microbial communities, de novo assembly on next generation sequencing data is commonly unable to produce complete microbial genomes. Metagenome assembly binning becomes an essential step that could group the fragmented contigs into clusters to represent microbial genomes based on contigs’ nucleotide compositions and read depths. These features work well on the long contigs, but are not stable for the short ones. Contigs can be linked by sequence overlap (assembly graph) or by the paired-end reads aligned to them (PE graph), where the linked contigs have high chance to be derived from the same clusters. Results We developed METAMVGL, a multi-view graph-based metagenomic contig binning algorithm by integrating both assembly and PE graphs. It could strikingly rescue the short contigs and correct the binning errors from dead ends. METAMVGL learns the two graphs’ weights automatically and predicts the contig labels in a uniform multi-view label propagation framework. In experiments, we observed METAMVGL made use of significantly more high-confidence edges from the combined graph and linked dead ends to the main graph. It also outperformed many state-of-the-art contig binning algorithms, including MaxBin2, MetaBAT2, MyCC, CONCOCT, SolidBin and GraphBin on the metagenomic sequencing data from simulation, two mock communities and Sharon infant fecal samples. Conclusions Our findings demonstrate METAMVGL outstandingly improves the short contig binning and outperforms the other existing contig binning tools on the metagenomic sequencing data from simulation, mock communities and infant fecal samples.


2019 ◽  
Vol 7 (1) ◽  
pp. 104-118 ◽  
Author(s):  
Weiwei Du ◽  
Dandan Yuan ◽  
Jianming Wang ◽  
Xiaojie Duan ◽  
Yanhe Ma ◽  
...  

A radiologist must read hundreds of slices to recognize a malignant or benign lung tumor in computed tomography (CT) volume data. To reduce the burden of the radiologist, some proposals have been applied with the ground-glass opacity (GGO) nodules. However, the GGO nodules need be detected and labeled by a radiologist manually. Some slices with the GGO nodule can be missed because there are many slices in several volume data. Although some papers have proposed a semi-supervised learning method to find the slices with GGO nodules, the was no discussion on the impact of parameters in the proposed semi-supervised learning. This article also explains and analyzes the label propagation algorithm which is one of the semi-supervised learning methods to detect the slices including the GGO nodules based on the parameters. Experimental results show that the proposal can detect the slices including the GGO nodules effectively.


2020 ◽  
Vol 36 (11) ◽  
pp. 3457-3465 ◽  
Author(s):  
Renming Liu ◽  
Christopher A Mancuso ◽  
Anna Yannakopoulos ◽  
Kayla A Johnson ◽  
Arjun Krishnan

Abstract Background Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine-learning technique across fields, supervised learning has been applied only in a few network-based studies for predicting pathway-, phenotype- or disease-associated genes. It is unknown how supervised learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label propagation, the widely benchmarked canonical approach for this problem. Results In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene’s full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation’s appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows. Availability and implementation The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document