scholarly journals vRhyme enables binning of viral genomes from metagenomes

2021 ◽  
Author(s):  
Kristopher Kieft ◽  
Alyssa Adams ◽  
Rauf Salamzade ◽  
Lindsay Kalan ◽  
Karthik Anantharaman

Genome binning has been essential for characterization of bacteria, archaea, and even eukaryotes from metagenomes. Yet, no approach exists for viruses. We developed vRhyme, a fast and precise software for construction of viral metagenome-assembled genomes (vMAGs). vRhyme utilizes single- or multi-sample coverage effect size comparisons between scaffolds and employs supervised machine learning to identity nucleotide feature similarities, which are compiled into iterations of weighted networks and refined bins. Using simulated viromes, we displayed superior performance of vRhyme compared to available binning tools in constructing more complete and uncontaminated vMAGs. When applied to 10,601 viral scaffolds from human skin, vRhyme advanced our understanding of resident viruses, highlighted by identification of a Herelleviridae vMAG comprised of 22 scaffolds, and another vMAG encoding a nitrate reductase metabolic gene, representing near-complete genomes post-binning. vRhyme will enable a convention of binning uncultivated viral genomes and has the potential to transform metagenome-based viral ecology.

2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Jinchao Liu ◽  
Di Zhang ◽  
Dianqiang Yu ◽  
Mengxin Ren ◽  
Jingjun Xu

AbstractEllipsometry is a powerful method for determining both the optical constants and thickness of thin films. For decades, solutions to ill-posed inverse ellipsometric problems require substantial human–expert intervention and have become essentially human-in-the-loop trial-and-error processes that are not only tedious and time-consuming but also limit the applicability of ellipsometry. Here, we demonstrate a machine learning based approach for solving ellipsometric problems in an unambiguous and fully automatic manner while showing superior performance. The proposed approach is experimentally validated by using a broad range of films covering categories of metals, semiconductors, and dielectrics. This method is compatible with existing ellipsometers and paves the way for realizing the automatic, rapid, high-throughput optical characterization of films.


2020 ◽  
Author(s):  
John T. Halloran ◽  
Gregor Urban ◽  
David Rocke ◽  
Pierre Baldi

AbstractSemi-supervised machine learning post-processors critically improve peptide identification of shot-gun proteomics data. Such post-processors accept the peptide-spectrum matches (PSMs) and feature vectors resulting from a database search, train a machine learning classifier, and recalibrate PSMs using the trained parameters, often yielding significantly more identified peptides across q-value thresholds. However, current state-of-the-art post-processors rely on shallow machine learning methods, such as support vector machines. In contrast, the powerful training capabilities of deep learning models have displayed superior performance to shallow models in an ever-growing number of other fields. In this work, we show that deep models significantly improve the recalibration of PSMs compared to the most accurate and widely-used post-processors, such as Percolator and PeptideProphet. Furthermore, we show that deep learning is able to adaptively analyze complex datasets and features for more accurate universal post-processing, leading to both improved Prosit analysis and markedly better recalibration of recently developed database-search functions.


Complexity ◽  
2019 ◽  
Vol 2019 ◽  
pp. 1-10 ◽  
Author(s):  
Rafael Vega Vega ◽  
Héctor Quintián ◽  
Carlos Cambra ◽  
Nuño Basurto ◽  
Álvaro Herrero ◽  
...  

Present research proposes the application of unsupervised and supervised machine-learning techniques to characterize Android malware families. More precisely, a novel unsupervised neural-projection method for dimensionality-reduction, namely, Beta Hebbian Learning (BHL), is applied to visually analyze such malware. Additionally, well-known supervised Decision Trees (DTs) are also applied for the first time in order to improve characterization of such families and compare the original features that are identified as the most important ones. The proposed techniques are validated when facing real-life Android malware data by means of the well-known and publicly available Malgenome dataset. Obtained results support the proposed approach, confirming the validity of BHL and DTs to gain deep knowledge on Android malware.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Gorka Muñoz-Gil ◽  
Giovanni Volpe ◽  
Miguel Angel Garcia-March ◽  
Erez Aghion ◽  
Aykut Argun ◽  
...  

AbstractDeviations from Brownian motion leading to anomalous diffusion are found in transport dynamics from quantum physics to life sciences. The characterization of anomalous diffusion from the measurement of an individual trajectory is a challenging task, which traditionally relies on calculating the trajectory mean squared displacement. However, this approach breaks down for cases of practical interest, e.g., short or noisy trajectories, heterogeneous behaviour, or non-ergodic processes. Recently, several new approaches have been proposed, mostly building on the ongoing machine-learning revolution. To perform an objective comparison of methods, we gathered the community and organized an open competition, the Anomalous Diffusion challenge (AnDi). Participating teams applied their algorithms to a commonly-defined dataset including diverse conditions. Although no single method performed best across all scenarios, machine-learning-based approaches achieved superior performance for all tasks. The discussion of the challenge results provides practical advice for users and a benchmark for developers.


2014 ◽  
Author(s):  
Adam Hughes ◽  
Zhaowen Liu ◽  
Maryam Raftari ◽  
Mark E Reeves

A persistent challenge in materials science is the characterization of a large ensemble of heterogeneous nanostructures in a set of images. This often leads to practices such as manual particle counting, and sampling bias of a favorable region of the “best” image. Herein, we present the open-source software, imaging criteria and workflow necessary to fully characterize an ensemble of SEM nanoparticle images. Such characterization is critical to nanoparticle biosensors, whose performance and characteristics are determined by the distribution of the underlying nanoparticle film. We utilize novel artificial SEM images to objectively compare commonly-found image processing methods through each stage of the workflow: acquistion, preprocessing, segmentation, labeling and object classification. Using the semi- supervised machine learning application, Ilastik, we demonstrate the decomposition of a nanoparticle image into particle subtypes relevant to our application: singles, dimers, flat aggregates and piles. We outline a workflow for characterizing and classifying nanoscale features on low-magnification images with thousands of nanoparticles. This work is accompanied by a repository of supplementary materials, including videos, a bank of real and artificial SEM images, and ten IPython Notebook tutorials to reproduce and extend the presented results.


2020 ◽  
Vol 36 (10) ◽  
pp. 3185-3191 ◽  
Author(s):  
Edison Ong ◽  
Haihe Wang ◽  
Mei U Wong ◽  
Meenakshi Seetharaman ◽  
Ninotchka Valdez ◽  
...  

Abstract Motivation Reverse vaccinology (RV) is a milestone in rational vaccine design, and machine learning (ML) has been applied to enhance the accuracy of RV prediction. However, ML-based RV still faces challenges in prediction accuracy and program accessibility. Results This study presents Vaxign-ML, a supervised ML classification to predict bacterial protective antigens (BPAgs). To identify the best ML method with optimized conditions, five ML methods were tested with biological and physiochemical features extracted from well-defined training data. Nested 5-fold cross-validation and leave-one-pathogen-out validation were used to ensure unbiased performance assessment and the capability to predict vaccine candidates against a new emerging pathogen. The best performing model (eXtreme Gradient Boosting) was compared to three publicly available programs (Vaxign, VaxiJen, and Antigenic), one SVM-based method, and one epitope-based method using a high-quality benchmark dataset. Vaxign-ML showed superior performance in predicting BPAgs. Vaxign-ML is hosted in a publicly accessible web server and a standalone version is also available. Availability and implementation Vaxign-ML website at http://www.violinet.org/vaxign/vaxign-ml, Docker standalone Vaxign-ML available at https://hub.docker.com/r/e4ong1031/vaxign-ml and source code is available at https://github.com/VIOLINet/Vaxign-ML-docker. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Adam S Hughes ◽  
Zhaowen Liu ◽  
Maryam Raftari ◽  
Marke M. E. Reeves

A persistent challenge in materials science is the characterization of a large ensemble of heterogeneous nanostructures in a set of images. This often leads to practices such as manual particle counting, and sampling bias of a favorable region of the “best” image. Herein, we present the open-source software, imaging criteria and workflow necessary to fully characterize an ensemble of SEM nanoparticle images. Such characterization is critical to nanoparticle biosensors, whose performance and characteristics are determined by the distribution of the underlying nanoparticle film. We utilize novel artificial SEM images to objectively compare commonly-found image processing methods through each stage of the workflow: acquistion, preprocessing, segmentation, labeling and object classification. Using the semi- supervised machine learning application, Ilastik, we demonstrate the decomposition of a nanoparticle image into particle subtypes relevant to our application: singles, dimers, flat aggregates and piles. We outline a workflow for characterizing and classifying nanoscale features on low-magnification images with thousands of nanoparticles. This work is accompanied by a repository of supplementary materials, including videos, a bank of real and artificial SEM images, and ten IPython Notebook tutorials to reproduce and extend the presented results.


2019 ◽  
Author(s):  
Tanbin Rahman ◽  
Hsin-En Huang ◽  
An-Shun Tai ◽  
Wen-Ping Hsieh ◽  
George Tseng

AbstractSupervised machine learning methods have been increasingly used in biomedical research and in clinical practice. In transcriptomic applications, RNA-seq data have become dominating and have gradually replaced traditional microarray due to its reduced background noise and increased digital precision. Most existing machine learning methods are, however, designed for continuous intensities of microarray and are not suitable for RNA-seq count data. In this paper, we develop a negative binomial model via generalized linear model framework with double regularization for gene and covariate sparsity to accommodate three key elements: adequate modeling of count data with overdispersion, gene selection and adjustment for covariate effect. The proposed method is evaluated in simulations and two real applications using cervical tumor miRNA-seq data and schizophrenia post-mortem brain tissue RNA-seq data to demonstrate its superior performance in prediction accuracy and feature selection.


Sign in / Sign up

Export Citation Format

Share Document