Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study

Mapping Intimacies ◽

10.1101/2020.02.03.932350 ◽

2020 ◽

Cited By ~ 10

Author(s):

Gurjit S. Randhawa ◽

Maximillian P.M. Soltysiak ◽

Hadi El Roz ◽

Camila P.E. de Souza ◽

Kathleen A. Hill ◽

...

Keyword(s):

Machine Learning ◽

Death Rate ◽

Genomic Sequence ◽

Sequence Data ◽

Rank Correlation ◽

Taxonomic Classification ◽

Supervised Machine Learning ◽

Biological Knowledge ◽

Alignment Free

AbstractAs of February 20, 2020, the 2019 novel coronavirus (renamed to COVID-19) spread to 30 countries with 2130 deaths and more than 75500 confirmed cases. COVID-19 is being compared to the infamous SARS coronavirus, which resulted, between November 2002 and July 2003, in 8098 confirmed cases worldwide with a 9.6% death rate and 774 deaths. Though COVID-19 has a death rate of 2.8% as of 20 February, the 75752 confirmed cases in a few weeks (December 8, 2019 to February 20, 2020) are alarming, with cases likely being under-reported given the comparatively longer incubation period. Such outbreaks demand elucidation of taxonomic classification and origin of the virus genomic sequence, for strategic planning, containment, and treatment. This paper identifies an intrinsic COVID-19 genomic signature and uses it together with a machine learning-based alignment-free approach for an ultra-fast, scalable, and highly accurate classification of whole COVID-19 genomes. The proposed method combines supervised machine learning with digital signal processing for genome analyses, augmented by a decision tree approach to the machine learning component, and a Spearman’s rank correlation coefficient analysis for result validation. These tools are used to analyze a large dataset of over 5000 unique viral genomic sequences, totalling 61.8 million bp. Our results support a hypothesis of a bat origin and classify COVID-19 as Sarbecovirus, within Betacoronavirus. Our method achieves high levels of classification accuracy and discovers the most relevant relationships among over 5,000 viral genomes within a few minutes, ab initio, using raw DNA sequence data alone, and without any specialized biological knowledge, training, gene or genome annotations. This suggests that, for novel viral and pathogen genome sequences, this alignment-free whole-genome machine-learning approach can provide a reliable real-time option for taxonomic classification.

Download Full-text

Prediction of Compound-Protein Interactions with Machine Learning Methods

Chemoinformatics and Advanced Machine Learning Perspectives ◽

10.4018/978-1-61520-911-8.ch016 ◽

2011 ◽

pp. 304-317

Author(s):

Yoshihiro Yamanishi ◽

Hisashi Kashima

Keyword(s):

Machine Learning ◽

Protein Interactions ◽

Chemical Structure ◽

Genomic Sequence ◽

Sequence Data ◽

Binary Classification ◽

Biological Data ◽

Supervised Machine Learning ◽

Learning Methods ◽

Machine Learning Methods

In silico prediction of compound-protein interactions from heterogeneous biological data is critical in the process of drug development. In this chapter the authors review several supervised machine learning methods to predict unknown compound-protein interactions from chemical structure and genomic sequence information simultaneously. The authors review several kernel-based algorithms from two different viewpoints: binary classification and dimension reduction. In the results, they demonstrate the usefulness of the methods on the prediction of drug-target interactions and ligand-protein interactions from chemical structure data and genomic sequence data.

Download Full-text

Optimizing taxonomic classification of marker gene amplicon sequences

10.7287/peerj.preprints.3208v2 ◽

2018 ◽

Cited By ~ 4

Author(s):

Nicholas A Bokulich ◽

Benjamin D Kaehler ◽

Jai Ram Rideout ◽

Matthew Dillon ◽

Evan Bolyen ◽

...

Keyword(s):

Machine Learning ◽

Sequence Data ◽

Marker Gene ◽

Parameter Tuning ◽

Operating Conditions ◽

Evaluation Framework ◽

Taxonomic Classification ◽

Consensus Methods ◽

Learning Classifier

Background: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. Results: We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based taxonomy classifiers that meet or exceed the accuracy of existing methods for marker-gene amplicon sequence classification. We evaluated and optimized several commonly used taxonomic classification methods (RDP, BLAST, UCLUST) and several new methods (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods of VSEARCH, BLAST+, and SortMeRNA) for classification of marker-gene amplicon sequence data. Conclusions: Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for a range of standard operating conditions. q2-feature-classifier and our evaluation framework, tax-credit, are both free, open-source, BSD-licensed packages available on GitHub.

Download Full-text

Prediction of Compound-protein Interactions with Machine Learning Methods

Machine Learning ◽

10.4018/978-1-60960-818-7.ch315 ◽

2012 ◽

pp. 616-630

Author(s):

Yoshihiro Yamanishi ◽

Hisashi Kashima

Keyword(s):

Machine Learning ◽

Protein Interactions ◽

Chemical Structure ◽

Genomic Sequence ◽

Sequence Data ◽

Binary Classification ◽

Biological Data ◽

Supervised Machine Learning ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Optimizing taxonomic classification of marker gene amplicon sequences

10.7287/peerj.preprints.3208 ◽

2018 ◽

Author(s):

Nicholas A Bokulich ◽

Benjamin D Kaehler ◽

Jai Ram Rideout ◽

Matthew Dillon ◽

Evan Bolyen ◽

...

Keyword(s):

Machine Learning ◽

Sequence Data ◽

Marker Gene ◽

Parameter Tuning ◽

Operating Conditions ◽

Evaluation Framework ◽

Taxonomic Classification ◽

Consensus Methods ◽

Learning Classifier

Download Full-text

Machine Learning for Population Genetics: A New Paradigm

10.1101/206482 ◽

2017 ◽

Cited By ~ 4

Author(s):

Daniel R. Schrider ◽

Andrew D. Kern

Keyword(s):

Machine Learning ◽

Population Genetics ◽

Population Genomics ◽

Genomic Sequence ◽

Sequence Data ◽

Supervised Machine Learning ◽

New Paradigm ◽

Making Sense ◽

Daunting Task ◽

Population Genetic Inference

AbstractAs population genomic datasets grow in size, researchers are faced with the daunting task of making sense of a flood of information. To keep pace with this explosion of data, computational methodologies for population genetic inference are rapidly being developed to best utilize genomic sequence data. In this review we discuss a new paradigm that has emerged in computational population genomics: that of supervised machine learning. We review the fundamentals of machine learning, discuss recent applications of supervised machine learning to population genetics that outperform competing methods, and describe promising future directions in this area. Ultimately, we argue that supervised machine learning is an important and underutilized tool that has considerable potential for the world of evolutionary genomics.

Download Full-text

Effectuating Supervised Machine Learning Techniques for Multiclass Classification of Problematic Internet and Mobile Usage

2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS) ◽

10.1109/icccis51004.2021.9397062 ◽

2021 ◽

Author(s):

Sneha Sarkar ◽

Samanyu Bhandary ◽

Arti Arya

Keyword(s):

Machine Learning ◽

Multiclass Classification ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

What Computers Can Tell Us About Emotions – Classification of Affective Communication in Electronic Negotiations by Supervised Machine Learning

Lecture Notes in Business Information Processing - Group Decision and Negotiation. Theory, Empirical Evidence, and Application ◽

10.1007/978-3-319-52624-9_9 ◽

2017 ◽

pp. 113-123 ◽

Cited By ~ 1

Author(s):

Michael Filzmoser ◽

Sabine T. Koeszegi ◽

Guenther Pfeffer

Keyword(s):

Machine Learning ◽

Supervised Machine Learning ◽

Electronic Negotiations ◽

Affective Communication

Download Full-text

Classification of Military Aircraft in Real-time Radar Systems based on Supervised Machine Learning with Labelled ADS-B Data

2018 Sensor Data Fusion: Trends, Solutions, Applications (SDF) ◽

10.1109/sdf.2018.8547077 ◽

2018 ◽

Cited By ~ 3

Author(s):

Kaeye Dastner ◽

Susie Brunessaux ◽

Elke Schmid ◽

Bastian von Hasler zu Roseneckh-Kohler ◽

Felix Opitz

Keyword(s):

Machine Learning ◽

Real Time ◽

Supervised Machine Learning ◽

Military Aircraft ◽

Radar Systems

Download Full-text

Seeing It All: Evaluating Supervised Machine Learning Methods for the Classification of Diverse Otariid Behaviours

PLoS ONE ◽

10.1371/journal.pone.0166898 ◽

2016 ◽

Vol 11 (12) ◽

pp. e0166898 ◽

Cited By ~ 15

Author(s):

Monique A. Ladds ◽

Adam P. Thompson ◽

David J. Slip ◽

David P. Hocking ◽

Robert G. Harcourt

Keyword(s):

Machine Learning ◽

Supervised Machine Learning ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Supervised Machine Learning Approach for Subjectivity/Objectivity Classification of Social Data

Information Systems - Lecture Notes in Business Information Processing ◽

10.1007/978-3-030-44322-1_15 ◽

2020 ◽

pp. 193-205

Author(s):

Rim Chiha ◽

Mounir Ben Ayed

Keyword(s):

Machine Learning ◽

Supervised Machine Learning ◽

Learning Approach ◽

Social Data ◽

Machine Learning Approach

Download Full-text