Casboundary: automated definition of integral Cas cassettes

Bioinformatics ◽

10.1093/bioinformatics/btaa984 ◽

2020 ◽

Author(s):

Victor A Padilha ◽

Omer S Alkhnbashi ◽

Van Dinh Tran ◽

Shiraz A Shah ◽

André C P L F Carvalho ◽

...

Keyword(s):

Markov Models ◽

Hidden Markov ◽

Predictive Performance ◽

Research Field ◽

Supplementary Information ◽

Bacterial Genomes ◽

Average Similarity ◽

Cas Genes ◽

Definition Of

Abstract Motivation CRISPR-Cas are important systems found in most archaeal and many bacterial genomes, providing adaptive immunity against mobile genetic elements in prokaryotes. The CRISPR-Cas systems are encoded by a set of consecutive cas genes, here termed cassette. The identification of cassette boundaries is key for finding cassettes in CRISPR research field. This is often carried out by using Hidden Markov Models and manual annotation. In this article, we propose the first method able to automatically define the cassette boundaries. In addition, we present a Cas-type predictive model used by the method to assign each gene located in the region defined by a cassette’s boundaries a Cas label from a set of pre-defined Cas types. Furthermore, the proposed method can detect potentially new cas genes and decompose a cassette into its modules. Results We evaluate the predictive performance of our proposed method on data collected from the two most recent CRISPR classification studies. In our experiments, we obtain an average similarity of 0.86 between the predicted and expected cassettes. Besides, we achieve F-scores above 0.9 for the classification of cas genes of known types and 0.73 for the unknown ones. Finally, we conduct two additional study cases, where we investigate the occurrence of potentially new cas genes and the occurrence of module exchange between different genomes. Availability and implementation https://github.com/BackofenLab/Casboundary. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Classification of the Adenylation and Acyl-Transferase Activity of NRPS and PKS Systems Using Ensembles of Substrate Specific Hidden Markov Models

PLoS ONE ◽

10.1371/journal.pone.0062136 ◽

2013 ◽

Vol 8 (4) ◽

pp. e62136 ◽

Cited By ~ 57

Author(s):

Barzan I. Khayatt ◽

Lex Overmars ◽

Roland J. Siezen ◽

Christof Francke

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Transferase Activity ◽

Acyl Transferase ◽

Acyl Transferase Activity

Download Full-text

Comparison Between Hidden Markov Models and Artificial Neural Networks in the Classification of Bearing Defects

Applied Condition Monitoring - Rotating Machinery and Signal Processing ◽

10.1007/978-3-319-96181-1_6 ◽

2018 ◽

pp. 68-78

Author(s):

Miloud Sedira ◽

Ridha Ziani ◽

Ahmed Felkaoui

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Bearing Defects ◽

Artificial Neural

Download Full-text

Modelling Bacterial Genomes Using Hidden Markov Models

COMPSTAT ◽

10.1007/978-3-662-01131-7_8 ◽

1998 ◽

pp. 89-100 ◽

Cited By ~ 1

Author(s):

Florence Muri

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Bacterial Genomes

Download Full-text

Higher-order Markov models for metagenomic sequence classification

Bioinformatics ◽

10.1093/bioinformatics/btaa562 ◽

2020 ◽

Vol 36 (14) ◽

pp. 4130-4136

Author(s):

David J Burks ◽

Rajeev K Azad

Keyword(s):

Dna Sequences ◽

Markov Models ◽

Fragment Size ◽

Higher Order ◽

Training Data ◽

Supplementary Information ◽

Local Alignment ◽

Metagenomic Sequence ◽

Higher Order Models

Abstract Motivation Alignment-free, stochastic models derived from k-mer distributions representing reference genome sequences have a rich history in the classification of DNA sequences. In particular, the variants of Markov models have previously been used extensively. Higher-order Markov models have been used with caution, perhaps sparingly, primarily because of the lack of enough training data and computational power. Advances in sequencing technology and computation have enabled exploitation of the predictive power of higher-order models. We, therefore, revisited higher-order Markov models and assessed their performance in classifying metagenomic sequences. Results Comparative assessment of higher-order models (HOMs, 9th order or higher) with interpolated Markov model, interpolated context model and lower-order models (8th order or lower) was performed on metagenomic datasets constructed using sequenced prokaryotic genomes. Our results show that HOMs outperform other models in classifying metagenomic fragments as short as 100 nt at all taxonomic ranks, and at lower ranks when the fragment size was increased to 250 nt. HOMs were also found to be significantly more accurate than local alignment which is widely relied upon for taxonomic classification of metagenomic sequences. A novel software implementation written in C++ performs classification faster than the existing Markovian metagenomic classifiers and can therefore be used as a standalone classifier or in conjunction with existing taxonomic classifiers for more robust classification of metagenomic sequences. Availability and implementation The software has been made available at https://github.com/djburks/SMM. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Classification of multidimensional observation sequences described by Hidden Markov Models

2014 12th International Conference on Actual Problems of Electronics Instrument Engineering (APEIE) ◽

10.1109/apeie.2014.7040746 ◽

2014 ◽

Author(s):

T.A. Gultyaeva ◽

V.V. Kokoreva

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov

Download Full-text

Automatic Classification of Disordered Voices with Hidden Markov Models

2018 International Conference on Signal, Image, Vision and their Applications (SIVA) ◽

10.1109/siva.2018.8661038 ◽

2018 ◽

Author(s):

Redouane Benhammoud ◽

Abdellah Kacha

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Automatic Classification

Download Full-text

Hidden Markov Models Used for the Offline Classification of EEG Data - Hidden Markov-Modelle, verwendet zur Offline-Klassifikation von EEG-Daten

Biomedical Engineering / Biomedizinische Technik ◽

10.1515/bmte.1999.44.6.158 ◽

1999 ◽

Vol 44 (6) ◽

pp. 158-162 ◽

Cited By ~ 17

Author(s):

B. Obermaier ◽

Ch. Guger ◽

G. Pfurtscheller

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Eeg Data

Download Full-text

CutProtFam-Pred: Detection and classification of putative structural cuticular proteins from sequence alone, based on profile Hidden Markov Models

Insect Biochemistry and Molecular Biology ◽

10.1016/j.ibmb.2014.06.004 ◽

2014 ◽

Vol 52 ◽

pp. 51-59 ◽

Cited By ~ 67

Author(s):

Zoi S. Ioannidou ◽

Margarita C. Theodoropoulou ◽

Nikos C. Papandreou ◽

Judith H. Willis ◽

Stavros J. Hamodrakas

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Profile Hidden Markov Models ◽

Cuticular Proteins

Download Full-text

Sequence Classification Using Third-Order Moments

Neural Computation ◽

10.1162/neco_a_01033 ◽

2018 ◽

Vol 30 (1) ◽

pp. 216-236

Author(s):

Rasmus Troelsgaard ◽

Lars Kai Hansen

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Sequence Data ◽

Hidden Markov ◽

Score Function ◽

Simulated Data ◽

Discrete Observations ◽

Third Order ◽

Leibler Divergence

Model-based classification of sequence data using a set of hidden Markov models is a well-known technique. The involved score function, which is often based on the class-conditional likelihood, can, however, be computationally demanding, especially for long data sequences. Inspired by recent theoretical advances in spectral learning of hidden Markov models, we propose a score function based on third-order moments. In particular, we propose to use the Kullback-Leibler divergence between theoretical and empirical third-order moments for classification of sequence data with discrete observations. The proposed method provides lower computational complexity at classification time than the usual likelihood-based methods. In order to demonstrate the properties of the proposed method, we perform classification of both simulated data and empirical data from a human activity recognition study.

Download Full-text

Diversity and distribution of nuclease bacteriocins in bacterial genomes revealed using Hidden Markov Models

PLoS Computational Biology ◽

10.1371/journal.pcbi.1005652 ◽

2017 ◽

Vol 13 (7) ◽

pp. e1005652 ◽

Cited By ~ 21

Author(s):

Connor Sharp ◽

James Bray ◽

Nicholas G. Housden ◽

Martin C. J. Maiden ◽

Colin Kleanthous

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Bacterial Genomes

Download Full-text