CpG Island Identification with Higher Order and Variable Order Markov Models

Abstract Motivation Alignment-free, stochastic models derived from k-mer distributions representing reference genome sequences have a rich history in the classification of DNA sequences. In particular, the variants of Markov models have previously been used extensively. Higher-order Markov models have been used with caution, perhaps sparingly, primarily because of the lack of enough training data and computational power. Advances in sequencing technology and computation have enabled exploitation of the predictive power of higher-order models. We, therefore, revisited higher-order Markov models and assessed their performance in classifying metagenomic sequences. Results Comparative assessment of higher-order models (HOMs, 9th order or higher) with interpolated Markov model, interpolated context model and lower-order models (8th order or lower) was performed on metagenomic datasets constructed using sequenced prokaryotic genomes. Our results show that HOMs outperform other models in classifying metagenomic fragments as short as 100 nt at all taxonomic ranks, and at lower ranks when the fragment size was increased to 250 nt. HOMs were also found to be significantly more accurate than local alignment which is widely relied upon for taxonomic classification of metagenomic sequences. A novel software implementation written in C++ performs classification faster than the existing Markovian metagenomic classifiers and can therefore be used as a standalone classifier or in conjunction with existing taxonomic classifiers for more robust classification of metagenomic sequences. Availability and implementation The software has been made available at https://github.com/djburks/SMM. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Health assessment and prognostics based on higher‐order hidden semi‐Markov models

Naval Research Logistics (NRL) ◽

10.1002/nav.21947 ◽

2020 ◽

Author(s):

Ying Liao ◽

Yisha Xiang ◽

Min Wang

Keyword(s):

Markov Models ◽

Health Assessment ◽

Higher Order

Download Full-text

A framework for space-efficient variable-order Markov models

Bioinformatics ◽

10.1093/bioinformatics/btz268 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4607-4616

Author(s):

Fabio Cunial ◽

Jarno Alanko ◽

Djamal Belazzougui

Keyword(s):

Language Processing ◽

Data Structures ◽

Markov Models ◽

Biological Properties ◽

Specific Model ◽

Suffix Array ◽

Training Data ◽

Supplementary Information ◽

Variable Order ◽

Scoring Functions

Abstract Motivation Markov models with contexts of variable length are widely used in bioinformatics for representing sets of sequences with similar biological properties. When models contain many long contexts, existing implementations are either unable to handle genome-scale training datasets within typical memory budgets, or they are optimized for specific model variants and are thus inflexible. Results We provide practical, versatile representations of variable-order Markov models and of interpolated Markov models, that support a large number of context-selection criteria, scoring functions, probability smoothing methods, and interpolations, and that take up to four times less space than previous implementations based on the suffix array, regardless of the number and length of contexts, and up to ten times less space than previous trie-based representations, or more, while matching the size of related, state-of-the-art data structures from Natural Language Processing. We describe how to further compress our indexes to a quantity related to the redundancy of the training data, saving up to 90% of their space on very repetitive datasets, and making them become up to 60 times smaller than previous implementations based on the suffix array. Finally, we show how to exploit constraints on the length and frequency of contexts to further shrink our compressed indexes to half of their size or more, achieving data structures that are a hundred times smaller than previous implementations based on the suffix array, or more. This allows variable-order Markov models to be used with bigger datasets and with longer contexts on the same hardware, thus possibly enabling new applications. Availability and implementation https://github.com/jnalanko/VOMM Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Block variable order step size method for solving higher order orbital problems

10.1063/1.5012174 ◽

2017 ◽

Author(s):

Ahmad Fadly Nurullah Rasedee ◽

Hazizah Mohd Ijam ◽

Mohammad Hasan Abdul Sathar ◽

Norizarina Ishak ◽

Muhamad Azrin Nazri ◽

...

Keyword(s):

Higher Order ◽

Variable Order ◽

Step Size ◽

Orbital Problems

Download Full-text

Free Vibration Response of Thin and Thick Nonhomogeneous Shells by Refined One-Dimensional Analysis

Journal of Vibration and Acoustics ◽

10.1115/1.4028127 ◽

2014 ◽

Vol 136 (6) ◽

Cited By ~ 7

Author(s):

Alberto Varello ◽

Erasmo Carrera

Keyword(s):

Finite Element ◽

Free Vibration ◽

Cross Section ◽

Vibration Analysis ◽

Plane Deformation ◽

Free Vibration Analysis ◽

Higher Order ◽

Variable Order ◽

Vibrational Modes ◽

One Dimensional

The free vibration analysis of thin- and thick-walled layered structures via a refined one-dimensional (1D) approach is addressed in this paper. Carrera unified formulation (CUF) is employed to introduce higher-order 1D models with a variable order of expansion for the displacement unknowns over the cross section. Classical Euler–Bernoulli (EBBM) and Timoshenko (TBM) beam theories are obtained as particular cases. Different kinds of vibrational modes with increasing half-wave numbers are investigated for short and relatively short cylindrical shells with different cross section geometries and laminations. Numerical results of natural frequencies and modal shapes are provided by using the finite element method (FEM), which permits various boundary conditions to be handled with ease. The analyses highlight that the refinement of the displacement field by means of higher-order terms is fundamental especially to capture vibrational modes that require warping and in-plane deformation to be detected. Classical beam models are not able to predict the realistic dynamic behavior of shells. Comparisons with three-dimensional elasticity solutions and solid finite element solutions prove that CUF provides accuracy in the free vibration analysis of even short, nonhomogeneous thin- and thick-walled shell structures, despite its 1D approach. The results clearly show that bending, radial, axial, and also shell lobe-type modes can be accurately evaluated by variable kinematic 1D CUF models with a remarkably lower computational effort compared to solid FE models.

Download Full-text

Predictive Channel Access in Cognitive Radio Networks Based on Variable Order Markov Models

2011 IEEE Global Telecommunications Conference - GLOBECOM 2011 ◽

10.1109/glocom.2011.6133706 ◽

2011 ◽

Cited By ~ 6

Author(s):

C. Devanarayana ◽

A. S. Alfa

Keyword(s):

Cognitive Radio ◽

Cognitive Radio Networks ◽

Markov Models ◽

Radio Networks ◽

Variable Order ◽

Channel Access

Download Full-text

Massively parallel structured direct solver for equations describing time-harmonic qP-polarized waves in TTI media

Geophysics ◽

10.1190/geo2011-0163.1 ◽

2012 ◽

Vol 77 (3) ◽

pp. T69-T82 ◽

Cited By ~ 15

Author(s):

Shen Wang ◽

Jianlin Xia ◽

Maarten V. de Hoop ◽

Xiaoye S. Li

Keyword(s):

Finite Difference ◽

Higher Order ◽

Difference Schemes ◽

Massively Parallel ◽

Finite Difference Schemes ◽

Variable Order ◽

Nested Dissection ◽

Partial Differential ◽

Polarized Waves ◽

Time Harmonic

We considered the discretization and approximate solutions of equations describing time-harmonic qP-polarized waves in 3D inhomogeneous anisotropic media. The anisotropy comprises general (tilted) transversely isotropic symmetries. We are concerned with solving these equations for a large number of different sources. We considered higher-order partial differential equations and variable-order finite-difference schemes to accommodate anisotropy on the one hand and allow higher-order accuracy — to control sampling rates for relatively high frequencies — on the other hand. We made use of a nested dissection based domain decomposition in a massively parallel multifrontal solver combined with hierarchically semiseparable matrix compression techniques. The higher-order partial differential operators and the variable-order finite-difference schemes require the introduction of separators with variable thickness in the nested dissection; the development of these and their integration with the multifrontal solver is the main focus of our study. The algorithm that we developed is a powerful tool for anisotropic full-waveform inversion.

Download Full-text

A framework for space-efficient variable-order Markov models

10.1101/443101 ◽

2018 ◽

Author(s):

Fabio Cunial ◽

Jarno Alanko ◽

Djamal Belazzougui

Keyword(s):

Language Processing ◽

Data Structures ◽

Markov Models ◽

Biological Properties ◽

Specific Model ◽

Suffix Array ◽

Training Data ◽

Variable Order ◽

Scoring Functions ◽

Smoothing Methods

AbstractMotivationMarkov models with contexts of variable length are widely used in bioinformatics for representing sets of sequences with similar biological properties. When models contain many long contexts, existing implementations are either unable to handle genome-scale training datasets within typical memory budgets, or they are optimized for specific model variants and are thus inflexible.ResultsWe provide practical, versatile representations of variable-order Markov models and of interpolated Markov models, that support a large number of context-selection criteria, scoring functions, probability smoothing methods, and interpolations, and that take up to 4 times less space than previous implementations based on the suffix array, regardless of the number and length of contexts, and up to 10 times less space than previous trie-based representations, or more, while matching the size of related, state-of-the-art data structures from Natural Language Processing. We describe how to further compress our indexes to a quantity related to the redundancy of the training data, saving up to 90% of their space on repetitive datasets, and making them become up to 60 times smaller than previous implementations based on the suffix array. Finally, we show how to exploit constraints on the length and frequency of contexts to further shrink our compressed indexes to half of their size or more, achieving data structures that are 100 times smaller than previous implementations based on the suffix array, or more. This allows variable-order Markov models to be trained on bigger datasets and with longer contexts on the same hardware, thus possibly enabling new applications.Availability and implementationhttps://github.com/jnalanko/VOMM

Download Full-text

On Prediction Using Variable Order Markov Models

Journal of Artificial Intelligence Research ◽

10.1613/jair.1491 ◽

2004 ◽

Vol 22 ◽

pp. 385-421 ◽

Cited By ~ 174

Author(s):

R. Begleiter ◽

R. El-Yaniv ◽

G. Yona

Keyword(s):

Markov Models ◽

Real Life ◽

Compression Algorithm ◽

Finite Alphabet ◽

Protein Classification ◽

Variable Order ◽

Suffix Trees ◽

Classification Problems ◽

Context Tree ◽

Probabilistic Suffix Trees

This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic Suffix Trees (PSTs). We discuss the properties of these algorithms and compare their performance using real life sequences from three domains: proteins, English text and music pieces. The comparison is made with respect to prediction quality as measured by the average log-loss. We also compare classification algorithms based on these predictors with respect to a number of large protein classification tasks. Our results indicate that a ``decomposed'' CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in sequence prediction tasks. Somewhat surprisingly, a different algorithm, which is a modification of the Lempel-Ziv compression algorithm, significantly outperforms all algorithms on the protein classification problems.

Download Full-text