scholarly journals Short-range template switching in great ape genomes explored using a pair hidden Markov model

2020 ◽  
Author(s):  
Conor R. Walker ◽  
Aylwyn Scally ◽  
Nicola De Maio ◽  
Nick Goldman

Many complex genomic rearrangements arise through template switch errors, which occur in DNA replication when there is a transient polymerase switch to an alternate template nearby in three-dimensional space. While typically investigated at kilobase-to-megabase scales, the genomic and evolutionary consequences of this mutational process are not well characterised at smaller scales, where they are often interpreted as clusters of independent substitutions, insertions and deletions. Here we present an improved statistical approach using pair hidden Markov models, and use it to detect and describe short-range template switches underlying clusters of mutations in the multi-way alignment of hominid genomes. Using robust statistics derived from evolutionary genomic simulations, we show that template switch events have been widespread in the evolution of the great apes’ genomes and provide a parsimonious explanation for the presence of many complex mutation clusters in their phylogenetic context. Larger-scale mechanisms of genome rearrangement are typically associated with structural features around breakpoints, and accordingly we show that atypical patterns of secondary structure formation and DNA bending are present at the initial template switch loci. Our methods improve on previous non-probabilistic approaches for computational detection of template switch mutations, allowing the statistical significance of events to be assessed. By specifying realistic evolutionary parameters based on the genomes and taxa involved, our methods can be readily adapted to other intra- or inter-species comparisons.

PLoS Genetics ◽  
2021 ◽  
Vol 17 (3) ◽  
pp. e1009221
Author(s):  
Conor R. Walker ◽  
Aylwyn Scally ◽  
Nicola De Maio ◽  
Nick Goldman

Many complex genomic rearrangements arise through template switch errors, which occur in DNA replication when there is a transient polymerase switch to an alternate template nearby in three-dimensional space. While typically investigated at kilobase-to-megabase scales, the genomic and evolutionary consequences of this mutational process are not well characterised at smaller scales, where they are often interpreted as clusters of independent substitutions, insertions and deletions. Here we present an improved statistical approach using pair hidden Markov models, and use it to detect and describe short-range template switches underlying clusters of mutations in the multi-way alignment of hominid genomes. Using robust statistics derived from evolutionary genomic simulations, we show that template switch events have been widespread in the evolution of the great apes’ genomes and provide a parsimonious explanation for the presence of many complex mutation clusters in their phylogenetic context. Larger-scale mechanisms of genome rearrangement are typically associated with structural features around breakpoints, and accordingly we show that atypical patterns of secondary structure formation and DNA bending are present at the initial template switch loci. Our methods improve on previous non-probabilistic approaches for computational detection of template switch mutations, allowing the statistical significance of events to be assessed. By specifying realistic evolutionary parameters based on the genomes and taxa involved, our methods can be readily adapted to other intra- or inter-species comparisons.


2010 ◽  
Vol 19 (01) ◽  
pp. 121-131 ◽  
Author(s):  
YAN PAN ◽  
YONG TANG ◽  
YE-MIN LUO ◽  
LU-XIAN LIN ◽  
GUI-BIN WU

Recently, Question Answering has been a hot topic in the research of information retrieval. Question Classification plays a critical role in most Question Answering systems. In this paper, a new approach to classifying questions using Profile Hidden Markov Models (PHMMs) is proposed. The generalization strategies to extract the pattern instances of questions by selective substitution are discussed. Then the classification method with pattern instances' structural features is investigated. Experimental results show that the PHMM based question classifier can reach the accuracy of 92.2% and significantly outperforms most of the state-of-the-art systems.


2016 ◽  
Author(s):  
Ari Löytynoja ◽  
Nick Goldman

AbstractResequencing efforts are uncovering the extent of genetic variation in humans and provide data to study the evolutionary processes shaping our genome. One recurring puzzle in both intra- and inter-species studies is the high frequency of complex mutations comprising multiple nearby base substitutions or insertion-deletions. We devised a generalized mutation model of template switching during replication that extends existing models of genome rearrangement, and used this to study the role of template switch events in the origin of such mutation clusters. Applied to the human genome, our model detects thousands of template switch events during the evolution of human and chimp from their common ancestor, and hundreds of events between two independently sequenced human genomes. While many of these are consistent with the template switch mechanism previously proposed for bacteria but not thought significant in higher organisms, our model also identifies new types of mutations that create short inversions, some flanked by paired inverted repeats. The local template switch process can create numerous complex mutation patterns, including hairpin loop structures, and explains multi-nucleotide mutations and compensatory substitutions without invoking positive selection, complicated and speculative mechanisms, or implausible coincidence. Clustered sequence differences are challenging for mapping and variant calling methods, and we show that detection of mutation clusters with current resequencing methodologies is difficult and many erroneous variant annotations exist in human reference data. Template switch events such as those we have uncovered may have been neglected as an explanation for complex mutations because of biases in commonly used analyses. Incorporation of our model into reference-based analysis pipelines and comparisons of de novo-assembled genomes will lead to improved understanding of genome variation and evolution.


2016 ◽  
Vol 26 (06) ◽  
pp. 1650036 ◽  
Author(s):  
Kostas Michalopoulos ◽  
Michalis Zervakis ◽  
Marie-Pierre Deiber ◽  
Nikolaos Bourbakis

We present a novel synergistic methodology for the spatio-temporal analysis of single Electroencephalogram (EEG) trials. This new methodology is based on the novel synergy of Local Global Graph (LG graph) to characterize define the structural features of the EEG topography as a global descriptor for robust comparison of dominant topographies (microstates) and Hidden Markov Models (HMM) to model the topographic sequence in a unique way. In particular, the LG graph descriptor defines similarity and distance measures that can be successfully used for the difficult comparison of the extracted LG graphs in the presence of noise. In addition, hidden states represent periods of stationary distribution of topographies that constitute the equivalent of the microstates in the model. The transitions between the different microstates and the formed syntactic patterns can reveal differences in the processing of the input stimulus between different pathologies. We train the HMM model to learn the transitions between the different microstates and express the syntactic patterns that appear in the single trials in a compact and efficient way. We applied this methodology in single trials consisting of normal subjects and patients with Progressive Mild Cognitive Impairment (PMCI) to discriminate these two groups. The classification results show that this approach is capable to efficiently discriminate between control and Progressive MCI single trials. Results indicate that HMMs provide physiologically meaningful results that can be used in the syntactic analysis of Event Related Potentials.


2021 ◽  
Author(s):  
Alfred M. Lentzsch ◽  
Jennifer L. Stamos ◽  
Jun Yao ◽  
Rick Russell ◽  
Alan M. Lambowitz

Reverse transcriptases (RTs) can template switch during cDNA synthesis, enabling them to join discontinuous nucleic acid sequences. Template switching plays crucial roles in retroviral replication and recombination, is used for adapter addition in RNA-seq, and may contribute to retroelement fitness by enabling continuous cDNA synthesis on damaged templates. Here, we determined an X-ray crystal structure of a template-switching complex of a group II intron RT bound simultaneously to an acceptor RNA and donor RNA template/DNA heteroduplex with a 1-nt 3'-DNA overhang. The latter mimics a completed cDNA after non-templated addition (NTA) of a nucleotide complementary to the 3' nucleotide of the acceptor as required for efficient template switching. The structure showed that the 3' end of the acceptor RNA binds in a pocket formed by an N-terminal extension (NTE) present in non-long-terminal-repeat (LTR)-retroelement RTs and the RT fingertips loop, with the 3' nucleotide of the acceptor base paired to the 1-nt 3'-DNA overhang and its penultimate nucleotide base paired to the incoming dNTP at the RT active site. Analysis of structure-guided mutations identified amino acids that contribute to acceptor RNA binding and a phenylalanine near the RT active site that mediates NTA. Mutation of the latter residue decreased multiple sequential template switches in RNA-seq. Our results provide new insights into the mechanisms of template switching and NTA by RTs, suggest how these reactions could be improved for RNA-seq, and reveal common structural features for template switching by non-LTR-retroelement RTs and viral RNA-dependent RNA polymerases.


2015 ◽  
Vol 135 (12) ◽  
pp. 1517-1523 ◽  
Author(s):  
Yicheng Jin ◽  
Takuto Sakuma ◽  
Shohei Kato ◽  
Tsutomu Kunitachi

Author(s):  
M. Vidyasagar

This book explores important aspects of Markov and hidden Markov processes and the applications of these ideas to various problems in computational biology. It starts from first principles, so that no previous knowledge of probability is necessary. However, the work is rigorous and mathematical, making it useful to engineers and mathematicians, even those not interested in biological applications. A range of exercises is provided, including drills to familiarize the reader with concepts and more advanced problems that require deep thinking about the theory. Biological applications are taken from post-genomic biology, especially genomics and proteomics. The topics examined include standard material such as the Perron–Frobenius theorem, transient and recurrent states, hitting probabilities and hitting times, maximum likelihood estimation, the Viterbi algorithm, and the Baum–Welch algorithm. The book contains discussions of extremely useful topics not usually seen at the basic level, such as ergodicity of Markov processes, Markov Chain Monte Carlo (MCMC), information theory, and large deviation theory for both i.i.d and Markov processes. It also presents state-of-the-art realization theory for hidden Markov models. Among biological applications, it offers an in-depth look at the BLAST (Basic Local Alignment Search Technique) algorithm, including a comprehensive explanation of the underlying theory. Other applications such as profile hidden Markov models are also explored.


Sign in / Sign up

Export Citation Format

Share Document