A High-Throughput Pruning-based Pair-Hidden-Markov-Model Hardware Accelerator for Next-Generation DNA Sequencing

Copy number variation (CNV) is a prevalent kind of genetic structural variation which leads to an abnormal number of copies of large genomic regions, such as gain or loss of DNA segments larger than 1[Formula: see text]kb. CNV exists not only in human genome but also in plant genome. Current researches have testified that CNV is associated with many complex diseases. In this paper, guanine-cytosine (GC) bias, mappability and their effect on read depth signals in sequencing data are discussed first. Subsequently, a new correction method for GC bias and an improved combinatorial detection algorithm for CNV using high-throughput sequencing reads based on hidden Markov model (CNV-HMM) are proposed. The corrected read depth signals have lower correlation with GC content, mappability of reads and the width of analysis window. Then we create a hidden Markov model which maps the reads onto the reference genome and records the unmapped reads. The unmapped reads are counted and normalized. The CNV-HMM detects the abnormal signal of read count and gains the candidate CNVs using the expectation maximization (EM) algorithm. Finally, we filter the candidate CNVs using split reads to promote the performance of our algorithm. The experiment result indicates that the CNV-HMM algorithm has higher accuracy and sensitivity for CNVs detection than most current detection algorithms.

Download Full-text

BCFtools/RoH: a hidden Markov model approach for detecting autozygosity from next-generation sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btw044 ◽

2016 ◽

Vol 32 (11) ◽

pp. 1749-1751 ◽

Cited By ~ 184

Author(s):

Vagheesh Narasimhan ◽

Petr Danecek ◽

Aylwyn Scally ◽

Yali Xue ◽

Chris Tyler-Smith ◽

...

Keyword(s):

Next Generation Sequencing ◽

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing ◽

Model Approach

Download Full-text

Validation for Clinical Use of, and Initial Clinical Experience with, a Novel Approach to Population-Based Carrier Screening using High-Throughput, Next-Generation DNA Sequencing

Journal of Molecular Diagnostics ◽

10.1016/j.jmoldx.2013.10.006 ◽

2014 ◽

Vol 16 (2) ◽

pp. 180-189 ◽

Cited By ~ 29

Author(s):

Stephanie Hallam ◽

Heather Nelson ◽

Valerie Greger ◽

Cynthia Perreault-Micale ◽

Jocelyn Davie ◽

...

Keyword(s):

Dna Sequencing ◽

Clinical Experience ◽

High Throughput ◽

Population Based ◽

Carrier Screening ◽

Clinical Use ◽

Next Generation ◽

Next Generation Dna Sequencing ◽

Novel Approach ◽

Initial Clinical Experience

Download Full-text

A Hidden Markov Model Approach for Simultaneously Estimating Local Ancestry and Admixture Time Using Next Generation Sequence Data in Samples of Arbitrary Ploidy

PLoS Genetics ◽

10.1371/journal.pgen.1006529 ◽

2017 ◽

Vol 13 (1) ◽

pp. e1006529 ◽

Cited By ~ 48

Author(s):

Russell Corbett-Detig ◽

Rasmus Nielsen

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Sequence Data ◽

Hidden Markov ◽

Next Generation ◽

Local Ancestry ◽

Model Approach

Download Full-text

A high-throughput analysis pipeline for large next generation DNA sequencing studies

2012 IEEE International Conference on Bioinformatics and Biomedicine ◽

10.1109/bibm.2012.6392638 ◽

2012 ◽

Author(s):

Zayed Albertyn ◽

Jorg Hakenberg ◽

Hongjin Bian ◽

Huifeng Niu ◽

James Cai

Keyword(s):

Dna Sequencing ◽

High Throughput ◽

Next Generation ◽

Analysis Pipeline ◽

High Throughput Analysis ◽

Throughput Analysis ◽

Next Generation Dna Sequencing ◽

Sequencing Studies

Download Full-text

A Nonhomogeneous Hidden Markov Model for Gene Mapping Based on Next-Generation Sequencing Data

Journal of Computational Biology ◽

10.1089/cmb.2014.0258 ◽

2015 ◽

Vol 22 (2) ◽

pp. 178-188 ◽

Cited By ~ 7

Author(s):

Fatemeh Zamanzad Ghavidel ◽

Jürgen Claesen ◽

Tomasz Burzykowski

Keyword(s):

Next Generation Sequencing ◽

Markov Model ◽

Gene Mapping ◽

Hidden Markov Model ◽

Hidden Markov ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

A hidden Markov model approach for simultaneously estimating local ancestry and admixture time using next generation sequence data in samples of arbitrary ploidy

10.1101/064238 ◽

2016 ◽

Cited By ~ 1

Author(s):

Russell Corbett-Detig ◽

Rasmus Nielsen

Keyword(s):

Drosophila Melanogaster ◽

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Next Generation ◽

Ongoing Research ◽

Local Ancestry ◽

Outlier Loci ◽

Ancestry Inference ◽

Local Ancestry Inference

AbstractAdmixture—the mixing of genomes from divergent populations—is increasingly appreciated as a central process in evolution. To characterize and quantify patterns of admixture across the genome, a number of methods have been developed for local ancestry inference. However, existing approaches have a number of shortcomings. First, all local ancestry inference methods require some prior assumption about the expected ancestry tract lengths. Second, existing methods generally require genotypes, which is not feasible to obtain for many next-generation sequencing projects. Third, many methods assume samples are diploid, however a wide variety of sequencing applications will fail to meet this assumption. To address these issues, we introduce a novel hidden Markov model for estimating local ancestry that models the read pileup data, rather than genotypes, is generalized to arbitrary ploidy, and can estimate the time since admixture during local ancestry inference. We demonstrate that our method can simultaneously estimate the time since admixture and local ancestry with good accuracy, and that it performs well on samples of high ploidy—i.e. 100 or more chromosomes. As this method is very general, we expect it will be useful for local ancestry inference in a wider variety of populations than what previously has been possible. We then applied our method to pooled sequencing data derived from populations of Drosophila melanogaster on an ancestry cline on the east coast of North America. We find that regions of local recombination rates are negatively correlated with the proportion of African ancestry, suggesting that selection against foreign ancestry is the least efficient in low recombination regions. Finally we show that clinal outlier loci are enriched for genes associated with gene regulatory functions, consistent with a role of regulatory evolution in ecological adaptation of admixed D. melanogaster populations. Our results illustrate the potential of local ancestry inference for elucidating fundamental evolutionary processes.Author SummaryWhen divergent populations hybridize, their offspring obtain portions of their genomes from each parent population. Although the average ancestry proportion in each descendant is equal to the proportion of ancestors from each of the ancestral populations, the contribution of each ancestry type is variable across the genome. Estimating local ancestry within admixed individuals is a fundamental goal for evolutionary genetics, and here we develop a method for doing this that circumvents many of the problems associated with existing methods. Briefly, our method can use short read data, rather than genotypes and can be applied to samples with any number of chromosomes. Furthermore, our method simultaneously estimates local ancestry and the number of generations since admixture—the time that the two ancestral populations first encountered each other. Finally, in applying our method to data from an admixture zone between ancestral populations of Drosophila melanogaster, we find many lines of evidence consistent with natural selection operating to against the introduction of foreign ancestry into populations of one predominant ancestry type. Because of the generality of this method, we expect that it will be useful for a wide variety of existing and ongoing research projects.

Download Full-text

A hidden Markov-model for gene mapping based on whole-genome next generation sequencing data

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2014-0007 ◽

2015 ◽

Vol 14 (1) ◽

Cited By ~ 4

Author(s):

Jürgen Claesen ◽

Tomasz Burzykowski

Keyword(s):

Next Generation Sequencing ◽

Markov Model ◽

Hidden Markov Model ◽

Genetic Markers ◽

High Throughput Sequencing ◽

Transition Probabilities ◽

Hidden Markov ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Generation Sequencing

AbstractThe analysis of polygenic, phenotypic characteristics such as quantitative traits or inheritable diseases requires reliable scoring of many genetic markers covering the entire genome. The advent of high-throughput sequencing technologies provides a new way to evaluate large numbers of single nucleotide polymorphisms as genetic markers. Combining the technologies with pooling of segregants, as performed in bulk segregant analysis, should, in principle, allow the simultaneous mapping of multiple genetic loci present throughout the genome. We propose a hidden Markov-model to analyze the marker data obtained by the bulk segregant next generation sequencing. The model includes several states, each associated with a different probability of observing the same/different nucleotide in an offspring as compared to the parent. The transitions between the molecular markers imply transitions between the states of the model. After estimating the transition probabilities and state-related probabilities of nucleotide (dis)similarity, the most probable state for each SNP is selected. The most probable states can then be used to indicate which genomic regions may be likely to contain trait-related genes. The application of the model is illustrated on the data from a study of ethanol tolerance in yeast. Software is written in R. R-functions, R-scripts and documentation are available on

Download Full-text

A High-Throughput Pruning-based Pair-Hidden-Markov-Model Hardware Accelerator for Next-Generation DNA Sequencing

17.3 GCUPS Pruning-Based Pair-Hidden-Markov-Model Accelerator for Next-Generation DNA Sequencing

Novel Next-Generation DNA Sequencing Techniques for Ultra High-Throughput Applications in Bio-Medicine

Combinatorial Detection Algorithm for Copy Number Variations Using High-throughput Sequencing Reads

BCFtools/RoH: a hidden Markov model approach for detecting autozygosity from next-generation sequencing data

Validation for Clinical Use of, and Initial Clinical Experience with, a Novel Approach to Population-Based Carrier Screening using High-Throughput, Next-Generation DNA Sequencing

A Hidden Markov Model Approach for Simultaneously Estimating Local Ancestry and Admixture Time Using Next Generation Sequence Data in Samples of Arbitrary Ploidy

A high-throughput analysis pipeline for large next generation DNA sequencing studies

A Nonhomogeneous Hidden Markov Model for Gene Mapping Based on Next-Generation Sequencing Data

A hidden Markov model approach for simultaneously estimating local ancestry and admixture time using next generation sequence data in samples of arbitrary ploidy

A hidden Markov-model for gene mapping based on whole-genome next generation sequencing data

Export Citation Format