Fast haplotype matching in very large cohorts using the Li and Stephens model

Mapping Intimacies ◽

10.1101/048280 ◽

2016 ◽

Cited By ~ 6

Author(s):

Gerton Lunter

Keyword(s):

Data Structure ◽

Exact Algorithm ◽

Highly Efficient ◽

Pattern Of Variation ◽

Burrows Wheeler Transform ◽

Reference Cohort

AbstractThe Li and Stephens model, which approximates the coalescent describing the pattern of variation in a population, underpins a range of key tools and results in genetics. Although highly efficient compared to the coalescent, standard implemen-tations of this model still cannot deal with the very large reference cohorts that are starting to become available, and practical implementations use heuristics to achieve reasonable runtimes. Here I describe a new, exact algorithm (“fastLS”) that implements the Li and Stephens model and achieves runtimes independent of the size of the reference cohort. Key to achieving this runtime is the use of the Burrows-Wheeler transform, allowing the algorithm to efficiently identify partial haplotype matches across a cohort. I show that the proposed data structure is very similar to, and generalizes, Durbin’s positional Burrows-Wheeler transform.

Download Full-text

A highly efficient exact algorithm for the uncapacitated multiple allocation p-hub center problem

Decision Science Letters ◽

10.5267/j.dsl.2019.12.001 ◽

2020 ◽

pp. 181-192 ◽

Cited By ~ 1

Author(s):

Nader Ghaffarinasab

Keyword(s):

Exact Algorithm ◽

Center Problem ◽

Highly Efficient ◽

Multiple Allocation

Download Full-text

d-PBWT: dynamic positional Burrows-Wheeler transform

10.1101/2020.01.14.906487 ◽

2020 ◽

Author(s):

Ahsan Sanaullah ◽

Degui Zhi ◽

Shaojie Zhang

Keyword(s):

Data Structure ◽

Time Complexity ◽

Linear Time ◽

Genotype Imputation ◽

Worst Case ◽

Average Case ◽

Insertion And Deletion ◽

Static Data ◽

Efficient Retrieval ◽

Burrows Wheeler Transform

AbstractDurbin’s PBWT, a scalable data structure for haplotype matching, has been successfully applied to identical by descent (IBD) segment identification and genotype imputation. Once the PBWT of a haplotype panel is constructed, it supports efficient retrieval of all shared long segments among all individuals (long matches) and efficient query between an external haplotype and the panel. However, the standard PBWT is an array-based static data structure and does not support dynamic updates of the panel. Here, we generalize the static PBWT to a dynamic data structure, d-PBWT, where the reverse prefix sorting at each position is represented by linked lists. We developed efficient algorithms for insertion and deletion of individual haplotypes. In addition, we verified that d-PBWT can support all algorithms of PBWT. In doing so, we systematically investigated variations of set maximal match and long match query algorithms: while they all have average case time complexity independent of database size, they have different worst case complexities, linear time complexity with the size of the genome, and dependency on additional data structures.

Download Full-text

Parallel Adaptive Mesh Refinement Combined with Additive Multigrid for the Efficient Solution of the Poisson Equation

ISRN Applied Mathematics ◽

10.5402/2012/246491 ◽

2012 ◽

Vol 2012 ◽

pp. 1-24 ◽

Cited By ~ 1

Author(s):

Hua Ji ◽

Fue-Sang Lien ◽

Eugene Yee

Keyword(s):

Data Structure ◽

Poisson Equation ◽

Adaptive Mesh Refinement ◽

Multigrid Method ◽

Mesh Refinement ◽

Adaptive Mesh ◽

Space Filling Curve ◽

Grid Partitioning ◽

Highly Efficient ◽

The Poisson Equation

Three different speed-up methods (viz., additive multigrid method, adaptive mesh refinement (AMR), and parallelization) have been combined in order to provide a highly efficient parallel solver for the Poisson equation. Rather than using an ordinary tree data structure to organize the information on the adaptive Cartesian mesh, a modified form of the fully threaded tree (FTT) data structure is used. The Hilbert space-filling curve (SFC) approach has been adopted for dynamic grid partitioning (resulting in a partitioning that is near optimal with respect to load balancing on a parallel computational platform). Finally, an additive multigrid method (BPX preconditioner), which itself is parallelizable to a certain extent, has been used to solve the linear equation system arising from the discretization. Our numerical experiments show that the proposed parallel AMR algorithm based on the FTT data structure, Hilbert SFC for grid partitioning, and additive multigrid method is highly efficient.

Download Full-text

MD-tree: a balanced hierarchical data structure for multidimensional data with highly efficient dynamic characteristics

[1988 Proceedings] 9th International Conference on Pattern Recognition ◽

10.1109/icpr.1988.28247 ◽

2003 ◽

Cited By ~ 1

Author(s):

Y. Nakamura ◽

S. Abe ◽

Y. Ohsawa ◽

M. Sakauchi

Keyword(s):

Data Structure ◽

Dynamic Characteristics ◽

Multidimensional Data ◽

Hierarchical Data ◽

Hierarchical Data Structure ◽

Highly Efficient

Download Full-text

A balanced hierarchical data structure for multidimensional data with highly efficient dynamic characteristics

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/69.234779 ◽

1993 ◽

Vol 5 (4) ◽

pp. 682-694 ◽

Cited By ~ 22

Author(s):

Y. Nakamura ◽

S. Abe ◽

Y. Ohsawa ◽

M. Sakauchi

Keyword(s):

Data Structure ◽

Dynamic Characteristics ◽

Multidimensional Data ◽

Hierarchical Data ◽

Hierarchical Data Structure ◽

Highly Efficient

Download Full-text

Highly Efficient Computer Oriented Octree Data Structure and Neighbours Search in 3D GIS

Advances in 3D Geoinformation - Lecture Notes in Geoinformation and Cartography ◽

10.1007/978-3-319-25691-7_16 ◽

2016 ◽

pp. 285-303

Author(s):

Noraidah Keling ◽

Izham Mohamad Yusoff ◽

Habibah Lateh ◽

Uznir Ujang

Keyword(s):

Data Structure ◽

3D Gis ◽

Highly Efficient

Download Full-text

Efficient Construction of a Complete Index for Pan-Genomics Read Alignment

10.1101/472423 ◽

2018 ◽

Author(s):

Alan Kuhnle ◽

Taher Mun ◽

Christina Boucher ◽

Travis Gagie ◽

Ben Langmead ◽

...

Keyword(s):

Data Structure ◽

State Of The Art ◽

Suffix Array ◽

Genomic Databases ◽

Run Length ◽

Slowing Down ◽

Human Genomes ◽

Efficient Construction ◽

Main Components ◽

Burrows Wheeler Transform

AbstractWhile short read aligners, which predominantly use the FM-index, are able to easily index one or a few human genomes, they do not scale well to indexing databases containing thousands of genomes. To understand why, it helps to examine the main components of the FM-index in more detail, which is a rank data structure over the Burrows-Wheeler Transform (BWT) of the string that will allow us to find the interval in the string’s suffix array (SA) containing pointers to starting positions of occurrences of a given pattern; second, a sample of the SA that — when used with the rank data structure — allows us access the SA. The rank data structure can be kept small even for large genomic databases, by run-length compressing the BWT, but until recently there was no means known to keep the SA sample small without greatly slowing down access to the SA. Now that Gagie et al. (SODA 2018) have defined an SA sample that takes about the same space as the run-length compressed BWT — we have the design for efficient FM-indexes of genomic databases but are faced with the problem of building them. In 2018 we showed how to build the BWT of large genomic databases efficiently (WABI 2018) but the problem of building Gagie et al.’s SA sample efficiently was left open. We compare our approach to state-of-the-art methods for constructing the SA sample, and demonstrate that it is the fastest and most space-efficient method on highly repetitive genomic databases. Lastly, we apply our method for indexing partial and whole human genomes, and show that it improves over Bowtie with respect to both memory and time.AvailabilityWe note that the implementation of our methods can be found here: https://github.com/alshai/r-index.

Download Full-text

d-PBWT: dynamic positional Burrows-Wheeler transform

Bioinformatics ◽

10.1093/bioinformatics/btab117 ◽

2021 ◽

Author(s):

Ahsan Sanaullah ◽

Degui Zhi ◽

Shaojie Zhang

Keyword(s):

Data Structure ◽

Genotype Imputation ◽

Supplementary Information ◽

Worst Case ◽

Average Case ◽

Insertion And Deletion ◽

Static Data ◽

Efficient Retrieval ◽

Dynamic Data Structure ◽

Burrows Wheeler Transform

Abstract Motivation Durbin’s positional Burrows-Wheeler transform (PBWT) is a scalable data structure for haplotype matching. It has been successfully applied to identical by descent (IBD) segment identification and genotype imputation. Once the PBWT of a haplotype panel is constructed, it supports efficient retrieval of all shared long segments among all individuals (long matches) and efficient query between an external haplotype and the panel. However, the standard PBWT is an array-based static data structure and does not support dynamic updates of the panel. Results Here, we generalize the static PBWT to a dynamic data structure, d-PBWT, where the reverse prefix sorting at each position is stored with linked lists.We also developed efficient algorithms for insertion and deletion of individual haplotypes. In addition, we verified that d-PBWT can support all algorithms of PBWT. In doing so, we systematically investigated variations of set maximal match and long match query algorithms: while they all have average case time complexity independent of database size, they have different worst case complexities and dependencies on additional data structures. Availability The benchmarking code is available at genome.ucf.edu/d-PBWT. Supplementary information Supplementary Materials are available at Bioinformatics online.

Download Full-text

Highly efficient and selective photocatalytic CO2 to CO conversion in aqueous solution

Chemical Communications ◽

10.1039/d0cc00879f ◽

2020 ◽

Vol 56 (27) ◽

pp. 3851-3854 ◽

Cited By ~ 6

Author(s):

Xiaomin Chai ◽

Hai-Hua Huang ◽

Huiping Liu ◽

Zhuofeng Ke ◽

Wen-Wen Yong ◽

...

Keyword(s):

Aqueous Solution ◽

Photocatalytic Performance ◽

Aqueous Media ◽

Highly Efficient ◽

Co Conversion

A Co-based complex displayed the highest photocatalytic performance for CO2 to CO conversion in aqueous media.

Download Full-text

Tuning the pore structures and photocatalytic properties of a 2D covalent organic framework with multi-branched photoactive moieties

Nanoscale ◽

10.1039/d0nr02994g ◽

2020 ◽

Vol 12 (30) ◽

pp. 16136-16142

Author(s):

Xuan Wang ◽

Ming-Jie Dong ◽

Chuan-De Wu

Keyword(s):

Photocatalytic Properties ◽

Effective Strategy ◽

Covalent Organic Framework ◽

Highly Efficient ◽

Pore Structures ◽

Local Connection ◽

Organic Framework

An effective strategy to incorporate accessible metalloporphyrin photoactive sites into 2D COFs by establishing a 3D local connection for highly efficient photocatalysis was developed.

Download Full-text