disease gene mapping Latest Research Papers

AbstractRelationship estimation and segment detection between individuals is an important aspect of disease gene mapping. Existing methods are either tailored for computational efficiency, or require phasing to improve accuracy. We developed TRUFFLE, a method that integrates computational techniques and statistical principles for the identification and visualization of identity-by-descent (IBD) segments using un-phased data. By skipping the haplotype phasing step and, instead, relying on a simpler region-based approach, our method is computationally efficient while maintaining inferential accuracy. In addition, an error model corrects for segment break-ups that occur as a consequence of genotyping errors. TRUFFLE can estimate relatedness for 3.1 million pairs from the 1000 Genomes Project data in a few minutes on a typical laptop computer. Consistent with expectation, we identified only three second cousin or closer pairs across different populations, while commonly used methods identified a large number of such pairs. Similarly, within populations, we identified much fewer related pairs. Compared to methods relying on phased data, TRUFFLE has comparable accuracy but is drastically faster and has fewer broken segments. We also identified specific local genomic regions that are commonly shared within populations, suggesting selection. When applied to pedigree data, we observed 99.6% accuracy in detecting 1st to 5th degree relationships. As genomic datasets become much larger, TRUFFLE can enable disease gene mapping through implicit shared haplotypes by accurate IBD segment detection.

Download Full-text

A practical parameterised algorithm for the individual haplotyping problem MLF

Mathematical Structures in Computer Science ◽

10.1017/s096012951000023x ◽

2010 ◽

Vol 20 (5) ◽

pp. 851-863 ◽

Cited By ~ 4

Author(s):

MINZHU XIE ◽

JIANXIN WANG ◽

JIANER CHEN

Keyword(s):

Dna Sequence ◽

Complex Disease ◽

Disease Gene ◽

Exact Algorithm ◽

Biological Applications ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Disease Gene Mapping ◽

A Genome ◽

The Individual

Haplotypes are more useful in complex disease gene mapping than single-nucleotide polymorphisms (SNPs). However, haplotypes are difficult to obtain directly using biological experiments, which has prompted research into efficient computational methods for determining haplotypes. The individual haplotyping problem called Minimum Letter Flip (MLF) is a computational problem that, given a set of aligned DNA sequence fragment data of an individual, induces the corresponding haplotypes by flipping minimum SNPs. There has been no practical exact algorithm for solving the problem. Due to technical limits in DNA sequencing experiments, the maximum length of a fragment sequenced directly is about 1kb. In consequence, with a genome-average SNP density of 1.84 SNPs per 1 kb of DNA sequence, the maximum number k1 of SNP sites that a fragment covers is usually small. Moreover, in order to save time and money, the maximum number k2 of fragments that cover an SNP site is usually no more than 19. Building on these fragment data properties, the current paper introduces a new parameterised algorithm with running time O(nk22k2 + mlogm + mk1), where m is the number of fragments and n is the number of SNP sites. In practical biological applications, the algorithm solves the MLF problem efficiently even if m and n are large.

Download Full-text