ADAPTIVE CONTROL OF HYBRIDIZATION NOISE IN DNA SEQUENCING-BY-HYBRIDIZATION

2005 ◽  
Vol 03 (01) ◽  
pp. 79-98
Author(s):  
HON-WAI LEONG ◽  
FRANCO P. PREPARATA ◽  
WING-KIN SUNG ◽  
HUGO WILLY

We consider the problem of sequence reconstruction in sequencing-by-hybridization in the presence of spectrum errors. As suggested by intuition, and reported in the literature, false-negatives (i.e., missing spectrum probes) are by far the leading cause of reconstruction failures. In a recent paper we have described an algorithm, called "threshold-θ", designed to recover from false negatives. This algorithm is based on overcompensating for missing extensions by allowing larger reconstruction subtrees. We demonstrated, both analytically and with simulations, the increasing effectiveness of the approach as the parameter θ grows, but also pointed out that for larger error rates the size of the extension trees translates into an unacceptable computational burden. To obviate this shortcoming, in this paper we propose an adaptive approach which is both effective and efficient. Effective, because for a fixed value of θ it performs as well as its single-threshold counterpart, efficient because it exhibits substantial speed-ups over it. The idea is that, for moderate error rates a small fraction of the target sequence can be involved in error recovery; thus, expectedly the remainder of the sequence is reconstructible by the standard noiseless algorithm, with the provision to switch to operation with increasingly higher thresholds after detecting failure. This policy generates interesting and complex interplays between fooling probes and false negatives. These phenomena are carefully analyzed for random sequences and the results are found to be in excellent agreement with the simulations. In addition, the experimental algorithmic speed-ups of the multithreshold approach are explained in terms of the interaction amongst the different threshold regimes.

2011 ◽  
Vol 59 (1) ◽  
pp. 111-115 ◽  
Author(s):  
K. Kwarciak ◽  
P. Formanowicz

A greedy algorithm for the DNA sequencing by hybridization with positive and negative errors and information about repetitionsIn this paper a greedy algorithm for some variants of the sequencing by hybridization method is presented. In the standard version of the method information about repetitions is not available. In the paper it is assumed that a partial information of this type is a part of the problem instance. Here two simple but realistic models of this information are taken into consideration. The first one assumes it is known if a given element of a spectrum appears in the target sequence once or more than once. The second model uses the knowledge if a given element of a spectrum occurs in the analyzed sequence once, twice or at least three times. The proposed greedy algorithm solves the variant of the problem with positive and negative errors. Results of a computational experiment are reported which, among others, confirm that the additional information leads to the improvement of the obtained solutions. They also show that the more precise model of information increases the quality of reconstructed sequences.


2000 ◽  
Vol 125 (2) ◽  
pp. 257-265 ◽  
Author(s):  
J Błażewicz ◽  
P Formanowicz ◽  
M Kasprzak ◽  
W.T Markiewicz ◽  
J Wȩglarz

1996 ◽  
Vol 07 (01) ◽  
pp. 87-93 ◽  
Author(s):  
ART M. DUVAL ◽  
W.F. SMYTH

A nonempty circular string C(x) of length n is said to be covered by a set Uk of strings each of fixed length k≤n iff every position in C(x) lies within an occurrence of some string u∈Uk. In this paper we consider the problem of determining the minimum cardinality of a set Uk which guarantees that every circular string C(x) of length n≥k can be covered. In particular, we show how, for any positive integer m, to choose the elements of Uk so that, for sufficiently large k, uk≈σk–m, where uk=|Uk| and σ is the size of the alphabet on which the strings are defined. The problem has application to DNA sequencing by hybridization using oligonucleotide probes.


Sign in / Sign up

Export Citation Format

Share Document