Shortest Common Superstrings

Consecutive Block ◽

Given a finite set of strings S = {s1,...,sm}, the shortest common superstring of S, is the shortest string s such that each si appears as a substring (a consecutive block) of s. . . . Example. . . . . . . Assume we want to find the shortest common superstring of all words in the sentence “alf ate half lethal alpha alfalfa.” Our set of strings is S = { alf, ate, half, lethal, alpha, alfalfa }. A trivial superstring of S is “alfatehalflethalalphaalfalfa”, of length 28. A shortest common superstring is “lethalphalfalfate”, of length 17, saving 11 characters. The above example shows an application of the shortest common superstring problem in data compression. In many programming languages, a character string may be represented by a pointer to that string. The problem for the compiler is to arrange strings so that they may be “overlapped” as much as possible in order to save space. For more data compression related issues, see next chapter. Other than compressing a sentence about Alf, the shortest common superstring problem has more important applications in DNA sequencing. A DNA sequence may be considered as a long character string over the alphabet of nucleotides {A, C, G, T}. Such a character string ranges from a few thousand symbols long for a simple virus, to 2 x 108 symbols for a fly and 3 x 109 symbols for a human being. Determining this string for different molecules, or sequencing the molecules, is a crucial step towards understanding the biological functions of the molecules. In fact, today, no problem in biochemistry can be studied in isolation from its genetic background. However, with current laboratory methods, such as Sanger’s procedure, it is quite impossible to sequence a long molecule directly as a whole. Each time, a randomly chosen fragment of less than 500 base pairs can be sequenced. In general, biochemists “cut”, using different restriction enzymes, millions of such (identical) molecules into pieces each typically containing about 200-500 nucleotides (characters). A biochemist “samples” the fragments and Sanger’s procedure is applied to sequence the sampled fragment. . . .

The shortest common superstring problem: average case analysis for both exact and approximate matching

IEEE Transactions on Information Theory ◽

10.1109/18.782108 ◽

1999 ◽

Vol 45 (6) ◽

pp. 1867-1886 ◽

Cited By ~ 9

Author(s):

En-hui Yang ◽

Zhen Zhang

Keyword(s):

Case Analysis ◽

Average Case Analysis ◽

Average Case ◽

Approximate Matching ◽

Advances in Artificial Intelligence – IBERAMIA 2004 - Lecture Notes in Computer Science ◽

A Genetic Algorithm for the Shortest Common Superstring Problem

10.1007/978-3-540-30498-2_85 ◽

2004 ◽

pp. 851-860

Author(s):

Luis C. González-Gurrola ◽

Carlos A. Brizuela ◽

Everardo Gutiérrez

Keyword(s):

Genetic Algorithm ◽

The structure of first-order causality

Mathematical Structures in Computer Science ◽

10.1017/s0960129510000459 ◽

2011 ◽

Vol 21 (1) ◽

pp. 65-110 ◽

Cited By ~ 3

Author(s):

SAMUEL MIMRAM

Keyword(s):

Programming Languages ◽

Propositional Logic ◽

Monoidal Category ◽

Denotational Semantics ◽

Generators And Relations ◽

First Order ◽

Game Semantics ◽

Finite Set ◽

Categorical Algebra ◽

Interactive Behaviour

Game semantics describe the interactive behaviour of proofs by interpreting formulas as games on which proofs induce strategies. Such a semantics is introduced here for capturing dependencies induced by quantifications in first-order propositional logic. One of the main difficulties that has to be faced during the elaboration of this kind of semantics is to characterise definable strategies, that is, strategies that actually behave like a proof. This is usually done by restricting the model to strategies satisfying subtle combinatorial conditions, whose preservation under composition is often difficult to show. In this paper we present an original methodology to achieve this task, which requires a combination of advanced tools from game semantics, rewriting theory and categorical algebra. We introduce a diagrammatic presentation of the monoidal category of definable strategies of our model using generators and relations: these strategies can be generated from a finite set of atomic strategies, and the equality between strategies admits a finite axiomatisation, and this equational structure corresponds to a polarised variation of the bialgebra notion. The work described in this paper thus forms a bridge between algebra and denotational semantics in order to reveal the structure of dependencies induced by first-order quantifiers, and lays the foundations for a mechanised analysis of causality in programming languages.

On multiple occurrences shortest common superstring problem

Applied Mathematical Sciences ◽

10.12988/ams.2013.13056 ◽

2013 ◽

Vol 7 ◽

pp. 641-644

Author(s):

A. Gorbenko ◽

V. Popov

Keyword(s):

Why Greed Works for Shortest Common Superstring Problem

Combinatorial Pattern Matching - Lecture Notes in Computer Science ◽

10.1007/978-3-540-69068-9_23 ◽

2008 ◽

pp. 244-254 ◽

Cited By ~ 8

Author(s):

Bin Ma

Keyword(s):

Deductive Semantics of RTPA

Software Applications ◽

10.4018/978-1-60566-060-8.ch171 ◽

2009 ◽

pp. 2915-2942

Author(s):

Yingxu Wang

Keyword(s):

Programming Languages ◽

Large Scale ◽

Formal Semantics ◽

Semantic Theory ◽

Software Systems ◽

Software Comprehension ◽

Support Tool ◽

Semantic Function ◽

Semantics Of Programming Languages ◽

Finite Set

Deductive semantics is a novel software semantic theory that deduces the semantics of a program in a given programming language from a unique abstract semantic function to the concrete semantics embodied by the changes of status of a finite set of variables constituting the semantic environment of the program. There is a lack of a generic semantic function and its unified mathematical model in conventional semantics, which may be used to explain a comprehensive set of programming statements and computing behaviors. This article presents a complete paradigm of formal semantics that explains how deductive semantics is applied to specify the semantics of real-time process algebra (RTPA) and how RTPA challenges conventional formal semantic theories. Deductive semantics can be applied to define abstract and concrete semantics of programming languages, formal notation systems, and large-scale software systems, to facilitate software comprehension and recognition, to support tool development, to enable semantics-based software testing and verification, and to explore the semantic complexity of software systems. Deductive semantics may greatly simplify the description and analysis of the semantics of complicated software systems specified in formal notations and implemented in programming languages.

Barriers to nuclease Bal31 digestion across specific sites in simian virus 40 chromatin

Molecular and Cellular Biology ◽

10.1128/mcb.4.4.604-610.1984 ◽

1984 ◽

Vol 4 (4) ◽

pp. 604-610

Author(s):

W A Scott ◽

C F Walter ◽

B L Cryer

Keyword(s):

Short Distance ◽

Simian Virus 40 ◽

Restriction Enzymes ◽

Simian Virus ◽

Origin Of Replication ◽

Base Pairs ◽

Open Region ◽

Sensitive Structure ◽

Virus Sv40 ◽

Two Populations

A portion of the nucleoprotein containing viral DNA extracted from cells infected by simian virus (SV40) is preferentially cleaved by endonucleases in a region of the genome encompassing the origin of replication and early and late promoters. To explore this nuclease-sensitive structure, we cleaved SV40 chromatin molecules with restriction enzymes and digested the exposed termini with nuclease Bal31. Digestion proceeded only a short distance in the late direction from the MspI site, but some molecules were degraded 400 to 500 base pairs in the early direction. By comparison, BglI-cleaved chromatin was digested for only a short distance in the early direction, but some molecules were degraded 400 to 450 base pairs in the late direction. These barriers to Bal31 digestion (bracketing the BglI and the MspI sites) define the borders of the same open region in SV40 chromatin that is preferentially digested by DNase I and other endonucleases. In a portion of the SV40 chromatin, Bal31 could not digest through the nuclease-sensitive region and reached barriers after digesting only 50 to 100 base pairs from one end or the other. Chromatin molecules that contain barriers in the BglI to MspI region are physically distinct from molecules that are open in this region as evidenced by partial separation of the two populations on sucrose density gradients.

Linear mitochondrial deoxyribonucleic acid from the yeast Hansenula mrakii.

Molecular and Cellular Biology ◽

10.1128/mcb.1.5.387 ◽

1981 ◽

Vol 1 (5) ◽

pp. 387-393 ◽

Cited By ~ 38

Author(s):

M Wesolowski ◽

H Fukuhara

Keyword(s):

Deoxyribonucleic Acid ◽

Yeast Species ◽

Restriction Analysis ◽

Restriction Enzymes ◽

Specific Gene ◽

Restriction Map ◽

Base Pairs ◽

Linear Restriction ◽

Gene Map ◽

Petite Negative Yeast

The mitochondrial deoxyribonucleic acid (mtDNA) from a petite-negative yeast, Hansenula mrakii, was studied. A linear restriction map was constructed with 11 restriction enzymes. The linearity of the genome was confirmed by direct end labeling of the molecule, followed by restriction analysis. The molecular weight of the DNA was found to be 55,000 base pairs. This is the first linear mtDNA found in yeast species. Using specific gene probes obtained from Saccharomyces cerevisiae mtDNA, we have constructed a gene map of H. mrakii mtDNA. The arrangement of genes in this linear genome was very different from the circular mtDNA of other known yeasts.

A tissue P system and a DNA microfluidic device for solving the shortest common superstring problem

Soft Computing ◽

10.1007/s00500-004-0423-2 ◽

2004 ◽

Vol 9 (9) ◽

pp. 691-691 ◽

Cited By ~ 1

Author(s):

Lucas Ledesma ◽

Daniel Manrique ◽

Alfonso Rodríguez-Patón

Keyword(s):

Microfluidic Device ◽

P System ◽

Tissue P System ◽

Brain computation by assemblies of neurons

10.1101/869156 ◽

2019 ◽

Author(s):

Christos H. Papadimitriou ◽

Santosh S. Vempala ◽

Daniel Mitropolsky ◽

Michael Collins ◽

Wolfgang Maass

Keyword(s):

Programming Languages ◽

Brain Area ◽

Synaptic Connectivity ◽

Hebbian Plasticity ◽

Large Populations ◽

Finite Set ◽

Cognitive Information ◽

Coherent Sequence ◽

High Level ◽

Assembly Operations

AbstractAssemblies are large populations of neurons believed to imprint memories, concepts, words and other cognitive information. We identify a repertoire of operations on assemblies. These operations correspond to properties of assemblies observed in experiments, and can be shown, analytically and through simulations, to be realizable by generic, randomly connected populations of neurons with Hebbian plasticity and inhibition. Operations on assemblies include: projection (duplicating an assembly by creating a new assembly in a downstream brain area); reciprocal projection (a variant of projection also entailing synaptic connectivity from the newly created assembly to the original one); association (increasing the overlap of two assemblies in the same brain area to reflect cooccurrence or similarity of the corresponding concepts); merge (creating a new assembly with ample synaptic connectivity to and from two existing ones); and pattern-completion (firing of an assembly, with some probability, in response to the firing of some but not all of its neurons). Our analytical results establishing the plausibility of these operations are proved in a simplified mathematical model of cortex: a finite set of brain areas each containing n excitatory neurons, with random connectivity that is both recurrent (within an area) and afferent (between areas). Within one area and at any time, only k of the n neurons fire — an assumption that models inhibition and serves to define both assemblies and areas — while synaptic weights are modified by Hebbian plasticity, as well as homeostasis. Importantly, all neural apparatus needed for the functionality of the assembly operations is created on the fly through the randomness of the synaptic network, the selection of the k neurons with the highest synaptic input, and Hebbian plasticity, without any special neural circuits assumed to be in place. Assemblies and their operations constitute a computational model of the brain which we call the Assembly Calculus, occupying a level of detail intermediate between the level of spiking neurons and synapses, and that of the whole brain. As with high-level programming languages, a computation in the Assembly Calculus (that is, a coherent sequence of assembly operations accomplishing a task) can ultimately be reduced — “compiled down” — to computation by neurons and synapses; however, it would be far more cumbersome and opaque to represent the same computation that way. The resulting computational system can be shown, under assumptions, to be in principle capable of carrying out arbitrary computations. We hypothesize that something like it may underlie higher human cognitive functions such as reasoning, planning, and language. In particular, we propose a plausible brain architecture based on assemblies for implementing the syntactic processing of language in cortex, which is consistent with recent experimental results.