Shortest Common Superstrings

Author(s):  
M. Li ◽  
T. Jiang

Given a finite set of strings S = {s1,...,sm}, the shortest common superstring of S, is the shortest string s such that each si appears as a substring (a consecutive block) of s. . . . Example. . . . . . . Assume we want to find the shortest common superstring of all words in the sentence “alf ate half lethal alpha alfalfa.” Our set of strings is S = { alf, ate, half, lethal, alpha, alfalfa }. A trivial superstring of S is “alfatehalflethalalphaalfalfa”, of length 28. A shortest common superstring is “lethalphalfalfate”, of length 17, saving 11 characters. The above example shows an application of the shortest common superstring problem in data compression. In many programming languages, a character string may be represented by a pointer to that string. The problem for the compiler is to arrange strings so that they may be “overlapped” as much as possible in order to save space. For more data compression related issues, see next chapter. Other than compressing a sentence about Alf, the shortest common superstring problem has more important applications in DNA sequencing. A DNA sequence may be considered as a long character string over the alphabet of nucleotides {A, C, G, T}. Such a character string ranges from a few thousand symbols long for a simple virus, to 2 x 108 symbols for a fly and 3 x 109 symbols for a human being. Determining this string for different molecules, or sequencing the molecules, is a crucial step towards understanding the biological functions of the molecules. In fact, today, no problem in biochemistry can be studied in isolation from its genetic background. However, with current laboratory methods, such as Sanger’s procedure, it is quite impossible to sequence a long molecule directly as a whole. Each time, a randomly chosen fragment of less than 500 base pairs can be sequenced. In general, biochemists “cut”, using different restriction enzymes, millions of such (identical) molecules into pieces each typically containing about 200-500 nucleotides (characters). A biochemist “samples” the fragments and Sanger’s procedure is applied to sequence the sampled fragment. . . .

2011 ◽  
Vol 21 (1) ◽  
pp. 65-110 ◽  
Author(s):  
SAMUEL MIMRAM

Game semantics describe the interactive behaviour of proofs by interpreting formulas as games on which proofs induce strategies. Such a semantics is introduced here for capturing dependencies induced by quantifications in first-order propositional logic. One of the main difficulties that has to be faced during the elaboration of this kind of semantics is to characterise definable strategies, that is, strategies that actually behave like a proof. This is usually done by restricting the model to strategies satisfying subtle combinatorial conditions, whose preservation under composition is often difficult to show. In this paper we present an original methodology to achieve this task, which requires a combination of advanced tools from game semantics, rewriting theory and categorical algebra. We introduce a diagrammatic presentation of the monoidal category of definable strategies of our model using generators and relations: these strategies can be generated from a finite set of atomic strategies, and the equality between strategies admits a finite axiomatisation, and this equational structure corresponds to a polarised variation of the bialgebra notion. The work described in this paper thus forms a bridge between algebra and denotational semantics in order to reveal the structure of dependencies induced by first-order quantifiers, and lays the foundations for a mechanised analysis of causality in programming languages.


2009 ◽  
pp. 2915-2942
Author(s):  
Yingxu Wang

Deductive semantics is a novel software semantic theory that deduces the semantics of a program in a given programming language from a unique abstract semantic function to the concrete semantics embodied by the changes of status of a finite set of variables constituting the semantic environment of the program. There is a lack of a generic semantic function and its unified mathematical model in conventional semantics, which may be used to explain a comprehensive set of programming statements and computing behaviors. This article presents a complete paradigm of formal semantics that explains how deductive semantics is applied to specify the semantics of real-time process algebra (RTPA) and how RTPA challenges conventional formal semantic theories. Deductive semantics can be applied to define abstract and concrete semantics of programming languages, formal notation systems, and large-scale software systems, to facilitate software comprehension and recognition, to support tool development, to enable semantics-based software testing and verification, and to explore the semantic complexity of software systems. Deductive semantics may greatly simplify the description and analysis of the semantics of complicated software systems specified in formal notations and implemented in programming languages.


1984 ◽  
Vol 4 (4) ◽  
pp. 604-610
Author(s):  
W A Scott ◽  
C F Walter ◽  
B L Cryer

A portion of the nucleoprotein containing viral DNA extracted from cells infected by simian virus (SV40) is preferentially cleaved by endonucleases in a region of the genome encompassing the origin of replication and early and late promoters. To explore this nuclease-sensitive structure, we cleaved SV40 chromatin molecules with restriction enzymes and digested the exposed termini with nuclease Bal31. Digestion proceeded only a short distance in the late direction from the MspI site, but some molecules were degraded 400 to 500 base pairs in the early direction. By comparison, BglI-cleaved chromatin was digested for only a short distance in the early direction, but some molecules were degraded 400 to 450 base pairs in the late direction. These barriers to Bal31 digestion (bracketing the BglI and the MspI sites) define the borders of the same open region in SV40 chromatin that is preferentially digested by DNase I and other endonucleases. In a portion of the SV40 chromatin, Bal31 could not digest through the nuclease-sensitive region and reached barriers after digesting only 50 to 100 base pairs from one end or the other. Chromatin molecules that contain barriers in the BglI to MspI region are physically distinct from molecules that are open in this region as evidenced by partial separation of the two populations on sucrose density gradients.


1981 ◽  
Vol 1 (5) ◽  
pp. 387-393 ◽  
Author(s):  
M Wesolowski ◽  
H Fukuhara

The mitochondrial deoxyribonucleic acid (mtDNA) from a petite-negative yeast, Hansenula mrakii, was studied. A linear restriction map was constructed with 11 restriction enzymes. The linearity of the genome was confirmed by direct end labeling of the molecule, followed by restriction analysis. The molecular weight of the DNA was found to be 55,000 base pairs. This is the first linear mtDNA found in yeast species. Using specific gene probes obtained from Saccharomyces cerevisiae mtDNA, we have constructed a gene map of H. mrakii mtDNA. The arrangement of genes in this linear genome was very different from the circular mtDNA of other known yeasts.


2019 ◽  
Author(s):  
Christos H. Papadimitriou ◽  
Santosh S. Vempala ◽  
Daniel Mitropolsky ◽  
Michael Collins ◽  
Wolfgang Maass

AbstractAssemblies are large populations of neurons believed to imprint memories, concepts, words and other cognitive information. We identify a repertoire of operations on assemblies. These operations correspond to properties of assemblies observed in experiments, and can be shown, analytically and through simulations, to be realizable by generic, randomly connected populations of neurons with Hebbian plasticity and inhibition. Operations on assemblies include: projection (duplicating an assembly by creating a new assembly in a downstream brain area); reciprocal projection (a variant of projection also entailing synaptic connectivity from the newly created assembly to the original one); association (increasing the overlap of two assemblies in the same brain area to reflect cooccurrence or similarity of the corresponding concepts); merge (creating a new assembly with ample synaptic connectivity to and from two existing ones); and pattern-completion (firing of an assembly, with some probability, in response to the firing of some but not all of its neurons). Our analytical results establishing the plausibility of these operations are proved in a simplified mathematical model of cortex: a finite set of brain areas each containing n excitatory neurons, with random connectivity that is both recurrent (within an area) and afferent (between areas). Within one area and at any time, only k of the n neurons fire — an assumption that models inhibition and serves to define both assemblies and areas — while synaptic weights are modified by Hebbian plasticity, as well as homeostasis. Importantly, all neural apparatus needed for the functionality of the assembly operations is created on the fly through the randomness of the synaptic network, the selection of the k neurons with the highest synaptic input, and Hebbian plasticity, without any special neural circuits assumed to be in place. Assemblies and their operations constitute a computational model of the brain which we call the Assembly Calculus, occupying a level of detail intermediate between the level of spiking neurons and synapses, and that of the whole brain. As with high-level programming languages, a computation in the Assembly Calculus (that is, a coherent sequence of assembly operations accomplishing a task) can ultimately be reduced — “compiled down” — to computation by neurons and synapses; however, it would be far more cumbersome and opaque to represent the same computation that way. The resulting computational system can be shown, under assumptions, to be in principle capable of carrying out arbitrary computations. We hypothesize that something like it may underlie higher human cognitive functions such as reasoning, planning, and language. In particular, we propose a plausible brain architecture based on assemblies for implementing the syntactic processing of language in cortex, which is consistent with recent experimental results.


Sign in / Sign up

Export Citation Format

Share Document