Information Modeling: The EXPRESS Way
Latest Publications


TOTAL DOCUMENTS

23
(FIVE YEARS 0)

H-INDEX

0
(FIVE YEARS 0)

Published By Oxford University Press

9780195087147, 9780197560532

Author(s):  
Douglas Schenck ◽  
Peter Wilson

Now we turn to the question: ‘Once I have created an abstract declaration in EXPRESS, what would an instance of that thing look like?’ EXPRESS-I allows you to create instances of EXPRESS things that have values in place of references to datatypes. The main reason for doing this is to study some realistic examples of things that otherwise might be difficult to understand. After all, it is one thing to describe a tree and quite another to actually see one. Some of the design goals of EXPRESS-I are based on these requirements: • Major information modeling projects are large and complex. Managing them without appropriate tools based on formal languages and methods is a risky proposition. Informal specification techniques eliminate the possibility of employing computer automation in checking for inconsistencies in presentation or specification. • The language should focus on the display of the realization of the properties of entities, which are the things of interest. The definition of entities is in terms of data and behavior. Data represents the properties by which an entity is realized and behavior is represented by constraints. • The language should seek to avoid, as far as possible, specific implementation views. That is, EXPRESS-I models do not suggest the structure of databases, object bases, or of information bases in general. • The language should provide a means for displaying small populated models of EXPRESS schemas as examples for design reviews. • The language should provide a means for supporting the specification of test suites for information model processors. EXPRESS-I represents entity instances in terms of the values of its attributes (attributes are the traits or characteristics considered important for use and understanding). These values have a representation which might be considered simple (an integer value) or something more complex (an entity value). A geometric point might be defined in terms of three real numbers named x, y and z, and the actual values associated with those attributes might be 1.0, 2.5 and 7.9. The EXPRESS-I instance language provides a means of displaying instantiations of EXPRESS data elements. The language is designed principally for human readability and for ease of generating EXPRESS-I element instances from definitions in an EXPRESS schema.


Author(s):  
Douglas Schenck ◽  
Peter Wilson

Here is the entire syntax except for constant tokens (i.e., reserved words and such), character sets, standard constants, functions and procedures, and simple equates to tokens that create or reference identifiers. The following conventions will help to interpret the syntax presentation: • Identifiers written in upper case letters are keywords of the language. For example, when you see SCHEMA this means that the word ‘schema’ must be written at this place (using mixed case if you wish.) The names of these syntax productions are identical to the keywords of the language. • Elements that follow the pattern xxxDef represent an identifier declaration. For example, VarDef shows where a variable declaration takes place. This also implies that the name created at this place is subject to the scoping rules that apply to the object in question. • Elements that follow the pattern xxxRef represent a reference to some explicit definition. For example, a VarRef requires a VarDef.


Author(s):  
Douglas Schenck ◽  
Peter Wilson

Expressions are combinations of operators and operands which are evaluated to produce a value of a specific type. Infix operators require two operands with an operator written between them. A prefix operator requires one operand with an operator written before it. (The expression syntax starts on page 208.) Evaluation proceeds from left to right, governed by the precedence of the operators. The lowest numbered precedence as shown in Table 14.1 is evaluated first. Operators in the same row have the same precedence. Expressions enclosed by parentheses are evaluated before being treated as a single operand. An operand between two operators of different precedence is bound to the operator with the higher one; e.g., −10*20 means (−10)*20. An operand between two operators of the same precedence is bound to the one on the left; e.g., 10/20 * 30 means (10/20) * 30. Exercise 14.1 Work out the intermediate steps for this expression: … −2/(4+4)*5+6… When a null value is encountered in an expression where a non-null is expected, evaluation is short circuited and a null answer is produced. Otherwise, all expressions are fully evaluated even when the outcome is known after partial evaluation. Exercise 14.2 Can you think of an expression that does not require complete evaluation to get the correct answer? The operands of an operator must be compatible with the operator and with each other. Operands can be compatible without having identical types and are compatible when any of these conditions are satisfied: • The types are the same. • One type is a subtype of the other (e.g., one is a number and the other is an integer. • Both types are strings. • Both types are binaries. • Both types are arrays which have compatible base types and identical bounds. • Both types are bags which have compatible base types. • Both types are lists which have compatible base types. • Both types are sets which have compatible base types. Operations are organized by the kind of result they produce, namely: numeric, boolean or logical, string or binary, or aggregate.


Author(s):  
Douglas Schenck ◽  
Peter Wilson

Executable statements define the actions of functions, procedures and rules. They define the logic and actions needed to support the definition of constraints by acting on parameters, local variables and constants. The shortest possible ‘executable’ statement is just a semicolon. It is called a null statement because it does nothing. Such a statement is not useless, however, as you can use a null statement to stake out territory for future use, or perhaps to make the absence of a statement stand out more clearly as in the example following. …IF a = 13 THEN ; -- do nothing ELSE b := 5 ; -- otherwise give b a value END_IF ;... The Alias statement gives a short name (alias) to an identifier that might be long or clumsy to write. The alias exists only in the scope of the alias statement and references to the alias is the same as writing out the identifier out in full. The assignment statement is used to give a value to a local variable or parameter. The type of the expression assigned to the variable must be compatible with the variable or parameter. Some assignments are shown below. The target variable and the expression being assigned to it are assignment compatible if any of the following hold true: • The types are the same. • The expression results in a type which is a subtype of the type declared for the variable being assigned to. • The type of the variable being assigned to is a select type and the expression results in a type which is a member of that select type. The Case statement executes one (or perhaps zero) statement based on the value of an expression. The statement executed is chosen depending on the value of the Selector. The case statement consists of an expression, which is the case selector and a list of alternative actions, each one preceded by a case label. Agreement between the type of the case label and the case selector is required. The first occurring statement having a case label that evaluates to the same value of the case selector is executed.


Author(s):  
Douglas Schenck ◽  
Peter Wilson

This chapter explains the EXPRESS pseudotypes and datatypes. You will also want to read about defined types and entity types, both of which are covered in Chapter 11. Datatypes represent domains of values. A domain is the set of possible values associated with an attribute, local variable or formal parameter. Datatype values can be operated upon as explained in Chapter 14. EXPRESS is fussy about the way datatypes are used. The datatypes are grouped this way: • Pseudo (Generic and Aggregate — see 10.1) • Simple (Integer, String, etc. — see 10.2) • Collection (Array, List, etc. — see 10.3) • Enumeration and Select (see 10.4 and 10.5). • Named (entities and defined types — Chapter 11) Then, the context in which a reference to a datatype is made will be • as the type of an attribute, • as the type of a local variable, • as the type of a formal parameter, or • as the underlying type of a defined type. At last, a summary of the datatypes that can be used in the different contexts is given in Table 10.1. Notice that pseudotypes can only be used as formal parameter types and, the enumeration and select types can only be used as the underlying types of defined types. Pseudotypes are used only as the types of the formal parameters of functions and procedures. They can be regarded as templates into which various specific types can be placed. See 11.5.1 for more about formal parameters. The domain of a generic pseudotype is every conceivable value. When a procedure or function that has a generic type parameter is invoked it will accept any kind of actual parameter. No questions asked! Functions or procedures that use formal parameters typed as generic must be prepared to deal with whatever actual stuff is tossed its way and any operations performed on them will depend on the specific type of the actual parameter. Generic parameters should never be used when a more specific type can be used instead. In any event, the mechanics involved in writing an algorithm that is capable of handling every possible input value are tricky. The message is: Don’t use generic parameters unless you simply have to.


Author(s):  
Douglas Schenck ◽  
Peter Wilson

The major topic of this Chapter is modeling in the large. By this we mean looking at methods to integrate several different models into a single cohesive whole, and also examining the inverse problem of extracting a small specialized model from a larger one. Before we can sensibly discuss these, though, we need to look more closely at two aspects of EXPRESS, namely subtyping and schema interfacing. EXPRESS has a concept of Supertype and Subtype which taken together enable a type lattice to be constructed. A Subtype-Supertype relationship is typically called an ‘Isa’ relationship in data modeling terms. That is, a Subtype is a kind of its Supertype(s). For example, if we define an entity pet and also define two subtypes of this called cat and goldfish then an instance of cat is also an instance of pet, and similarly for a goldfish. The relationship, though, is asymmetrical, as a pet may be something other than a cat or goldfish; for instance a pet may also be a dog. A Subtype is a more specialized kind of thing than its Supertype and, conversely, a Supertype is a generalization of its Subtypes. A Subtype inherits all the attributes and constraints of its Supertype(s). In EXPRESS an entity is a Subtype of another entity if it declares its Supertype entity within its Subtype declaration. A Supertype does not declare its Subtypes. In general, an instance of a Subtype requires an instantiation of each of its Supertypes, while an instance of a Supertype does not require instantiation of its Subtypes. This later behavior may be modified by declaring the Supertype to be an Abstract Supertype, in which case an instance of the Supertype does require instantiation of at least one of its Subtypes. In most data modeling and Object Oriented languages that support similar notions to Subtyping, if an instance of a Supertype requires instantiation of a Subtype, then one and only one Subtype can be instantiated. EXPRESS does not have this restriction. Unless otherwise constrained, an instance of a Supertype may be accompanied by one instance of each of its Subtypes.


Author(s):  
Douglas Schenck ◽  
Peter Wilson

EXPRESS-G is a graphical notation for the display of information models. Using the EXPRESS language, an information model is represented by sentences in the language. In EXPRESS-G, an information model is represented by graphic symbols forming a diagram. Although EXPRESS-G has been specifically developed for the graphical rendition of information models defined in the EXPRESS language, it may be used as a modeling technology in its own right. EXPRESS-G supports the notions of entity, type, relationship and cardinality. It also separately supports the notion of schema. The notation only supports a subset of the EXPRESS language as it does not provide any support for the complex constraints which can be represented in the EXPRESS lexical language. The design goals for the notation are: • The diagrams should be intuitively understandable. • The diagrams should support levels of model abstraction. • A diagram must be able to span more than one sheet of paper. • The pictures should be definable using minimal computer graphics capabilities. Further, it should be possible to print the diagrams using only non-graphic symbols, for example on a line printer. • It should be possible to develop a processor that automatically converts from EXPRESS source to the graphical description. EXPRESS-G requires almost minimal graphical capabilities, namely the ability to draw straight lines of three kinds, to draw rectangular and rounded boxes, to draw small circles, and to put text onto a drawing. Two kinds of boxes are used as symbols: Definition These symbols denote the things (i.e., concepts, ideas, etc.) which form the basis of the information model. Rectangular boxes are used for these symbols. Composition These symbols enable a model diagram to be displayed on more than one sheet of paper. Boxes with rounded corners are used for these. Three styles of lines are used by EXPRESS-G — a thin solid line, a thick solid line, and a dashed line — each of which should be readily distinguishable. For computer displays that support graphics there should be no problems in choosing suitable line styles. For displays that only support a single line width, thick lines can be drawn as two closely spaced parallel lines. For line printer type displays, the lines have to be drawn using characters rather than graphics.


Author(s):  
Douglas Schenck ◽  
Peter Wilson

This Chapter explains the objects that you create to represent information of interest to you. The purpose of EXPRESS, and the information modeling process in general, centers on these declarations. The main object you declare is a schema. Within a schema you might declare constants, types, entities, functions, procedures and rules. Within those things are sub-objects such as attributes, local variables and parameters. The structure of EXPRESS source defines the scope of declarations. The statements Schema.. .End_Schema mark the boundaries for a group of declarations. In a like manner an entity or function declaration bounds the declarations made within them and so forth. This process of declaring things inside other things indicates the way one declaration ‘owns’ other things. In the case of an entity declaration, the attributes and local rules represent the properties that define it. So, an attribute is owned by a particular entity and also is a property of it. The rule is an exception to the way the structure of the source shows ownership and property definition. Rules are structurally subordinate to a schema, but they are logically subordinate to the entities to which they apply. Unlike an attribute that is owned by exactly one entity, a rule can be owned by more than one of them and conversely, a rule can be a property of several entities. This exception to an otherwise orderly correlation between the source structure and the way ownership and property definition is shown is a potential cause of confusion. Keep this special situation in mind as you build information models with EXPRESS. Exercise 11.1 Why do you think the authors of EXPRESS treated the rule differently? A schema declaration surrounds every other thing you declare. Therefore, SCHEMA YourSchemaName; END_SCHEMA; is the smallest possible EXPRESS legal source. Normally, however, a schema declaration contains declarations of constants, entities, functions, procedures, rules and types. A schema defines a collection of objects that have a related meaning and purpose. For example, geometry might be the name of a schema that collects declarations of points, curves, surfaces and other related objects. The order in which objects are declared is never relevant to the meaning of a schema as a whole or to the individual things declared in it.


Author(s):  
Douglas Schenck ◽  
Peter Wilson

We start with the basic stuff used to build EXPRESS: the characters you use to build ‘words’ and the different kinds of words that may appear in the source. Computer languages call these words tokens. You compose characters according to syntax rules to form different kinds of tokens. In turn, tokens are composed into statements and statements are composed into blocks (we will get to statements and blocks directly). The syntax alone does not define the language however — we also need to be concerned with semantics. Semantics deals with the meaning of well formed syntax. As an example, EXPRESS has tokens for identifiers and reserved words. Although they both look like ‘words’ they are different; an identifier cannot be the same as a keyword. EXPRESS expects to see identifiers and reserved words in specific places within statements. It is an error to put an identifier where a reserved word is expected and vice versa. EXPRESS is correctly written only when both the syntactic and semantical rules are observed. EXPRESS source is composed as a stream of characters. The source stream is decoded into tokens, statements and blocks according to the syntax rules. The source is typically broken into a number of physical lines, which is any number (including zero) of characters ended by a ‘newline.’ The source will be more attractive and easier to read when statements are broken into lines and whitespace is used to set off different constructs. The following declarations are the same to an EXPRESS parser, but give different impressions to human readers. The EXPRESS character set is easy to explain —just look at your computer keyboard. Most of the characters you see there are used to write EXPRESS. However, there is a more complicated explanation which follows: the EXPRESS character set is denned as cells 00-7F of plane 00 of group 00 of ISO 10646. Of those characters, cells 20-7E as shown in Table 9.1 are actually used to write EXPRESS. Any other character (cells 00-1F, 7F) is called a ‘rogue’ character, and if used is treated as a space unless it appears within a string literal or a remark.


Author(s):  
Douglas Schenck ◽  
Peter Wilson

EXPRESS-G has three basic kinds of symbol; defintion, relation, and composition. Definition and relation symbols are used to define the contents and structure of an information model. Composition symbols enable the diagrams to be spread across many physical pages. A definition symbol is a rectangle enclosing the name of the thing being defined. The type of the definition is denoted by the style of the box. Symbols are provided for EXPRESS simple types, defined types, entity types and schemas. The EXPRESS language offers a number of predefined simple types, namely Binary, Boolean, Integer, Logical, Number, Real and String. These are the terminal types of the language. The symbol for them is a solid rectangle with a double vertical line at its right end. The name of the type is enclosed within the box, as shown in Figure 18.1. The EXPRESS Generic pseudotype is not represented in EXPRESS-G as it is only used as a formal parameter to a function or procedure, and EXPRESS-G does not have these. The symbols for the select, enumeration and defined data type are dashed boxes as shown in Figure 18.2. • The symbol for a defined data type is a dashed box enclosing the name of the type. • The symbol for a select type is a dashed box with a double vertical line at the left end, enclosing the name of the select. • The symbol for an enumeration type is a dashed box with a double vertical line at the right end, enclosing the name of the enumeration. Although an enumeration is not a terminal of the EXPRESS language (because its definition includes the enumerated things), it is a terminal of the EXPRESS-G language. Figure 18.3 shows the symbol for an entity, which is a solid rectangle enclosing the name of the entity. The symbol for a schema is shown in Figure 18.3. It is a solid rectangle divided in half by a horizontal line. The name of the schema is written in the upper half of the rectangle. The lower half of the symbol is empty. EXPRESS-G does not support any notation for either function or procedure definitions.


Sign in / Sign up

Export Citation Format

Share Document