Copyright ©2007 T. H. Merrett
Domain Algebra
Wikipedia categories: programming languages, databases
Motivation
The domain algebra complements the relational algebra in the
Aldat
database programming language. It provides operations on relational attributes.
For intellectual simplification, it does so in a way as isolated from the
relations as possible, so that the programmer can think about attribute
operations independently from relational operations.
When the attributes the domain algebra operates on are scalars, such as numbers,
strings or boolean, the corresponding relations are in "first normal form".
If the relations are "nested" (attributes may themselves be relations), the
domain algebra subsumes the relational algebra so that relational operators can
be applied to these attributes. (This requires that binary relational operators
have explicit names, such as natjoin, natcomp and div.) The
domain algebra so extended accomplishes any desired processing of nested
relations. Recursive nesting (relations contain themselves) is supported by
recursive domain algebra.
Overview
Domain algebra expressions create virtual attributes: to maintain
independence from relations, these attributes do not belong to any relation
until some particular relation is created by "actualizing" virtual attributes
using the relational algebra. Actualization is the only point of contact
between the domain and relational algebras. A virtual attribute may be
(mentally) associated with any relation or relational expression containing
the attributes on which, directly or indirectly, the virtual attribute is
defined by the domain algebra.
Domain algebra operators are either scalar or aggregate.
Scalar expressions are self-evident, such as A + B (numerical
attributes A and B), X cat Y (string attributes X and Y) or
P and Q (boolean attributes P and Q). Any expression may
be built up from them. This includes unary operators and functions such as
-A or sin(A). A conditional expression,
if...then...else..., is also a scalar operation. For nested
relations, relational operators may be
included in scalar domain algebra expressions, such as R natjoin S
(relational attributes R and S). Scalar expressions must be able to
compute values separately for each tuple of any relation they are actualized
from.
Aggregate expressions apply to single attributes (or attribute expressions) and
perform aggregates across all tuples of a relation. Thus red + of
A sums all values of numeric attribute A in the relation it is
actualized from; equiv + of A by G sums all values of
A within groups of tuples having a single value for attribute G (of any
type); fun + of A order Q gives the cumulative sum
of A in an ordering of the tuples induced by the values of attribute Q
(of any type); and par + of A order Q by G
gives the corresponding group-by cumulative sum. These four families cover the
whole of aggregate domain algebra. They may be used in any way that makes sense:
operators other than + (such as *, and or natjoin) may be used;
there may be lists of attributes after by and order (such as
by G1,G2,G3 or order Q1,Q2); and any attribute anywhere may
be replaced by an arbitrary attribute expression including further aggregate
expressions. The only, and important, restriction is that operators following
red and equiv must be both associative and commutative, or else the
result is ambiguously defined, because relational tuples are not inherently
ordered.
Virtual attributes are defined from attribute expressions by statements of the
domain algebra e.g.,
let V be if A<B then A else B;
Definitions and Examples
- A relation is a set of tuples. (Because it is a set, tuple order is irrelevant, and duplicate tuples are not allowed.)
- A tuple is a mapping from a set of attributes to a set of values.
- An attribute is an identifier.
The set of values that an attribute can map to is a subset of a domain.
- A domain is a set of values. (Frequently it is characterized simply as a type, say the set of integers.)
- (More specifically) a relation is a subset of the Cartesian product of its domains (i.e., of the domains its attribute are drawn from).
- A database is a tuple.
First-normal-form ("flat") relation
- relation R; attributes A, B, X, Y, P, Q; 3 tuples
R( | A | B | X | Y | P | Q)
|
| 1 | 1 | "hi" | "there" | T | T
|
| 2 | 2 | "bye" | "now" | T | F
|
| 3 | 2 | "hi" | "there" | T | F
|
Domain algebra statements related to R:
let AB be A + B;
let XY be X cat Y;
let PQ be P and not Q;
let AA be red + of A;
let BB be red * of B;
let BA be equiv + of A by B;
let PP be red and of P;
let QQ be red or of Q;
let XQ be red or of Q by X;
let Bcum be fun + of B order A;
R showing the virtual attributes (outside the parentheses: they are not
actual attributes of R):
R( | A | B | X | Y | P | Q) | AB | XY | PQ | AA | BB | BA | PP | QQ | XQ | Bcum
|
| 1 | 1 | "hi" | "there" | T | T | 2 | "hithere" | F | 6 | 4 | 1 | T | T | T | 1
|
| 2 | 2 | "bye" | "now" | T | F | 4 | "byenow" | T | 6 | 4 | 5 | T | T | F | 3
|
| 3 | 2 | "hi" | "there" | T | F | 5 | "hithere" | T | 6 | 4 | 5 | T | T | T | 5
|
Non-First-normal-form ("nested") relation
- relation nestedPartOf; attributes A, SQ; SQ has attributes S, Q; nestedPartOf has 4 tuples
| nestedPartOf
|
| ( A | SQ | )
|
| | (S | Q)
|
| wallplug | cover | 1
|
| | fixture | 1
|
| cover | screw | 2
|
| | plate | 1
|
| fixture | screw | 2
|
| | plug | 2
|
| plug | connector | 2
|
| | mould | 1
|
Since for nested relations the domain algebra must include relational algebra
operators, we'll suppose three, the natural join, natjoin, the
set union, and the projection, [..] in... (Readers familiar with
Aldat, please note that the operator that implements, and generalizes,
union is ujoin, but we do not need the generality here.)
First, we "unnest" nestedPartOf to the flat relation PartOf:
- use the only special syntax for nested relations, the relation() operator of the domain algebra, to create a singleton relation with the attribute A;
- use natjoin to get the cartesian product of this relation with SQ;
- use reduction to get the union of the resulting relation over all four tuples of nestedPartOf, and do it "anonymously" (i.e., without giving the new attribute a name) to raise the nesting level;
- project the resulting flat relation from nestedPartOf and call it PartOf.
let relA be relation(A);
let ASQ be relA natjoin SQ;
PartOf <- [red union of ASQ] in nestedPartOf;
| PartOf( | A | S | Q | )
|
| | wallplug | cover | 1
|
| | wallplug | fixture | 1
|
| | cover | screw | 2
|
| | cover | plate | 1
|
| | fixture | screw | 2
|
| | fixture | plug | 2
|
| | plug | connector | 2
|
| | plug | mould | 1
|
Here are the intermediate steps.
| nestedPartOf
|
| ( A | SQ | | ) | relA | | | | ASQ
|
| | (S | Q) | | ( | A | ) | | (A | S | Q | )
|
| wallplug | cover | 1 | | | wallplug | | | wallplug | cover | 1
|
| | fixture | 1 | | | | | | wallplug | fixture | 1
|
| cover | screw | 2 | | | cover | | | cover | screw | 2
|
| | plate | 1 | | | | | | cover | plate | 1
|
| fixture | screw | 2 | | | fixture | | | fixture | screw | 2
|
| | plug | 2 | | | | | | fixture | plug | 2
|
| plug | connector | 2 | | | plug | | | plug | connector | 2
|
| | mould | 1 | | | | | | plug | mould | 1
|
Second, we "nest" the flat relation PartOf back to nestedPartOf:
- use relation(S,Q) to create a singleton relation on S and Q;
- use equivalence reduction to take unions of these singletons for each value of A;
- project A and the unions from PartOf. (The "anonymous" projection used
is explained a little later in the discussion of "level raising".)
let relSQ be relation(S,Q);
let SQ be equiv union of relSQ by A;
nestedPartOf <- [A,SQ] in PartOf;
Here are the intermediate steps.
PartOf( | A | S | Q | ) | relSQ | | | SQ
|
| | | | | (S | Q | ) | (S | Q | )
|
| wallplug | cover | 1 | | cover | 1 | | cover | 1
|
| | | | | | | | fixture | 1
|
| wallplug | fixture | 1 | | fixture | 1 | | cover | 1
|
| | | | | | | | fixture | 1
|
| cover | screw | 2 | | screw | 2 | | screw | 2
|
| | | | | | | | plate | 1
|
| cover | plate | 1 | | plate | 1 | | screw | 2
|
| | | | | | | | plate | 1
|
| fixture | screw | 2 | | screw | 2 | | screw | 2
|
| | | | | | | | plug | 2
|
| fixture | plug | 2 | | plug | 2 | | screw | 2
|
| | | | | | | | plug | 2
|
| plug | connector | 2 | | connector | 2 | | connector | 2
|
| | | | | | | | mould | 1
|
| plug | mould | 1 | | mould | 1 | | connector | 2
|
| | | | | | | | mould | 1
|
Suppose we want to know how many subassemblies (sum of Q in SQ) make up
each assembly (A) in nestedPartOf. To add up all the values of Q for
each SQ, we just use red + of Q. But
let count be red + of Q
gives a virtual attribute at the same level as Q
nestedPartOf
|
( A | SQ | | | )
|
| (S | Q) | count
|
wallplug | cover | 1 | 2
|
| fixture | 1 | 2
|
cover | screw | 2 | 3
|
| plate | 1 | 3
|
fixture | screw | 2 | 4
|
| plug | 2 | 4
|
plug | connector | 2 | 3
|
| mould | 1 | 3
|
We could now make these counts available at the next level up, as an attribute
of nestedPartOf, by a projection in the domain algebra,
let C be [count] in SQ;
but the result is still a nested relation.
nestedPartOf
|
( A | SQ | | ) | C
|
| (S | Q) | | (count)
|
wallplug | cover | 1 | | 2
|
| fixture | 1
|
cover | screw | 2 | | 3
|
| plate | 1
|
fixture | screw | 2 | | 4
|
| plug | 2
|
plug | connector | 2 | | 3
|
| mould | 1
|
We can see this clearly if we project it out from nestedPartOf
countPartsOf <- [C] in nestedPartOf;
giving the nested relation countPartsOf (C (count )).
It would be more appropriate to see count at the same level as SQ, i.e.,
as an attribute of nestedPartOf, one level up.
Such a "level raising" can be done when the attribute has only one value (which
red guarantees): we make the middle attribute disappear by simply not
naming it. Here is count "anonymously".
let count be[red + of Q] in SQ;
nestedPartOf
|
( A | SQ | | ) | count
|
| (S | Q)
|
wallplug | cover | 1 | | 2
|
| fixture | 1
|
cover | screw | 2 | | 3
|
| plate | 1
|
fixture | screw | 2 | | 4
|
| plug | 2
|
plug | connector | 2 | | 3
|
| mould | 1
|
countPartsOf <- [count] in nestedPartOf;
now gives a flat relation, countPartsOf(count).
Third, we show recursive domain algebra. For this we use a recursively nested
version of the above data.
assembly
|
(component | subassembly | | | | | | | )
|
| (qty | component | subassembly | | | | | )
|
| | | (qty | component | subassembly | | | )
|
| | | | | (qty | component | subassembly | )
|
wallplug | 1 | cover | 1 | plate
|
| | | 2 | screw
|
| 1 | fixture | 2 | plug | 1 | mould
|
| | | | | 2 | connector
|
| | | 2 | screw
|
To get a list of all components, we define cmpnt recursively in the domain
algebra.
let cmpnt be component union [red union of
cmpnt] in subassembly ;
Then
AllComponents <- [cmpnt] in assembly;
gives
AllComponents
|
(cmpnt | )
|
wallplug
|
cover
|
fixture
|
plate
|
screw
|
plug
|
mould
|
connector
|
Note that recursion is allowed in the domain algebra only if there is a level
change in the recursive statement.
Summary
The domain algebra processes attributes independently of their participation in
relations, with the connection between the two being made only when the virtual
attributes defined by the domain algebra are actualized in a relation newly
created by a statement of the relational algebra.
The domain algebra includes all operators of the relational algebra, so that
attributes may be relations in nested relations. Recursion in the domain
algebra is used to bridge levels of nesting in such relations, and supports
recursively nested relations.
Further Reading
- T. H. Merrett "Experience with the Domain Algebra" Proc. 3rd Internat. Conf. on Data and Knowledge Bases 1988 San Mateo CA, Morgan Kaufmann Publishers Inc. pp.335-346
- T. H. Merrett "Aldat: a Retrospective on a Work in Progress" Information Systems 32(4),2007.
- T. H. Merrett "Database programming".
- T. H. Merrett "Advanced database programming".
Links
Aldat,
Aldat Relational Algebra