Copyright ©2001 T. H. Merrett

Aldat: Database Perspective


(as of Nov 8 2001)

Relations

A relation is an abstraction over files, and Aldat is designed to support programming at this level of abstraction. In particular, neither records (tuples) nor their order are imposed on the programmer's concerns. Aldat is also designed to provide the bulk operations that are well supported by secondary storage, where files reside, and which has totally different memory charateristics from RAM.

Relational Algebra

Aldat extends Codd's relational algebra in minor, but conceptually useful, ways. The natural join is seen as the extension, to relations, of set intersection, and the other set-valued operations (union, difference, symmetric difference) are similarly extended, making a family of mu-joins. These are seven join operations, if we include difference both ways, and the trivial set operators, that return the left- and right-hand operands, extended nontrivially to relations.

Relational division and natural composition are similarly related to the boolean-valued set operations of superset and overlap, respectively, and are incorporated by Aldat into a family of twelve sigma-joins, along with extensions of other set comparisons (proper superset, set equality, subset and proper subset, and the six complementary comparisons).

The unary operators of the original relational algebra, projection and restriction, are extended to projection and selection, and unified into the T-selector, which selects any tuple whose attributes satisfy a given, arbitrary, boolean expression, and then projects on another specified set of attributes. The T-selector is further generalized to the QT-selector, which adds quantifiers based on counting and proportion-of operators. QT-selectors appear to have straightforward translations into natural language queries, except when they become too precise for ready expression in everyday speech.

A second family of unary operators are the editors, some general, others specialized to treat particular relations as constructs such as spreadsheets, pictures, or prolog fact bases. These are Janus-like operators which from the viewpoint of the Aldat programmer are unary operators such as projection, but from the perspective of the end-user are interactive programs with command languages, graphical user interfaces, and any other handy facility for working with the individual components of the data.

The relational algebra provides an abstraction over looping, and has no notion of tuples to tempt the programmer to lower levels of thought. (There is a non-deterministic pick operator which guarantees a singleton (or empty) relation as its result, for the programmer who must escape from the higher-level constructs.)

Domain Algebra

The first novel, and fundamental, addition to database algebra by Aldat is a means to operate on attributes in the same high-level way as the relational algebra operates on relations. The horizontal or scalar operations of the domain algebra combine attribute values in each tuple using any scalar operators, such as arithmetic.

The vertical operations allow aggregations, using commutative and associative operators. Counting and summing are simple consequences, and from these, means, standard deviations, etc. So are maximum- and minimum-finding, and aggregative and (for all), or (for some), product, etc. These are called reductions, after the APL meta-operator of the same name and similar function. There are also classes of equivalence reduction, which apply reduction to subrelations that are grouped by the values of some (set of) attributes, of functional mapping, which induce an order on a relation according to the values of a (set of) attributes and permit aggregative operators which are not associative or commutative, and of partial functional mapping, which support both order and grouping.

All domain algebra operations are independent of any particular relation, and so produce virtual attributes. These may be actualized for specific relations through the relational algebra. The domain algebra also abstracts over looping.

Statements, Views and Relational Recursion

Aldat has assignment operators with relation names on the left and relational algebra expressions on the right. These form the basic statements of the language, and allow changes of state in the database. (The relational and domain algebras are purely functional and do not change any state.) There are also update statements, which allow smaller-scale changes, at the level of subrelations (but not tuples).

Aldat has a deferred assignment which produces a view: a relation of no extent which is evaluated by executing the view each time the left-hand operand is required. A view can be seen as a parameterless function of its right-hand operands.

Views may be recursive, with the left operand defined in terms of itself on the right, or mutually recursive, with the left operand defined in terms of other views, which in turn are ultimately defined in terms of that same left operand. Such recursion can be used to specify transitive closures and rule-based inference engines.

The recursion is implemented in the most naive way because this makes it general and because mutual recursion, even naively implemented, can express the most sophisticated implementations of simple recursion such as transitive closure.

The domain algebra, working independently of the relational algebra, can be combined with recursion to support numerical and other calculations on the attributes of the recursively defined relations. An example is the bill-of-materials problem, or a sophisticated inference engine which allows arithmetic operations. General flow problems may be expressed, on cyclic or acyclic graphs.

Computations

The programming notion of a function, like the mathematical concept, is a special case of a relation, and functions can be expressed relationally and invoked by the relational algebra. Such functions can now be evaluated in more than one direction: for instance, the relation a + b = c can be used to compute the value of any of the three variables, given values for the other two. Aldat regards them as intrinsic relations, whose extents could potentially have an infinite number of tuples. They are called computations, or comp in the syntax, which can also stand for ``compressed relation''.

The invocation mechanism is a T-selector on the computation, or a join of the computation with an (extrinsic) relation. The code in the computation is executed for each tuple of the relation. Joins of computations with each other can also be interpreted usefully.

The motivation for using computations, rather than functions or procedures, in Aldat is to make available the same syntax for using intrinsic relations (computations) as extrinsic relations (relations). A practical result is that a model for making predictions has the same syntactic form as a database storing the historical data: the same query can be used for the future as for the past.

Computations need not be functional, but may have local states, so that subsequent invocations depend on a memory of preceeding ones. They may also be called recursively.

Nested Relations, Instantiation, and Inheritance

Aldat sees nesting as the incorporation of relations into attributes of other relations. Thus all that is needed to express any desired operation on nested relations is to subsume the relational algebra as part of the domain algebra. Underneath this simple adaptation of existing syntax, implementation is by joins.

In particular, this supports data abstraction and allows relational attributes to be defined on complex types.

States are the fundamental aspect of ``object orientation''. The need to create instances, automatically, of any class with state follows from this, and is the primary accomplishment of object orientation. In a database, everything is state because databases are persistent. Aldat computations support state in the object-oriented sense of hidden state. The ordinary relations of the database provide the setting for instantiation of these states, and Aldat allows the programmer to instantiate by joining the computation-with-state to an appropriate relation. This removes the need for a new operator, which would be cumbersome in the database context of possibly millions of instantiations.

From instantiation follow classes, and from these, subclasses and inheritance. Aldat sees inheritance as syntactic sugar for a join between the inheriting relation and the bequeathing relation. Of course, once we have the syntactic sugar, we can also have a special implementation (which the programmer does not see) replacing the join implementation by pointer dereferencing and capturing the speed advantage of object-oriented databases in the very special cases to which they apply.

Computations may inherit from each other in the same way, by joins defined on computations, giving the benefits of method inheritance with the same syntax and mechanism as data inheritance.

(There is thus nothing new in either nesting or inheritance: everything they can do can be done in principle by the relational and domain algebras of Aldat. But the special implementations underlying the specialized syntax of object orientation can make these special operation much faster than the general algebras.)

Event Programming and Active Databases

Aldat sees an event as a system-generated procedure call, and an event handler as a procedure (computation) equipped to be called by the system rather than by a programmer. Database operations such as update can be considered events in this sense, and code can be run automatically when they occur.

This is the basis for active databases, which respond actively to events such as updates, rather than just passively accepting them.

Synchronization and Nondeterminism

Aldat offers a variant of the T-selector in which an empty result blocks until the operand relation(s) are modified so as to produce a non-empty result. This synchronization is an adaptation of the process synchronization of Linda, which uses tuples, to the relational world, which does not.

Non-determinism is kept independent of synchronization in this approach (unlike Linda), and is implemented by the pick operator.

Multidatabases

Aldat adapts scopes from the Algol-like programming languages to the interactive world of databases. In this, Aldat is preceeded by operating systems, such as UNIX, whose scopes are their file directories. Multidatabases follow, and syntax mimicking the change directory operations of UNIX (say) allow connections among the different databases. These databases could in principle reside at different sites on different computers.

Scalar Relations

It follows from the sigma-joins that a nullary relation (one with no attributes) must be a boolean. Aldat considers any unary relation also to be a scalar, but possibly multi-valued, of the same type as its one attribute, and supports scalar operations on these scalars. These are defined in terms of the relational and domain algebras, but provide syntactic sugar for the scalar viewpoint. The T-selector on arbitrary relations particularly has syntactic sugar that makes it seem an array lookup, and a corresponding special implementation which guarantees sublinear time cost for such lookups.

Text

Spatial and temporal data, rule bases, and many other applications are adequately handled by Aldat relations and operators with no specialized extensions. Text is more subtle, because it can be viewed in many different ways simultaneously: e.g., as a sequence of characters (for encryption), as a sequence of words (for indexing and word analysis), as a sequence of sentences, etc., as various hierarchies of such sequences, or as arbitrary further structures that might be derived by syntactic analysis. Aldat adopts the approach to text in which text simultaneously has all the attributes needed to give it these representations, as well as being able to support queries by regular expression matching.

Implementation

Much of the foregoing has been implemented in a series of incarnations of a system called relix (relational database on UNIX) by a sequence of M.Sc. students.

Technical overview and applications ( PostScript 243K)

Relational database programming in Aldat (PostScript 147K)

Course 308-612

Course 308-617

Attribute Metadata for Relational OLAP and Data Mining (PostScript 235K)
(Metadata of type attribute allows datacubes to be built with the relational and domain algebras, and both classification mining (which uses datacubes) and association mining to be programmed easily.)
Eighth Biennial Workshop on Data Bases and Programming Languages (DBPL'01), Monteporzio Catone - Roma, Italy, Sept. 8--10, 2001.