Up: Merrett's home page Next: About this document

Are Databases a Special Case of Programming Languages?


An overview of the Aldat Project at McGill 97/10

T. H. Merrett
McGill University
Abstract
Databases deal primarily with data on secondary storage, which differs from the data types of conventional programming languages by being persistent, shared, and often too large for RAM. Database languages have mainly been query languages. Some programming languages have been made persistent, usually giving rise to object-oriented databases. The relational model has been used as a basis for logic programming. How far can the unification of databases and programming languages be taken, especially in view of the need for data units which may exceed RAM capacity?

We present some results of exploring this question. It leads to further questions which we can answer.

Related topics, which we shall not have time to address, include relational recursion, event programming for active databases, process synchronization in an associative data language, and scoping and multidatabases.

  1. Chronology
  2. Start with Databases ---
    abstracting tuples, loops, grouping, & ordering
  3. On to Programming Language ---
    functions and ``computations'';
    data abstraction and nesting;
    object orientation and instantiation

Chronology I Secondary Storage
... fixed length
1978 Static Multipaging
S = N ... 2N < T > = 1 ... 2
1982 Dynamic Multipaging
S = N ... 2N < T > = 1 ... 2
1983 Z-Order
S= N
1984 Tidy functions
S = N ... 2N < T > = 1 ... 2
... variable length
1994 Tries for text
S ~ 3N < T > = |result|
... fixed length
1994 Tries for maps
S ~ 0.1N < T > = |result|
1996 Tries for orthogonal range queries
S ~ 0.1N < T > = |result|

Chronology II
Programming Languages

P.L. D.B.
1975 iterator relation
1984 recursion fixed point
1992 scope multidatabases
1992 process syncr. selector
1992 reflection metadata
1993 inheritance join
1994 aggregation join
1994 function relation
1995 A.D.T. nesting
1996 events active
1997 objects instantiation
199? type constraint

Applications Considered or Coded in this Talk

Start with Databases

Main ideas:

5--13

Start with Databases

(We call the result ``Relix''.)

(Relix - SQL) =
no tuples, only relations

Difference I
iteration: high level ops
Difference II
grouping: high level ops
Difference III
ordering: high level ops


Difference I --- no iterators

a) Relational Algebra
StuCour( Stu Cour Asst Exam)
Joe Algol 45 42
Sue Algol 47 38
Joe Pascal 43 42
Sue Pascal 45 42

JoeCour( Cour Asst Exam)
Algol 45 42
Pascal 43 42

Difference I --- no iterators

b) Domain Algebra

JoeCour( Cour Asst Exam)
Algol 45 42
Pascal 43 42
JoeMark < -[ Avg] in JoeCour

JoeMark( Avg)
86

Difference II --- grouping

a) Domain Algebra

StuCour( Stu Cour Asst Exam)
Joe Algol 45 42
Sue Algol 47 38
Joe Pascal 43 42
Sue Pascal 45 42
TotMark < -[ Cour,Total] in StuCour

TotMark( Cour Total)
Algol 172
Pascal 172

Difference II --- grouping

b) Relational Algebra NewFacts is Facts ujoin
[ Concl] in ( NewFacts[ Concl Ante] Horn)

Difference III --- ordering

a) Domain Algebra let be par succ of X order Seq by Group;
let be par succ of Y order Seq by Group;
let Area be equiv + of
by Group;

What kinds of programming can be done in a language whose data structures support possibly very large sets of related items, and whose operations abstract over loops?


On to Programming Language I

Main idea:

Unify functions and relations

Topics

14--25

Computations

comp Sum(A,B,C) is


;
C < - A + B
altA < - C - B
altB < - C - A
[C] where A=2 & B=2 in Sum

T(C) R(A B) S(A B C)
4 5 2 5 2 7
6 3 6 3 9
Sum ijoin R

Computations (cont.)


New Horizons from Computations

Computations encourage us to think in terms of sets of related equations.

This introduces some amusements which people who like abstractions may already have formalized.


Combining Computations

Try uniform acceleration:

d = d0 + v0*t + a*t^2/2
v = v0 + a*t


Combining Computations (cont.)

(Missing table: see PostScript version )

The matrix shows when this can be implemented as two computations joined, in some order, with actual parameters provided by some input data.


New Horizons from Computations (cont.)

(Missing equations: see PostScript version )

Here is another pair of equations, rotation in two Euclidean dimensions.

where c is cosine and s is sine. So, of course,

But computations motivate us also to find

and

This is the improper Lorentz transformation that gives time reflection. !


From physics to data mining:

computation to climb a concept hierarchy. (ref. Jiawei Han)

comp generalize( Src, Rslt, Attr, Thld)
{ let distinct be equiv + of 1 by Attr;
while [] where distinct > Thld in Src do
{ local tmp < - Src[ Attr icomp Low] Hier;
if [] in tmp then
{ let Attr be High;
Rslt < - [ Attr, ~{ Attr}] in
[ High, ~{ Attr}] in tmp;
} else
Rslt < - Src;
}
}

( Hier( Low, High) holds the concept hierarchy)


Mining (cont.)

For example, with a data relation

Student( Name Major Born GPA)
Anderson hist Vanc. 3.5
Bach biol Calg. 3.7
: : : :
and a concept hierarchy, Hier, which subsumes biol, etc. under sci; hist, etc. under hum; Vanc., etc. under BC; BC, etc. under Can; 3.7, etc. under excel; and so on, we could deduce

by the invocations
generalize( in Student, out MajorGen, in Major, 3);
generalize( in MajorGen, out BornGen, in Born, 3);
generalize( in BornGen, out StudentAbil, in GPA, 3);


Predictive Modelling for D.S.S.


Predictive Modelling (cont.)

Here is the code for the last two, using least squares for the model:

let yavg be ( red + of year) / ( red + of 1.0);
let qavg be ( red + of qty) / ( red + of 1.0);
let slope be
( red + of (( year - yavg)* ( qty - qavg)))
/ ( red + of ( (year - yavg)**2));
let base be qavg - slope* yavg;
comp LinInterp( year, qty, slope, base) is
qty slope* year + base;
FutSales is LinInterp icomp PastSales;


Active Databases

We have built event handlers as parameterless computations for the usual database operations,
e.g., comp add:local for additions to relation local.

We could also have programmer-defined events which might automatically trigger more than one event-handler computation.


On to Programming Language II

Main idea:

Relational operators all apply to attributes

Topics

26--33

Complex Objects

The domain algebra subsumes
the relational algebra:

any attribute type can be relational,
so relational operations are available for
attributes.

Example (Jaeschke & Schek, 1982):

Books( Authors Title Price Descript)
A1,A2 T1 P1 D1, D2
A2 T2 P2 D1, D2
A1 T3 P1 D1,D2,D3
Which books by A2 are described by both D1 and D2?

[ Title] where Authors sup { (A2)} and
Descript sup { (D1), (D2)} in Books


Example (Deshpande & Larson, 1987):

(Missing table: see PostScript version )

Find employees without children:
[ ENo] where not ([] in Children)
in Employee
(Note the projection of the attribute, Children.)

Find employees who took course 314:
[ ENo] where ([] where CNo= 314 in Training)
in Employee
(Note the selection of the attribute, Training.)


Double nesting (cont.)

Find children of employees who took course 314:
[ Children] where ([] where CNo= 314 in Training)
in Employee
(Note the result is still a nested relation.)

Find names of those children:
let C be [ Name] in Children;
or
let C be Children.Name
[ C] where ([] where CNo= 314 in Training)
in Employee


Deeper Nesting

(Missing table: see PostScript version )

Find DEPTs with two A grades:
let A1 be red + of 1;
let S2 be red ujoin of [ Name,
[ A1] in where Grd= "A" in Courses]
in Students;
let A2 be red + of A1;
Ans [ Dept] where Prof.S2.A2= 2 in
EFaculty;


Recursive Nesting

(Missing table: see PostScript version )

Find total costs
(A11 6; A12 5; A1 14; A2 2; A 20):

let SumCost be red + of CalCost;
let CalCost be if Assembly= then Cost
else Cost + Assembly.SumCost;
Ans [ Id, CalCost] in Assembly;


The NEST and UNNEST operators are too unruly and unimportant to be implemented except in terms of other primitives which we need anyway.

Operators on ABSTRACT DATA TYPES are supported by computations whose arguments are at once attributes and relations.

On to Programming Language III

Main idea:

Instantiate with domain and relational algebras

Topics

34--40

Instantiation


Instantiate by domain algebra actualization.

e.g., bank accounts
comp ba( deposit, balance) is
{ state bal init 0;

deposit < - comp( dep) is
{ bal < - bal + dep
alt
dep < - bal - old bal;
} /* deposit */

balance < - comp(b) is
{ b < - bal;
} /* balance */

} /* ba */


Given

accts( ac# client)
1 joe
2 sue
instantiate ba as an attribute

let ba1 be [ deposit, balance] in ba;
accounts [ ac#, client, ba1] in accts;


Binary Operators

Gas Molecules: hitWall is unary; bang is binary.

comp molecules( initial, hitWall, getvel, putvel, ..) { ...

hitWall < - comp( which) is
{ case which:
{ "x": xvel < - - xvel;
"y": yvel < - - yvel;
"z": zvel < - - zvel;
} } /* hitWall */

} /* molecules */


The binary operations are defined outside the class, with parameters.
For instance, bang collides two molecules (of equal mass), swapping their velocities.

bang comp( mol1, mol2, mola, molb) is
{ local xv1, yv1, zv1, xv2, yv2, zv2;
mol1.getvel( xv1, yv1, zv1);
mol2.getvel( xv2, yv2, zv2);
mol2.putvel( xv1, yv1, zv1);
mol1.putvel( xv2, yv2, zv2);
} /* bang */

This is all invoked on a relation of molecules.

gas for mol# from 1 to 10000 par mol#;


Binary Operators on Objects (cont.)

let mol1 be [ initial, hitWall, getvel, putvel, ..]
in molecules;
let mol2 be [ initial, hitWall, getvel, putvel, ..]
in molecules;
binaryMols ([ mol#1, mol1] in gas) ijoin
[ mol#2, mol2] in gas

update binaryMols change
bang( in mol1, in mol2, out mol1, out mol2)
using where close in ( binaryMols ijoin near);

( near is another binary computation which inputs mol1 and mol2 and outputs the boolean, close.)


Conclusions






Next: About this document



Prof. T.H. MERRETT
Thu Oct 16 13:20:17 EDT 1997