Up: Merrett's home page
Next:
About this document
Are Databases a Special Case of Programming Languages?
An overview of the Aldat Project at McGill 97/10
T. H. Merrett
McGill University
Abstract
Databases deal primarily with data on secondary storage, which differs from
the data types of conventional programming languages by being persistent,
shared, and often too large for RAM. Database languages have mainly been
query languages. Some programming languages have been made persistent, usually
giving rise to objectoriented databases. The relational model has been used
as a basis for logic programming. How far can the unification of databases
and programming languages be taken, especially in view of the need for data
units which may exceed RAM capacity?
We present some results of exploring this question. It leads to further
questions which we can answer.

What kinds of programming can be done in a language whose data
structures support possibly very large sets of related items, and
whose operations abstract over loops?

What happens to functions in a programming language when seen from
the perspective of relations, since relations generalize functions
mathematically?

What happens to database relations when asked to support data abstraction
and how can this be done with no new syntax in the database formalism?

Object orientation was motivated by a wish to instantiate possibly
very large numbers of similar objects (such as cells or gas molecules);
relations can represent very large numbers of similar objects: how can
instantiation be incorporated into the relational algebra?
Related topics, which we shall not have time to address, include relational
recursion, event programming for active databases, process synchronization
in an associative data language, and scoping and multidatabases.
 Chronology
 Start with Databases

abstracting tuples, loops, grouping, & ordering
 On to Programming Language 
functions and
``computations'';
data abstraction and nesting;
object orientation and instantiation
Chronology I Secondary Storage
 ... fixed length
 1978 Static Multipaging
S = N ... 2N < T > = 1 ... 2
 1982 Dynamic Multipaging
S = N ... 2N < T > = 1 ... 2
 1983 ZOrder
S= N
 1984 Tidy functions
S = N ... 2N < T > = 1 ... 2
 ... variable length
 1994 Tries for text
S ~ 3N < T > = result
 ... fixed length
 1994 Tries for maps
S ~ 0.1N < T > = result
 1996 Tries for orthogonal range queries
S ~ 0.1N < T > = result
Chronology II
Programming Languages
 P.L.  D.B. 
1975  iterator  relation 
1984  recursion  fixed point 
1992  scope  multidatabases 
1992  process syncr.  selector 
1992  reflection  metadata 
1993  inheritance  join 
1994  aggregation  join 
1994  function  relation 
1995  A.D.T.  nesting 
1996  events  active 
1997  objects  instantiation 
199?  type  constraint 
Applications Considered or Coded in this Talk
Start with Databases
Main ideas:
 Abstract over looping
 Treat attributes independently of relations
513
Start with Databases
 Relational algebra:
extend slightly
 Natural join corresponds to set intersection.
So extend set union, difference, etc. similarly.
 Division corresponds to superset comparison.
So extend subset, empty intersection, etc. similarly.
 Projection & selection combine & quantify to give QTselector
 Domain algebra:
introduce operations on attributes
(independently from relations)
 ``horizontal'' or intratuple
 ``vertical'' or aggregation
(We call the result ``Relix''.)
(Relix  SQL) =
no tuples, only relations
 Difference I
iteration: high level ops
 Difference II
grouping: high level ops
 Difference III
ordering: high level ops
Difference I  no iterators
a) Relational Algebra
StuCour( Stu  Cour  Asst  Exam) 
Joe  Algol  45  42 
Sue  Algol  47  38 
Joe  Pascal  43  42 
Sue  Pascal  45  42 
 TSelector is a loop:
JoeCour < [ Cour,Asst,Exam]
where Stu= "Joe" in StuCour
JoeCour( Cour  Asst  Exam) 
Algol  45  42 
Pascal  43  42 
Difference I  no iterators
b) Domain Algebra
JoeCour( Cour  Asst  Exam) 
Algol  45  42 
Pascal  43  42 
 Horizontal operations are a loop:
let Final be Asst + Exam
 Vertical operations are a loop:
let Avg be ( red + of Final) /
( red + of 1)
JoeMark < [ Avg] in JoeCour
Difference II  grouping
a) Domain Algebra
StuCour( Stu  Cour  Asst  Exam) 
Joe  Algol  45  42 
Sue  Algol  47  38 
Joe  Pascal  43  42 
Sue  Pascal  45  42 
 Vertical operations support grouping:
let Total be equiv + of Final
by Cour
TotMark < [ Cour,Total] in StuCour
TotMark( Cour  Total) 
Algol  172 
Pascal  172 
Difference II  grouping
b) Relational Algebra
 Generalizing division, plus recursion, gives a oneline
inference engine. ( Horn is rulebase; Facts are known facts.)
NewFacts is Facts ujoin
[ Concl] in ( NewFacts[ Concl
Ante] Horn)
 Expands to a 50line inference engine in a 200line expert system shell,
Relixpert (1987; 1 manmonth; subsumes commercial shells AUGMENT, VPExpert).
Difference III  ordering
a) Domain Algebra
 Here is a planimeter for geographical areas, using Stokes' Theorem
let be par succ of X order Seq
by Group;
let be par succ of Y order Seq
by Group;
let Area be equiv + of
by Group;
 Similar calculations can find centroid, slope of an area in 3d, etc.
What kinds of programming can be done in a language whose data
structures support possibly very large sets of related items, and
whose operations abstract over loops?
 structured data processing
 geometry and maps  spatial data
 temporal data
 expert system shells
 text data
 mining
 active databases
 images?
 sound?
 warehousing?
 mediation?
On to Programming Language I
Main idea:
Unify functions and relations
Topics
1425
Computations
 Unify base concepts of DB and PL
 In mathematics, functions relations
 Let's generalize PL ``function''
comp Sum(A,B,C) is
;
 C <  A + B 
alt  A <  C  B 
alt  B <  C  A 
 And let's invoke it by relational algebra
[C] where A=2 & B=2 in Sum
T(C)  R(A  B)  S(A  B  C) 
4  5  2  5  2  7 
 6  3  6  3  9 
Sum ijoin R
Computations (cont.)
 a limited form of constraint programming
 think of `` comp'' as ``compressed relation'':
the relation whose extent is all values allowed by the constraint
 so we can make sense of invocations by other joins,
e.g., ujoin, djoin, sjoin, , etc.
 we will also need a toplevel invocation, a.k.a. procedure
 Sum( in A, in B, out C) or
Sum( out A, in B, in C)
New Horizons from Computations
Computations encourage us to think in terms of sets of related equations.
This introduces some amusements which people who like abstractions may already
have formalized.
 Consider the join of two computations
 (the and of two constraints).
 An implementation might hold this join as a view
 to be applied by invoking one then the other.
Combining Computations
Try uniform acceleration:
d = d0 + v0*t + a*t^2/2 
v = v0 + a*t 
 These generate 4 *5 pairs
 but the nine pairs on v0, a and t give only three different results
 (which require simultaneous algebraic solutions)
 and only the pairs giving (d,v) and (d0,v) can be solved in any
order.
 There are choose(6,2) solutions to
two equations in six variables
 but (d,d0) cannot be determined.
Combining Computations (cont.)
(Missing table: see PostScript version )
The matrix shows when this can be implemented as two computations joined, in
some order, with actual parameters provided by some input data.
  means that the equations are overdetermined.
 alg means algebraic rearrangement is needed.
 means solve first for the variable above, then for the variable
to the left.
 means solve first for the variable to the left, then for the
variable above.
 any means equations can be solved in any order.
New Horizons from Computations (cont.)
(Missing equations: see PostScript version )
Here is another pair of equations, rotation in two Euclidean dimensions.
where c is cosine and s is sine. So, of course,
But computations motivate us also to find
and
This is the improper Lorentz transformation that gives time reflection. !
From physics to data mining:
computation to climb a concept hierarchy. (ref. Jiawei Han)
comp generalize( Src, Rslt, Attr, Thld)
{ let distinct be equiv + of 1 by Attr;
while [] where distinct > Thld in Src
do
{ local tmp <  Src[ Attr icomp
Low] Hier;
if [] in tmp then
{ let Attr be High;
Rslt <  [ Attr, ~{ Attr}] in
[ High, ~{ Attr}] in tmp;
} else
Rslt <  Src;
}
}
( Hier( Low, High) holds the concept hierarchy)
Mining (cont.)
For example, with a data relation
Student( Name  Major  Born  GPA) 
Anderson  hist  Vanc.  3.5 
Bach  biol  Calg.  3.7 
:  :  :  : 
and a concept hierarchy, Hier, which subsumes biol, etc. under
sci;
hist, etc. under hum; Vanc., etc. under BC; BC,
etc. under Can; 3.7, etc. under excel; and so on, we could
deduce
by the invocations
generalize( in Student, out MajorGen, in
Major, 3);
generalize( in MajorGen, out BornGen, in
Born, 3);
generalize( in BornGen, out StudentAbil, in
GPA, 3);
Predictive Modelling for D.S.S.
 Given a relation PastSales( Year, Qty)
 we can find out last year's sales with
[ Qty] where Year = 1996 in PastSales
 Suppose we want to predict sales for next year:
[ Qty] where Year = 1998 in FutSales
 In fact, it would be nice to have
Sales is PastSales ujoin FutSales
 so that we can query Sales for any year, past or future.
Predictive Modelling (cont.)
Here is the code for the last two, using least squares for the model:
let yavg be ( red + of year) /
( red + of 1.0);
let qavg be ( red + of qty) /
( red + of 1.0);
let slope be
( red + of (( year  yavg)*
( qty  qavg)))
/ ( red + of ( (year  yavg)**2));
let base be qavg  slope* yavg;
comp LinInterp( year, qty, slope, base) is
qty slope* year + base;
FutSales is LinInterp icomp PastSales;
Active Databases
 Events are systemgenerated procedure calls
 and event handlers are procedures (computations).
We have built event handlers as parameterless computations for the usual
database operations,
e.g., comp add:local for additions to relation local.
We could also have programmerdefined events which might automatically
trigger more than one eventhandler computation.
On to Programming Language II
Main idea:
Relational operators all apply to attributes
Topics
2633
Complex Objects
 A basic precept of PL is ``no 2ndclass types''.
 So values in relations should not be restricted.
 So relations can contain relations: ``complex objects''.
 How do we support this in the language?
The domain algebra subsumes
the relational algebra:
any attribute type can be relational,
so relational operations are available for
attributes.
Example (Jaeschke & Schek, 1982):
Books( Authors  Title  Price  Descript) 
A1,A2  T1  P1  D1, D2 
A2  T2  P2  D1, D2 
A1  T3  P1  D1,D2,D3 
Which books by A2 are described by both D1 and D2?
[ Title] where Authors sup { (A2)} and
Descript sup { (D1), (D2)} in Books
 The two sup operators look like set comparisons, but they are
joins (division) from the relational algebra applied to attributes
in the select clause.
Example (Deshpande & Larson, 1987):
(Missing table: see PostScript version )
Find employees without children:
[ ENo] where not ([] in Children)
in Employee
(Note the projection of the attribute, Children.)
Find employees who took course 314:
[ ENo] where ([] where
CNo= 314 in Training)
in Employee
(Note the selection of the attribute, Training.)
Double nesting (cont.)
Find children of employees who took course 314:
[ Children] where ([] where
CNo= 314 in Training)
in Employee
(Note the result is still a nested relation.)
Find names of those children:
let C be [ Name] in Children;
or
let C be Children.Name
[ C] where ([] where CNo= 314 in
Training)
in Employee
Deeper Nesting
(Missing table: see PostScript version )
Find DEPTs with two A grades:
let A1 be red + of 1;
let S2 be red ujoin of [ Name,
[ A1] in where Grd= "A" in Courses]
in Students;
let A2 be red + of A1;
Ans [ Dept] where Prof.S2.A2= 2 in
EFaculty;
Recursive Nesting
(Missing table: see PostScript version )
Find total costs
(A11 6; A12 5; A1 14; A2 2; A 20):
let SumCost be red + of CalCost;
let CalCost
be if Assembly= then Cost
else Cost + Assembly.SumCost;
Ans [ Id, CalCost] in Assembly;
The NEST and UNNEST operators are too unruly and unimportant to be implemented
except in terms of other primitives which we need anyway.
 syntax in the domain algebra to name a set of attributes;
 a way of suppressing unneeded names of nested relations;
 red ujoin and equiv ujoin in the domain algebra,
which we would have anyway as part of the subsumption of the relational
algebra.
Operators on ABSTRACT DATA TYPES are supported by computations whose arguments
are at once attributes and relations.
On to Programming Language III
Main idea:
Instantiate with domain and relational algebras
Topics
3440
Instantiation
 When a computation has a state, as for a counter or a flipflop,
different uses of the computation may require different instantiations
of the state.
 The ability to make many instances, as for all molecules of a gas, was
the original motivation of the objectoriented approach.
 Relations are geared to support large numbers of instances, but deal with
sets of instances, not individual ``objects''.
 So we need an instantiation mechanism which does not need to say new
for each individual instance.
 We also need to be able to hide the state of each instance.
Instantiate by domain algebra actualization.
e.g., bank accounts
comp ba( deposit, balance) is
{ state bal init 0;
deposit <  comp( dep) is
{ bal <  bal + dep
alt
dep <  bal  old bal;
} /* deposit */
balance <  comp(b) is
{ b <  bal;
} /* balance */
} /* ba */
Given
accts( ac#  client) 
1  joe 
2  sue 
instantiate ba as an attribute
let ba1 be [ deposit, balance] in ba;
accounts [ ac#, client, ba1] in accts;
 [ bal oldbal] are hidden. This state is instantiated
for each account in accts.
 deposit, balance, the computations, are constant attributes
Binary Operators
 What if we need interactions between two (or more) objects of the same
type?
 This is the nbody problem in objectorientation.
 For example, adding two numbers should not be asymmetrical, as in
sending one in a message to the other.
Gas Molecules: hitWall is unary; bang is binary.
comp molecules( initial, hitWall, getvel, putvel, ..)
{ ...
hitWall <  comp( which) is
{ case which:
{ "x": xvel <   xvel;
"y": yvel <   yvel;
"z": zvel <   zvel;
} } /* hitWall */
} /* molecules */
The binary operations are defined outside the class, with parameters.
For instance, bang collides two molecules (of equal mass), swapping
their velocities.
bang comp( mol1, mol2, mola, molb) is
{ local xv1, yv1, zv1, xv2, yv2, zv2;
mol1.getvel( xv1, yv1, zv1);
mol2.getvel( xv2, yv2, zv2);
mol2.putvel( xv1, yv1, zv1);
mol1.putvel( xv2, yv2, zv2);
} /* bang */
This is all invoked on a relation of molecules.
gas for mol# from 1 to 10000
par mol#;
Binary Operators on Objects (cont.)
let mol1 be [ initial, hitWall, getvel, putvel, ..]
in molecules;
let mol2 be [ initial, hitWall, getvel, putvel, ..]
in molecules;
binaryMols ([ mol#1, mol1] in gas) ijoin
[ mol#2, mol2] in gas
update binaryMols change
bang( in mol1, in mol2,
out mol1, out mol2)
using where close in ( binaryMols ijoin near);
( near is another binary computation which inputs mol1 and mol2
and outputs the boolean, close.)
Conclusions

 All the above was done with
 Relational algebra
 joins (intersection, union, etc.)
 joins (subset, superset, etc.)
 Tselectors (select, project)
 Domain algebra
 horizontal operations
 vertical operations ( red, equiv, fun, par)
 Views & Recursion
 Computations & Events

 Abstract over looping
 Better programmer productivity: 2 o.m.?
 Exhaust the relational algebra
 Separate the domain algebra
 Driven by
 Applicability (usefulness)
 Elegance (generality)
Next: About this document
Prof. T.H. MERRETT
Thu Oct 16 13:20:17 EDT 1997