Up: Merrett's home page
Next:
About this document
Are Databases a Special Case of Programming Languages?
An overview of the Aldat Project at McGill 97/10
T. H. Merrett
McGill University
Abstract
Databases deal primarily with data on secondary storage, which differs from
the data types of conventional programming languages by being persistent,
shared, and often too large for RAM. Database languages have mainly been
query languages. Some programming languages have been made persistent, usually
giving rise to object-oriented databases. The relational model has been used
as a basis for logic programming. How far can the unification of databases
and programming languages be taken, especially in view of the need for data
units which may exceed RAM capacity?
We present some results of exploring this question. It leads to further
questions which we can answer.
-
What kinds of programming can be done in a language whose data
structures support possibly very large sets of related items, and
whose operations abstract over loops?
-
What happens to functions in a programming language when seen from
the perspective of relations, since relations generalize functions
mathematically?
-
What happens to database relations when asked to support data abstraction
and how can this be done with no new syntax in the database formalism?
-
Object orientation was motivated by a wish to instantiate possibly
very large numbers of similar objects (such as cells or gas molecules);
relations can represent very large numbers of similar objects: how can
instantiation be incorporated into the relational algebra?
Related topics, which we shall not have time to address, include relational
recursion, event programming for active databases, process synchronization
in an associative data language, and scoping and multidatabases.
- Chronology
- Start with Databases
---
abstracting tuples, loops, grouping, & ordering
- On to Programming Language ---
functions and
``computations'';
data abstraction and nesting;
object orientation and instantiation
Chronology I Secondary Storage
- ... fixed length
- 1978 Static Multipaging
S = N ... 2N < T > = 1 ... 2
- 1982 Dynamic Multipaging
S = N ... 2N < T > = 1 ... 2
- 1983 Z-Order
S= N
- 1984 Tidy functions
S = N ... 2N < T > = 1 ... 2
- ... variable length
- 1994 Tries for text
S ~ 3N < T > = |result|
- ... fixed length
- 1994 Tries for maps
S ~ 0.1N < T > = |result|
- 1996 Tries for orthogonal range queries
S ~ 0.1N < T > = |result|
Chronology II
Programming Languages
| P.L. | D.B. |
1975 | iterator | relation |
1984 | recursion | fixed point |
1992 | scope | multidatabases |
1992 | process syncr. | selector |
1992 | reflection | metadata |
1993 | inheritance | join |
1994 | aggregation | join |
1994 | function | relation |
1995 | A.D.T. | nesting |
1996 | events | active |
1997 | objects | instantiation |
199? | type | constraint |
Applications Considered or Coded in this Talk
Start with Databases
Main ideas:
- Abstract over looping
- Treat attributes independently of relations
5--13
Start with Databases
- Relational algebra:
extend slightly
- Natural join corresponds to set intersection.
So extend set union, difference, etc. similarly.
- Division corresponds to superset comparison.
So extend subset, empty intersection, etc. similarly.
- Projection & selection combine & quantify to give QT-selector
- Domain algebra:
introduce operations on attributes
(independently from relations)
- ``horizontal'' or intra-tuple
- ``vertical'' or aggregation
(We call the result ``Relix''.)
(Relix - SQL) =
no tuples, only relations
- Difference I
iteration: high level ops
- Difference II
grouping: high level ops
- Difference III
ordering: high level ops
Difference I --- no iterators
a) Relational Algebra
StuCour( Stu | Cour | Asst | Exam) |
Joe | Algol | 45 | 42 |
Sue | Algol | 47 | 38 |
Joe | Pascal | 43 | 42 |
Sue | Pascal | 45 | 42 |
- T-Selector is a loop:
JoeCour < -[ Cour,Asst,Exam]
where Stu= "Joe" in StuCour
JoeCour( Cour | Asst | Exam) |
Algol | 45 | 42 |
Pascal | 43 | 42 |
Difference I --- no iterators
b) Domain Algebra
JoeCour( Cour | Asst | Exam) |
Algol | 45 | 42 |
Pascal | 43 | 42 |
- Horizontal operations are a loop:
let Final be Asst + Exam
- Vertical operations are a loop:
let Avg be ( red + of Final) /
( red + of 1)
JoeMark < -[ Avg] in JoeCour
Difference II --- grouping
a) Domain Algebra
StuCour( Stu | Cour | Asst | Exam) |
Joe | Algol | 45 | 42 |
Sue | Algol | 47 | 38 |
Joe | Pascal | 43 | 42 |
Sue | Pascal | 45 | 42 |
- Vertical operations support grouping:
let Total be equiv + of Final
by Cour
TotMark < -[ Cour,Total] in StuCour
TotMark( Cour | Total) |
Algol | 172 |
Pascal | 172 |
Difference II --- grouping
b) Relational Algebra
- Generalizing division, plus recursion, gives a one-line
inference engine. ( Horn is rulebase; Facts are known facts.)
NewFacts is Facts ujoin
[ Concl] in ( NewFacts[ Concl
Ante] Horn)
- Expands to a 50-line inference engine in a 200-line expert system shell,
Relixpert (1987; 1 man-month; subsumes commercial shells AUGMENT, VP-Expert).
Difference III --- ordering
a) Domain Algebra
- Here is a planimeter for geographical areas, using Stokes' Theorem
let be par succ of X order Seq
by Group;
let be par succ of Y order Seq
by Group;
let Area be equiv + of
by Group;
- Similar calculations can find centroid, slope of an area in 3d, etc.
What kinds of programming can be done in a language whose data
structures support possibly very large sets of related items, and
whose operations abstract over loops?
- structured data processing
- geometry and maps --- spatial data
- temporal data
- expert system shells
- text data
- mining
- active databases
- images?
- sound?
- warehousing?
- mediation?
On to Programming Language I
Main idea:
Unify functions and relations
Topics
14--25
Computations
- Unify base concepts of DB and PL
- In mathematics, functions relations
- Let's generalize PL ``function''
comp Sum(A,B,C) is
;
| C < - A + B |
alt | A < - C - B |
alt | B < - C - A |
- And let's invoke it by relational algebra
[C] where A=2 & B=2 in Sum
T(C) | R(A | B) | S(A | B | C) |
4 | 5 | 2 | 5 | 2 | 7 |
| 6 | 3 | 6 | 3 | 9 |
Sum ijoin R
Computations (cont.)
- a limited form of constraint programming
- think of `` comp'' as ``compressed relation'':
the relation whose extent is all values allowed by the constraint
- so we can make sense of invocations by other joins,
e.g., ujoin, djoin, sjoin, , etc.
- we will also need a top-level invocation, a.k.a. procedure
- Sum( in A, in B, out C) or
Sum( out A, in B, in C)
New Horizons from Computations
Computations encourage us to think in terms of sets of related equations.
This introduces some amusements which people who like abstractions may already
have formalized.
- Consider the join of two computations
- (the and of two constraints).
- An implementation might hold this join as a view
- to be applied by invoking one then the other.
Combining Computations
Try uniform acceleration:
d = d0 + v0*t + a*t^2/2 |
v = v0 + a*t |
- These generate 4 *5 pairs
- but the nine pairs on v0, a and t give only three different results
- (which require simultaneous algebraic solutions)
- and only the pairs giving (d,v) and (d0,v) can be solved in any
order.
- There are choose(6,2) solutions to
two equations in six variables
- but (d,d0) cannot be determined.
Combining Computations (cont.)
(Missing table: see PostScript version )
The matrix shows when this can be implemented as two computations joined, in
some order, with actual parameters provided by some input data.
- - means that the equations are overdetermined.
- alg means algebraic rearrangement is needed.
- means solve first for the variable above, then for the variable
to the left.
- means solve first for the variable to the left, then for the
variable above.
- any means equations can be solved in any order.
New Horizons from Computations (cont.)
(Missing equations: see PostScript version )
Here is another pair of equations, rotation in two Euclidean dimensions.
where c is cosine and s is sine. So, of course,
But computations motivate us also to find
and
This is the improper Lorentz transformation that gives time reflection. !
From physics to data mining:
computation to climb a concept hierarchy. (ref. Jiawei Han)
comp generalize( Src, Rslt, Attr, Thld)
{ let distinct be equiv + of 1 by Attr;
while [] where distinct > Thld in Src
do
{ local tmp < - Src[ Attr icomp
Low] Hier;
if [] in tmp then
{ let Attr be High;
Rslt < - [ Attr, ~{ Attr}] in
[ High, ~{ Attr}] in tmp;
} else
Rslt < - Src;
}
}
( Hier( Low, High) holds the concept hierarchy)
Mining (cont.)
For example, with a data relation
Student( Name | Major | Born | GPA) |
Anderson | hist | Vanc. | 3.5 |
Bach | biol | Calg. | 3.7 |
: | : | : | : |
and a concept hierarchy, Hier, which subsumes biol, etc. under
sci;
hist, etc. under hum; Vanc., etc. under BC; BC,
etc. under Can; 3.7, etc. under excel; and so on, we could
deduce
by the invocations
generalize( in Student, out MajorGen, in
Major, 3);
generalize( in MajorGen, out BornGen, in
Born, 3);
generalize( in BornGen, out StudentAbil, in
GPA, 3);
Predictive Modelling for D.S.S.
- Given a relation PastSales( Year, Qty)
- we can find out last year's sales with
[ Qty] where Year = 1996 in PastSales
- Suppose we want to predict sales for next year:
[ Qty] where Year = 1998 in FutSales
- In fact, it would be nice to have
Sales is PastSales ujoin FutSales
- so that we can query Sales for any year, past or future.
Predictive Modelling (cont.)
Here is the code for the last two, using least squares for the model:
let yavg be ( red + of year) /
( red + of 1.0);
let qavg be ( red + of qty) /
( red + of 1.0);
let slope be
( red + of (( year - yavg)*
( qty - qavg)))
/ ( red + of ( (year - yavg)**2));
let base be qavg - slope* yavg;
comp LinInterp( year, qty, slope, base) is
qty slope* year + base;
FutSales is LinInterp icomp PastSales;
Active Databases
- Events are system-generated procedure calls
- and event handlers are procedures (computations).
We have built event handlers as parameterless computations for the usual
database operations,
e.g., comp add:local for additions to relation local.
We could also have programmer-defined events which might automatically
trigger more than one event-handler computation.
On to Programming Language II
Main idea:
Relational operators all apply to attributes
Topics
26--33
Complex Objects
- A basic precept of PL is ``no 2nd-class types''.
- So values in relations should not be restricted.
- So relations can contain relations: ``complex objects''.
- How do we support this in the language?
The domain algebra subsumes
the relational algebra:
any attribute type can be relational,
so relational operations are available for
attributes.
Example (Jaeschke & Schek, 1982):
Books( Authors | Title | Price | Descript) |
A1,A2 | T1 | P1 | D1, D2 |
A2 | T2 | P2 | D1, D2 |
A1 | T3 | P1 | D1,D2,D3 |
Which books by A2 are described by both D1 and D2?
[ Title] where Authors sup { (A2)} and
Descript sup { (D1), (D2)} in Books
- The two sup operators look like set comparisons, but they are
-joins (division) from the relational algebra applied to attributes
in the select clause.
Example (Deshpande & Larson, 1987):
(Missing table: see PostScript version )
Find employees without children:
[ ENo] where not ([] in Children)
in Employee
(Note the projection of the attribute, Children.)
Find employees who took course 314:
[ ENo] where ([] where
CNo= 314 in Training)
in Employee
(Note the selection of the attribute, Training.)
Double nesting (cont.)
Find children of employees who took course 314:
[ Children] where ([] where
CNo= 314 in Training)
in Employee
(Note the result is still a nested relation.)
Find names of those children:
let C be [ Name] in Children;
or
let C be Children.Name
[ C] where ([] where CNo= 314 in
Training)
in Employee
Deeper Nesting
(Missing table: see PostScript version )
Find DEPTs with two A grades:
let A1 be red + of 1;
let S2 be red ujoin of [ Name,
[ A1] in where Grd= "A" in Courses]
in Students;
let A2 be red + of A1;
Ans [ Dept] where Prof.S2.A2= 2 in
EFaculty;
Recursive Nesting
(Missing table: see PostScript version )
Find total costs
(A11 6; A12 5; A1 14; A2 2; A 20):
let SumCost be red + of CalCost;
let CalCost
be if Assembly= then Cost
else Cost + Assembly.SumCost;
Ans [ Id, CalCost] in Assembly;
The NEST and UNNEST operators are too unruly and unimportant to be implemented
except in terms of other primitives which we need anyway.
- syntax in the domain algebra to name a set of attributes;
- a way of suppressing unneeded names of nested relations;
- red ujoin and equiv ujoin in the domain algebra,
which we would have anyway as part of the subsumption of the relational
algebra.
Operators on ABSTRACT DATA TYPES are supported by computations whose arguments
are at once attributes and relations.
On to Programming Language III
Main idea:
Instantiate with domain and relational algebras
Topics
34--40
Instantiation
- When a computation has a state, as for a counter or a flipflop,
different uses of the computation may require different instantiations
of the state.
- The ability to make many instances, as for all molecules of a gas, was
the original motivation of the object-oriented approach.
- Relations are geared to support large numbers of instances, but deal with
sets of instances, not individual ``objects''.
- So we need an instantiation mechanism which does not need to say new
for each individual instance.
- We also need to be able to hide the state of each instance.
Instantiate by domain algebra actualization.
e.g., bank accounts
comp ba( deposit, balance) is
{ state bal init 0;
deposit < - comp( dep) is
{ bal < - bal + dep
alt
dep < - bal - old bal;
} /* deposit */
balance < - comp(b) is
{ b < - bal;
} /* balance */
} /* ba */
Given
accts( ac# | client) |
1 | joe |
2 | sue |
instantiate ba as an attribute
let ba1 be [ deposit, balance] in ba;
accounts [ ac#, client, ba1] in accts;
- [ bal oldbal] are hidden. This state is instantiated
for each account in accts.
- deposit, balance, the computations, are constant attributes
Binary Operators
- What if we need interactions between two (or more) objects of the same
type?
- This is the n-body problem in object-orientation.
- For example, adding two numbers should not be asymmetrical, as in
sending one in a message to the other.
Gas Molecules: hitWall is unary; bang is binary.
comp molecules( initial, hitWall, getvel, putvel, ..)
{ ...
hitWall < - comp( which) is
{ case which:
{ "x": xvel < - - xvel;
"y": yvel < - - yvel;
"z": zvel < - - zvel;
} } /* hitWall */
} /* molecules */
The binary operations are defined outside the class, with parameters.
For instance, bang collides two molecules (of equal mass), swapping
their velocities.
bang comp( mol1, mol2, mola, molb) is
{ local xv1, yv1, zv1, xv2, yv2, zv2;
mol1.getvel( xv1, yv1, zv1);
mol2.getvel( xv2, yv2, zv2);
mol2.putvel( xv1, yv1, zv1);
mol1.putvel( xv2, yv2, zv2);
} /* bang */
This is all invoked on a relation of molecules.
gas for mol# from 1 to 10000
par mol#;
Binary Operators on Objects (cont.)
let mol1 be [ initial, hitWall, getvel, putvel, ..]
in molecules;
let mol2 be [ initial, hitWall, getvel, putvel, ..]
in molecules;
binaryMols ([ mol#1, mol1] in gas) ijoin
[ mol#2, mol2] in gas
update binaryMols change
bang( in mol1, in mol2,
out mol1, out mol2)
using where close in ( binaryMols ijoin near);
( near is another binary computation which inputs mol1 and mol2
and outputs the boolean, close.)
Conclusions
-
- All the above was done with
- Relational algebra
- joins (intersection, union, etc.)
- joins (subset, superset, etc.)
- T-selectors (select, project)
- Domain algebra
- horizontal operations
- vertical operations ( red, equiv, fun, par)
- Views & Recursion
- Computations & Events
-
- Abstract over looping
- Better programmer productivity: 2 o.m.?
- Exhaust the relational algebra
- Separate the domain algebra
- Driven by
- Applicability (usefulness)
- Elegance (generality)
Next: About this document
Prof. T.H. MERRETT
Thu Oct 16 13:20:17 EDT 1997