308-420A, Assignment 1

Copyright ©2008  T. H. Merrett

COMP 617	Information Systems		Winter 2008		Week 10

	     Polymorphic Relations, Display2D and Transactions

1. The notes on  display2D  from last term say that the current implementation
cannot do certain updates, because they require new attributes to be added
to the relation or because they require the relation to be restructured, say
from flat to nested.

For example, in  Triangle(x1,y1,x2,y2,x3,y3,lc,fc,fp,label)  we had two
triangles, one with a blue boundary and yellow fill in a vertical pattern,
the other with a yellow boundary and blue fill in a vertical pattern.

		Triangle
		( x1   y1   x2   y2   x3   y3  lc fc fp label)
		 5000 4000 2000 4000 5000 3000  1 6  50 "Tri1"
		 3000 1000 5000 1000 5000 2500  6 1  50 "Tri2"

One update  display2D  disallows is changing the thickness of one of the
boundaries, say for Tri2. Since the default line thickness, 1, is used
for both boundaries, the relation to be updated has no attribute (from the
.vocabulary) meaning line thickness.

To allow this update requires _polymorphic_ relations: relations that can
have a varying set of attributes. Then in particular a new attribute can
be added for some tuples only. We can suppose that  display2D  adds the
attribute given in the default .vocabulary with meaning line_thickness: lt.
After the change by  display2D , the value of the  display2D  expression
becomes

		
		( x1   y1   x2   y2   x3   y3  lc fc fp lt label)
		 5000 4000 2000 4000 5000 3000  1 6  50    "Tri1"
		 3000 1000 5000 1000 5000 2500  6 1  50  2 "Tri2"

2. Let's explore polymorphism in this sense. The blank value for  lt  in Tri1
just means that attribute  lt  is not defined for that tuple. This is the same
as saying it is the DC null value. And that is the same as saying that the new
relation is the  ujoin  of two components

	 = T1 ujoin T2

where
		T1( x1   y1   x2   y2   x3   y3  lc fc fp label)
		   5000 4000 2000 4000 5000 3000  1 6  50 "Tri1"
and
		T2( x1   y1   x2   y2   x3   y3  lc fc fp lt label)
		   3000 1000 5000 1000 5000 2500  6 1  50  2 "Tri2"

Note that the DC null value is the same as the attribute being not there at all.
The heart of polymorphism for relations is the converse: every relation has all
possible attributes, but the attributes we don't normally write is are taken to
have DC null values for all tuples.

But we did not create  by a ujoin but rather by an update.
So  update  needs to behave consistently.

We can define	update R add S	to mean exactly	R <- R ujoin S  (and R <+ S
likewise).

This was in fact the original definition, but we find that the present
implementation of jRelix is more limited. Here's what we should have.

	R(X Y)	S(X Z)  T(U)	R ujoin S	R <+ S		update R add S
	  a 1     a q     k	(X Y Z)		(X Y Z)		(X Y Z)
	  b 1     c q		 a 1 q		 a 1 q		 a 1 q
				 b 1		 b 1		 b 1
				 c   q		 c   q		 c    q

Furthermore, if the attributes are completely disjoint, we'll have

				R ujoin T	R <+ T		update R add T
				(X Y U)		(X Y U)		(X Y U)
				 a 1		 a 1		 a 1
				 b 1		 b 1		 b 1
				     k		     k		     k

This means re-implementing  ujoin , which currently gives the Cartesian product
in an example such as  R ujoin T.

It also means cleaning up the implementation of  update .. add  and <+.

3. Since polymorphic relations can add or lose attributes at any time, we should
have updates to do this.

	update  insert 
	update  remove 
	update  insert  remove 
	update  remove  insert 

Inserted attributes must be virtual, already defined by domain algebra.
In the combined alternatives, we can say that the execution order reads from
left to right.

So we can mimic the effect of our last update by  display2D .
	T1 <- where label="Tri1" in Triangle;
	T2 <- where label="Tri2" in Triangle;
	let lt be 2;
	update T2 insert lt;
	 <- T1 ujoin T2;

(This does not add anything new for top-level relations. We could equally
write
	T1 <- where label="Tri1" in Triangle;
	let lt be 2;
	T2 <- [x1,y1,x2,y2,x3,y3,lc,fc,fp.lt,label]
		where label="Tri2" in Triangle;
	 <- T1 ujoin T2;
But we'll see that it is convenient for updating nested relations.)
	
4. Now let's suppose  display2D  tries to add an extra vertex to Tri1. The new
data cannot be accommodated in Triangle(x1,y1,x2,y2,x3,y3,lc,fc,fp,lt,label)
at all, because one figure now has an extra pair of coordinates. We cannot
just add, say x4 and y4, because we made a convention that this would connect
every pair of coordinates with every other, making a flattened tetrahedron,
not the quadrilateral we built wth the xfig interface to  display2D . [See
Note 6 for a retraction of this. The update discussed in Note 4 might now be
done by just adding attributes (x4, y4) as in Note 1.]

So we must convert the affected triangle to a sequenced polygon. This means
converting the flat relation, Triangle, to a nested relation, say

		Triangle
		(Triangle1                       )
		 (label Triangle11   lc fc fp lt)
		        (sq x1   y1 )
		  "Tri1" 1 5000 4000  1  6 50
		         2 2000 4000
			 3 3102 3493
			 4 5000 3000
		  "Tri2" 1 3000 1000  6  1 50  2
			 2 5000 1000
			 3 5000 2500

If  display2D  is intended to be able to do this, we should be able to show the
way in Aldat.

	let sq be 1;
	let Triangle1a be relation(sq,x1,y1);
	let x1 be x2;
	let y1 be y2;
	let sq be 2;
	let Triangle1b be Triangle1a ujoin [sq,x1,y1] in relation(sq,x2,y2);
	let x1 be x3;
	let y1 be y3;
	let sq be 3;
	let Triangle11 be Triangle1b ujoin [sq,x1,y1] in relation(sq,x3,y3);
	let Triangle1 be red ujoin of relation(label,Triangle11,lc,fc,fp,lt);
	Triangle <- [Triangle1] in Triangle;

We didn't need  update..insert  or  update..remove  for this. (We could have
used them to change Triangle, but the top-level assignment was easier.)

We can suppose  display2D  does this for the whole relation as soon as it
detects the need to reconfigure. The update is now easy.

	relation newVert(sq,x1,y1) <- {(2.5,3102,3493)};
	update Triangle/Triangle1/Triangle11 add newVert
		using where Triangle/Triangle1/label = "Tri1";

(and we must follow up by redefining sq as fun + of 1 order sq).

5. Now suppose we have used  display2D  to change the colour of _one_ of the
edges in Tri1 from blue to yellow. The relation must be reconfigured again,
this time as a set of edges for each triangle, so that the colour applies
unambiguously to each edge.

		Triangle
		(Triangle1                                 )
		 (label Triangle12               fc fp lt)
		        ( x1   y1   x2   y2  lc)
		  "Tri1" 5000 4000 2000 4000  6   6 50
		         2000 4000 3102 3493  1
			 3102 3493 5000 3000  1
			 5000 3000 5000 4000  1
		  "Tri2" 3000 1000 5000 1000  6   1 50  2
			 5000 1000 5000 2500  6
			 5000 2500 3000 1000  6

	let x2 be fun succ of x1 order sq;
	let y2 be fun succ of y1 order sq;
	let Triangle12 be [x1,y1,x2,y2,lc] in Triangle11;
	update Triangle/Triangle1 insert Triangle12 remove Triangle11, lc;

Note that insert must be done first, or Triangle11 won't be there to construct
Triangle12 from.

Now we mimic the update by  display2D.

	relation Tri1x1y1(label,x1,y1) <- {("Tri1",5000,4000)};
	update Triangle/Triangle1/Triangle12 change lc <- 6 using Tri1x1y1;

6. I have changed my mind about the convention stated at the beginning of
Note 4. Display2D and display3D should both allow any number of coordinate-set
attributes, with the convention that edges connect the vertices in the order
given. Thus a quadrilateral could be represented as
		Quad(x1 y1 x2 y2 x3 y3 x4 y4)
		      0  0  2  0  2  2  1  2
or as
		Quad(sq x y)
		      0 0 0
		      1 2 0
		      2 2 2
		      3 1 2
the latter being more general, of course.

If we want to connect every vertex to every other, we must supply a set (not a
sequence) of coordinates in a nested relation.
		Simplex(x y z)
			0 0 0
			0 1 1
			1 0 1
			1 1 0
will display as the four vertices of a tetrahedron, but
		Simplex			or	Simplex
		(label simp   )			(lc simp   )
		 "s1"  (x y z)			  1 (x y z)
			0 0 0			     0 0 0
			0 1 1			     0 1 1
			1 0 1			     1 0 1
			1 1 0			     1 1 0
will display the six edges, too: in the first case the label will be associated
with the tetrahedron; in the second the edges will be blue.

Any polyline (sequence of vertices connected by edges), whether specified by
sets of coordinate attributes or by a sequence of tuples, is displayed open by
default. If data pertaining to faces (such as fc fill_colour or fp fill_pattern,
or, suitably placed, a strg label) is present, the polyline is closed by the
display. (Of course, if the polyline specification ends with the starting
vertex, then it will appear closed even in default.) Thus
		Triangle(x1,y1,x2,y2,x3,y3,lc)
			  0  0  1  0  1  1  1
will show two blue edges connecting the three vertices, but
		Triangle(x1,y1,x2,y2,x3,y3,lc,fc,fp,label)
			  0  0  1  0  1  1  1  6 50 "Tri"
will show the triangle filled with yellow vertical stripes and all three (blue)
edges of its boundary. (We need only one of  fc,  fp  or  label  to display all
three edges.) (This last is exactly the present behaviour of display2D.)

Similarly,
		Triangle(sq x y)
			  0 0 0
			  1 1 0
			  2 1 1
will show the two edges of an open triangle, while
		Triangle
		(fc Tri     )
		    (sq x y)
		  1   0 0 0
		      1 1 0
		      2 1 1
will show a blue closed triangle with all three edges (in the default colour of
black).

Of course,
		Triangle(x1,y1,x2,y2,x3,y3,x4,y4)
			  0  0  1  0  1  1  0  0
or
		Triangle(sq x y)
			  0 0 0
			  1 1 0
			  2 1 1
			  4 0 0
will both show closed triangles, because the final vertex is the same as the
first.

7. In summary, this requires the following changes to the relix implementation.
   1) Change  ujoin  so, given operands with disjoint attributes, it gives a
      union instead of a Cartesian product.
   2) Change <+ and  update..add  to be exactly the same as  L <- L ujoin R
      in L <+ R, etc.
      This requires a polymorphic representation of relations, best as a virtual
      union of the separate pieces.
   3) Implement  update..insert  and  update..remove  This must use the above
      polymorphic representation of relations.
   4) Augment  display2D  to support all  xfig  updates in this framework.  

   Related to this are the changes to update required by the  undo  and  redo
   operators (week6p1).
   5) Add the builtin  symdiff()  computation for event handlers. (But leave
      .trigger  alone.)
   6) Implement  undo  and  redo  as deferred event handlers: they are defined
      automatically by the update, and invoked explicitly by the programmer.
      (This is the opposite of regular event handlers, which are defined by
      the programmer and invoked automatically by the update.) Their names are
      constructed by the rules
		undo|redo : add|delete : ()
		undo|redo : change : []()
      just as for regular event handlers.
   7) Implement the boolean computations  undone  and  redone  similarly (Note 4
      of week6p1.
   8) Note that undo, redo, undone and redone need state variables such as tape,
      undid, redid and parent. These were created by the containing computation,
      woops, in Note 2 of week6p1, but the system can automatically produce the
      result of calling woops().

   Undo, redo, etc. are useful in their own right, but they arose while
   investigating transactions (week3p1, week6p1). We cannot implement
   transactions in jRelix because it is a main-memory implementation of
   Aldat, but we can add the syntax for transactions. Here it is.
   1) Anywhere a statement is allowed, we should now accept that it can begin
      by the keyword  transaction  or synonyms such as  transactiad  or  iad
      (standing for isolated-atomic-durable).
      Most frequently this will be used with bracketed statements, and the
      syntax will look like
		iad{ ... };	or transactiad{ ... }
   2) At the end of any transaction we can accept a compensating transaction,
      specified as  compensated by  or synonyms such as  sateby .
		iad{ ... }sateby{ ... };
      Of course, it is up to the programmer to ensure that the compensating
      transaction completely undoes the transaction. Hey can often call undo.
      This supports sagas (pp.120--1 of [BernNewc:TP], Note 6 of week3p1): the
      implementation would invoke the compensating transactions when it finds
      that the saga must be aborted.
   3) The system must know if a transaction is part of a saga, in order to be
      able to run the compensating transactions in case of abort, and it must
      know which saga it is in. So we allow a parameterized variant of  iad
      and its synonyms, passing in the name of the saga as a string.
		transaction("sag1"){ ... }sateby{ ... };
      Note that compensating transactions are obligatory for any transaction
      which is part of a saga.
   4) It seems to me that syntax such as  commit  and   abort  is completely
      replaced by the  undo  and  redo  operations and their tests (undone, ..),
      but many more examples must be found and worked.
   5) I have not decided if the syntax should explicitly describe mechanisms
      such as enqueuing and dequeuing multi-host transactions against
      communications crashes (pp.118--9 of [BernNewc:TP]), "conversations"
      or logging (both on p.122 of [BernNewc:TP]). I suspect these are best
      left to the implementation as part of the mechanism to guarantee i.a.d.
      (or a.d. if i. is imposible, as in sagas).

   [BernNewc:TP] P. A. Bernstein and E. Newcomer, "Principles of Transaction
   Processing", San Francisco, Morgan Kaufmann Publishers, Inc., 1997