Copyright ©2008 T. H. Merrett COMP 617 Information Systems Winter 2008 Week 7 Series Queries 1. Two motivating queries from [DiaoImmGyll:SASE+] Query 2 captures a one-hour period in which the price of a particular stock increased from 10 to 20 and the trading volume of the stock stayed relatively stable. Query 3 captures a more complex trend for each stock: in the last hour, the volume started high, but after a period when the price increased or remained relatively stable, the volume plummeted. To approach these queries, we need some preliminaries. 2. Running automata. Here's the autmaton from [week1p1]. aababb Str(sq ch) DFA( sS ch eS ) term( eS) 1 a 1 a 125 136 (a|b)*(abb|aa*b) 2 a 1 b 1 14 3 b 125 a 125 4 a 125 b 136 5 b 136 a 125 6 b 136 b 14 14 a 125 14 b 1 Here's a programmer-written funop [week2p1]. comp makeAutom(DFA, term, start, runauto) is { state automSt intg; comp runauto(value,accum) is funop { automSt <- DFA[automSt,value]; accum <- [] in term[automSt] } automSt <- start } (Note automSt <- DFA[automSt,value]; is syntactic shorthand for automSt <- [pick eS] in DFA[automSt,value]; The type system can probably convert from singleton unary to integer. NB [pick eS] is also shorthand for the less direct [red max of eS].) Let's run this on Str. makeAutom(in DFA, in term, in 1, out runauto); let res be fun runauto of ch order sq; let match be red max of res; << NB false < true OR red or >> let matchlast be red max of (sq = red max of sq) and res; Str(sq ch) res match matchlast 1 a f t t 2 a f t t 3 b t t t 4 a f t t 5 b t t t 6 b t t t 3. Automata on predicates. Let's look at a STOCK relation and derive some predicates related to queries 2 and 3 above. let firstime be red min of time; let firstvol be red max of if time = firstime then volume else -1; let volstable be 0.9*firstvol <= volume <= 1.1*firstvol; let prevprice be if time = firstime then price - 1 else fun pred of price order time; let priceincr be price >= prevprice; STOCK (symbol time price volume) volstable priceincr allVSorPI GOOG 1000 10 1100 t t t GOOG 1006 12 1400 f t t GOOG 1012 10 1200 t f t If we want to know whether the volume was stable or the price increased in the period covered by this relation, we can find let allVSorPI be red and of (volstable or priceincr) which is true. But if we want to detect a pattern such as tft or ttf, we need automata. autTFT termTFT autTTF termTTF (sS pr eS) (S) (sS pr eS) (S) 0 t 1 3 0 t 1 3 0 f 0 0 f 0 1 t 1 1 t 2 1 f 2 1 f 0 2 t 3 2 t 2 2 f 2 2 f 3 3 t 3 3 t 3 3 f 3 3 f 3 makeAutom(in autTFT, in termTFT, in 0, out runauto); let vsTFT be fun runauto of volstable order time; let allvsTFT be red max of vsTFT; makeAutom(in autTTF, in termTTF, in 0, out runauto); let piTTF be fun runauto of priceincr order time; let allpiTTF be red max of piTTF; And if we want to detect the combined pattern, we could combine the automata: the two predicates combine to form the integers 0,..,3. let automIN be 2*(if volstable then 1 else 0) + (if priceincr then 1 else 0); autCombined termCombined (sS pr eS) (sS pr eS) (S) 0 0 0 2 0 0 3 0 1 0 2 1 0 0 2 0 2 2 3 0 3 1 2 3 1 1 0 0 3 0 3 1 1 2 3 1 3 1 2 0 3 2 3 1 3 1 3 3 3 makeAutom(in autCombined, in termCombined, in 0, out runauto); let combined be fun runauto of automIN order time; let allCombined be red max of (time = red max of time) and combined; Of course, this is redundant for this example, because allCombined should be identical to allvsTFT and allpiTTF . But if we had more than three tuples in STOCK, then allvsTFT would be true if tft occurred _anywhere_ under volstable, and allpiTTF would be true if ttf occurred _anywhere_ under priceincr , and we would need to write a combined automaton for the synchronized occurrence of the patterns. We'll need an automaton for Query 3. 4. Windowing. Let's do some preparation for queries 2 and 3, first considering the whole of relation STOCKS. (The stocks are GOOGle and, later, BALLard.) let firstime be red min of time; let firstvol be red max of if time = firstime then volume else -1; let volstable be 0.9*firstvol <= volume <= 1.1*firstvol; let firstprice be red max of if time = firstime then price else -1; let pristable be 0.9*firstprice <= price <= 1.1*firstprice; let prevprice be if time = firstime then price - 1 else fun pred of price order time; let priceincr be price >= prevprice; let volplunge be volume <= 0.8*(fun pred of volume order time); Here's the effect on STOCK. (For the moment, look only at the first column under each of the four virtual attributes shown.) STOCK (symbol time price volume) volstable pristable priceincr volplunge vpp GOOG 1000 10 1100 t t t f f GOOG 1006 12 1400 f t f t t t f f f GOOG 1012 10 1200 t f t t f t f f t f f f t GOOG 1018 13 1182 t f t f t f t t t f f f f GOOG 1024 15 1200 t f t f f f t t t f f f f GOOG 1030 16 1300 f t t f f f t t t f f f f GOOG 1036 11 1100 t f t t f t f f f f f f f GOOG 1042 18 1090 t f t f f f t t t f f f f GOOG 1048 19 1080 t f t f f f t t t f f f f GOOG 1054 20 1070 t f t f f f t t t f f f f GOOG 1100 21 1000 t f f f f f t t t f f f f GOOG 1106 21 800 f f f f t t t t f GOOG 1112 20 870 f f f f f GOOG 1118 21 1000 t f t f f GOOG 1124 23 1070 t f t f f GOOG 1130 24 1070 t f t f f GOOG 1036 20 1070 t f t f f GOOG 1142 24 870 f f f f f GOOG 1148 26 800 f f f f f We can find out if the volume is stable and the price is stable or increasing over the whole relation: red and of (volstable and (pristable or priceincr)) This is false (as is red and of volstable by itself, and red and of (pristable or priceincr) by itself). But we don't want to know if these hold over the whole relation, but only over hour-long windows. We need a windowing reduction. We replace red above by a new operator, redwin(), which can have a parameter saying how big the window is. For the moment, let's use an integer count, redwin(10). let vpp be redwin(10) and of (volstable and (pristable or priceincr)) order time; Note that redwin() must be controlled by an order clause (and it may also have a group by clause). Two windows are shown above in the second and third columns under volstable, pristable, priceincr and volplunge. The resulting attribute, vpp, is shown on the right, with the value for each window shown at the starting entry for the window. Note that redwin() wraps around cyclically, unless we write code to stop it. The idea behind redwin() is that each window is considered to be a separate relation, and any virtual attribute, especially aggregates from red, equiv, fun and par, mentioned in a redwin() statement is to be actualized in that separate relation. Redwin() returns a single value for every window, which is why it must be a red . 5. Here's redwin() in a group-by context. let allPI be redwin(3) and of priceincr order time by symbol; STOCK (symbol time price volume) priceincr allPI GOOG 1000 10 1100 t f GOOG 1006 12 1400 t f GOOG 1012 10 1100 f t GOOG 1018 13 1000 t t GOOG 1024 15 1200 t f << remember: cyclic >> GOOG 1030 16 1300 t f BALL 1000 2 700 t f BALL 1006 3 800 t f BALL 1012 1 600 f t BALL 1018 1 600 t f BALL 1024 2 800 t f << remember: cyclic >> BALL 1030 1 600 f t 6. We could have a variable window size, determined by an attribute of the relation. let ct be redwin(wsize) + of 1 order seq; let tot be redwin(wsize) + of seq order seq; A(seq wsize) ct tot 1 3 3 6 2 3 3 9 3 4 4 13 4 2 2 9 5 2 2 6 7. In addition to the counting parameter for redwin() we can have a logical parameter which tells us when to stop. let firstime be red min of time; let stop be time >= firstime + 18; let allPI be redwin(stop) and of priceincr order time by symbol; Both count and stopping predicate may be passed to redmin() (distinguished by type), and the logical or is taken to determine the window size. The stopping predicate can be written so as to stop the calculations before the specified window size is reached. let vpp be redwin(10,not(volstable and (pristable or priceincr))) and of (volstable and (pristable or priceincr)) order time; stops the window, and the red and of calculation, as soon as a false value of the predicate is reached. An implementation might even be able to use the stopping predicate to skip the window past the point of failure, since intermediate windows will give false results. Post-lecture note (080412). Let's generalize this to _any_ predicate limiting the window range given by the size parameter. Thus in a window of size 5 with sequence number s running within the window from 1 to 5, the predicate s>1 & s<5 or 2<=s & s<=4 would limit the window to tuples with those sequence numbers. See queryHum.pdf, Note 9, for an example. For reasons motivated by that example, we can allow the window size parameter to be negative as well as positive, and just use the absolute value. As a pragma to help the implementation know when to start or stop, in the case of a window predicate-limited to single contiguous subset of the range given by the size parameter, we could provide side-effect predicates which toggle when their argument becomes true: start and stop . Thus in the above example in the predicate start(2=s) & stop(s=5), start() would initially be false and stop() true, and as s progresses from 1 to 5 the predicate would be F,T,T,T,F. 8. Query 2 captures a one-hour period in which the price of a particular stock increased from 10 to 20 and the trading volume of the stock stayed relatively stable. let firstime be red min of time; let firstvol be red max of if time = firstime then volume else -1; let volstable be 0.9*firstvol <= volume <= 1.1*firstvol; let prevprice be if time = firstime then price - 1 else fun pred of price order time; let priceincr be price >= prevprice; let price10 be red max of price = 10; let price20 be red max of price = 20; let query2 be redwin(time >= firstime + 60 or time < firstime) and of (time10 and time20 and volstable and priceincr) order time by symbol; STOCK (symbol time price volume) volstable priceincr query2 GOOG 1000 10 1100 t t f GOOG 1006 12 1400 f t GOOG 1012 10 1200 t f GOOG 1018 13 1182 t t GOOG 1024 15 1200 t t GOOG 1030 16 1300 f t GOOG 1036 11 1100 t t f t f GOOG 1042 18 1090 t t t t GOOG 1048 19 1080 t t t t GOOG 1054 20 1070 t t t t GOOG 1100 21 1000 t t GOOG 1106 21 800 f t GOOG 1112 20 870 f t f t f GOOG 1118 21 1000 t f t t GOOG 1124 23 1070 t f t t GOOG 1130 24 1070 t f t t GOOG 1036 20 1070 f t GOOG 1142 24 870 t f GOOG 1148 26 800 t t f t f Note that redwin( .. or time < firstime prevents cycling back. 9. Query 3 captures a more complex trend for each stock: in the last hour, the volume started high, but after a period when the price increased or remained relatively stable, the volume plummeted. We need an automaton to ensure that the volume plummets _after_ the stable increase. let firstime be red min of time; let firstvol be red max of if time = firstime then volume else -1; let hivol be red max of firstvol > 1000; let firstprice be red max of if time = firstime then price else -1; let pristable be 0.9*firstprice <= price <= 1.1*firstprice; let prevprice be if time = firstime then price - 1 else fun pred of price order time; let priceincr be price >= prevprice; let priceOK be pristable or priceincr; let volplunge be volume <= 0.8*(fun pred of volume order time); let automIn be 2*(if priceOK then 1 else 0) +if volplunge then 1 else 0; relation automQ3(sS,in,eS) <- {(0,0,0),(0,1,0),(0,2,1),(0,3,0), (1,0,0),(1,1,0),(1,2,1),(1,3,2),(2,0,2),(2,1,2),(2,2,2),(2,3,2)}; relation termQ3(eS) <- {(2)}; makeAutom(in automQ3, in termQ3, in 0, out runauto); let automCheck be fun runauto of automIn order time; let query3 be redwin(time >= firstime + 60 or time < firstime) and of (hivol and red max of automCheck) order time by symbol; STOCK (symbol time price volume) priceOK volplunge query3 GOOG 1000 10 1100 t f f GOOG 1006 12 1400 t t f f f GOOG 1012 10 1200 t f t f f f t GOOG 1018 13 1182 t t t f f f t GOOG 1024 15 1200 t t t f f f t GOOG 1030 16 1300 t t t f f f t GOOG 1036 11 1100 t f t f f f t GOOG 1042 18 1090 t t t f f f t GOOG 1048 19 1080 t t t f f f t GOOG 1054 20 1070 t t t f f f t GOOG 1100 21 1000 t t f f t GOOG 1106 21 800 t t f GOOG 1112 20 870 f GOOG 1118 21 1000 f GOOG 1124 23 1070 f GOOG 1130 24 1070 f GOOG 1036 20 1070 f GOOG 1142 24 870 f GOOG 1148 26 800 f 10. I have added only one new construct, redwin(), in all the above. (Running automata follows directly from the programmer-definable funop of [week2p1], and this is an expected capability, given that Aldat supports funops at all.) Note that redwin() is possible only because of virtual attributes: it is conceptually as easy to think of an attribute being actualized on a window as on a whole relation. Without virtual attributes, all sorts of constructs would be needed as we see in [DiaoImmGyll:SASE+]. But redwin() was not the first construct I came up with. On my way there I first produced a much more elaborate and less powerful set of operators. I'd like to discuss these even though they are replaced by redwin(), in order to show some of the discovery process. Also, the obsolete facilities might offer a way to implement redwin(). let w be winsize(win) 3 order seq; << shorthand below: 3,4,0 is {(3),(4),(0)} >> let base be fun + of 1 order seq; let linear_w be where win