Copyright ©2008 T. H. Merrett COMP 617 Information Systems Winter 2008 Week 2 Time Series, etc. 1. An example from the AQuery language [LernShash:AQuery]. For a given stock and a given date, find the best profit one could obtain by buying the stock and then selling it later that day. let runningMin be par min of price order ts by ID, tradeDate; let runningProfit be price - runningMin; let maxProfit be equiv max of runningProfit by ID, tradeDate; let buyTS be equiv min of ts by runningMin, ID, tradeDate; Trades(ID tradeDate price ts ) runningMin runningProfit maxProfit buyTS ACME 030511 1.25 030511.0 1.25 0.00 0.09 030511.0 ACME 030511 1.33 030511.1 1.25 0.08 0.09 030511.0 ACME 030511 1.22 030511.2 1.22 0.00 0.09 030511.2 ACME 030511 1.15 030511.3 1.15 0.00 0.09 030511.3 ACME 030511 1.17 030511.4 1.15 0.02 0.09 030511.3 ACME 030511 1.17 030511.5 1.15 0.02 0.09 030511.3 ACME 030511 1.18 030511.6 1.15 0.03 0.09 030511.3 ACME 030511 1.21 030511.7 1.15 0.06 0.09 030511.3 ACME 030511 1.24 030511.8 1.15 0.09 0.09 030511.3 ACME 030511 1.23 030511.9 1.15 0.08 0.09 030511.3 let priceSell be price; let tsSell be ts; Sell <- [ID,tradeDate,priceSell,tsSell,runningMin] where runningProfit=maxProfit in Trade; Sell(ID tradeDate priceSell tsSell runningMin) ACME 030511 1.24 030511.8 1.15 let priceBuy be price; let tsBuy be ts; Buy <- [ID,tradeDate,priceBuy,tsBuy] in (Sell ijoin [ID,tradeDate,priceBuy,tsBuy,runningMin] where ts=buyTS in Trades); Buy(ID tradeDate priceBuy tsBuy ) ACME 030511 1.15 030511.3 BuySell <- Buy ujoin [ID,tradeDate,priceSell,tsSell] in Sell; BuySell(ID tradeDate priceBuy tsBuy priceSell tsSell ) ACME 030511 1.15 030511.3 1.24 030511.8 2. A second example from the AQuery language [LernShash:AQuery]. Find the count of packets and their average length within each flow. (A "flow" from a source s to a destination d ends if there is a 2-minute gap between consecutive packets from s to d.) let tsPred be par pred of ts order ts by src, dest; let flowStart be if ts-tsPred>=0.0002 then 1 else if ts> Only this must be defined in a context which allows accum to be a state variable and intializes it. The context must also supply the parameter n. comp windows(gp,qty,ts,n,data,runsum) is { comp parop runsum(value,accum) is { accum <- [red + of if seq<=n then qty else 0] where Gp=gp in [Gp,qty,ts] in data << initialize accum for runsum >> } alt { accum <- accum + par succ(n-1) of value order ts by gp - par pred of value order ts by gp }; let seq be par + of 1 order ts by gp; let Gp be gp; } We will assume that the partial functional mapping invocation of runsum matches the first alt-block (accum is output) at the beginning of each grouping, and matches the second (accum is input) everywhere else. (Funops are similar but easier: there is no gp attribute, and par..by gp becomes fun.. and where Gp=gp (and Gp) is not needed.) NB maybe we should just call them all funops. Here is the invocation of all this: windows(in ID, in price, in ts, in 5, in Trades, out runsum); let 5sums be par runsum of price order ts by ID; let 5avgs be 5sums/5; 4. The second example from [WhitShash:stockSeries] gets us into serious time series concepts, so try Google or the QA280.-- section of the library stacks. Here's one to try: the convolution [en.wikipedia.org/wiki/Convolution, mathworld.wolfram.com/Convolution.html, cnx.org/content/m11541/latest/,..] Convolution of two time series x1,..,xn CONV y1,..,yn = 0 x0y0 1 y0y1 + x1y0 2 x0y2 + x1y1 + x2y0 : j Sum_{k=0}^{n-1} x(k)*y(j-k) : Let's try it for two rectangular series x = 11111, y = 11111 (where the first terms are x0, y0, resp. and we assume 0s before, after) 0 1 = x0y0 1 2 = y0y1 + x1y0 2 3 = x0y2 + x1y1 + x2y0 3 4 = x0y3 + x1y2 + x2y1 + x3y0 4 5 = x0y4 + x1y3 + x2y2 + x3y1 + x4y0 5 4 = x1y4 + x2y3 + x3y2 + x4y1 6 3 = x2y4 + x3y3 + x4y2 7 2 = x3y4 + x4y3 8 1 = x4y4 In an alternative definition, one of the series is periodic (not 0s before and after). Let's try rectangular x = 11 and periodic y = ..11001100.., i.e., x0 = 1, x1 = 1; .., y0 = 1, y1 = 1, y2 = 0, y3 = 0, .. Showing only nonzero entries: : -2 1 = x1y_{-3} -1 0 = 0 1 = x0y0 1 2 = y0y1 + x1y0 2 1 = x1y1 3 0 = 4 1 = x0y4 : To motivate finding relational implementations of convolution, here are some uses (from wikipedia): # In statistics, a weighted moving average is a convolution. # In probability theory, the probability distribution of the sum of two independent random variables is the convolution of their individual distributions. # In optics, many kinds of "blur" are described by convolutions. A shadow (e.g. the shadow on the table when you hold your hand between the table and a light source) is the convolution of the shape of the light source that is casting the shadow and the object whose shadow is being cast. An out-of-focus photograph is the convolution of the sharp image with the shape of the iris diaphragm. The photographic term for this is bokeh. # Similarly, in digital image processing, convolutional filtering plays an important role in many important algorithms in edge detection and related processes. # In linear acoustics, an echo is the convolution of the original sound with a function representing the various objects that are reflecting it. The convolution theorem says a convolution of two functions is the inverse Fourier transform of the product of the Fourier transforms of the two functions, so having FFT will also give us convolutions. 5. On the subject of Fourier transforms, [ColeShashZhao:uncoopTS] point out that Fourier transforms, wavelets and the singular value decomposition do _not_ do very well at reducing the dimensionality of certain "uncooperative" time series (time series in which the power of the time series is spread over all Fourier coefficients). They improve on "sketches" to reduce the dimensionality of time series while approximately preserving the distance between any two (for similarity queries). Sketches are random projections: "taking [the] inner product of each time series window, considered as a vector, with a set of random vectors". If there are d of these random vectors, then the d results can be thought of as a vector in a d-dimensional space. Repeating this 2b more times and using medians is the approximation. [ColeShashZhao:uncoopTS] use a variety of other techniques, including convolution, to speed up the process of making sketches. [LernShash:AQuery] Alberto Lerner, Dennis Shasha "AQuery, Query Language for Ordered Data Optimization Techniques, and Experiments" VLDB 29 2003 [WhitShash:stockSeries] Arthur Whitney, Dennis Shasha "Lots o' Ticks: real-time high performance time series queries on billions of trades and quotes" SIGMOD 2001 [ColeShashZhao:uncoopTS] Richard Cole, Dennis Shasha, Xiaojian Zhao "Fast Window Correlations Over Uncooperative Time Series" KDD 2005 [See cs.nyu.edu/shasha/papers/papers.html for these papers in pdf.]