Copyright ©2008  T. H. Merrett

COMP 617	Information Systems		Winter 2008		Week 2

				Time Series, etc.

1. An example from the AQuery language [LernShash:AQuery].
   For a given stock and a given date, find the best profit one could
   obtain by buying the stock and then selling it later that day.

	let runningMin be par min of price order ts by ID, tradeDate;
	let runningProfit be price - runningMin;
	let maxProfit be equiv max of runningProfit by ID, tradeDate;
	let buyTS be equiv min of ts by runningMin, ID, tradeDate;

Trades(ID tradeDate price ts      ) runningMin runningProfit maxProfit buyTS
       ACME  030511  1.25 030511.0	1.25	   0.00		0.09   030511.0
       ACME  030511  1.33 030511.1	1.25	   0.08		0.09   030511.0
       ACME  030511  1.22 030511.2	1.22	   0.00		0.09   030511.2
       ACME  030511  1.15 030511.3	1.15	   0.00		0.09   030511.3
       ACME  030511  1.17 030511.4	1.15	   0.02		0.09   030511.3
       ACME  030511  1.17 030511.5	1.15	   0.02		0.09   030511.3
       ACME  030511  1.18 030511.6	1.15	   0.03		0.09   030511.3
       ACME  030511  1.21 030511.7	1.15	   0.06		0.09   030511.3
       ACME  030511  1.24 030511.8	1.15	   0.09		0.09   030511.3
       ACME  030511  1.23 030511.9	1.15	   0.08		0.09   030511.3

	let priceSell be price;
	let tsSell be ts;
	Sell <- [ID,tradeDate,priceSell,tsSell,runningMin]
		where runningProfit=maxProfit in Trade;

		Sell(ID tradeDate priceSell tsSell  runningMin) 
		     ACME  030511    1.24   030511.8   1.15

	let priceBuy be price;
	let tsBuy be ts;
	Buy <- [ID,tradeDate,priceBuy,tsBuy] in (Sell ijoin
	     [ID,tradeDate,priceBuy,tsBuy,runningMin] where ts=buyTS in Trades);

		Buy(ID tradeDate priceBuy tsBuy   ) 
		    ACME  030511   1.15   030511.3

	BuySell <- Buy ujoin [ID,tradeDate,priceSell,tsSell] in Sell;

		BuySell(ID tradeDate priceBuy tsBuy   priceSell tsSell  )
			ACME  030511   1.15   030511.3   1.24   030511.8

2. A second example from the AQuery language [LernShash:AQuery].
   Find the count of packets and their average length within each flow.
   (A "flow" from a source s to a destination d ends if there is a
    2-minute gap between consecutive packets from s to d.)

	let tsPred be par pred of ts order ts by src, dest;
	let flowStart be 
		if ts-tsPred>=0.0002 then 1 else
		if ts>

	Only this must be defined in a context which allows accum to be a state
	variable and intializes it. The context must also supply the parameter
	n.
		comp windows(gp,qty,ts,n,data,runsum) is
		{ comp parop runsum(value,accum) is
		  { accum <- [red + of if seq<=n then qty else 0]
			     where Gp=gp in [Gp,qty,ts] in data
			     << initialize  accum  for  runsum >>
		  } alt
		  { accum <- accum + par succ(n-1) of value order ts by gp
				   - par pred of value order ts by gp };
		  let seq be par + of 1 order ts by gp;
		  let Gp be gp;
		}
	We will assume that the partial functional mapping invocation of  runsum
	matches the first alt-block (accum is output) at the beginning of each
	grouping, and matches the second (accum is input) everywhere else.
	(Funops are similar but easier: there is no gp attribute, and par..by gp
	becomes fun.. and  where Gp=gp (and Gp) is not needed.)
	NB maybe we should just call them all funops.

	Here is the invocation of all this:
		windows(in ID, in price, in ts, in 5, in Trades, out runsum);
		let 5sums be par runsum of price order ts by ID;
		let 5avgs be 5sums/5;

4. The second example from [WhitShash:stockSeries] gets us into serious time
   series concepts, so try Google or the QA280.-- section of the library stacks.
   Here's one to try: the convolution [en.wikipedia.org/wiki/Convolution,
   mathworld.wolfram.com/Convolution.html, cnx.org/content/m11541/latest/,..]

	Convolution of two time series x1,..,xn CONV y1,..,yn =
		0 x0y0
		1 y0y1 + x1y0
		2 x0y2 + x1y1 + x2y0
		  :
		j Sum_{k=0}^{n-1} x(k)*y(j-k)
		  :

	Let's try it for two rectangular series x = 11111, y = 11111
	(where the first terms are x0, y0, resp. and we assume 0s before, after)
		0	1 = x0y0
		1	2 = y0y1 + x1y0
		2	3 = x0y2 + x1y1 + x2y0
		3	4 = x0y3 + x1y2 + x2y1 + x3y0
		4	5 = x0y4 + x1y3 + x2y2 + x3y1 + x4y0
		5	4 = x1y4 + x2y3 + x3y2 + x4y1
		6	3 = x2y4 + x3y3 + x4y2
		7	2 = x3y4 + x4y3
		8	1 = x4y4

	In an alternative definition, one of the series is periodic (not 0s
	before and after).
	Let's try rectangular x = 11 and periodic y = ..11001100..,
	i.e., x0 = 1, x1 = 1; .., y0 = 1, y1 = 1, y2 = 0, y3 = 0, ..
	Showing only nonzero entries:
			:
		-2	1 = x1y_{-3}
		-1	0 =
		 0	1 = x0y0
		 1	2 = y0y1 + x1y0
		 2	1 = x1y1
		 3	0 = 
		 4	1 = x0y4
			:

	To motivate finding relational implementations of convolution, here are
	some uses (from wikipedia):
	# In statistics, a weighted moving average is a convolution.
	# In probability theory, the probability distribution of the sum of two
	  independent random variables is the convolution of their individual
	  distributions.
	# In optics, many kinds of "blur" are described by convolutions. A
	  shadow (e.g. the shadow on the table when you hold your hand between
	  the table and a light source) is the convolution of the shape of the
	  light source that is casting the shadow and the object whose shadow is
	  being cast. An out-of-focus photograph is the convolution of the
	  sharp image with the shape of the iris diaphragm. The photographic
	  term for this is bokeh.
	# Similarly, in digital image processing, convolutional filtering plays
	  an important role in many important algorithms in edge detection and
	  related processes.
	# In linear acoustics, an echo is the convolution of the original sound
	  with a function representing the various objects that are reflecting
	  it.

	The convolution theorem says a convolution of two functions is the
	inverse Fourier transform of the product of the Fourier transforms
	of the two functions, so having FFT will also give us convolutions.

5. On the subject of Fourier transforms, [ColeShashZhao:uncoopTS] point out that
   Fourier transforms, wavelets and the singular value decomposition do _not_ do
   very well at reducing the dimensionality of certain "uncooperative" time
   series (time series in which the power of the time series is spread over
   all Fourier coefficients). They improve on "sketches" to reduce the
   dimensionality of time series while approximately preserving the distance
   between any two (for similarity queries).

   Sketches are random projections: "taking [the] inner product of each time
   series window, considered as a vector, with a set of random vectors". If
   there are d of these random vectors, then the d results can be thought of
   as a vector in a d-dimensional space. Repeating this 2b more times and
   using medians is the approximation.

   [ColeShashZhao:uncoopTS] use a variety of other techniques, including
   convolution, to speed up the process of making sketches.

[LernShash:AQuery] Alberto Lerner, Dennis Shasha "AQuery, Query Language for
Ordered Data Optimization Techniques, and Experiments" VLDB 29 2003

[WhitShash:stockSeries] Arthur Whitney, Dennis Shasha "Lots o' Ticks: real-time
high performance time series queries on billions of trades and quotes" SIGMOD
2001

[ColeShashZhao:uncoopTS] Richard Cole, Dennis Shasha, Xiaojian Zhao "Fast Window
Correlations Over Uncooperative Time Series" KDD 2005

[See cs.nyu.edu/shasha/papers/papers.html for these papers in pdf.]