Next: Kernel Basis Functions Up: svkde Previous: Non-Parametric Density Estimation

Kernel Density Estimation: Parzen Windows

In the previous section we decomposed the CDF into regions or windows $\mathcal{R}$ and estimated the PDF for each window separately. There are several ways to choose the placement (i.e. the center) of the windows; for example they can be disjoint and cover the entire domain as is the case with Frequency Histogram Estimation (Figure 2); the benefit of using such a method is that the structure of the windows are independant of the number of random samples (although the estimation procedure is not). Alternatively, the approach we will consider henceforth will associate a window with each random sample so that we have a sequence of windows $\mathcal{R}_1, \mathcal{R}_2, \mathcal{R}_3, \cdots$ centered at $x_1, x_2, x_3, \cdots$ . Since the number of windows (and their volume) is now dependent on the sample size we will re-write the result (5) as

$\displaystyle \hat{f}_n(x) = \frac{1}{V_n} \times \frac{k_n(x)}{n}$

(6)

where

is the volume of the region $\mathcal{R}_n$ and

is the number of samples that fall inside it; as more samples are generated the regions can either be refined (as is the case with the Parzen method) or we can reverse this logic and gradually increase the size of the regions starting with an inconspicuously small $\mathcal{R}_1$ (which is the case with the nearest-neighbor estimation method).

Let us further generalize [DHS01] the definition of the region $\mathcal{R}_n$ so that we can estimate multi-variate densities; assume the windows $\mathcal{R}_n$ are d-dimensional hypercubes with edges of length ; the volume is then simply .

**Figure:** Let the hypercube window $\mathcal{R}_n$ have dimension ; then is simply a count of the random samples that fall in the square with sides of length and centered at ; in this case .
$\includegraphics[scale=1.25]{image_hypercube_window_function.eps}$

Now that the window has changed we need to revise our definition of

which was previously defined using a simple indicator function $\mathbb{I}_{x-h \leq x_i \leq x+h}$ ; generalising this from counting random samples within an interval to counting within a hypercube can be done by first using a window function $\omega$ of the form:

$\displaystyle \omega(s) = \left\{\begin{array}{cl} 1, & \vert s_j\vert \leq 0.5 \; \; \forall j = 1, \cdots , d\\ 0, & otherwise \end{array}\right.$

which defines the boundary for a unit-hypercube centered at the origin and then defining

as follows:

$\displaystyle k_n(x) = \sum_{i=1}^n \omega \left( \frac{x-x_i}{h_n} \right)$

So $\omega \left( \frac{x-x_i}{h_n} \right) = 1$ if and only if $\frac{x-x_i}{h_n} \leq (\frac{1}{2}, \cdots , \frac{1}{2})^T$ or $x-x_i \leq (\frac{1}{2} h_n, \cdots , \frac{1}{2} h_n )^T$ , in other words if

is less than half the length of an edge of the hypercube away from

in all dimensions $i=1, \cdots , d$ . So $\omega \left( \frac{x-x_i}{h_n} \right)$ defines a (d+1)-dimensional hypercube of volume $(h_n)^{d}$ (since the (d+1)th dimension has edges of length

), centered at

which counts the number of random samples that fall within the d-dimensional hypercube window $\mathcal{R}$ . Finally substituting this result into 6 gives:

$\displaystyle \hat{f}_n(x)$	$\displaystyle =$	$\displaystyle \frac{1}{V_n} \times \frac{k_n}{n}$	(7)
	$\displaystyle =$	$\displaystyle \frac{1}{n} \frac{1}{(h_n)^d} \sum_{i=1}^n \; \omega \left( \frac{x-x_i}{h_n} \right)$

The resulting estimated density is `jagged' since the window (basis) functions in the above linear combination are hypercubes (Figure 3) with abrupt edges; reducing the width

as more samples are generated will smooth out the estimate. Infact if $\lim_{n \to \infty} h_n = 0$ then the estimated density function is proved [Fuk72] to be asymptotically unbiased; $\lim_{n \to \infty} E(\hat{f}_n) = f$ .

Subsections

Next: Kernel Basis Functions Up: svkde Previous: Non-Parametric Density Estimation

Rohan Shiloh SHAH 2006-12-12