Kernel Density Estimation

Next: Non-Parametric Density Estimation

KERNEL DENSITY ESTIMATION TUTORIAL

ROHAN SHILOH SHAH

CS644A PATTERN RECOGNITION

In classification and regression, the primary goal is the estimation of a prediction function. The likelihood or conditional density is one such function; for regression $p(y\vert x) = p( y, x)/ \int p(y, x) d y$ and similarily for classification $p(c\vert x) = p( c, x)/ \sum_c p(c, x)$ where is a class label from the set of labels $\mathfrak{C}$ . These are supervised learning tasks since each training example is paired with a corresponding label or annotation; for regression $y \in \mathbb{R}^n$ and for classification $c \in \mathfrak{C}$ .

Given an unannotated training data set, we seek to build a model, specifically an unconditional probability density function, that delineates the essential information contained in the observation space $\mathcal{X}$ . This is un-supervised learning since it is performed in the abscence of annotations (and hence without any cost or loss function) through direct interaction with `new experiences'.

One common approach is to assume the density has a fixed parametric form and then to estimate the parameters (using a maximum likelihood approach) associated with this form; for example using a mixture model we can decompose the unknown density as follows:

$\displaystyle \hat{f}_{(\mu, \Sigma)}(x) = \sum_{i=1}^m \zeta_i \; P_{(\mu_i, \Sigma_i)}(x), \; \zeta_i \geq 0, \; \sum \zeta_i = 1$

(1)

where the mixing coefficients $\zeta_i$ quantify the contribution of the $i^{th}$ model in the mixture to the generation of the estimate $\hat{f}_{(\mu, \Sigma)}$ . However, in many instances parametric mixture approximations do not converge in probability to the true density. The following sections present a regularized, non-parametric estimate that is a mixture of convolved kernel functions and is asymptotically both unbiased and consistent.

Next: Non-Parametric Density Estimation

Rohan Shiloh SHAH 2006-12-12