next up previous
Next: Non-Parametric Density Estimation

KERNEL DENSITY ESTIMATION TUTORIAL



ROHAN SHILOH SHAH


CS644A PATTERN RECOGNITION





In classification and regression, the primary goal is the estimation of a prediction function. The likelihood or conditional density is one such function; for regression $ p(y\vert x) = p( y, x)/ \int p(y, x) d y$ and similarily for classification $ p(c\vert x) = p( c, x)/ \sum_c p(c, x) $ where $ c$ is a class label from the set of labels $ \mathfrak{C}$. These are supervised learning tasks since each training example is paired with a corresponding label or annotation; for regression $ y \in \mathbb{R}^n$ and for classification $ c \in \mathfrak{C}$.

Given an unannotated training data set, we seek to build a model, specifically an unconditional probability density function, that delineates the essential information contained in the observation space $ \mathcal{X}$. This is un-supervised learning since it is performed in the abscence of annotations (and hence without any cost or loss function) through direct interaction with `new experiences'.

One common approach is to assume the density has a fixed parametric form and then to estimate the parameters (using a maximum likelihood approach) associated with this form; for example using a mixture model we can decompose the unknown density as follows:

$\displaystyle \hat{f}_{(\mu, \Sigma)}(x) = \sum_{i=1}^m \zeta_i \; P_{(\mu_i, \Sigma_i)}(x), \; \zeta_i \geq 0, \; \sum \zeta_i = 1$ (1)

where the mixing coefficients $ \zeta_i$ quantify the contribution of the $ i^{th}$ model in the mixture to the generation of the estimate $ \hat{f}_{(\mu, \Sigma)}$. However, in many instances parametric mixture approximations do not converge in probability to the true density. The following sections present a regularized, non-parametric estimate that is a mixture of convolved kernel functions and is asymptotically both unbiased and consistent.




next up previous
Next: Non-Parametric Density Estimation
Rohan Shiloh SHAH 2006-12-12