In classification and regression, the primary goal is the estimation of a prediction function. The likelihood or conditional density is one such function; for regression and similarily for classification where is a class label from the set of labels . These are supervised learning tasks since each training example is paired with a corresponding label or annotation; for regression and for classification .
Given an unannotated training data set, we seek to build a model, specifically an unconditional probability density function, that delineates the essential information contained in the observation space . This is un-supervised learning since it is performed in the abscence of annotations (and hence without any cost or loss function) through direct interaction with `new experiences'.
One common approach is to assume the density has a fixed parametric form and then to estimate the parameters (using a maximum likelihood approach) associated with this form; for example using a mixture model we can decompose the unknown density as follows: