Decision Rules for the Normal Distribution

Definitions


The multivariate normal density is typically an appropriate model for most pattern recognition problems where the feature vectors x for a given class wi are continuous valued, mildly corrupted versions of a single mean vector ui. In this case, the conditional densities p(x|wi) and the a priori probabilities P(wi) are normally distributed. (For background information on the normal density, see Normal Distribution). As a reminder, the density function for the univariate normal is given by

The 2 parameters called the mean and the variance completely specify the normal distribution. Samples from this type of distribution tend to cluster about the mean, and the extend to which they spread out depends on the variance.


The general multivariate normal density is given by a d-dimentional mean vector and a d-by-d covariance matrix:


The mean vector is just a collection of single means ui where the ith mean represents the mean for the ith feature that we are measuring. For example, if we decided to measure the color and weight of a random fruit, then u1 = the mean of all the colors and u2 = the mean of all the weights.

The covariance matrix is similar to the variance in the univariate case. The diagonal elements represent the variances for the different features we measure. For example, the ith diagonal element represents the variance for the ith feature we measure. The off-diagonal elements represent the covariance between 2 different features. In other words, the element oij in the above matrix represents the covariance between feature i and feature j. This is important because the features that we measure are not necessarily independent. Suppose that the color of some fruit depended on the weight of the fruit. The exact value of the covariance for color and weight would depend on exactly how they vary together. For more information on these values, see Covariance.

As with the univariate density, samples from a normal population tend to fall in a single cluster centered about the mean vector, and the shape of the cluster depends on the covariance matrix:

The contour lines in the above diagram show the regions for which the function has constant density. From the equation for the normal density, it is apparent that points which have the same density must have the same constant term:
This quantity is often called the squared Mahalanobis distance from x to u. This term depends on the contents of the covariance matrix, which explains why the shape of the contour lines (lines of contant Mahalanobis distance) is determined by this matrix. Since this distance is a quadratic function, the contours of constant density are hyperellipsoids of constant Mahalanobis distance to u.

In simple cases, there is some intuition behind the shape of the contours, depending on the contents of the covariance matrix:

DescriptionDiagram of the contour lines on the xy plane.
The covariance matrix for 2 features x and y is diagonal (which implies that the 2 features don't covary), but feature x varies more than feature y. The contour lines are stretched out in the x direction to reflect the fact that the distance spreads out at a lower rate in the x direction than it does in the y direction. The reason that the distance decreases slower in the x direction is because the variance for x is greater and thus a point that is far away in the x direction is not quite as distant from the mean as a point that is far away in the y direction.
The covariance matrix for 2 features x and y is diagonal, and x and y have the exact same variance. This results in euclidean distance contour lines.
The covariance matrix is not diagonal. Instead, x and y have the same variance, but x varies with y in the sense that x and y tend to increase together. So the covariance matrix would have identical diagonal elements, but the off-diagonal element would be a strictly positive number representing the covariance of x and y.


Discriminant Functions for the Normal Density

One of the discriminant functions that was listed in the previous section on decision rules, was
. When the densities p(x|wi) are each normally distributed then the discriminant function becomes :
     (0)

where ui is the mean vector for the distribution of class i, and is the covariance matrix for the distribution of class i.

In order to determine the nature of the decision regions for these discriminant functions, it is necessary to look at 2 special cases first - which greatly simplify the gi(x).

CASE 1:
CASE 2:
CASE 3: