|
Decision Rules for the Normal Distribution |
The multivariate normal density is typically an appropriate model
for most pattern recognition problems where the feature vectors x
for a given class wi are continuous valued, mildly corrupted
versions of a single mean vector ui.
In this case, the conditional densities p(x|wi) and
the a priori probabilities P(wi) are normally distributed.
(For background information on the normal density, see
Normal Distribution). As a reminder, the density function for the
univariate normal is given by
The 2 parameters called the mean and the variance
completely specify the normal distribution.
Samples from this type of distribution tend to cluster about the mean,
and the extend to which they spread out depends on the variance.

The general multivariate normal density is given by a d-dimentional
mean vector and a d-by-d covariance matrix:


The mean vector is just a collection of single means ui where the ith mean represents the mean for the ith feature that we
are measuring. For example, if we decided to measure the color and weight
of a random fruit, then u1 = the mean of all the colors
and u2 = the mean of all the weights.
The covariance matrix is similar to the variance in the univariate case.
The diagonal elements represent the variances for the different features
we measure. For example, the ith diagonal element represents the variance
for the ith feature we measure. The off-diagonal elements represent
the covariance between 2 different features. In other words, the
element oij in the above matrix represents the
covariance between feature i and feature j. This is important because
the features that we measure are not necessarily independent. Suppose
that the color of some fruit depended on the weight of the fruit. The exact
value of the covariance for color and weight would depend on exactly
how they vary together.
For more
information on these values, see Covariance.
As with the univariate density, samples from a normal population tend to fall
in a single cluster centered about the mean vector, and the shape of the
cluster depends on the covariance matrix:
The contour lines in the above diagram show the regions for which
the function has constant density. From the equation for
the normal density, it is apparent that points which have the same density
must have the same constant term:
This quantity is often called the squared Mahalanobis
distance from x to u. This term depends
on the contents of the covariance matrix, which explains why the shape
of the contour lines (lines of contant Mahalanobis distance) is
determined by this matrix.
Since this distance is a quadratic
function, the contours of constant density are hyperellipsoids of
constant Mahalanobis distance to u.
In simple cases, there is some intuition behind the shape of the
contours, depending on the contents of the covariance matrix:
|
Discriminant Functions for the Normal Density |
One of the discriminant functions that was listed in the previous
section on decision rules, was
.
When the densities p(x|wi) are each normally
distributed then the discriminant function becomes :
     (0)
where ui is the mean vector for the distribution
of class i, and
is the covariance matrix
for the distribution of class i.
In order to determine the nature of the decision regions for these
discriminant functions, it is necessary to look at 2 special cases
first - which greatly simplify the gi(x).