| Decision Rules |
| Notation: |
For example, we may have the probability distribution for the colour of apples, as well as that for oranges. To introduce some notation, let wapp represent the state of nature where the fruit is an apple, let worg represent that state where the fruit is an orange, and let x be the continuous random variable that represents the colour of a fruit. Then the expression p(x|wapp) represents the density function for x given that the state of nature is an apple.
In a typical problem, we would know (or be able to calculate) the
conditional densities p(x|wj) for j either
an apple or an orange. We would also typically know the prior
probabilities P(wapp) and P(worg), which
represent simply the total number of apples versus oranges that are
on the conveyer belt. What we are looking for is some formula that
will tell us about the probability of a fruit being an apple or an orange
given that we observe a certain colour x. If we had such a probability,
then for some given color that we observed we would classify the fruit
by comparing the probability that an orange had such a color versus
the probability that an apple had such a color. If it were
more probable that an apple had such a color, the fruit would be
classified as an apple. Fortunately, we can
use Baye's Formula which states that :
|
The following graph shows the a posteriori probabilities for the 2 class decision problem. At every x, the posteriors must sum to 1. The red region on the x axes depicts values for x for which the decision rule would decide 'apple'. The orange region represents values for x for which you would decide 'orange'. |
| Allowing more than 1 feature and more than 1 class: |
| The role of Neural Networks |
A simple Neural Network
contains a set of c discriminant functions gi(x), for
i=1,..,c. The network will assign feature vector x to class
wj if
The choice of discriminant functions is not unique. You can always
multiply the functions by a positive constant and still have the
same decision rule. You can also calculate f(gi(x)) for some
monotonically increasing function f, which will also give you a new set of
discriminant functions for the same decision rule. Because applying these
changes may lead to computational simplifications, there are 4 commonly
used discriminant functions used for Bayes' Decision Rule:
| Decision Regions |
When any decision rule is applied to the d-dimentional feature
space Rd, the result is that the space is split up into c
decision regions R1, ..., Rc. In the above
graph for the 2 category case, the decision regions were marked in
red and orange at the bottom of the graph. In general, if x lies in decision region
Ri then it means that the pattern classifer selected
the function gi(x) to be the maximum of all the
discriminant functions. The decision regions are any subset of the space
Rd. For example, if the feature vector is a 2-dimentional vector,
then the discriminant functions gi(x) will be functions of
2 variables and will be mapped in 3-D. The decision regions for this
case will be
subsets of the x-y plane. Here are 2 simple examples:


Obviosly, the shape of the decision bondary depends on the functions P(wi|x). The next section takes a closer look at discriminant functions and their corresponding decision regions for the Normal Density in particular.