In section 3, we explored an algorithm for
determining the closest pair of a set of points in the plane. We used a
divide-and-conquer approach which we generalized from one-dimension in
order to solve the problem. Here we will discuss another
O(*n*log*n*) that attacks the problem in a different way.

Once again, we are given a set *S* of *n* points in the plane,
but this time we shall attempt to use the plane sweep technique. We sweep
a vertical line across the set from left to right keeping track of the
closest pair seen so far. We shall describe an O(*n*log*n*)
algorithm, and then examine why it works.

[Top]
[Bottom]
[Home]

As we mentionned above, the algorithmic technique we shall use is the plane
sweep method. This means that we will be sweeping a vertical line across
the set of points, keeping track of certain data, and performing certain
actions every time a point is encountered during the plane sweep.

As we sweep the line, we will maintain the following data:
- The closest pair among the points encountered
- The distance
*d* between the points in the above pair
- All points within a strip of distance
*d* to the left of the
sweep line (see
figure 4.1). These
points will be stored in an ordered set *D* (sorted by their
*y*-coordinates), which we will implement as a (2,4)-tree (see
section 6 for a description of this data
structure).

**Figure 4.1:** Plane sweep technique for
closest pair. Sweep line shown in red.

Now, every time the sweep line encounters a point *p*, we will perform
the following actions:
- Remove the points further than
*d* to the left of *p* from
the ordered set *D* that is storing the points in the strip.
- Determine the point on the left of
*p* that is closest to it
- If the distance between this point and
*p* is less than *d*
(the current minimum distance), then update the closest current pair and
*d*.

Steps 1 and 3 need no further elaboration, but step two is a little unclear.
Doesn't it imply a O(*n*^{2}) algorithm? After all we will need
to check 1+2+3+...+*n* points after sweeping the line all the way to
the end of the set if we perform this search in the most naive way! However,
those of you who have read section 3 will
notice that we had a similar problem then.

In fact, we don't need to search *all* the points to the left of
*p*. Obviously, those that are further than *d* from *p*
(i.e. those in the strip) don't need to be checked because they couldn't
possibly lead to a new closest pair.

Although this seems to restricts our search somewhat, it could still be
possible for all of the points to the left of *p* to lie within our
strip! So once again we must make the observation that points in the strip
that are more than *d* away from *p* in the *y* direction
also don't need to be considered.

Hence our strip is now constrained to the bounding box of size *d* *
2*d* shown in
figure 4.2, and since all
of the points inside the strip must be *at least d* apart there can be
at most 6 of them inside this box (click here for the proof).
Therefore we only need to do a constant number of comparisons to accomplish
step 2!

**Figure 4.2:** Restriction of search to
small bounding box shown in light blue.

Therefore, the the summary and analysis of the plane sweep algorithm goes
something like this:
- sort the set according to
*x*-coordinates (this is necessary for
a plane sweep approach to determine the coordinates of the next point),
which takes O(*n*log*n*) time
- inserting and removing each point once from the ordered set
*D*
(each insertion takes O(log*n*) time as shown in the section on
(2,4)-trees), which takes a total of O(*n*log*n*) time
- comparing each point to a constant number of its neighbors (which takes
O(
*n*) time in total

Hence the whole algorithm takes O(*n*log*n*) time. However, one
thing that is not completely obvious about this algorithm is exactly why it
works. Hence, to avoid any confusion, I've included a proof of correctness
in the following section.

[Top]
[Bottom]
[Home]

Let {*p*_{1},...,*p*_{n}} be the set of
input points sorted by their *x*-coordinates. When the sweep line
hits *p*_{2}, then the pair
(*p*_{1},*p*_{2}) will be the current closest
pair with distance *d*=dist(*p*_{1},*p*_{2}).
Furthermore, we know that if *p*_{1} is one of the points that
makes up the closest pair for the whole set, then the other point
*must* be *p*_{2}, since no other points in the set are
closer to *p*_{1}.

Let *D* be the strip of width *d* just to the left of the sweep
line (where *d* is the current minimum distance). We will now prove
that when the sweep completes processing any intermediate point
*p*_{i}, then the current *d* is the minimum distance
between all of the
points *p*_{1},...,*p*_{i}. We will
prove this by assuming its opposite.

Suppose there are two points *p* and *q* to the left of
*p*_{i} such that dist(*p*,*q*) is less than
*d* that the algorithm missed, and assume WLOG that *q* is
further to the right (has a greater *x*-coordinate) than *p* (see
figure 4.3), then
consider what occured when the sweep line was at *q*. At this time
the strip *D* was of width at least *d* (since it takes a
smaller distance to change it).There are two cases for the position of
*p*:
**[CASE 1]: ***p* is in the strip
- in this case,
*p* would have been within the bounding box checked
by the algorithm (since it is within less than *d* from *q* as
per our assumption), and hence would not have been missed.
**[CASE 2]: ***p* is not in the strip
- in this case
*p* must be to the left of the strip (since we
assumed that it is to the left of *q*, but this would violate our
assumption that *p* is less than *d* away from *q*.

Hence our assumption must be incorrect, there is no pair of points to the
left of the sweep line less than *d* apart from eachother.
**Figure 4.3**

Now, if we include the current point *p*_{i}, the algorithm
will check and see if there is a point to the left of the sweep line within
*d* of it and update accordingly. Hence when the sweep completes
processing at a given point *p*_{i}, then *d* is the
distance between the closest pair among the points
*p*_{1},...,*p*_{i}. Applying this result
to the last point in the list shows that the algorithm is correct.

[Top]
[Bottom]
[Home]

Sam Bakhtiar SANJABI
Last modified: Wed Apr 12 01:09:29 EDT 2000