4 Closest Pair: A Plane Sweep Algorithm



4.1 Introduction

In section 3, we explored an algorithm for determining the closest pair of a set of points in the plane. We used a divide-and-conquer approach which we generalized from one-dimension in order to solve the problem. Here we will discuss another O(nlogn) that attacks the problem in a different way.
Once again, we are given a set S of n points in the plane, but this time we shall attempt to use the plane sweep technique. We sweep a vertical line across the set from left to right keeping track of the closest pair seen so far. We shall describe an O(nlogn) algorithm, and then examine why it works.

[Top] [Bottom] [Home]

4.2 Algorithm and Performance

As we mentionned above, the algorithmic technique we shall use is the plane sweep method. This means that we will be sweeping a vertical line across the set of points, keeping track of certain data, and performing certain actions every time a point is encountered during the plane sweep.
As we sweep the line, we will maintain the following data:
Figure 4.1: Plane sweep technique for closest pair. Sweep line shown in red.

Now, every time the sweep line encounters a point p, we will perform the following actions:
  1. Remove the points further than d to the left of p from the ordered set D that is storing the points in the strip.
  2. Determine the point on the left of p that is closest to it
  3. If the distance between this point and p is less than d (the current minimum distance), then update the closest current pair and d.
Steps 1 and 3 need no further elaboration, but step two is a little unclear. Doesn't it imply a O(n2) algorithm? After all we will need to check 1+2+3+...+n points after sweeping the line all the way to the end of the set if we perform this search in the most naive way! However, those of you who have read section 3 will notice that we had a similar problem then.
In fact, we don't need to search all the points to the left of p. Obviously, those that are further than d from p (i.e. those in the strip) don't need to be checked because they couldn't possibly lead to a new closest pair.
Although this seems to restricts our search somewhat, it could still be possible for all of the points to the left of p to lie within our strip! So once again we must make the observation that points in the strip that are more than d away from p in the y direction also don't need to be considered.
Hence our strip is now constrained to the bounding box of size d * 2d shown in figure 4.2, and since all of the points inside the strip must be at least d apart there can be at most 6 of them inside this box (click here for the proof). Therefore we only need to do a constant number of comparisons to accomplish step 2!
Figure 4.2: Restriction of search to small bounding box shown in light blue.

Therefore, the the summary and analysis of the plane sweep algorithm goes something like this: Hence the whole algorithm takes O(nlogn) time. However, one thing that is not completely obvious about this algorithm is exactly why it works. Hence, to avoid any confusion, I've included a proof of correctness in the following section.

[Top] [Bottom] [Home]

4.2 Proof of Correctness

Let {p1,...,pn} be the set of input points sorted by their x-coordinates. When the sweep line hits p2, then the pair (p1,p2) will be the current closest pair with distance d=dist(p1,p2). Furthermore, we know that if p1 is one of the points that makes up the closest pair for the whole set, then the other point must be p2, since no other points in the set are closer to p1.
Let D be the strip of width d just to the left of the sweep line (where d is the current minimum distance). We will now prove that when the sweep completes processing any intermediate point pi, then the current d is the minimum distance between all of the points p1,...,pi. We will prove this by assuming its opposite.

Suppose there are two points p and q to the left of pi such that dist(p,q) is less than d that the algorithm missed, and assume WLOG that q is further to the right (has a greater x-coordinate) than p (see figure 4.3), then consider what occured when the sweep line was at q. At this time the strip D was of width at least d (since it takes a smaller distance to change it).There are two cases for the position of p:
[CASE 1]: p is in the strip
in this case, p would have been within the bounding box checked by the algorithm (since it is within less than d from q as per our assumption), and hence would not have been missed.
[CASE 2]: p is not in the strip
in this case p must be to the left of the strip (since we assumed that it is to the left of q, but this would violate our assumption that p is less than d away from q.
Hence our assumption must be incorrect, there is no pair of points to the left of the sweep line less than d apart from eachother.
Figure 4.3

Now, if we include the current point pi, the algorithm will check and see if there is a point to the left of the sweep line within d of it and update accordingly. Hence when the sweep completes processing at a given point pi, then d is the distance between the closest pair among the points p1,...,pi. Applying this result to the last point in the list shows that the algorithm is correct.

[Top] [Bottom] [Home]
Sam Bakhtiar SANJABI
Last modified: Wed Apr 12 01:09:29 EDT 2000