3 Closest Pair: A Divide-and-Conquer Approach



3.1 Introduction

The brute force approach to the closest pair problem (i.e. checking every possible pair of points) takes quadratic time. We would now like to introduce a faster divide-and-conquer algorithm for solving the closest pair problem.
Given a set of points in the plane S, our approach will be to split the set into two roughly equal halves (S1 and S2) for which we already have the solutions, and then to merge the halves in linear time to yield an O(nlogn) algorithm.
However, the actual solution is far from obvious. It is possible that the the desired pair might have one point in S1 and one in S2, does this not force us once again to check all possible pairs of points? The divide-and-conquer approach presented here generalizes directly from the one dimensional algorithm we presented in the previous section.

[Top] [Bottom] [Home]

3.2 Closest Pair in the Plane

Alright, we'll generalize our 1-D algorithm as directly as possible (see figure 3.2). Given a set of points S in the plane, we partition it into two subsets S1 and S2 by a vertical line l such that the points in S1 are to the left of l and those in S2 are to the right of l.
We now recursively solve the problem on these two sets obtaining minimum distances of d1 (for S1), and d2 (for S2). We let d be the minimum of these.

Now, identical to the 1-D case, if the closes pair of the whole set consists of one point from each subset, then these two points must be within d of l. This area is represented as the two strips P1 and P2 on either side of l (as shown in the figure).
Figure 3.2: Divide-and-conquer in two dimensions.

Up to now, we are completely in step with the 1-D case. At this point, however, the extra dimension causes some problems. We wish to determine if some point in say P1 is less than d away from another point in P2. However, in the plane, we don't have the luxury that we had on the line when we observed that only one point in each set can be within d of the median. In fact, in two dimensions, all of the points could be in the strip! This is disastrous, because we would have to compare n2 pairs of points to merge the set, and hence our divide-and-conquer algoritm wouldn't save us anything in terms of efficiency.
Thankfully, we can make another life saving observation at this point. For any particualr point p in one strip, only points that meet the following constraints in the other strip need to be checked:
  • those points within d of p in the direction of the other strip
  • those within d of p in the positive and negative y directions
Simply because points outside of this bounding box cannot be less than d units from p (see figure 3.3). It just so happens that because every point in this box is at least d apart, there can be at most six points within it (I won't let myself get away with that scot-free, click here to see the proof). Well this is simply fantastic news, because now we don't need to check all n2 points. All we have to do is sort the points in the strip by their y-coordinates and scan the points in order, checking each point against a maximum of 6 of its neighbors. This means at most 6*n comparisons are required to check all candidate pairs.
However, since we sorted the points in the strip by their y-coordinates the process of merging our two subsets is not linear, but in fact takes O(nlogn) time.
Hence our full algorithm is not yet O(nlogn), but it is still an improvement on the quadratic performance of the brute force approach (as we shall see in the next section). In section 3.4, we will demonstrate how to make this algorithm even more efficient by strengthening our recursive sub-solution.

[Top] [Bottom] [Home]

3.3 Summary and Analysis of the 2-D Algorithm

We present here a step by step summary of the algorithm presented in the previous section, followed by a performance analysis. The algorithm is simply written in list form because I find pseudo-code to be burdensome and unnecessary when trying to understand an algorithm. Note that we pre-sort the points according to their x coordinates which in itself takes O(nlogn) time.

ClosestPair of a set of points:
  1. Divide the set into two equal sized parts by the line l, and recursively compute the minimal distance in each part.
  2. Let d be the minimal of the two minimal distances.
  3. Eliminate points that lie farther than d apart from l
  4. Sort the remaining points according to their y-coordinates
  5. Scan the remaining points in the y order and compute the distances of each point to its five neighbors.
  6. If any of these distances is less than d then update d.
Steps 2-6 define the merging process which must be repeated logn times because this is a divide and conquer algortithm: Hence the merging of the sub-solutions is dominated by the sorting at step 4, and hence takes O(nlogn) time.
This must be repeated once for each level of recursion in the divide-and-conquer algorithm,

hence the whole of algorithm ClosestPair takes O(logn*nlogn) = O(nlog2n) time.

[Top] [Bottom] [Home]

3.4 Improving the Algorithm

We can improve on this algorithm slightly by reducing the time it takes to achieve the y-coordinate sorting in Step 4. This is done by asking that the recursive solution computed in Step 1 returns the points in sorted order by their y coordinates. This will yield two sorted lists of points which need only be merged (a linear time operation) in Step 4 in order to yield a complete sorted list. Hence the revised algorithm involves making the following changes: Hence the merging process is now dominated by the linear time steps thereby yielding an O(nlogn) algorithm for finding the closest pair of a set of points in the plane.

[Top] [Bottom] [Home]
Sam Bakhtiar SANJABI
Last modified: Wed Apr 12 00:22:19 EDT 2000