1. Balanced Search Trees


(a) I'm not going to write out every step, and I apologize for my unevocative ASCII art, but you should end up with a tree like this:

      ___10 22____
     /     |      \
1 2 5   12 16 18   30 45 50

(b) One ordering which produces a different tree is to insert the numbers in increasing order, (1, 2, 5, ..., 50), producing the following tree:

    _____5 16 30_____
   /      |  \       \
1 2   10 12   18 22   45 50

(c) By the depth property of a (2,4)-tree, all external nodes have the same height h. The number of external nodes in T is minimized when every node has 2 children (the minimum number of children per node by the size property), and maximized when every node has 4 children (the maximum number of children per node by the size property).
The minimum and maximum numbers of external nodes in a (2,4)-tree are hence 2h and 4h respectively.
Because T is a tree, we know that it has exactly n+1 external nodes. This yields the following inequalities:
$2h \leq n+1 \leq 4h$
$h \leq log(n+1) \leq 2h$
The first inequality gives us $h \leq log(n+1)$, and the second yields $h \leq 1/2 log(n+1)$, hence $1/2 log(n+1) \leq h \leq log(n+1)$

Therefore, h is $\Theta(log(n))$.


2. Element Uniqueness


Designing the algorithm is easy. Find smax, the largest number in S. This takes a single run through the elements: O(n). We can now pick smax + 1, a number larger than any element of S, which therefore must not itself be in S.

The proof is also pretty easy -- if you understand what you're supposed to do. Proof by contradiction. Suppose there exists an algorithm which runs in less than $\Omega(n)$ time and produces a number guaranteed not to be in S. This algorithm must perform less than $\frac{n}{2}$ comparisons in general, since if it did that many comparisons it would be $\Omega(n)$. But each comparison only involves two elements of S. So in performing less than $\frac{n}{2}$ comparisons, the algorithm must be examining less than n of the elements. Thus, there must be at least one element, si, which it never compares against at all. But clearly this supposed algorithm can't guarantee that the number it produces is not equal to si, if it never compares anything with si. Thus no correct algorithm can run faster than $\Omega(n)$ in the worst case.


3. Non-collinearity of Points


This was a tough one.

The idea is to first calculate the maximum (finite) slope, mmax, of any line through two points in S. Once we have mmax, we can construct a point X such that, for any point p in S, the line $\overline{pX}$ has slope > mmax. This guarantees that X is not collinear with any pair of points in S.

Calculating mmax. This is really the only tricky part. Naively, we would consider every pair of points in S, and find the pair which defined the line with the greatest slope. But this of course would take $\Omega(n^2)$ time: too slow. To speed this up, we use a clever observation: if we sort the points by x-coordinate, mmax must be defined by a pair of adjacent points. Why? Suppose mmax is defined by pi and pj, and some other point pi+1has an x-coordinate somewhere between xi and xj. Now, suppose pi+1 lies on the line $\overline{p_i p_j}$; then pi and pi+1 define mmax, as required. Conversely, suppose pi+1 doesn't lie on $\overline{p_i p_j}$. In this case it's not hard to see that either $\overline{p_i p_{i+1}}$ or $\overline{p_{i+1} p_j}$ must have greater slope than $\overline{p_i p_j}$ -- contradicting the assumption that pi and pj defined mmax.

It follows that mmax must be defined by a pair of points with adjacent x-coordinates. So, we can sort the points by x-coordinates ( $O(n \log n)$), and do a linear search to find the adjacent pair of points which define mmax, in total time $O(n \log n)$.

Constructing X. This is easier. Do linear searches through the points in S to find xmin, xmax, ymin, and ymax. This gives us the bounding box of the n points. Call the upper left corner point of this bounding box c = (xmin, ymax). Draw a line L through c with slope mmax. Now, let u be the point on L with x-coordinate xmax + 1. Finally, let X be the point (xu, yu + 1).

Proving X is non-collinear. For any point p in S, consider the slope mp,X of the line $\overline{pX}$. Since c is at the upper left of all the points, and X lies above and to the right of all the points, $m_{p,X} \geq m_{c,X}$. Also, since X lies directly above u, mc,X > mc,u. But we constructed u so that mc,u = mmax. Therefore, mp,X > mmax, as desired. And, since no line through a pair of points in S has slope > mmax, it follows that no pair of points in S is collinear with X.


4. Maximal Points of a Set


This took some creativity, but really wasn't hard once you understood what a ``maximal'' point was. Our algorithm must find all points p such that no other point q in Sdominates p (ie, satisfies xq > xp and yq > yp). (Note that q only dominates p if both its x- and y-coordinates are greater than p's, not if only one coordinate is greater! In other words, p is maximal iff the quadrant above it and to its right is empty. Many students were confused about this.) The complexity constraint is again $O(n \log n)$, so presorting is an option.

The idea is to sort the points by x-coordinate and go through them from right to left, keeping track as we go of ymax, the greatest y-coordinate seen so far. When considering a point p, there are two cases: either $y_{max} \leq y_p$, or ymax > yp. If $y_{max} \leq y_p$, no point to the right of p also lies above it. Thus no point dominates p, and therefore p is a maximal point. On the other hand, if ymax > yp, there exists some point already seen (ie, to the right of p) with greater y-coordinate. So this point dominates p, and p is not a maximal point. (There is a small subtlety here in the case where multiple points have the same x-coordinate, but it's not hard to deal with by presorting the points appropriately.)

In short, by presorting and then traversing the points once from right to left, we can conclude whether each point p is maximal as we traverse it. The running time for the whole algorithm is $O(n \log n)$ for the presorting + O(n) for the linear traversal = $O(n \log n)$. (Of course, by symmetry of x and y, you can accomplish the same thing by sorting by y-coordinate instead and traversing keeping track of xmax.)


Sylvain Bouix
2000-10-24