1. Balanced Search Trees

(a) I'm not going to write out every step, and I apologize for my unevocative ASCII art, but you should end up with a tree like this:

      ___10 22____
     /     |      \
1 2 5   12 16 18   30 45 50

(b) One ordering which produces a different tree is to insert the numbers in increasing order, (1, 2, 5, ..., 50), producing the following tree:

    _____5 16 30_____
   /      |  \       \
1 2   10 12   18 22   45 50

(c) By the depth property of a (2,4)-tree, all external nodes have the same height h. The number of external nodes in T is minimized when every node has 2 children (the minimum number of children per node by the size property), and maximized when every node has 4 children (the maximum number of children per node by the size property). The minimum and maximum numbers of external nodes in a (2,4)-tree are hence 2h and 4h respectively. Because T is a tree, we know that it has exactly n+1 external nodes. This yields the following inequalities: $2h \leq n+1 \leq 4h$ $h \leq log(n+1) \leq 2h$ The first inequality gives us $h \leq log(n+1)$ , and the second yields $h \leq 1/2 log(n+1)$ , hence $1/2 log(n+1) \leq h \leq log(n+1)$

Therefore, h is $\Theta(log(n))$ .

2. Element Uniqueness

Designing the algorithm is easy. Find s_max, the largest number in S. This takes a single run through the elements: O(n). We can now pick s_max + 1, a number larger than any element of S, which therefore must not itself be in S.

The proof is also pretty easy -- if you understand what you're supposed to do. Proof by contradiction. Suppose there exists an algorithm which runs in less than $\Omega(n)$ time and produces a number guaranteed not to be in S. This algorithm must perform less than $\frac{n}{2}$ comparisons in general, since if it did that many comparisons it would be $\Omega(n)$ . But each comparison only involves two elements of S. So in performing less than $\frac{n}{2}$ comparisons, the algorithm must be examining less than n of the elements. Thus, there must be at least one element, s_i, which it never compares against at all. But clearly this supposed algorithm can't guarantee that the number it produces is not equal to s_i, if it never compares anything with s_i. Thus no correct algorithm can run faster than $\Omega(n)$ in the worst case.

3. Non-collinearity of Points

This was a tough one.

The idea is to first calculate the maximum (finite) slope, m_max, of any line through two points in S. Once we have m_max, we can construct a point X such that, for any point p in S, the line $\overline{pX}$ has slope > m_max. This guarantees that X is not collinear with any pair of points in S.

Calculating m_max. This is really the only tricky part. Naively, we would consider every pair of points in S, and find the pair which defined the line with the greatest slope. But this of course would take $\Omega(n^2)$ time: too slow. To speed this up, we use a clever observation: if we sort the points by x-coordinate, m_max must be defined by a pair of adjacent points. Why? Suppose m_max is defined by p_i and p_j, and some other point p_i+1has an x-coordinate somewhere between x_i and x_j. Now, suppose p_i+1 lies on the line $\overline{p_i p_j}$ ; then p_i and p_i+1 define m_max, as required. Conversely, suppose p_i+1 doesn't lie on $\overline{p_i p_j}$ . In this case it's not hard to see that either $\overline{p_i p_{i+1}}$ or $\overline{p_{i+1} p_j}$ must have greater slope than $\overline{p_i p_j}$ -- contradicting the assumption that p_i and p_j defined m_max.

It follows that m_max must be defined by a pair of points with adjacent x-coordinates. So, we can sort the points by x-coordinates ( $O(n \log n)$ ), and do a linear search to find the adjacent pair of points which define m_max, in total time $O(n \log n)$ .

Constructing X. This is easier. Do linear searches through the points in S to find x_min, x_max, y_min, and y_max. This gives us the bounding box of the n points. Call the upper left corner point of this bounding box c = (x_min, y_max). Draw a line L through c with slope m_max. Now, let u be the point on L with x-coordinate x_max + 1. Finally, let X be the point (x_u, y_u + 1).

Proving X is non-collinear. For any point p in S, consider the slope m_p,X of the line $\overline{pX}$ . Since c is at the upper left of all the points, and X lies above and to the right of all the points, $m_{p,X} \geq m_{c,X}$ . Also, since X lies directly above u, m_c,X > m_c,u. But we constructed u so that m_c,u = m_max. Therefore, m_p,X > m_max, as desired. And, since no line through a pair of points in S has slope > m_max, it follows that no pair of points in S is collinear with X.

4. Maximal Points of a Set

This took some creativity, but really wasn't hard once you understood what a ``maximal'' point was. Our algorithm must find all points p such that no other point q in Sdominates p (ie, satisfies x_q > x_p and y_q > y_p). (Note that q only dominates p if both its x- and y-coordinates are greater than p's, not if only one coordinate is greater! In other words, p is maximal iff the quadrant above it and to its right is empty. Many students were confused about this.) The complexity constraint is again $O(n \log n)$ , so presorting is an option.

The idea is to sort the points by x-coordinate and go through them from right to left, keeping track as we go of y_max, the greatest y-coordinate seen so far. When considering a point p, there are two cases: either $y_{max} \leq y_p$ , or y_max > y_p. If $y_{max} \leq y_p$ , no point to the right of p also lies above it. Thus no point dominates p, and therefore p is a maximal point. On the other hand, if y_max > y_p, there exists some point already seen (ie, to the right of p) with greater y-coordinate. So this point dominates p, and p is not a maximal point. (There is a small subtlety here in the case where multiple points have the same x-coordinate, but it's not hard to deal with by presorting the points appropriately.)

In short, by presorting and then traversing the points once from right to left, we can conclude whether each point p is maximal as we traverse it. The running time for the whole algorithm is $O(n \log n)$ for the presorting + O(n) for the linear traversal = $O(n \log n)$ . (Of course, by symmetry of x and y, you can accomplish the same thing by sorting by y-coordinate instead and traversing keeping track of x_max.)

Sylvain Bouix

2000-10-24