McGill University:
School of Computer Science
Winter 1999 Projects for 308-251B

DATA STRUCTURES AND ALGORITHMS

Project #16: FREE TREES, Enumeration and Representation of

Last update: February 25, 1999
Minor note: December 16, 2007


0.0 Contents

0.0 Contents
0.1 Overview
0.2 Initial Problem Outline
0.2.1 Summary of Topics
1.0 Introduction to Graphs and Trees
1.1 Graphs
1.2 Plain Trees
1.3 Binary Trees
1.4 Free Trees
2.0 Enumeration of Trees
2.1 Definition
2.2 Enumeration of Free Trees
3.0 Representation of Trees
3.1 Definition
3.2 Representation of Free Trees
4.0 Links to Related Materials
5.0 References
6.0 Author


0.1 Overview

This report addresses the enumeration and representation of free trees. As a prelude to its core there is a short introduction to graphs and plain trees, as to establish a common ground for the reader and author.
note:
References to particular books are made using the following format: [5.0.x]. This refers to section 5.0 of the report. x is the reference number of the particular entry in section 5.0.


0.2 Initial Problem Outline

As given by Professor Luc Devroye:
We saw [in class] how many binary trees there are with n nodes. Investigate how many free trees there are with labels 1, 2, ... , n, and consider the number of bits needed to describe a tree. For example, if I describe the tree by giving n-1 pairs (i,j) for the edges, then I will need about (n-1) x 2 log2n bits, as each i and j uses about log2n bits. One can do better. Describe how you could describe each tree with about half as many bits. Name-dropping hints: Prufer code, Cayley.


0.2.1 Summary of Topics


1.0 Introduction to Graphs and Trees

1.1 Graphs

A graph is generally defined to be a set of points called vertices, v, joined by edges, e. There is at most one edge between any pair of vertices. Two edges are adjacent if there is an edge between them. If v and v' are vertices and if n >= 0, we say that (v0, v1, ... , vn) is a path of length n from v to v' if v = v0, vk is adjacent to vk+1 for 0 <= k < n, and vn = v'. A path is simple if no edge on the path from v to v' is used more than once. A graph is connected if there is a path between any two vertices of the graph. A cycle is a simple path of length three or more from a vertex to itself. The degree of a vertex in a graph is the number of edges radiating from it.


1.2 Plain Trees

Trees are among the most important data structures in computer science. By definition of graph theory, trees are finite, labeled, rooted, and ordered. In general, a tree has a branching structure that defines a relationship among its nodes, via edges. Formally, a tree is recursively defined as a finite set T of one or more nodes such that one of the nodes is desginated as, and called the root of the tree, root(T); and the remaining nodes, excluding the root, are partitioned into m >= 0 disjoint sets T1, ..., Tm, called subtrees, each of which is in turn a tree.

Every node is the root of its subtree. The number of subtrees, or nodes below a particular node, is called the degree of the node. A node of degree zero is called a terminal node, or leaf. A nonterminal node is often called a branch node.

A set of one or more nodes can exist at a particular level or depth of T. The level of a node is defined as the number of edges between the root and that node. Thus the root has level 0, since there is no edge between the root and itself. each level following level 0 is one higher then the one before it.

If the relative order of the subtrees T1, ..., Tm is important, then we say the tree is an ordered tree; however if the order is unimportant, then we say the tree is an oriented tree.

Contrary to the similarities between nature's trees and digital trees, digital trees grow from top down. Thus the root node is at the apex of the tree, with all other nodes positioned at shallow and deep levels.

At a local perspective, each node is the parent of the roots of its subtrees, which are inturn called the children. The children that are born of the same parent are called siblings. Other self explanitory terms, relavent to a particular node are grand parent and grand children. The ancestors of a node are all those nodes in the path from that node to and including the root node. Similarly, the descendants of a node are all those nodes that are part of its subtree. The root of T has no parent, and likewise, a leaf node has no children. In general, most terminology of family trees can be used to discuss digital trees.

The fanout of a tree is an upper bounding number on then number of children any one parent may have.

Fig. 1
In this diagram, node A is the root of the tree. Nodes {E, J, C, G, H, I} are the leafs. Node A is the parent of Nodes B, C and D; likewise B is the parent of E and F, thus making E and F the children of B. Node E and F's grand parent is A. Nodes G, H, and I are siblings. The ancestors of Node H are A, D, and H. The descendants of Node B are E, F, and J.


1.3 Binary Trees

A special kind of tree is a binary tree which has a fanout of 2. This means that a node can have at least 0 children, and at most 2. The children of any node of a binary tree are distinguished as left and right children. It should be mentioned that unlike a plain tree, a binary tree can be empty, meaning that it has no nodes.


1.4 Free Trees

A free tree is an unrooted tree, defined to be a connected graph having no cycles. There is no fanout restriction on free trees. If any edge of a free tree is deleted, the free tree ceases to exist, as it is no longer connected. There is exactly one path between any two nodes, thus a free tree with n nodes has n - 1 edges.

A free subtree can be derived from a graph by deleting edges in an iterative fashion such that the resulting graph of each deletion is still connected. A free subtree has been derived when the deletion of any other remaining edge would result in an unconnected graph.


2.0 Enumeration Of Trees

2.1 Definition

The subject of graph enumeration is concerned with the problem of finding out how many non-isomorphic graphs there are which posses a given property. Enumeration of trees is concerned with counting how many different trees there are of various kinds on n vertices, where n is a natural number {1, 2, 3, ... , n}. The problem on the number of labelled trees with a given number of vertices was proposed by Cayley.

[5.0.4] In the context of this problem, a labelled graph on n vertices is essentially a graph in which the vertices are 'labelled' with the integers from 1 to n. We define labelling of a graph G on n vertices to be a one-to-one mapping from the vertex-set of G onto the set {1, 2, 3, ... , n}; a labelled graph is then a pair (G, X), where G is a graph and X is a labelling of G. The integers 1, 2, 3, ... , n are referred to as the lables of G. The two labelled graphs (G1, X1) and (G2, X2) are isomorphic if there exists an isomorphism between G1 and G2 which preserves the labelling of the vertices.

The following is a quick reminder of what isomorphism is. Fig. 2a, 2b, and 2c are various ways of labelling a tree with four vertices. The tree of Fig. 2b is simply the reverse of the first one, and it follows that these two labelled trees must be isomorphic; however, neither of them is isomorphic to the third labelled tree, as can be seen by looking at the degree of the vertex 3. Similarly, the total number of ways of labelling the tree shown in Fig. 2d must be four, since the central vertex may be labelled in four different ways, and each one determines the labelling.

Fig. 2a
Fig. 2b
Fig. 2c
Fig. 2d


2.2 Enumeration of Free Trees

We now show Cayley's theorem which generalizes the result from section 2.1 to labelled free trees with n vertices.

THEOREM 1 (Cayley [1889]). There are nn-2 distinst labelled free trees on n vertices.

Example 1. Before displaying the proof, we should first see an example of a listing (Table 1) of small trees for n <= 4.

n list of trees
1
2
3
4



Table 1

The first set of nontrivial free trees is that with three vertices. There is only one isomorphism class with three vertices, but the adjacency matrix is determined by which vertex is the center. With four vertices, there are two isomorphism classes, composed of the stars and the paths. //end example 1.

There are many proofs of Caley's Theorem, including

  • Prufer [5.0.{2, 3, 4, 5, 6, 7, 8, 9, 10}],
  • Kirchhoff [5.0.8],
  • Polya [5.0.{6, 8}],
  • Renyi [5.0.8],
  • Clarke [5.0.{4, 8, 10, 11}],
  • Dziobek [5.0.8],
  • Katz [5.0.8],
  • Gobel [5.0.8],
  • Moon [5.0.8].
  • Many other results are in the standard reference for labelled trees, John W. Moon's book [5.0.12], whose bibliography lists over 160 titles. The most elegant of all these proofs is Prufer's proof which is found in almost every graph theory text. Prufer verified Caley's Theorem by establishing a one-to-one correspondence between labelled free trees of order n and all sequences of n-2 positive integers from 1 to n.

    PROOF 1 (Prufer [1918])
    There are nn-2 sequences (called Prufer sequeces, or Prufer codes) of length n-2 with entries being from natural numbers; we establish a bijection between the set of trees and this set of sequences.

    To compute the Prufer sequence treeToSeq(T) for a labelled tree T, iteratively delete the leaf with smallest label and append the label of its neighbor to the sequence. After n-2 iterations a single edge remains and we have produced a sequence treeToSeq(T) of length n-2.

       treeToSeq( T ) {
          
          if one edge remains
             return( Prufer sequence );
          else {
             find vertex v of degree 1,
                that has the smallest label in T;
    
             record v's neighbor;
    
             delete v from T;
          }
          
       }
    

    That the algorithm defines a one-to-one correspondence is proved by an induction argument on n; however we will provide the inverse algorithm, which finds the unique labelled free tree T of order n for a given Prufer sequence of n-2 elements.

    (December 16, 2007) Note: It has been brought to my attention (by pwebster) that the following seqToTree algorithm may not produce the expected result. Please consult the following URLs for what pwebster suggests are correct algorithms.

       seqToTree( SEQ ) {
          
          determine n:  count the number of elements
                        in the Prufer sequence and add 2;
    
          make forest:  begin with a forest of n isolated vertices { 1, ..., n };
    
          end-vertex :  make end-vertex-list of all integers 1 to n,
                        that are missing from the Prufer sequence;
    
          record-edge:  p = the first integer in the Prufer sequence;
                        e = smallest label in end-vertex-list;
                        record edge of (p, e).
    
          delete     :  delete e from end-vertex-list;
    
          if  (two vertices remain)
               Record this last edge;
    
          else
               delete p from Prufer sequence;
    
          if  (p is repeated elsewhere in Prufer sequence)
               goto step: record-edge;
    
          else {
               add p to end-vertex-list
               then goto step: record-edge;
          }
          
       }
    

    Example 2. We shall now see an example of the use of treeToSeq( T ). We will start with a labelled free tree and derive the Prufer sequence by use of the mentioned algorithm. The free tree from which we will derive the Prufer code is:

    Fig. 3
    In the following table (Table 2), the leaf of least degree will be circled, while its neighbor will be colored pink.

    labelled free tree Prufer code
    { } 2 is the smallest label.
    7 is its neighbor.
    record 7.
    { 7 } 3 is the smallest label.
    4 is its neighbor.
    record 4.
    { 7, 4 } 5 is the smallest label.
    4 is its neighbor.
    record 4.
    { 7, 4, 4 } 4 is the smallest label.
    1 is its neighbor.
    record 1.
    { 7, 4, 4, 1 } 6 is the smallest label.
    7 is its neighbor.
    record 7.
    { 7, 4, 4, 1, 7 } 7 is the smallest label.
    1 is its neighbor.
    record 1.
    { 7, 4, 4, 1, 7, 1 } 1 edge remains
    (2 vertices).
    the Prufer code
    is complete.
    Table 2

    Example 3. We must also present an example of seqToTree( SEQ ). Naturally, the sequence will be { 7, 4, 4, 1, 7, 1 } as this will also provide an example of the one-to-one correspondence between free trees and Prufer sequences (codes). There is a new list called end-vertex-list which contains those verticies that were deleted to obtain the Prufer sequence.

    Prufer code end-vertex-list labelled free tree
    { 7, 4, 4, 1, 7, 1 } { 2, 3, 5, 6, 8 } The original lists and
    forest on n isolated
    vertices.
    { 7, 4, 4, 1, 7, 1 } { 2, 3, 5, 6, 8 } p=7, e=2; connect (7, 2).
    delete e. delete p.
    { 4, 4, 1, 7, 1 } { 3, 5, 6, 8 } p=4, e=3; connect (4, 3).
    delete e. delete p.
    { 4, 1, 7, 1 } { 5, 6, 8 } p=4, e=5; connect (4, 5).
    delete e. delete p.
    add p to end of
    end-vertex-list.
    { 1, 7, 1 } { 6, 8, 4 } p=1, e=4; connect (1, 4).
    delete e. delete p.
    { 7, 1 } { 6, 8 } p=7, e=8; connect (7, 6).
    delete e. delete p.
    add p to end of
    end-vertex-list.
    { 1 } { 8, 7 } p=1, e=7; connect (1, 7).
    delete e. delete p.
    add p to end of
    end-vertex-list.
    { } { 8, 1 } join remainding vertices.
    Table 3

    It can be clearly seen that with a slight reorientation of the edges and vertices, the constructed free tree of Table 3 is identical to the one we started with in Example 2.


    3.0 Representation of Trees

    3.1 Definition

    Analysis of the representation of trees is concerned with how many bits are required to describe a tree on n nodes.


    3.2 Representation of Free Trees

    As mentioned in the initial probelm outline above, given a tree on n nodes, whose edges are described by n-1 pairs (i, j), then (n-1) x 2 log2n bits will be required. (n-1) is the number of edges. The 2 infront of log2n accounts for both i and j. The log2n accounts for the number of bits needed to represent (in this case) a natural number.

    Since a labelled free tree on n nodes can be described by a sequence of (n-2) natural numbers (Prufer sequence) then a free tree can be described using n-2 x log2n. In this case the n from log2n arrives from the fact that there are n nodes, so the representation must account for the highest respective label in a given sequence.

    The latter method of representing a labelled free tree uses about half as many bits as the former.


    4.0 Links to related material

    There were no web sites devoted to the enumeration of labelled free trees. Over ten search engines were attempted using numerous key words. It turns out that there is far more literature on the topic in form of books. This seems like the first site on the web devoted to free trees.

    5.0 References

    note: The last entry of each reference includes a call number which corresponds to the
    Physical Sciences and Engineering Library, PSE at McGill University. It is possible to search all McGill libraries.
    1. Donald E. Knuth, The Art of Computer Programming: Fundamental Algorithms (vol 1, 2nd edition)
      Addison-Wesley Publishing Company. (1973, 1968)
      pp. 362-363, 385-396, 397 Q22, 406
      PSE QA76.5 K57

    2. Douglas B. West, Introduction to Graph Theory
      Prentice-Hall, Inc. (1996)
      pp. 63-65.
      PSE QA166 W43

    3. Edgar M. Palmer, Graphical Evolution, An Introduction to the Theory of Random Graphs
      John Wiley & Sons, Inc. (1985)
      pp. 4, 98-99
      PSE QA166.17 P35

    4. Robin J. Wilson, Introduction to Graph Theory (3rd edition)
      Longman Group Limited. (1985)
      pp. 48-52
      PSE QA166 W55

    5. Shimon Even, Graph Algorithms
      Computer Science Press Inc. (1979)
      pp. 27-29
      PSE QA166 E93

    6. Frank Harary, Edgar M. Palmer, Graphical Enumeration
      Academic Press, Inc. (1973)
      pp. 20-22
      PSE QA166 H38 1973

    7. Nora Hartsfield, Gerhard Ringel, Pearls in Graph Theory, A Comprehensive Introduction
      Academic Press, Inc. (1990)
      pp. 94-99
      PSE QA166 H39

    8. University College, London., Dept. of Mathematics, A Seminar on Graph Theory
      Holt, Rinehart and Winston, Inc. (1967)
      pp. 70-78 (John, W. Moon. Various Proofs of Caley's Formula for Counting Trees)
      PSE QA166 L65

    9. Claude Francois Picard, Graphs and Questionnaires
      North-Holland Publishing Company. (1980)
      pp. 57-58
      PSE QA166 P5213

    10. Robin J. Wilson, Introduction to Graph Theory
      Longman Group Ltd. (1996)
      pp. 47-51, 155
      PSE QA166 W5

    11. Robin J. Wilson, Introduction to Graph Theory
      Oliver and Boyd. (1972)
      pp. 49-52
      PSE QA166 W55

    12. John W. Moon, Counting Labelled Trees
      Canadian Mathematics Congress. (1970)
      PSE QA166.2 M66

    13. Arthur Cayley, A Theorem on Trees
      Quart. J. Math. vol. 23. (1889)
      pp. 376-378


    6.0 Author


    Last update: Sat Aug 24 23:31:13 EDT 2002