Enumeration and Representation of Free Trees

McGill University:
School of Computer Science
Winter 1999 Projects for 308-251B

DATA STRUCTURES AND ALGORITHMS

Project #16: FREE TREES, Enumeration and Representation of

Last update: February 25, 1999
Minor note: December 16, 2007

0.0 Contents

0.0 Contents
0.1 Overview
0.2 Initial Problem Outline
0.2.1 Summary of Topics
1.0 Introduction to Graphs and Trees
1.1 Graphs
1.2 Plain Trees
1.3 Binary Trees
1.4 Free Trees
2.0 Enumeration of Trees
2.1 Definition
2.2 Enumeration of Free Trees
3.0 Representation of Trees
3.1 Definition
3.2 Representation of Free Trees
4.0 Links to Related Materials
5.0 References
6.0 Author

0.1 Overview

This report addresses the enumeration and representation of free trees. As a prelude to its core there is a short introduction to graphs and plain trees, as to establish a common ground for the reader and author.
note: References to particular books are made using the following format: [5.0.x]. This refers to section 5.0 of the report. x is the reference number of the particular entry in section 5.0.

0.2 Initial Problem Outline

As given by Professor Luc Devroye:
We saw [in class] how many binary trees there are with n nodes. Investigate how many free trees there are with labels 1, 2, ... , n, and consider the number of bits needed to describe a tree. For example, if I describe the tree by giving n-1 pairs (i,j) for the edges, then I will need about (n-1) x 2 log₂n bits, as each i and j uses about log₂n bits. One can do better. Describe how you could describe each tree with about half as many bits. Name-dropping hints: Prufer code, Cayley.

0.2.1 Summary of Topics

How many labeled free trees are there with n nodes?
Exactly n^n-2; for if X is a particular vertex, the free trees are in one-to-one correspondence with oriented trees having root X. (Knuth, Vol 1: P. 586, A. 22)
How many bits are needed to describe a free tree?
ceiling(log₂n^n-2) = O(nlogn)

1.0 Introduction to Graphs and Trees

1.1 Graphs

A graph is generally defined to be a set of points called vertices, v, joined by edges, e. There is at most one edge between any pair of vertices. Two edges are adjacent if there is an edge between them. If v and v' are vertices and if n >= 0, we say that (v₀, v₁, ... , v_n) is a path of length n from v to v' if v = v₀, v_k is adjacent to v_k+1 for 0 <= k < n, and v_n = v'. A path is simple if no edge on the path from v to v' is used more than once. A graph is connected if there is a path between any two vertices of the graph. A cycle is a simple path of length three or more from a vertex to itself. The degree of a vertex in a graph is the number of edges radiating from it.

1.2 Plain Trees

Trees are among the most important data structures in computer science. By definition of graph theory, trees are finite, labeled, rooted, and ordered. In general, a tree has a branching structure that defines a relationship among its nodes, via edges. Formally, a tree is recursively defined as a finite set T of one or more nodes such that one of the nodes is desginated as, and called the root of the tree, root(T); and the remaining nodes, excluding the root, are partitioned into m >= 0 disjoint sets T₁, ..., T_m, called subtrees, each of which is in turn a tree.

Every node is the root of its subtree. The number of subtrees, or nodes below a particular node, is called the degree of the node. A node of degree zero is called a terminal node, or leaf. A nonterminal node is often called a branch node.

A set of one or more nodes can exist at a particular level or depth of T. The level of a node is defined as the number of edges between the root and that node. Thus the root has level 0, since there is no edge between the root and itself. each level following level 0 is one higher then the one before it.

If the relative order of the subtrees T₁, ..., T_m is important, then we say the tree is an ordered tree; however if the order is unimportant, then we say the tree is an oriented tree.

Contrary to the similarities between nature's trees and digital trees, digital trees grow from top down. Thus the root node is at the apex of the tree, with all other nodes positioned at shallow and deep levels.

At a local perspective, each node is the parent of the roots of its subtrees, which are inturn called the children. The children that are born of the same parent are called siblings. Other self explanitory terms, relavent to a particular node are grand parent and grand children. The ancestors of a node are all those nodes in the path from that node to and including the root node. Similarly, the descendants of a node are all those nodes that are part of its subtree. The root of T has no parent, and likewise, a leaf node has no children. In general, most terminology of family trees can be used to discuss digital trees.

The fanout of a tree is an upper bounding number on then number of children any one parent may have.

Fig. 1 In this diagram, node A is the root of the tree. Nodes {E, J, C, G, H, I} are the leafs. Node A is the parent of Nodes B, C and D; likewise B is the parent of E and F, thus making E and F the children of B. Node E and F's grand parent is A. Nodes G, H, and I are siblings. The ancestors of Node H are A, D, and H. The descendants of Node B are E, F, and J.

1.3 Binary Trees

A special kind of tree is a binary tree which has a fanout of 2. This means that a node can have at least 0 children, and at most 2. The children of any node of a binary tree are distinguished as left and right children. It should be mentioned that unlike a plain tree, a binary tree can be empty, meaning that it has no nodes.

1.4 Free Trees

A free tree is an unrooted tree, defined to be a connected graph having no cycles. There is no fanout restriction on free trees. If any edge of a free tree is deleted, the free tree ceases to exist, as it is no longer connected. There is exactly one path between any two nodes, thus a free tree with n nodes has n - 1 edges.

A free subtree can be derived from a graph by deleting edges in an iterative fashion such that the resulting graph of each deletion is still connected. A free subtree has been derived when the deletion of any other remaining edge would result in an unconnected graph.

2.0 Enumeration Of Trees

2.1 Definition

The subject of graph enumeration is concerned with the problem of finding out how many non-isomorphic graphs there are which posses a given property. Enumeration of trees is concerned with counting how many different trees there are of various kinds on n vertices, where n is a natural number {1, 2, 3, ... , n}. The problem on the number of labelled trees with a given number of vertices was proposed by Cayley.

[5.0.4] In the context of this problem, a labelled graph on n vertices is essentially a graph in which the vertices are 'labelled' with the integers from 1 to n. We define labelling of a graph G on n vertices to be a one-to-one mapping from the vertex-set of G onto the set {1, 2, 3, ... , n}; a labelled graph is then a pair (G, X), where G is a graph and X is a labelling of G. The integers 1, 2, 3, ... , n are referred to as the lables of G. The two labelled graphs (G₁, X₁) and (G₂, X₂) are isomorphic if there exists an isomorphism between G₁ and G₂ which preserves the labelling of the vertices.

The following is a quick reminder of what isomorphism is. Fig. 2a, 2b, and 2c are various ways of labelling a tree with four vertices. The tree of Fig. 2b is simply the reverse of the first one, and it follows that these two labelled trees must be isomorphic; however, neither of them is isomorphic to the third labelled tree, as can be seen by looking at the degree of the vertex 3. Similarly, the total number of ways of labelling the tree shown in Fig. 2d must be four, since the central vertex may be labelled in four different ways, and each one determines the labelling.

Fig. 2a

Fig. 2b

Fig. 2c

Fig. 2d

2.2 Enumeration of Free Trees

We now show Cayley's theorem which generalizes the result from section 2.1 to labelled free trees with n vertices.

THEOREM 1 (Cayley [1889]). There are n^n-2 distinst labelled free trees on n vertices.

Example 1. Before displaying the proof, we should first see an example of a listing (Table 1) of small trees for n <= 4.

**Table 1**
`n`	list of trees
1
2
3
4

The first set of nontrivial free trees is that with three vertices. There is only one isomorphism class with three vertices, but the adjacency matrix is determined by which vertex is the center. With four vertices, there are two isomorphism classes, composed of the stars and the paths. //end example 1.

There are many proofs of Caley's Theorem, including

Prufer [5.0.{2, 3, 4, 5, 6, 7, 8, 9, 10}],

Kirchhoff [5.0.8],

Polya [5.0.{6, 8}],

Renyi [5.0.8],

Clarke [5.0.{4, 8, 10, 11}],

Dziobek [5.0.8],

Katz [5.0.8],

Gobel [5.0.8],

Moon [5.0.8].

Many other results are in the standard reference for labelled trees, John W. Moon's book [5.0.12], whose bibliography lists over 160 titles. The most elegant of all these proofs is Prufer's proof which is found in almost every graph theory text. Prufer verified Caley's Theorem by establishing a one-to-one correspondence between labelled free trees of order n and all sequences of n-2 positive integers from 1 to n.

PROOF 1 (Prufer [1918])
There are n^n-2 sequences (called Prufer sequeces, or Prufer codes) of length n-2 with entries being from natural numbers; we establish a bijection between the set of trees and this set of sequences.

To compute the Prufer sequence treeToSeq(T) for a labelled tree T, iteratively delete the leaf with smallest label and append the label of its neighbor to the sequence. After n-2 iterations a single edge remains and we have produced a sequence treeToSeq(T) of length n-2.

   treeToSeq( T ) {
      
      if one edge remains
         return( Prufer sequence );
      else {
         find vertex v of degree 1,
            that has the smallest label in T;

         record v's neighbor;

         delete v from T;
      }
      
   }

That the algorithm defines a one-to-one correspondence is proved by an induction argument on n; however we will provide the inverse algorithm, which finds the unique labelled free tree T of order n for a given Prufer sequence of n-2 elements.

(December 16, 2007) Note: It has been brought to my attention (by pwebster) that the following seqToTree algorithm may not produce the expected result. Please consult the following URLs for what pwebster suggests are correct algorithms.

   seqToTree( SEQ ) {
      
      determine n:  count the number of elements
                    in the Prufer sequence and add 2;

      make forest:  begin with a forest of n isolated vertices { 1, ..., n };

      end-vertex :  make end-vertex-list of all integers 1 to n,
                    that are missing from the Prufer sequence;

      record-edge:  p = the first integer in the Prufer sequence;
                    e = smallest label in end-vertex-list;
                    record edge of (p, e).

      delete     :  delete e from end-vertex-list;

      if  (two vertices remain)
           Record this last edge;

      else
           delete p from Prufer sequence;

      if  (p is repeated elsewhere in Prufer sequence)
           goto step: record-edge;

      else {
           add p to end-vertex-list
           then goto step: record-edge;
      }
      
   }

Example 2. We shall now see an example of the use of treeToSeq( T ). We will start with a labelled free tree and derive the Prufer sequence by use of the mentioned algorithm. The free tree from which we will derive the Prufer code is:

Fig. 3 In the following table (Table 2), the leaf of least degree will be circled, while its neighbor will be colored pink.

**Table 2**
labelled free tree	Prufer code
	{ }	2 is the smallest label. 7 is its neighbor. record 7.
	{ 7 }	3 is the smallest label. 4 is its neighbor. record 4.
	{ 7, 4 }	5 is the smallest label. 4 is its neighbor. record 4.
	{ 7, 4, 4 }	4 is the smallest label. 1 is its neighbor. record 1.
	{ 7, 4, 4, 1 }	6 is the smallest label. 7 is its neighbor. record 7.
	{ 7, 4, 4, 1, 7 }	7 is the smallest label. 1 is its neighbor. record 1.
	{ 7, 4, 4, 1, 7, 1 }	1 edge remains (2 vertices). the Prufer code is complete.

Example 3. We must also present an example of seqToTree( SEQ ). Naturally, the sequence will be { 7, 4, 4, 1, 7, 1 } as this will also provide an example of the one-to-one correspondence between free trees and Prufer sequences (codes). There is a new list called end-vertex-list which contains those verticies that were deleted to obtain the Prufer sequence.

**Table 3**
Prufer code	end-vertex-list	labelled free tree
{ 7, 4, 4, 1, 7, 1 }	{ 2, 3, 5, 6, 8 }		The original lists and forest on `n` isolated vertices.
{ 7, 4, 4, 1, 7, 1 }	{ 2, 3, 5, 6, 8 }		p=7, e=2; connect (7, 2). delete e. delete p.
{ 4, 4, 1, 7, 1 }	{ 3, 5, 6, 8 }		p=4, e=3; connect (4, 3). delete e. delete p.
{ 4, 1, 7, 1 }	{ 5, 6, 8 }		p=4, e=5; connect (4, 5). delete e. delete p. add p to end of end-vertex-list.
{ 1, 7, 1 }	{ 6, 8, 4 }		p=1, e=4; connect (1, 4). delete e. delete p.
{ 7, 1 }	{ 6, 8 }		p=7, e=8; connect (7, 6). delete e. delete p. add p to end of end-vertex-list.
{ 1 }	{ 8, 7 }		p=1, e=7; connect (1, 7). delete e. delete p. add p to end of end-vertex-list.
{ }	{ 8, 1 }		join remainding vertices.

It can be clearly seen that with a slight reorientation of the edges and vertices, the constructed free tree of Table 3 is identical to the one we started with in Example 2.

3.0 Representation of Trees

3.1 Definition

Analysis of the representation of trees is concerned with how many bits are required to describe a tree on n nodes.

3.2 Representation of Free Trees

As mentioned in the initial probelm outline above, given a tree on n nodes, whose edges are described by n-1 pairs (i, j), then (n-1) x 2 log₂n bits will be required. (n-1) is the number of edges. The 2 infront of log₂n accounts for both i and j. The log₂n accounts for the number of bits needed to represent (in this case) a natural number.

Since a labelled free tree on n nodes can be described by a sequence of (n-2) natural numbers (Prufer sequence) then a free tree can be described using n-2 x log₂n. In this case the n from log₂n arrives from the fact that there are n nodes, so the representation must account for the highest respective label in a given sequence.

The latter method of representing a labelled free tree uses about half as many bits as the former.

4.0 Links to related material

Intro to trees:
251 topic #8 1997
Binary trees:
251 topic #9 1997

There were no web sites devoted to the enumeration of labelled free trees. Over ten search engines were attempted using numerous key words. It turns out that there is far more literature on the topic in form of books. This seems like the first site on the web devoted to free trees.

5.0 References

note: The last entry of each reference includes a call number which corresponds to the Physical Sciences and Engineering Library, PSE at McGill University. It is possible to search all McGill libraries.

Donald E. Knuth, The Art of Computer Programming: Fundamental Algorithms (vol 1, 2nd edition)
Addison-Wesley Publishing Company. (1973, 1968)
pp. 362-363, 385-396, 397 Q22, 406
PSE QA76.5 K57
Douglas B. West, Introduction to Graph Theory
Prentice-Hall, Inc. (1996)
pp. 63-65.
PSE QA166 W43
Edgar M. Palmer, Graphical Evolution, An Introduction to the Theory of Random Graphs
John Wiley & Sons, Inc. (1985)
pp. 4, 98-99
PSE QA166.17 P35
Robin J. Wilson, Introduction to Graph Theory (3rd edition)
Longman Group Limited. (1985)
pp. 48-52
PSE QA166 W55
Shimon Even, Graph Algorithms
Computer Science Press Inc. (1979)
pp. 27-29
PSE QA166 E93
Frank Harary, Edgar M. Palmer, Graphical Enumeration
Academic Press, Inc. (1973)
pp. 20-22
PSE QA166 H38 1973
Nora Hartsfield, Gerhard Ringel, Pearls in Graph Theory, A Comprehensive Introduction
Academic Press, Inc. (1990)
pp. 94-99
PSE QA166 H39
University College, London., Dept. of Mathematics, A Seminar on Graph Theory
Holt, Rinehart and Winston, Inc. (1967)
pp. 70-78 (John, W. Moon. Various Proofs of Caley's Formula for Counting Trees)
PSE QA166 L65
Claude Francois Picard, Graphs and Questionnaires
North-Holland Publishing Company. (1980)
pp. 57-58
PSE QA166 P5213
Robin J. Wilson, Introduction to Graph Theory
Longman Group Ltd. (1996)
pp. 47-51, 155
PSE QA166 W5
Robin J. Wilson, Introduction to Graph Theory
Oliver and Boyd. (1972)
pp. 49-52
PSE QA166 W55
John W. Moon, Counting Labelled Trees
Canadian Mathematics Congress. (1970)
PSE QA166.2 M66
Arthur Cayley, A Theorem on Trees
Quart. J. Math. vol. 23. (1889)
pp. 376-378

6.0 Author

Alex BATKO

Note:

Last update: Sat Aug 24 23:31:13 EDT 2002

McGill University: School of Computer Science Winter 1999 Projects for 308-251B DATA STRUCTURES AND ALGORITHMS Project #16: FREE TREES, Enumeration and Representation of