EECS 311: Trees Home Course Info Links Grades Lectures Newsgroup Homework Exams

## Terminology

There's a lot of terminology associated with trees. You should be familiar with the following basic terms:

• tree
• node
• root or root node
• leaf or leaf node
• internal node
• height vs. depth
• level (level 0 is the root)
• child nodes of a parent node
• sibling nodes
• edge or branch
• ancestors and descendants
• subtree (a very important concept!), e.g., left and right subtrees
• path from leaf to root
• traversals: preorder, postorder, and inorder
• balanced trees

There are also many kinds of trees, including:

• n-ary trees (at most n branches from a node, at most n children)
• binary trees (at most 2 branches or children)
• heaps
• binary search trees or ordered trees
• parse trees (expression trees)

## Implementing Binary Trees

### Linear or Array Representation

This method is easy to understand and implement. It's very useful for certain kinds of tree applications, such as heaps, and fairly useless for others. It's typically used on dense binary trees.

The idea is simple:

• take a complete binary tree and number its nodes from top to bottom, left to right.
• The root is 0, the left child 1, the right child 2, the left child of the left child 3, etc.
• Put the data for node I of this tree in the Ith element of an Array.
• If you have a partial (incomplete) binary tree, and node I is absent, put some value that represents "no data" in the Ith position of the array.

Three simple formulae allow you to go from the index of the parent to the index of its children and vice versa:

• if index(parent) = N, index(left child) = 2*N+1
• if index(parent) = N, index(right child) = 2*N+2
• if index(child) = N, index(parent) = (N-1)/2 (integer division with truncation)

The advantage of the linear representation is this easy traversal up and down, and efficient use of space if the tree is complete. The disadvantage is inefficient use of space if the tree is sparse.

Again, the idea is simple. A node in the tree has

• a data field
• a left child field with a pointer to another tree node
• a right child field with a pointer to another tree node
• optionally, a parent field with a pointer to the parent node

The most important thing to remember about the linked representation is this:

A tree is represented by the pointer to the root node, not a node.

The empty tree is simply the NULL pointer, not an empty node.

## Traversal and Recursion

Traversing a binary tree in any of the three orders, even in a linked representation without parent links, is trivial with recursion.

For example, here's the pseudo-code for inorder traversal:

```inorderTraversal(bt, fn):

if (bt != NULL)
inorderTraversal(bt->leftChild)
fn(bt->info)
inorderTraversal(bt->rightChild)```

Here,

• `bt` is a binary tree, i.e., a pointer to a binary tree node
• `fn` is a function that operates on the kind of data stored in the tree
• `bt->leftChild`, `bt->rightChild`, and `bt->info` access the left child pointer, right child pointer, and data field, respectively

It should be obvious how to code the other two traversals.

## Binary Search Trees

Binary search trees have the property that

• all data in the left subtree of every node are less than the data in the node
• all data in the right subtree of every node are greater than the data in the node

### What's the Point?

Consider the combinatorial complexity of the data structures we've seen so far. We give best and worse case Big O values:

Array (unordered) constant (just add to end) O(N) (linear search)
Array (ordered) O(log2N) compares, O(N) data transfers (shift data) O(log2N) (binary search)
Linked list (unordered) constant (assuming end pointer) O(N) (linear search)
Linked list (ordered) O(log2N) compares, constant data transfers (change a few pointers), O(N) nodes visited sequentially O(log2N) (binary search), visit O(N) nodes
Binary search tree (best case) O(log2N) compares, constant data transfers, O(log2N) nodes visited O(log2N)
Binary search tree (worst case) O(N) compares, constant data transfers, O(N) nodes visited O(N)

On the average, then, a binary search tree will be

• as fast as a sorted array with less work to add data,
• faster than a linked list, because it will pass through far few nodes

Binary search tree algorithms for adding and retrieving are also very simple.

In the worst case, however, a binary search tree will be as bad as a linked list. Many of the variations of binary search trees that we'll see will be attempts to get the best of both worlds: fast access and fast storage, albeit using more complex algorithms.

It's easy to add new data and "grow" the tree:

```addData(&bt, x):

if (bt == NULL)
bt = new BtNode
bt->info = x
bt->left = bt->right = NULL
else
if (x < bt->info)
else
if (x > bt->info)

Adding a new node takes O(log2n) steps, where n is the number of nodes in the tree. This is best and average case behavior. If the tree is very unbalanced, e.g., we passed it a sorted list of numbers, then adding a new node will take O(n2) steps.

### Deleting Data

Deleting data is complicated when the data being removed is not in a leaf node. We can't just delete the node, because then our tree would "fall apart." We have to promote one of the children to become the new parent. The child has to be

• bigger than all the other children in the left tree, and
• smaller than all the other children in the right tree

There are at most two possible candidates:

• the rightmost child of the left subtree
• the leftmost child of the right subtree

It doesn't matter which one we pick. If neither subtree exists, we have a leaf node, which can be just deleted and it's pointer removed from the parent node.

The following algorithm deletes a node in a binary search tree:

1. if there's a left subtree, use the rightmost child of the left subtree
2. otherwise, if there's a right subtree, use the leftmost child of the right subtree
3. otherwise, this is a leaf node, just delete it from its parent

Note that

• no data has to be moved, only links -- the tricky part is doing this in the right order so as not to lose a link you need for a later step
• for choice 1, if the rightmost child has a left subtree, that subtree has to be moved (relinked) to replace the child that was promoted
• for choice 2, if the leftmost child has a right subtree, that subtree has to be moved (relinked) to replace the child that was promoted
• for choice 2, there's less pointer updating because there's no left subtree to reink to

Consider removing TALBOT from the tree in Figure 7.18. If we pick SELIGER, then we have to move SEFTON to where SELIGER was.

## Heaps

A heap is a binary tree (not a binary search tree) with the following properties:

• for every node, all nodes in all subtrees have data smaller than the data in that node
• the tree is full and dense

Full and dense means that only the bottom level of the tree has gaps, and the gaps are all on the right.

Being dense means the linear array representation is the most appropriate. A linked representation would waste space.

### What's the Point?

A heap is a great data structure for implementing a priority queue, because

• The next item to do (the item with the highest priority) is right at the top of the heap, i.e., at the root
• Adding a new item takes at most log2N swaps and compares (as shown below), which is better than the O(N) data transfers a sorted array would require.
• Removing an item and updating the tree takes at most log2N steps (as shown below).

It also turns out we can use the routines for adding and removing items to create an N log2N sort algorithm called heap sort.

The algorithm for adding data to a heap is simple, with one unintuitve aspect: we add an item to the bottom first then move it upwards, if necessary:

• add a new data item to the next empty gap in the bottom of the tree
• "walk" the data item up the tree, i.e.,

while the data is larger than its parent, swap the item with its parent, and repeat, checking the data item with its new parent

Walking up the tree will take at most log2N steps (compares and data transfers), because the tree is always balanced, so the height of the tree is log2N.

### Deleting Data

Both uses of heaps (priority queues and heap sort) only need to delete the root, so that's the only case we'll consider here. Deleting is somewhat similar to adding in that we'll replaced the deleted root with another item from the tree and bubble it down. The algorithm is made only slightly more complex because we have two children to worry about instead of just one parent.

1. Remove the root.
2. Replace it with the rightmost item in the bottom level of the tree.

The new item is almost certainly in the wrong place because it's one of the smaller data elements, but this keeps the heap full and dense.

3. If the new root has no children bigger than it is, we're done.
4. If it has just a left child that is bigger, swap it with the child and go back to step 2 with the left subtree.
5. If it has two children that are bigger, swap it with the larger child and go back to step 2 with the affected subtree.

Note that

• In choice 5, the child promoted is known to be bigger than the other child, so it must be the largest item in the two subtrees. We don't need to bubble it down.

Walking down the tree will take at most log2N steps (compares and data transfers). We may have to do three comparisons (parent with each child and child against child) but 3 log2N is still O(log2N).

### Implementing Heaps

As mentioned above, heaps are best implemented using a linear array. The move up and down the tree are simple jumps around the array and the array is mostly filled. The only downside is that data has to be moved around, unless, of course, what we really have is an array of pointers to data, which is the best way to go with non-numeric data.

### Heap Sort

There are two versions of heap sort that I know about. Both

• are "in-place" algorithms, that is, that sort an array directly without needing a second array
• have two phases, a heap building phase, and a heap to sorted array phase
• use walkDown to do the second phase

The versions differ on how the first phase is done.

#### Version 1 of Heap Sort

##### Phase 1
• Let N be the number of data elements.
• Let M be the index of the rightmost node with children.
• For I from M down to 1:
• Walk the node at I down to its proper position.
##### Phase 2
• For I from N to 2:
• Swap element 1 (the root) with element I.
• Remove element I from future consideration as a part of the heap.
• Walk the new root down to its proper position.

The first phase takes no more than O(N log2N) swaps and compares, because each walk down takes log2N steps (N starting small and growing to the number of elements) and it's called N times. In fact, it's actually better than O(N log N), it's O(N), although proving this requires some mathematics.

Similarly, the second phase takes O(N log2N) swaps and compares for the same reason. Ths is both best and worst case behavior. In the best case, you have a heap when you start and no swaps are needed for the first phase, but they become necessary in the second phase.

#### Version 2 of Heap Sort

##### Phase 1
• Let N be the number of data elements.
• For I from 2 to N:
• Walk the node at I up to its proper position.

The first phase takes O(N log2N) swaps and compares, because each walkUp takes log2N steps and it's called N times.

Version 2 is simpler, especially if you've already implemented walking up and down. Version 1, however, only needs walking down, and Phase 1 of Version 1 is faster (O(N) instead of O(N log N)).