EECS 311: Balanced Binary Search Trees

This is a brief review of a couple of the many kinds of balanced search trees that have been developed.

AVL (Height-balanced) Trees

An AVL tree is a binary search tree. The key ideas are

keep a very tight bound on how unbalanced the tree can get when nodes are added or removed. (Removal is not discussed here or in the book.)
use rotations to fix the tree when it starts to get unbalanced

We define a balance factor for each node:

height( left subtree) - height(right subtree)

A tree is an AVL tree if and only if:

For every node in the tree, the balance factor is 1, 0, or -1.

If this bound is maintained, then it can be shown that the tree will give the desired N log₂N retrieval property that we desire.

The trick is maintaining this bound with no more than log₂N work when inserting new nodes. This is done using rotations. The rules for insertion are basically:

find the leaf where a new data item will go
find the first node on the path from leaf to root with a non-zero balance factor -- this is the pivot
if, after insertion, the pivot's balance factor changes to 2 or -2, rotate the subtrees to restore the balance.

There are 4 possible cases of rotation that might be required, and 2 of the cases have subcases. There's a symmetry between left and right rotations.

You should understand how each case is defined and which rotation is appropriate. Sometimes, there may be several ways to fix an unbalanced AVL tree, but you need to know which of those rotations the algorithm does. See the textbook for more information.

There are quite a few online interactive animations for AVL search trees. I like this one, and this one. But I find none of them are as helpful for learning as simply building trees yourself by hand. Or, if you prefer, watching someone else do it by hand.

Red-black trees

How red-black trees work is described in detail here and in the textbook.

Red-black trees are like AVL trees in that they

are binary search trees
use left and right rotations
maintain an invariant property that guarantees that the tree is never more than a constant factor worse than the best possible binary search tree.

They differ in what that invariant is and what bookkeeping information is kept. Instead of two numbers on each node representing the heights of the subtrees, in red-black trees just one bit is kept, marking how a child node links to a parent node. By convention, a marked link is called a red link, an unmarked link is called black.

The invariant property is that

the tree is perfectly balanced if you only look at the black nodes (links)
there are never two red nodes (links) in a row on any path to a leaf

From this it follows that the tree can't be more than 2 log N deep, and normally it's much better than that.

Because there's so much less bookkeeping data, red-black trees in general perform better than other self-balancing binary search tree algorithms and are the algorithm of choice for things like the STL's set and map data structures.

2-3-4 Trees

2-3-4 trees are search trees, but not binary search trees. While 2-3-4 trees themselves are rarely used in practice, they have two interesting properties that appear in other commonly used data structures:

They are n-way search trees, i.e., each key has N-1 keys and N subtrees.
They use node splitting to stay balanced.

N-way search trees and node splitting is a general idea that is central in B-trees and similar data structures that are used in implementing large database management systems. Such trees are particularly useful for managing large amounts of data stored on external media, such as disks.

An interesting side point is that even though there are no rotations in 2-3-4 trees, it turns out that operations in building a 2-3-4 tree can be mapped one-to-one to operations in building a red-black tree. That is, a red-black tree with rotations and coloring can be viewed as a way to implement 2-3-4 trees using binary nodes.

The key ideas in 2-3-4 trees are:

keep a very tight bound on how unbalanced the tree can get when nodes are added or removed.
use splitting to fix the tree when it starts to get unbalanced

Each node in a 2-3-4 tree has 1, 2 or 3 keys. The keys are ordered in the node, and the subtrees follow these rules:

A 2-way node with 1 key is just like a normal binary node -- the left subtree has data smaller than the key and the right subtree has data greater than the key.
A 3-way node with 2 keys has 3 subtrees: the left subtree has all data smaller than the first key, the right subtree has all data greater than the key, and the middle tree has all data between the two keys.
A 4-way node with 3 keys has 4 subtrees, following the same pattern as a node with 2 keys.

Searching a 2-3-4 tree is pretty much like searching a binary search tree. The only difference is that we compare the input data item against the keys in the node first to determine which subtree to follow (left, left of middle, right of middle, or right). The logic for this is straightforward.

The real differences arise when we add data to a 2-3-4 tree. The key operation is node splitting. When we adding data, as we go down the tree,

if we find a node that is "full," i.e., has 4 subtrees already (3 keys),
- We split the 4-way node into two 2-way nodes, with the leftmost key in the left node and the rightmost key in the right node.
- We send the middle key to the parent of the split node.
- Note: The parent can't be full because we split all full nodes on the way down.

Question for the reader: Why does a 2-3-4 tree stay balanced?

Mapping 2-3-4 trees to red-black trees

Red-black trees can be seen as a way of representing 2-3-4 trees as binary trees. The basic idea is simple:

Split 3-way nodes and 4-way nodes into individual binary nodes
use red links to connect parent nodes to children that were together in the 3-nodes and 4-nodes
use black branches to link nodes that were separate in the 2-3-4 tree

Intuitively, the red links capture the same information that we see when a 2-3-4 node gets full.

The three kinds of nodes in a 2-3-4 tree therefore become the following kinds of nodes in a red-black tree:

a 2-way node becomes 1 red-black node, with black branches to the same subtrees as the original 2-way node, -- it's nice and empty
a 3-way node becomes 2 red-black nodes -- and there are two ways this can happen:
- make the leftmost key a red child of the rightmost key
- make the rightmost key a red child of the leftmost key
a 4-way node becomes 3 red-black nodes:
- the leftmost key becomes a red child of the middle key
- the rightmost key becomes a red child of the middle key

When the 2-3-4 algorithm

inserts a key into a 2-way or 3-way nodes, that becomes a single rotation in the equivalent red-black tree.
splits a node, this is the case in the red-black algorithm when two red links in a row lead to rotation and re-coloring.

For example, let's compare adding the numbers 1, 2, 3, ... to a 2-3-4 and a red-black tree.

Insert	2-3-4 Operations	Red-black Operations
1	Put in leftmost key of root node	Put in root node
2	Put in rightmost key of root node	Make 2 a red right child of 1
3	Shift 2 left, put 3 in rightmost key of root node	Make 3 a red right child of 2. Because there are two red links in a row, rotate left. Note: the red links join the keys in the 2-3-4 node.
4	Split the full node. Make a root node with 2, a left child with 1, and a right 3-way child with 3 and 4	Recolor the two red children black. Make 4 a red right child of 3. Note: the red link joins the keys in the corresponding 3-way node.

This should be enough for you to see how to keep going with more numbers. Notice how the counter-intuitive "make 2 red siblings black" makes perfect sense when viewed as the result of a node split.

Comments? Send mail to c-riesbeck@northwestern.edu.