Binary Search Trees

Tree Arity

The arity of a tree is the maximum number of children any node can have. This is a maximum. A node can always have less children, but never more.

The arity of a tree can always be described as M-ary where \(M\) is the maximum number of children. A few common arities have special names.

  • Unary Tree has at most 1 child. This is a Linked List. A 1-ary tree.

  • Binary Tree has at most 2 children. A 2-ary tree.

  • Ternary Tree has at most 3 children. A 3-ary tree.

The example in the previous section was a Ternary or 3-ary tree.

BST Structure

A Binary Search Tree (BST) is a Binary Tree with special properties that make it efficient to search.

A BST is designed to allow efficient addition/removal of values and still be able to binary search the values. It is a ordered tree. The nodes are kept in a strict order to ensure easy searching.

This makes a BST efficient to search, but also efficient to change.

The ordering of a BST is strictly defined.

  • The left child and all it’s proper descendants must be less than the parent.

  • The right child and all it’s proper descendants must be greater than the parent.

Note

What happens to equal values? If you only ever need to search, then there is no reason to keep duplicates. If the duplicates matter, pick a side and be consistent.

Searching for Items

The two most basic features are inserting values and searching. If we can’t insert any values, there is no point in searching. We need to build the structure before we can search it.

Insertion

To make a BST we start with an empty tree. Each node is inserted such that the two properties are meet.

The algorithm for inserting a target value can be summarized as:

  1. If the location is empty then create a new node and store target value.

  2. If the location is not empty then compare node value to the target value.

    1. If values are the same then make no changes.

    2. If node value is larger then insert recursively on left.

    3. If node value is smaller then insert recursively on right.

The following images insert the values \(6, 4, 1, 10, 8, 5, 12\). When building a tree, the order we insert the values matters.

Insert Value into Tree (1 of 7)

Insert Value into Tree (2 of 7)

Insert Value into Tree (3 of 7)

Insert Value into Tree (4 of 7)

Insert Value into Tree (5 of 7)

Insert Value into Tree (6 of 7)

Insert Value into Tree (7 of 7)

Implementation

To implement our BST, we need a node structure. The node has a value and pointers for the left and right values.

//Node is a Data Structure to hold a binary tree node.
struct Node {
    //Store the Node's Value
    int value
    //Pointer to Left subtree
    Node* left
    //Pointer to Right subtree
    Node* right
}

A tree is just a bunch of nodes starting at the root. We only need to know where the root is. We can find everything else by following pointers. The tree has it’s own structure.

//BST is a Data Structure to hold a Binary Search Tree
struct BST {
    //We need a pointer to the Root node
    Node* root
}

We can’t create anything if we can’t make an empty tree. The following function just allocates memory for an empty tree.

//MakeTree creates a new empty binary search tree
Function MakeTree()
    //Allocate a new BST and get pointer
    T = allocateBST()
    //Set the root to be empty
    T's root is null
    //Return pointer to tree
    Return T
End Function

At this point, we have enough code structure to start building algorithms. All tree algorithms will be recursive. We will make two functions for each feature. One that programmers user, these always start at the root. We will have a second function that can be executed at any node. These will be used internally for implementation.

A user of our data structure wants to find a value in the tree.

//Find searches for target value in tree
Function Find(int value, BST* Tree)
    //Start recursive search at the root
    return findInTree(value, Tree's root)
End Function

We just execute the recursive implementation starting at the root. The recursive implementation compares to values and moves left or right. It will either find the target or run out of options.

//findInTree searches nodes for a value
Function findInTree(int value, Node* currentNode)
    //If the node is empty then nowhere to look.
    If currentNode is null then
        Return false
    End If
    //If the value is the current node, we found target
    If currentNode's value == value then
        Return true
    End If
    //Otherwise, compare and search one side.
    If currentNode.value > value then
        //Node is to big, search left
        Return findInTree(value, currentNode's left)
    End If
    //Node is to small, search right
    Return findInTree(value, currentNode's right)
End Function

We need to do a similar thing to insert. The user of our data structure just wants to insert a value.

//Insert inserts a new value into the tree (if not already in it).
Function Insert(int value, BST* Tree)
    //This might change the root node
    Tree's root = insertValue(value, Tree's root)
End Function

We need to deal with updating pointers. We need to search the tree for a place to put the value. Then we need to update all the pointers to account for the new node.

Note

We won’t know where in the recursion the new pointers appear. Every call needs to return pointers. Some will be unchanged, but we won’t know that until the return.

//insertValue recursively searches for a location to insert the value.
//It returns a pointer to the updated node.
Function insertValue(int value, Node* currentNode)
    //If we are an empty position, create a new node
    If currentNode is null then
        Let newNode = allocateNewNode()
        newNode's value = value
        newNode's left = nil
        newNode's right = nil
        Return newNode
    End If
    //If the value is found, ignore it
    If currentNode's value == value then
        Return currentNode //No Changes
    End If
    //Recursively update either left or right side
    If currentNode's value > value then
        //Insert and update on left
        currentNode's left = insertValue(value, currentNode's left)
        //Return updated node
        Return currentNode
    End If
    //Insert and update on right
    currentNode's right = insertValue(value, currentNode's right)
    //Return updated node
    Return currentNode
End Function

Tree Information

Normally, we want to do more than just search. We want to ask questions about the tree. We may even need to display the tree.

Get the Root

It is sometimes helpful to get direct access to the pointers. This function returns the root node pointer.

//GetRoot returns the root node of the tree
Function GetRoot(BST* T)
    Return T's root
End Function

Height

Finding out the height of the tree is very important. It allows us to determine if the tree is empty or not. It also allows us to determine if it has become unbalanced. A tree with no values has a height of \(-1\).

We need a public function that starts at the root.

//Height tells you the height of the tree
Function Height(BST* T *BST)
    //The tree height is the node height of the root
    Return nodeHeight(T's root)
End Function

We can implement a recursive function to find the height of a node. Since the height of the root is the height of the tree, this is used as a helper to find the tree height.

//nodeHeight determines the height of a node
Function nodeHeight(Node* currentNode)
    If currentNode is null then
        Return -1
    End If
    Let leftHeight = nodeHeight(currentNode's left)
    Let rightHeight = nodeHeight(currentNode's right)
    If leftHeight > rightHeight then
        Return leftHeight + 1
    End If
    Return rightHeight + 1
End Function

Displaying the Tree

It is great to have nice graphics of the tree. In practice, this will normally not be possible. In a text-based program, we still need a way to visualize the tree. This is crucial for debugging.

There are three common methods for printing out a tree.

  1. Preorder: Print Node, Print Left Children, Print Right Children

  2. Postorder: Print Left Children, Print Right Children, Print Node

  3. Inorder: Print Left Children, Print Node, Print Right Children

Examine the following tree as an example.

A Binary Search Tree to Print

For preorder, we start with the node’s value, then recursively deal with the left and right. When we get to empty positions, we print N for null. The following example shows how the recursion builds the text to display. Parenthesis are added to show grouping in intermediate steps.

  • Start Preorder at Root

  • Print Node, Print Left Children, Print Right Children

  • 6 (Node LC RC) (Node LC RC)

  • 6 (4 (Node LC RC) (Node LC, RC)) (10 (Node LC RC) (Node LC RC))

  • 6 (4 (1 N N) (5 N N)) (10 (8 N N) (12 N N))

  • 6 4 1 N N 5 N N 10 8 N N 12 N N

In postorder, we put the root value at the end instead of the beginning.

  • Start Postorder at root

  • Print Left Children, Print Right Children, Print Node

  • (Node LC RC) (Node LC RC) 6

  • ((Node LC RC) (Node LC RC) 4) ((Node LC RC) (Node LC RC) 10) 6

  • ((1 N N) (5 N N) 4) ((8 N N) (12 N N) 10) 6

  • 1 N N 5 N N 4 8 N N 12 N N 10 6

With inorder, we put the root value in between the left and right side. This gives us a sorted list of values! It doesn’t tell us as much about the tree structure as the other two.

  • Start Inorder at root

  • Print Left Children, Print Node, Print Right Children

  • (LC Node RC) 6 (LC Node RC)

  • ((LC Node RC) 4 (LC Node RC)) 6 ((LC Node RC) 10 (LC Node RC))

  • ((N 1 N) 4 (N 5 N)) 6 ((N 8 N) 10 (N 12 N))

  • N 1 N 4 N 5 N 6 N 8 N 10 N 12 N

Using all three of these prints together, it is easy to rebuild the structure of the tree on paper.

Deleting Values

Deleting a value from the tree is the most complicated operation. We need to keep all the properties of the BST, but we also want to minimize our changes. This requires a more complex algorithm.

Removing Values

The easiest node to remove is a leaf. It has no children. We just need to update the parent to know the child has been removed.

Delete a Leaf

It is also easy to delete a node with only one child. We just update the parent to point directly to the child. The node in the middle is gone.

Delete Node With One Child

The challenge comes when we want to delete a node with two children. We need to keep the tree properties. We want to delete 6 from the following tree. Is has many children. This is the worst case situation.

Preparing to Delete Root

We already know leaves are trivial to delete. Could we swap 6 into a leaf? That would make it easier to delete. If we swap it with 2, the tree doesn’t get any easier to work with. The same thing happens with 8. We can’t swap with an internal node. It won’t make the problem any easier.

Root cannot be swapped with internal nodes.

We could swap with values 1 or 9. They would make it easy to delete 6. They introduce a different problem. They break other nodes in the tree. These swaps will cause even more work for us. We will need to fix the tree.

Swapping with some nodes breaks the tree.

There are two values that can be swapped with the root. The maximum of the left side and the minimum of the right side.

Swapping with the value 3 keeps everything else in the tree correct. It also makes 6 a leaf. We can easily delete 6 after this swap.

Swapping with Max of Left

The same thing happens if we swap with 7. The tree structure holds.

Swapping with Min of Right

We can use either of these, but we need to be consistent. When we delete an node with two children, we swap it to a easier position. We will use the minimum of the right side.

Deleting the Root

The minimum of the right side must be an easy to delete value and also a root replacement. Since it is the minimum it has either 0 or 1 child. These are both easy to delete. It cannot have two children and also be the minimum. It’s left child would be smaller.

The minimum of the right side is larger than everything on the left side and also smaller than everything on the right side except the root. Since the root is being deleted, this node will fit in the root position without breaking the tree properties.

Finding the Min

The minimum value is a binary search tree is reasonably easy to find. If there is a smaller value, it is always to the left. We move left until we run out of left children. The node we end on must be the minimum. There is nothing smaller than it.

The minimum of the entire tree starts searching at the root.

//Min determines the minimum value in the tree
Function Min(BST* Tree)
    Return findMin(Tree's root)
End Function

We can find the minimum of any node by searching recursively starting at that node. We just go left until we run out of left children.

//findMin finds the min starting at any node
Function findMin(Node* currentNode)
    //Return -1 for error
    If currentNode is null then
        Return -1
    End If
    //If no left child then at min
    If currentNode's left == nil then
        Return currentNode's value
    End If
    //Otherwise keep searching
    Return findMin(currentNode's left)
End Function

Delete Implementation

At the top level, we delete a value from the root of the tree.

//Delete removes a value from the tree
Function Delete(int value int, BST* Tree)
    Tree's root = deleteValue(value, Tree's root)
End Function

We need to search the three for the value we want to delete. The function deleteValue searches the tree. When it finds the target, it calls a helper to remove the node. It updates pointers as it searches.

//deleteValue removes a value starting at node given and returns updated tree
Function deleteValue(int value int, Node* currentNode)
    //Case 1: Empty Node
    If currentNode is null then
        return nil
    End If
    //Case 2: This is the node we want
    If currentNode.value == value then
        Let x = deleteNode(currentNode)
        Return x
    End If
    //Case 3: Search left or right for target
    If currentNode.value > value then
        //Search left
        currentNode's left = deleteValue(value, currentNode's left)
        Return currentNode
    End If
    //Search right
    currentNode.right = deleteValue(value, currentNode's right)
    Return currentNode
End Function

The deleteNode function is our workhorse. It actually removes the node. We implement the three possible cases of delete. If we swap with the minimum of the right side, we have to call deleteValue recursively. We can be sure this value will be trivial to delete.

//deleteNode removes the node with pointer given. Returns new pointer
Function deleteNode(Node* currentNode)
    //Case 1: If no left child then replace with right child
    If currentNode's left is null then
        Return currentNode's right
    End If
    //Case 2: If no right child then replace with left child
    If currentNode's right is null then
        Return currentNode's left
    End If
    //Case 3: Swap with min value on right side
    Let minVal = findMin(currentNode's right)
    //Swap min here
    currentNode's value = minVal
    //Delete from right side
    currentNode's right = deleteValue(minVal, currentNode's right)
    //Return updated pointer
    Return currentNode
End Function

Analysis

Almost every function in the tree is dependent on the height. The only exceptions are printing which requires looking at every node and returning trivial values. When we move through the three, we only take one path from the root to a leaf. The longest path from the root to a leaf is the height.

Let the height of the tree be \(h\) and the number of nodes be \(n\).

Operation

Runtime

MakeTree

\(O(1)\)

GetRoot

\(O(1)\)

Height and nodeHeight

\(O(n)\)

Find and findInTree

\(O(h)\)

Insert and insertValue

\(O(h)\)

Min and findMin

\(O(h)\)

Delete, deleteValue, and deleteNode

\(O(h)\)

Preorder, Postorder, Inorder

\(O(n)\)

To determine the running times of these functions, we need to determine the height of the tree.

The worst case tree height is when all nodes are placed on one side.

A worst case BST.

In this case, we have build a linked list. The height is the number of nodes minus one. The worst case height is \(h = n-1\).

A Balanced Tree

The tree we saw earlier was a full balanced tree. Ever position on every level was filled.

  1. First level as 1 node

  2. Second level has 2 nodes

  3. Third level has 4 nodes

  4. Fourth level has 8 nodes

  5. \(k\)-th level has \(2^k\) nodes

The total nodes in the tree is

\( \begin{align} 1+2+4+8+16+\cdots+2^{k} \end{align} \)

This formula should remind you of binary numbers. We are adding powers of two! A tree with \(k\) levels can store as many numbers as a \(k\)-bit binary number. Our tree has \(3\) levels, so it can store \(1+2+4=7\) numbers.

The best case height of a tree is to fill every space. A height must be a whole number, so we need to round down. That means for a tree with \(n\) nodes, we have

\( \begin{align} h = \left\lfloor \log_2(n) \right\rfloor \end{align} \)

The height will fall in a range between these two extreme cases.

\( \begin{align} \left\lfloor \log_2(n) \right\rfloor \le h \le n-1 \end{align} \)

Where will the tree fall in this range? That will depend on the values and the order they are entered. There are a number of modifications to the BST that will force it to always stay balanced. Some examples are AVL Trees, 3-4 Trees, and Red-Black Trees. Each of these data structures tracks the height of the tree and moves values if it becomes to unbalanced. For uniform random values, a standard BST will likely be close to balanced.