Fall 2005
Monday 5:00 - 6:00 Rm 643E, Wednesday 3:00 - 5:00 Rm 621E
Binary search trees, like linked lists, combine pointers, structures, and dynamic memory allocation to store data when it is unknown by both the programmer and user how much data there will be ahead of time. A binary search tree is a binary tree with a specific ordering requirement. Each node in a binary tree has two links, one to a left child and one to a right child. Either of these links might be a NULL pointer. The binary tree has a single root, and somewhere the program there must be a a pointer to the root; this pointer will likely be the only thing stored in the static area of memory or an activation record (depending on whether it is a global or a local); the rest of the memory will come from the heap (allocated with "malloc" or some other provided function) as is the case with a linked list. A binary search is a binary tree in which every node has a key, all descendants in the left sub-tree of a node have keys with values less than or equal to the node, and all descendants in the right sub-tree of a node have keys with values greater than or equal to the node. A binary search tree can be used instead of a linked list whenever it is OK to order nodes based on their value (e.g., if you are sorting of words or numeric data). They can not be used when there is a specific sequential order of nodes that can not be changed (e.g., if you are storing the digits of a number as in the multiplication example of the last topic).
A linked list generally takes linear time to insert a node or search for a node, and it takes linear time to walk through all the nodes in a linked list. A binary tree takes, on average, only logarithmic time (with a base of 2) to insert a node or search for a node; it still takes linear time to traverse all the nodes in a binary tree in order. This is a major improvement; for example, a sort implemented by inserting nodes into a linked list in the order they arrive takes O(N^2) average (and worst-case) time (the same as bubble sort, insertion sort, or selection sort). A sort implemented by inserting nodes into a binary search tree in the order they arrive takes only O(N * log N) average time (the same as merge sort or quicksort). This is an AVERAGE case analysis; it is likely if the data comes in an arbitrary order. The worst case is when the data arrives in an already sorted (or reverse sorted) order, in which case a binary tree behaves the same as a linked list (and takes extra memory, although not a significant amount). There are improvements to binary search trees that guarantee logarithmic time for insertions and searches, and if you go on to take a Data Structures and Algorithms course you will probably learn about this.
The typical way to order nodes in a binary search tree is as follows: The first node becomes the root no matter what. Whenever a new node is created, you start at the root, and if the value is less than the value at the root you move to the left child of the root, and if the data is greater than or equal to the root, you move to the right child. You repeat this process until you hit a NULL, and then, in the position of this NULL, you insert a new leaf (a leaf is a node without any children). It is also possible to store counts with each node so that repeated values do not lead to new nodes; this is the case with the word counting program, and in class we examine an example of this program with both linked lists and binary search trees.
Once a binary search tree exists, it is surprisingly simple to traverse all of the nodes of the tree in sorted order in linear time. Since you know that all nodes in the left subtree of a node have values less than the current node, and all nodes in the right subtree of a node have values greater than or equal to the current node, to traverse all nodes in order, starting at a given node, you simply traverse the nodes of the left subtree in order, then visit the current node, then traverse the right subtree in order. This leads to a simple, recursive inorder traversal routine that can be implemented with a few lines of code.