How heaps (priority queues) work
Priority queues use a very simple data structure called a "heap". Heaps are a special type of binary tree with exactly two properties:
- Every parent node is smaller than both of its children
- Heaps are "almost complete", meaning that their nodes can be represented as an array in reading order without NULL values.
Rule two is a bit confusing, so here's an example: The array [1, 2, 3, 4, 5] can be placed into a tree with all of its nodes in reading order like this:
If you just read this tree left to right and top to bottom, you get [1, 2, 3, 4, 5]. Not all trees have this property, though. This tree, for example, cannot be represented as an array of values.
This tree would turn into [1, 2, 3, NULL, 5]. With heaps, you're not allowed to have NULL values in the middle of the array like that.
Here are some examples of valid heaps:
The first rule passes. The only parent node (1) is smaller than all of its children (2 and 3). The second rule also passes, because this tree has all of its nodes in reading order. If I read out this tree to you, it would go [1, 2, 3].
Here's another example:
This heap is also valid. Every parent node is smaller than its children, and it can be read in reading order as [2, 5, 3, 12917322, 6, 4, NULL]. There are, of course, NULL elements in this array, but they're all at the end so it's fine.
The following heap, on the other hand, is invalid.
If I put this tree into reading order, it would go [2, 5, 3, 12917322, 6, NULL, 4]. I hit a NULL before I finish reading all of the real nodes, so it breaks rule two.
This heap is also invalid.
The heap is perfectly balanced, but there is a node that is smaller than its parent (1 is less than 2), so it breaks rule one.
Heaps have a few interesting properties, the biggest one being that the root node of a heap is always the smallest node. If it wasn't, then it wouldn't be a valid heap. This gives an opportunity: If we can figure out how to efficiently add and remove from a heap, we can create an efficient priority queue.
Let's try adding the number 0 to this heap.
We don't want to break rule number 2, so we can add our new node to the next spot in reading order.
Now we're breaking rule number 1, because 0 is less than 5. Luckily, we can swap those two nodes without a problem. This operation is guaranteed to be safe. If you make a parent node smaller, you can guarantee that both children will still be bigger than the parent.
Great! We've just made a small section of this heap a little bit more correct. Of course, it's still wrong, 0 is less than 3. We can fix this by swapping the 0 node with its parent again.
Now our heap is entirely correct, with an extra node. This operation is called "bubble-up", because you take a node and you repeatedly swap it upwards until it reaches a safe spot. Let's try it again by adding an 8.
We begin by adding it to the next spot in reading order
Then we realize that this node's parent is too big, so we swap it upwards
And we're done (another swap would have broken the heap). An implementation of this algorithm in pseudocode might look like this:
function bubble_up(node):
// Return if we're safe
if (node == root)
return;
if (node.value > node.parent.value)
return;
// Otherwise, swap up
swap(node, node.parent);
bubble_up(node.parent);
function add_value(value):
new_node = add_node_to_tree_in_next_available_spot(new node(value));
bubble_up(new_node);
We've got adding, how about removing? Let's begin by removing the root node.
Of course, this isn't a valid binary tree anymore, so let's swap the root node with the last node in the tree.
Because we've just taken a node from the bottom and pushed it to the top, this node is definitely going to be too big. We can fix this by swapping it with the smaller of its two children. This operation is also guaranteed to be safe for a similar reason to bubble-up.
It looks like our node is still too big, so we can swap it again.
And we're done. This operation is called "bubble-down". Pseudo-code for this algorithm might look like this:
function bubble_down(node):
// Return if we're good
if (node.child1 == NULL)
return;
if (node.child2 == NULL && node.child1.value < node.value)
return;
if (node.value < node.child1.value && node.value < node.child2.value)
return;
smaller = min(node.child1, node.child2);
swap(node, smaller);
function remove():
ret = root.value;
swap(root, last_node());
bubble_down(root);
return ret;
There is still one more thing about heaps that I haven't talked about. Because heaps can be represented as an array, we usually just store all of the nodes in an array and don't bother with storing any references. When you run
System.out.println(some_priorityqueue);
What you get is that array, where for any node i, its children are i*2 and 2*i+1.