Binary heap: Build heap proof

Published , updated
Tags:

The binary heap data structure supports a build heap operation that runs in O(n). Intuitively it might seem that it should run in O(n \log n) time since it performs an O(\log n) operation n/2 times. This article provides a proof of the linear run time.

The basic idea behind why the time is linear is due to the fact that the time complexity of heapify depends on where it is within the heap. It takes O(1) time when the node is a leaf node (which makes up at least half of the nodes) and O(\log n) time when it’s at the root.

The O(n) time can be proven by solving the following:

\sum_{h=0}^{\lfloor\log n\rfloor} \lceil\dfrac{n}{2^{h + 1}} \rceil O(h)

This can be broken down into three parts:

• O(h) is worst case time in which heapify takes for each node
• \lceil\dfrac{n}{2^{h + 1}} \rceil is the number of nodes for any given height n
• \sum_{h=0}^{\lfloor\log n\rfloor} is a summation that sums the values for each height of the tree from 0 to \lfloor \log n \rfloor.

n can be moved out of the fraction since it’s a constant and h can be moved into the fraction.

= O(n \cdot \sum_{h=0}^{\lfloor\log n\rfloor} \dfrac{h}{2^{h + 1}})

\dfrac{h}{2^{h + 1}} can be simplified to \dfrac{h}{2^h2} and then the 2 can be dropped completely since constants are discarded within big-O notation.

= O(n \cdot \sum_{h=0}^{\lfloor\log n\rfloor} \dfrac{h}{2^h})

We’re interested in how the summation behaves as it approaches infinity.

= O(n \cdot \sum_{h=0}^{\infty} \dfrac{h}{2^h})

The summation now converges to 2 so it can be swapped in and then dropped, leaving O(n).

= O(n \cdot 2) = O(n)

Textbooks

Here are two CS textbooks I personally recommend; the Algorithm Design Manual (Steven S. Skiena) is a fantastic introduction to data structures and algorithms without getting to deep into the maths side of things, and Introduction to Algorithms (CLRS) which provides a much deeper, math heavy look.