Growing with the Web

Bucket sort

Published , updated
Tags:

Bucket sort, also known as bin sort, is a distribution sort that works by arranging elements into several ‘buckets’ which are then sorted using another sort, typically insertion sort, and merged into a sorted list.

Bucket sort can be exceptionally fast because of the way elements are assigned to buckets, typically using an array where the index is the value. This means that more auxiliary memory is required for the buckets at the cost of running time than more comparison sorts. It runs in O(n+k) time in the average case where n is the number of elements to be sorted and k is the number of buckets.

Bucket sort moves elements to buckets, then sorts the buckets

While bucket sort is a distribution sort, it typically uses a comparison sort to sort the buckets after they have been allocated.

Complexity

Time Space
Worst case Best case Average case Worst case
O(n^2) O(n + k) O(n + k) O(n + k) auxiliary

When it’s fast

Bucket sort’s best case occurs when the data being sorted can be distributed between the buckets perfectly. If the values are sparsely allocated over the possible value range, a larger bucket size is better since the buckets will likely be more evenly distributed. An example of this is [2303, 33, 1044], if buckets can only contain 5 different values then for this example 461 buckets would need to be initialised. A bucket size of 200-1000 would be much more reasonable.

The inverse of this is also true; a densely allocated array like [103, 99, 119, 112, 111] performs best when buckets are as small as possible.

Bucket sort is an ideal algorithm choice when:

  • The additional O(n + k) memory usage is not an issue
  • Elements are expected to be fairly evenly distributed

When it’s slow

Bucket sort performs at its worst, O(n^2), when all elements at allocated to the same bucket. Since individual buckets are sorted using another algorithm, if only a single bucket needs to be sorted, bucket sort will take on the complexity of the inner sorting algorithm.

This depends on the individual implementation though and can be mitigated. For example a bucket sort algorithm could be made to work with large bucket sizes by using insertion sort on small buckets (due to its low overhead), and merge sort or quicksort on larger buckets.

Bucket sort vs counting sort vs radix sort

There are two main variants of bucket sort; one where there is a bucket for each value, and where buckets hold several values. Bucket sort is often seen as a generalisation of counting sort because bucket sort with a bucket size of 1 is essentially counting sort, just with a more complex implementation. This can be spun the other way as well by saying that bucket sort is counting sort that uses more sophisticated buckets.

Another similar sort, radix sort, also distributes items into buckets. However, they are distributed based on the individual digits within the values themselves.

In summary:

  • Counting sort: buckets hold only a single value
  • Bucket sort: buckets hold a range of values
  • Radix sort: buckets hold values based on digits within their values

Code

The most common implementation of bucket sort works by splitting the array of size n into k buckets, each of which house a value range of n/k. The buckets are then sorted using a simple sorting algorithm that works well on the expected data, such as insertion sort. Buckets are typically implemented using either linked lists or dynamic arrays.

public class BucketSort : IIntegerSortingAlgorithm {
    private const int DefaultBucketSize = 5;

    public void Sort(int[] array) {
        Sort(array, DefaultBucketSize);
    }

    public void Sort(int[] array, int bucketSize) {
        if (array.Length == 0) {
            return;
        }

        // Determine minimum and maximum values
        int minValue = array[0];
        int maxValue = array[0];
        for (int i = 1; i < array.Length; i++) {
            if (array[i] < minValue) {
                minValue = array[i];
            } else if (array[i] > maxValue) {
                maxValue = array[i];
            }
        }

        // Initialise buckets
        int bucketCount = (maxValue - minValue) / bucketSize + 1;
        IList<List<int>> buckets = new List<List<int>>(bucketCount);
        for (int i = 0; i < bucketCount; i++) {
            buckets.Add(new List<int>());
        }

        // Distribute input array values into buckets
        for (int i = 0; i < array.Length; i++) {
            buckets[(array[i] - minValue) / bucketSize].Add(array[i]);
        }

        // Sort buckets and place back into input array
        int currentIndex = 0;
        var insertionSort = new InsertionSort<int>();
        for (int i = 0; i < buckets.Count; i++) {
            int[] bucketArray = buckets[i].ToArray();
            insertionSort.Sort(bucketArray);
            for (int j = 0; j < bucketArray.Length; j++) {
                array[currentIndex++] = bucketArray[j];
            }
        }
    }
}
public class BucketSort {
    private static final int DEFAULT_BUCKET_SIZE = 5;

    public static void sort(Integer[] array) {
        sort(array, DEFAULT_BUCKET_SIZE);
    }

    public static void sort(Integer[] array, int bucketSize) {
        if (array.length == 0) {
            return;
        }

        // Determine minimum and maximum values
        Integer minValue = array[0];
        Integer maxValue = array[0];
        for (int i = 1; i < array.length; i++) {
            if (array[i] < minValue) {
                minValue = array[i];
            } else if (array[i] > maxValue) {
                maxValue = array[i];
            }
        }

        // Initialise buckets
        int bucketCount = (maxValue - minValue) / bucketSize + 1;
        List<List<Integer>> buckets = new ArrayList<List<Integer>>(bucketCount);
        for (int i = 0; i < bucketCount; i++) {
            buckets.add(new ArrayList<Integer>());
        }

        // Distribute input array values into buckets
        for (int i = 0; i < array.length; i++) {
            buckets.get((array[i] - minValue) / bucketSize).add(array[i]);
        }

        // Sort buckets and place back into input array
        int currentIndex = 0;
        for (int i = 0; i < buckets.size(); i++) {
            Integer[] bucketArray = new Integer[buckets.get(i).size()];
            bucketArray = buckets.get(i).toArray(bucketArray);
            InsertionSort.sort(bucketArray);
            for (int j = 0; j < bucketArray.length; j++) {
                array[currentIndex++] = bucketArray[j];
            }
        }
    }
}
/**
 * Sorts an array using bucket sort.
 * @param {number[]} array The array to sort.
 * @param {number} [bucketSize=5] The number of values a bucket can hold.
 * @returns The sorted array.
 */
function sort(array, bucketSize) {
  if (array.length === 0) {
    return array;
  }

  // Determine minimum and maximum values
  var i;
  var minValue = array[0];
  var maxValue = array[0];
  for (i = 1; i < array.length; i++) {
    if (array[i] < minValue) {
      minValue = array[i];
    } else if (array[i] > maxValue) {
      maxValue = array[i];
    }
  }

  // Initialise buckets
  var DEFAULT_BUCKET_SIZE = 5;
  bucketSize = bucketSize || DEFAULT_BUCKET_SIZE;
  var bucketCount = Math.floor((maxValue - minValue) / bucketSize) + 1;
  var buckets = new Array(bucketCount);
  for (i = 0; i < buckets.length; i++) {
    buckets[i] = [];
  }

  // Distribute input array values into buckets
  for (i = 0; i < array.length; i++) {
    buckets[Math.floor((array[i] - minValue) / bucketSize)].push(array[i]);
  }

  // Sort buckets and place back into input array
  array.length = 0;
  for (i = 0; i < buckets.length; i++) {
    insertionSort(buckets[i]);
    for (var j = 0; j < buckets[i].length; j++) {
      array.push(buckets[i][j]);
    }
  }

  return array;
}
def sort(array, bucketSize=DEFAULT_BUCKET_SIZE):
  if len(array) == 0:
    return array

  # Determine minimum and maximum values
  minValue = array[0]
  maxValue = array[0]
  for i in range(1, len(array)):
    if array[i] < minValue:
      minValue = array[i]
    elif array[i] > maxValue:
      maxValue = array[i]

  # Initialize buckets
  bucketCount = math.floor((maxValue - minValue) / bucketSize) + 1
  buckets = []
  for i in range(0, bucketCount):
    buckets.append([])

  # Distribute input array values into buckets
  for i in range(0, len(array)):
    buckets[math.floor((array[i] - minValue) / bucketSize)].append(array[i])

  # Sort buckets and place back into input array
  array = []
  for i in range(0, len(buckets)):
    insertion_sort.sort(buckets[i])
    for j in range(0, len(buckets[i])):
      array.append(buckets[i][j])

  return array
module BucketSort
  include InsertionSort

  # Sorts an array using bucket sort.
  def self.sort(array, bucket_size = 5)
    if array.empty?
      return
    end

    # Determine minimum and maximum values
    min_value = array.min
    max_value = array.max

    # Initialise buckets
    bucket_count = ((max_value - min_value) / bucket_size).floor + 1
    buckets = Array.new(bucket_count)
    (0..buckets.length - 1).each do |i|
      buckets[i] = []
    end

    # Distribute input array values into buckets
    (0..array.length - 1).each do |i|
      buckets[((array[i] - min_value) / bucket_size).floor].push(array[i])
    end

    # Sort buckets and place back into input array
    array.clear
    (0..buckets.length - 1).each do |i|
      InsertionSort.sort buckets[i]
      buckets[i].each do |value|
        array.push(value)
      end
    end
  end
end

Textbooks

Here are two CS textbooks I personally recommend; the Algorithm Design Manual (Steven S. Skiena) is a fantastic introduction to data structures and algorithms without getting to deep into the maths side of things, and Introduction to Algorithms (CLRS) which provides a much deeper, math heavy look.

 

Comments

comments powered by Disqus
Like this article?
Subscribe for more!