Growing with the Web

Find the median of two sorted arrays

Published , updated
Tags:

This article looks at the interview question - Find the median of two sorted arrays. For example an input of [1, 7, 8] and [2, 5, 6, 9] should result in the output 6.

Analysis

The trivial solution

Let’s start with the trivial case, this problem could easily be solved by merging the two arrays into a single array. Merging can use the merge portion of merge sort which runs in O(n) time, and then the median could be retrieved by grabbing the middle element(s) in O(1) time. But is it possible to get the median using a sub-linear solution?

Same length arrays

Notice that the example arrays feature varying lengths, this is an important point to take note of as it puts an additional constraint on the solution. Let’s work against this for now and come up with a solution that works with same length arrays, then validate it against varying length arrays.

One way to approach the problem is to evaluate which values can be trimmed from the arrays without changing the median. Trimming both the smallest and largest elements from a list of numbers will not change the median of the list unless the list has a size of either 1 or 2.

[1, 2, 3, 4, 5, 6, 7]
[1, 2, 3, 4, 5, 6, 7]
[1, 2, 3, 4, 5, 6, 7]
[1, 2, 3, 4, 5, 6, 7]

Using the above, the local median of each array can be determined in constant time and then used to trim either side. If the local medians of both arrays are equal, then that is the median for both arrays combined. If not, the smaller local median can have elements less than the median discarded, while the other array has elements greater than the median discarded.

Consider the following example:

A = [1, 2, 3, 4, 5]
B = [2, 3, 4, 5, 6]

median(A) = 3
median(B) = 4

Because A’s local median is smaller than B’s, discard the first 2 elements from A ([1, 2]) and the last 2 from B ([5, 6]).

A = [1, 2, 3, 4, 5] = [3, 4, 5]
B = [2, 3, 4, 5, 6] = [2, 3, 4]

median(A) = 4
median(B) = 3

Because B’s local median is smaller than A’s, discard the first element from A ([5]) and the last element from B ([4])

A = [3, 4, 5] = [3, 4]
B = [2, 3, 4] = [3, 4]

median(A) = median(B) = 3.5

Since the median of both A and B are equal, this is the median of both arrays.

This algorithm also works going down to single elements in the arrays too. Here’s another example:

A = [-1, 0, 1, 2, 5]
B = [2, 3, 4, 5, 6]

A = [-1, 0, 1, 2, 5]
B = [2, 3, 4, 5, 6]

A = [1, 2, 5]
B = [2, 3, 4]

A = [2, 5]
B = [2, 3]

median = average(A[0], B[0]) = 2.5

Varying length arrays

The above solution works fine with arrays of the same length, what about those with different lengths?

Consider the following example:

A = [1, 2, 4, 7, 9]
B = [3, 5, 6, 8]

A = [1, 2, 4, 7, 9]
B = [3, 5, 6, 8]

A = [2, 4, 7, 9]
B = [3, 5, 6]

A = [2, 4, 7]
B = [5, 6]

A = [4, 7]
B = [5]

Now what? The median of the sequence [1, 2, ..., 9] is 5, but it seems like it’s about to get cut off of the array. We need to create a base case to stop this from occurring; when one of the lists is of length 1.

There are a few ways in which implementing this base case could be approached. An easy approach is to compare the number from the single list (5) against the first and last elements in the list and remove the highest and lowest numbers. Sure this works, but this same situation could happen with a much larger A which would take 2/n operations, meaning this would push the time complexity of the entire solution to O(n).

Another solution could be when a single element list occurs, merge the element into the larger list using a binary search insert. Unfortunately since we’re dealing with a raw array, insertion will still take O(n) time since the array would need to be reallocated to expand.

What we should do is look at the minimal problem set in order to come up with a solution that takes the smallest amount of time. For the above example, where A is an array with 2 elements and B is an array with a single element, there are 3 possible results:

If B[0] < A[0]: median = A[0]
If B[0] > A[1]: median = A[1]
Otherwise:      median = B[0]

This works for the general case since any values outside of the middle 2 can be discarded at this point. An odd number of values in A can also be broken down like this:

If B[0] < A[0]: median = (A[1] + A[0]) / 2
If B[0] > A[2]: median = (A[1] + A[2]) / 2
Otherwise:      median = (A[1] + B[0]) / 2

There need not be only 2 or 3 elements in A for this to work as we’re only interested in the middle elements of A.

Yet another case

There is one more case that needs to be covered that was found by Tuxdude in the comments. The case is when there are two values in one array and an even number (2 or more) in the other array, but the first array’s values are the median.

A test case demonstrating the issue is:

A = [-50, -47, -36, -35, 0, 13, 14, 16]
B = [-31, 1, 9, 23, 30, 39]

Following the unchanged algorithm would see the arrays being trimmed to the point where A = [0] and B = [9], resulting in a median of 4.5. However, the median is actually 5 so an additional base case needs to be introduced to handle this.

The base case I came up with to solve this issue is if one array’s length is 2 and the other’s is greater than or equal to 2 in addition to being even. After this is checked, if the first array belongs in sorted order in the middle of the other array, then the median is the first array.

Code

public class Solution {
    /**
     * Gets the median of two sorted arrays.
     *
     * @param A The first sorted array.
     * @param B The second sorted array.
     * @return The median of A and B.
     */
    public static double findMedian(int[] A, int[] B) {
        // Both arrays must have length >= 1.
        if (A.length == 0 && B.length == 0) {
            throw new IllegalArgumentException(
                    "Both arrays cannot be zero-length");
        }

        return findMedianInternal(A, 0, A.length - 1, B, 0, B.length - 1);
    }

    /**
     * Gets the median of two sorted arrays.
     *
     * @param A The first sorted array.
     * @param aStart The start index of A to use.
     * @param aEnd The end index of A to use.
     * @param B The second sorted array.
     * @param bStart The start index of B to use.
     * @param bEnd The end index of B to use.
     * @return The median of A and B.
     */
    private static double findMedianInternal(
            int[] A, int aStart, int aEnd, int[] B, int bStart, int bEnd) {
        if (A.length == 0) {
            return medianOfArray(B, bStart, bEnd);
        }
        if (B.length == 0) {
            return medianOfArray(A, aStart, aEnd);
        }

        int aLength = (aEnd - aStart) + 1;
        int bLength = (bEnd - bStart) + 1;
        if (aLength == 1 && bLength == 1) {
            return (A[aStart] + B[bStart]) / 2d;
        }
        if (aLength == 1 && bLength > 1) {
            return findMedianOfArrayAndValue(B, bStart, bEnd, A[aStart]);
        }
        if (bLength == 1 && aLength > 1) {
            return findMedianOfArrayAndValue(A, aStart, aEnd, B[bStart]);
        }
        
        if (aLength == 2 && bLength >= 2 && bLength % 2 == 0) {
            if (areValuesInMiddleOfEvenArray(B, bStart, bEnd, A[aStart], A[aEnd])) {
                return (A[aStart] + A[aEnd]) / 2d;
            }
        }
        if (bLength == 2 && aLength >= 2 && aLength % 2 == 0) {
            if (areValuesInMiddleOfEvenArray(A, aStart, aEnd, B[bStart], B[bEnd])) {
                return (B[bStart] + B[bEnd]) / 2d;
            }
        }

        double medianA = medianOfArray(A, aStart, aEnd);
        double medianB = medianOfArray(B, bStart, bEnd);
        if (medianA == medianB) {
            return medianA;
        }
        int maxDiscardable = Math.min((aEnd - aStart) / 2 - 1,
                                      (bEnd - bStart) / 2 - 1);
        if (maxDiscardable < 1) {
            maxDiscardable = 1;
        }
        if (medianA < medianB) {
            return findMedianInternal(A, aStart + maxDiscardable, aEnd,
                                      B, bStart, bEnd - maxDiscardable);
        }
        return findMedianInternal(A, aStart, aEnd - maxDiscardable,
                                  B, bStart + maxDiscardable, bEnd);
    }

    /**
     * Gets whether two values belong in sorted order in the middle of an array.
     * 
     * @param array The sorted array.
     * @param start The start of the array to use.
     * @param end The end of the array to use.
     * @param small The smaller number.
     * @param large The larger number.
     * @return Whether the two values belong in the middle.
     */
    private static boolean areValuesInMiddleOfEvenArray(
            int[] array, int start, int end, int small, int large) {
        int midIndex = (end + start) / 2;
        return small > array[midIndex] && large < array[midIndex + 1];
    }

    /**
     * Gets the median of a sorted array and an additional value.
     *
     * @param array The sorted array.
     * @param start The start of the array to use.
     * @param end The end of the array to use.
     * @param value The additional value to include in median calculation.
     * @return The median of the array including the additional value.
     */
    private static double findMedianOfArrayAndValue(
            int[] array, int start, int end, int value) {
        double arrayMedian = medianOfArray(array, start, end);
        if (arrayMedian == value) {
            return arrayMedian;
        }
        // If array[start..end].length is even
        if ((end - start) % 2 == 1) {
            return findMedianOfArrayAndValueEvenCase(array, start, end, arrayMedian, value);
        }
        return findMedianOfArrayAndValueOddCase(array, start, end, arrayMedian, value);
    }

    /**
     * Gets the median of a sorted array with an odd number of elements plus an
     * additional value.
     *
     * @param array The sorted array.
     * @param start The start of the array to use.
     * @param end The end of the array to use.
     * @param arrayMedian The median of the array excluding the additional
     * value.
     * @param value The additional value to include in median calculation.
     * @return The median of the array including the additional value.
     */
    private static double findMedianOfArrayAndValueOddCase(
            int[] array, int start, int end, double arrayMedian, int value) {
        int midIndex = (end + start) / 2;
        int left = array[midIndex - 1];
        int mid = array[midIndex];
        int right = array[midIndex + 1];
        if (value < left) {
            return (mid + left) / 2d;
        }
        if (value > right) {
            return (mid + right) / 2d;
        }
        return (mid + value) / 2d;
    }

    /**
     * Gets the median of a sorted array with an even number of elements plus an
     * additional value.
     *
     * @param array The sorted array.
     * @param start The start of the array to use.
     * @param end The end of the array to use.
     * @param arrayMedian The median of the array excluding the additional
     * value.
     * @param value The additional value to include in median calculation.
     * @return The median of the array including the additional value.
     */
    private static double findMedianOfArrayAndValueEvenCase(
            int[] array, int start, int end, double arrayMedian, int value) {
        if (arrayMedian > value) {
            int left = array[(end - start) / 2];
            return Math.max(left, value);
        } else {
            int right = array[(end - start) / 2 + 1];
            return Math.min(right, value);
        }
    }

    /**
     * Gets the median of a single sorted array.
     *
     * @param array The sorted array to get the median of.
     * @param start The start of the array to use.
     * @param end The end of the array to use.
     * @return The median of the array.
     */
    private static double medianOfArray(int[] array, int start, int end) {
        if (start == end) {
            return array[start];
        }
        int mid = (start + end) / 2;
        // If array[start..end].length is even
        if ((end - start) % 2 == 1) {
            return (array[mid] + array[mid + 1]) / 2d;
        }
        return array[mid];
    }
}
/**
 * Gets the median of a single sorted array.
 *
 * @param {number[]} array The sorted array to get the median of.
 * @return {number} The median of the array.
 */
function medianOfArray(array) {
  var mid = Math.floor(array.length / 2);
  if (array.length % 2 === 0) {
    return (array[mid] + array[mid - 1]) / 2;
  }
  return array[mid];
}

/**
 * Gets the median of a sorted array with an even number of elements plus an
 * additional value.
 *
 * @param {number[]} array The sorted array.
 * @param {number} arrayMedian The median of the array excluding the additional
 * value.
 * @param {number} value The additional value to include in median calculation.
 * @return {number} The median of the array including the additional value.
 */
function findMedianOfArrayAndValueEvenCase(array, arrayMedian, value) {
  if (arrayMedian > value) {
    var left = array[array.length / 2 - 1];
    return Math.max(left, value);
  } else {
    var right = array[array.length / 2];
    return Math.min(right, value);
  }
}

/**
 * Gets the median of a sorted array with an odd number of elements plus an
 * additional value.
 *
 * @param {number[]} array The sorted array.
 * @param {number} arrayMedian The median of the array excluding the additional
 * value.
 * @param {number} value The additional value to include in median calculation.
 * @return {number} The median of the array including the additional value.
 */
function findMedianOfArrayAndValueOddCase(array, arrayMedian, value) {
  var midIndex = (array.length - 1) / 2;
  var left = array[midIndex - 1];
  var mid = array[midIndex];
  var right = array[midIndex + 1];
  if (value < left) {
    return (mid + left) / 2;
  }
  if (value > right) {
    return (mid + right) / 2;
  }
  return (mid + value) / 2;
}

/**
 * Gets the median of a sorted array and an additional value.
 *
 * @param {number[]} array The sorted array.
 * @param {number} value The additional value to include in median calculation.
 * @return {number} The median of the array including the additional value.
 */
function findMedianOfArrayAndValue(array, value) {
  var arrayMedian = medianOfArray(array);
  if (arrayMedian === value) {
    return arrayMedian;
  }
  if (array.length % 2 === 0) {
    return findMedianOfArrayAndValueEvenCase(array, arrayMedian, value);
  }
  return findMedianOfArrayAndValueOddCase(array, arrayMedian, value);
}

/**
 * Gets the median of two sorted arrays.
 *
 * @param {number[]} A The first sorted array.
 * @param {number[]} B The second sorted array.
 * @returns {number} The median of A and B.
 */
function findMedian(A, B) {
  if (A.length === 0 && B.length === 0) {
    return undefined;
  }
  if (A.length === 1 && B.length === 1) {
    return (A[0] + B[0]) / 2;
  }
  if (A.length === 0) {
    return medianOfArray(B);
  }
  if (B.length === 0) {
    return medianOfArray(A);
  }
  if (A.length === 1) {
    return findMedianOfArrayAndValue(B, A[0]);
  }
  if (B.length === 1) {
    return findMedianOfArrayAndValue(A, B[0]);
  }
  if (A.length === 2 && B.length >= 2 && B.length % 2 === 0) {
    if (areValuesInMiddleOfEvenArray(B, A[0], A[1])) {
      return (A[0] + A[1]) / 2;
    }
  }
  if (B.length === 2 && A.length >= 2 && A.length % 2 === 0) {
    if (areValuesInMiddleOfEvenArray(A, B[0], B[1])) {
      return (B[0] + B[1]) / 2;
    }
  }

  var medianA = medianOfArray(A);
  var medianB = medianOfArray(B);
  if (medianA === medianB) {
    return medianA;
  }
  var maxDiscardable = Math.floor(Math.min(A.length / 2 - 1, B.length / 2 - 1));
  if (maxDiscardable < 1) {
    maxDiscardable = 1;
  }
  if (medianA < medianB) {
    A.splice(0, maxDiscardable);
    B.splice(B.length - maxDiscardable);
  } else {
    A.splice(A.length - maxDiscardable);
    B.splice(0, maxDiscardable);
  }
  return findMedian(A, B);
}

Comments

comments powered by Disqus
Like this article?
Subscribe for more!