Sunday, September 13, 2009

Minimum Number of Comparisons to Find the Median

Last week in my Advanced Algorithms class, we discussed how to create a deterministic algorithm to find the kth smallest element in a list of size n. An algorithm that solves this problem is called a Selection algorithm.

The first step in the deterministic algorithm is to divide the original list into ceil(n/5) lists (each of size 5). The next step is to find the median of each 5 element list. Because 5 is a constant (with respect to n), it is acceptable to first sort each 5 element list and then extract the media (without adversely effecting the runtime).

Now, the Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition) by Kunth says that sorting 5 elements can be done in 7 comparisons. That is great, but we actually do not need to sort these 5 elements. We only need to find the median. Our class found some lecture notes from another class where the professor mentioned in passing that the median of 5 elements can be found with only 6 comparisons, but he never expained how to do this.

I was not aware of nor have I found any results of the form: Finding the median of n elements requires c comparisons. However, I was able to discover how to find the median of 5 elements by using only 6 comparisons. It was not in a paper or even some professional-looking site. It was in a forum which I am only able to view using Google's cached copy. About halfway down the page is the following figure:

This is a remarkably thorough answer. Other than two meaningless typos at the bottom (where he should have said "(C>D gives full ordering)" on the left side and should have said "(C>E gives a full ordering)" on the right side), this solution clearly shows that 6 comparisons is sufficient.

It seems unlikely that there could be a way to find the median of 5 elements with only 5 comparisons, but I have not seen any proof of such a fact. If you know a proof that 6 is indeed the minimum number of comparisons necessary to find the median of 5 elements or if you know any results for lists of larger sizes, please leave a comment.

8 comments:

  1. I found one, but there's not enough space to write it down.

    ReplyDelete
  2. Can you publish it on another site and link to it from here?

    ReplyDelete
  3. In number theory, Fermat's Last Theorem states that no three positive integers a, b, and c can satisfy the equation an + bn = cn for any integer value of n greater than two. This theorem was first conjectured by Pierre de Fermat in 1637, famously in the margin of a copy of Arithmetica where ****he claimed he had a proof that was too large to fit in the margin.****
    Pretty sure you were trolled

    ReplyDelete
  4. I know Fermat's statement and that anon1 was trolling. I was just playing along for some fun ;)

    ReplyDelete
  5. I don't mean to prey but I would prefer to see an algorithm instead of this nonconstructive proof. Moreover, it's absolutely unclear. It is not stated anywhere what these "steps" are. I don't understand it at all.

    ReplyDelete
  6. These steps are constructive. For a different reference, see this recent TCS.SE question:
    http://cstheory.stackexchange.com/questions/12257/exact-number-of-comparisons-to-compute-the-median

    ReplyDelete
  7. Looking at it from an information theory view, there's no theoretical problem with an ability to solve for the median of a list of 5 in 5 steps. There are 120 ways to organize a list of 5 unique items. If C is the median element, you need to know that A>C, B>C, C>D, and C>E. It doesn't matter whether A>B or B>A, and likewise it doesn't matter if D>E or E>D. So 4 permutations out of the 120 satisfy the condition that C is the median. Here's a visualization:

    A B
    \ /
    C
    / \
    D E

    So you need to select 1/30 of the permutations in 5 binary operations. Since 5 binary operations gives you a selection power of 2^5, or 32, this is naively possible. However, it involves using 4.91 of your 5 bits of optimization power, so if it is possible you would expect it to be very, very hard. I have not managed it, and I think (p ≈ 0.99) that it's not possible.

    Actually, I have a proof that it's not.

    There are 120 permutations.

    Compare A and B.
    A > B, there are 60 permutations (B>A is an identical case)

    Compare A and C.
    A > B and A > C, there are 20 permutations
    A > B and C > A, there are 40 permutations
    You have 4 working permutations out of 40, and only 3 binary decisions (a factor of 8) in which to narrow them down. Therefore, if you compare A and C (or, equivalently, A and B), you can't always solve the problem.

    So don't compare A and C. Compare C and D instead.
    A > B and C > D, there are 30 permutations. All other cases are identical.

    Then compare A and E.
    A > B and C > D and A > E: There are 20 permuations. You have 2 decisions left to narrow those 20 to 4, so you can't compare A and E.

    So don't compare A and E. Compare A and D instead.
    A > B and C > D and A > D: There are 25 permuations. You have 2 decisions left to narrow those 25 to 4, so you can't compare A and E.

    So don't compare A and D. Compare A and C instead.
    A > B and C > D and A > C: there are 15 permutations (the other cases are identical). You have 2 decisions to narrow that 15 to 4, which may be possible.

    Compare D and E.
    A > B and C > D and A > C and D > E: There are 4 permutations.
    A > B and C > D and A > C and E > D: There are 11 permutations. You have 1 decision to narrow that to 4, so you lose.

    So don't compare D and E. Compare C and E:
    A > B and C > D and A > C and C > E: There are 8 permutations. You have 1 decision to narrow that to 4, which may be possible.
    A > B and C > D and A > C and E > C: There are 7 permutations. You have 1 decision to narrow that to 4, which may be possible.

    We'll come back to that. Compare B and E:
    A > B and C > D and A > C and B > E: There are 6 permutations. You have 1 decision to narrow that to 4, which may be possible.
    A > B and C > D and A > C and E > B: There are 9 permutations. You have 1 decision to narrow that to 4, which is not possible.

    Compare A and E.
    A > B and C > D and A > C and A > E: There are 12 permutations. You have 1 decision to narrow that to 4, which is not possible.

    So back to comparing C and E:
    A > B and C > D and A > C and C > E:
    Compare D and E:
    A > B and C > D and A > C and C > E and D > E: 4 permuatations, but not the right four. Whoops.
    Compare B and E:
    A > B and C > D and A > C and C > E and B > E: 5 permuatations. Whoops.
    Compare A and E:
    A > B and C > D and A > C and C > E and A > E: 8 permuatations. Whoops.

    So you can't do it. You can almost do it with the A-B, C-D, A-C, C-E, D-E/A-E method, but not quite.

    ReplyDelete
  8. This TCS.SE question [1] also quotes AoCP III but says the minimum number to find the median of 5 elements is 6. I must have been looking in the wrong place or at an older version.

    [1] http://cstheory.stackexchange.com/questions/12257/exact-number-of-comparisons-to-compute-the-median

    ReplyDelete