A few days ago, Amy asked me an interesting question. Say you have a deck of N cards, n1 of one type, n2 of the second type, and so on, and you shuffle them all together. Then you predict the order in which you will see all of the cards, and flip them over one-by-one. How many will you guess right, on average?
This problem turned out to be tricker than I thought it would be, and since it doesn’t show up on random internet searches, I figured it would be good to work through the problem here. It’s related to the problem of counting derangements (permutations of N elements with no fixed point), which was solved by Bernoulli in the early 1700’s; but the version with groups of identical objects turns out to be slightly messier. In the same way that multinomials generalize binomials, these “multinomial derangements” generalize ordinary derangements.
The problem as initially stated wants us to compare two shuffles of the set of elements — the shuffle which the guesser predicts, and the actual shuffle of the deck of cards — to see how many elements match. We can simplify this off the bat by applying the reverse of the guesser’s permutation to both sides; we should get the same probability of a match if the guesser always picks the identity permutation, with the cards arranged with all the type 1 cards first, then all the type 2 cards, etc. So now we simply need to ask: if we arbitrarily pick a permutation of these objects, what is the probability that M of them will be in their “correct” positions?
First, of course, we know the number of permutations of objects to simply be the multinomial coefficient:
The probability of getting M correct is the number of permutations which get exactly M correct divided by this. To count the number of permutations with exactly M elements correct, we sum up over all the ways to fix M elements correctly, of type 1, etc.:
where the sum goes over all such that and , and is the number of derangements of elements of the given types: i.e., the number of permutations of so many elements which leave none of the elements in their original places. (This count is thus the number of “partial derangements.”)
The structure of this sum will actually be useful to us repeatedly in this calculation; for any function which counts ways to arrange a collection of elements of identical types, we can define
which is the number of ways to arrange the collection when exactly M elements (of any types, thus the sum) are held fixed, and the remaining elements are distributed according to .
Now we simply need to count the number of derangements. We do this by generalizing Bernoulli’s original proof. Let be the set of all permutations of our N elements which keep the i’th element fixed. (For ) Then is the set of all permutations which leave at least one element fixed; and
We can calculate the size of this set using the inclusion-exclusion principle:
where the inner sum runs over . Now, the innermost summand doesn’t actually depend on the exact values of each ; it only depends on the number of indices of each type. In fact, if for any particular term in the sum the indices break down into of type 1, etc., then is simply the number of permutations remaining after elements of type 1 are fixed, elements of type 2, etc., which is simply . Since this summand only depends on the counts , we can collapse the sum over every possible into a sum over all the values of , where , and . This is exactly the same sum structure we saw above, and so
Putting it all together
We can therefore compute the probability of guessing M out of N elements correctly:
Because of the fairly lengthy sums, this is impractical to evaluate by hand for more than small collections. Tools like Mathematica make it considerably easier; since WordPress won’t let me upload a .nb file, I saved the code (with comments) as a Word file instead for those who want it: Partial Derangements.* We can test it out by asking it a simple case which we can also verify by hand, . It gives
We can easily verify this by enumerating the six possible permutations: AABB, ABAB, BAAB, ABBA, BABA, and BBAA. The first of these is the “canonical” order; it (obviously) matches all four elements in the right place. The next four match two elements; the last one, BBAA, is a total derangement of all the elements and matches zero. (Or in terms of the original question: if you guessed AABB and got each of these shuffles, you would get 4, 2, 2, 2, 2, or 0 correct, respectively)
The question I was originally asked was harder, fifteen cards, three of each of five types. Using the Mathematica code, we get:
12 3.21107 * 10^-6
13 5.35179 * 10^-7
15 5.94643 * 10^-9
Note the probability of exactly zero for getting 14 right; this makes good sense, as if you arrange fourteen of the fifteen cards correctly, it’s easy to see that the fifteenth card has no incorrect place left to go. It’s also more likely to get at least a few right by chance, which given the number of identical cards is intuitively natural as well.
* NB that I’m writing Mathematica like a computer scientist; if this looks little like any Mathematica code you’ve read, it’s because the language is actually a functional derivative of Scheme, with some very ugly syntactic sugar meant to make it look procedural. But said sugar is clunky and slow, more like syntactic strychnine, and so one ends up writing the native functional stuff. Unfortunately, functional Mathematica is somewhat of a write-only language, which I’ve tried to ameliorate with comments — but seriously, this is hard to parse, I know. This is what’s known in CS as a “dysfunctional programming language.”