Multinomial derangements

A few days ago, Amy asked me an interesting question. Say you have a deck of N cards, n1 of one type, n2 of the second type, and so on, and you shuffle them all together. Then you predict the order in which you will see all of the cards, and flip them over one-by-one. How many will you guess right, on average?

This problem turned out to be tricker than I thought it would be, and since it doesn’t show up on random internet searches, I figured it would be good to work through the problem here. It’s related to the problem of counting derangements (permutations of N elements with no fixed point), which was solved by Bernoulli in the early 1700’s; but the version with groups of identical objects turns out to be slightly messier. In the same way that multinomials generalize binomials, these “multinomial derangements” generalize ordinary derangements.

The problem as initially stated wants us to compare two shuffles of the set of elements — the shuffle which the guesser predicts, and the actual shuffle of the deck of cards — to see how many elements match. We can simplify this off the bat by applying the reverse of the guesser’s permutation to both sides; we should get the same probability of a match if the guesser always picks the identity permutation, with the cards arranged with all the type 1 cards first, then all the type 2 cards, etc. So now we simply need to ask: if we arbitrarily pick a permutation of these objects, what is the probability that M of them will be in their “correct” positions?

Counting permutations

First, of course, we know the number of permutations of $(n_1, \ldots n_k)$ objects to simply be the multinomial coefficient:

$N_p(n_1, \ldots n_k) = \binom{n_1 + \cdots + n_k}{n_1\ n_2\ \cdots n_k} = \frac{(n_1+\cdots+n_k)!}{n_1! n_2! \cdots n_k!}$

The probability of getting M correct is the number of permutations which get exactly M correct divided by this. To count the number of permutations with exactly M elements correct, we sum up over all the ways to fix M elements correctly, $m_1$ of type 1, etc.:

$N(M; n_1, \ldots n_k) = \sum_{m_i} \binom{n_1}{m_1}\cdots\binom{n_k}{m_k} d(n_1-m_1, \ldots n_k-m_k)$

where the sum goes over all $m_i$ such that $0\le m_i < n_i$ and $\sum m_i = M$ , and $d(n_1, \ldots n_k)$ is the number of derangements of elements of the given types: i.e., the number of permutations of so many elements which leave none of the elements in their original places. (This count is thus the number of “partial derangements.”)

The structure of this sum will actually be useful to us repeatedly in this calculation; for any function $f(n_1, \ldots n_k)$ which counts ways to arrange a collection of elements of identical types, we can define

$PSum(f; M; n_1, \ldots n_k) \equiv \sum_{m_i} \binom{n_1}{m_1} \cdots \binom{n_k}{m_k} f(n_1-m_1, \ldots n_k-m_k)$

which is the number of ways to arrange the collection when exactly M elements (of any types, thus the sum) are held fixed, and the remaining elements are distributed according to $f$ .

Counting derangements

Now we simply need to count the number of derangements. We do this by generalizing Bernoulli’s original proof. Let $A_i$ be the set of all permutations of our N elements which keep the i’th element fixed. (For $0\le i < N$ ) Then $A_1\cup\cdots\cup A_N$ is the set of all permutations which leave at least one element fixed; and

$d(n_1, \ldots n_k) = N_p(n_1, \ldots n_k) - \left|A_1\cup\cdots\cup A_N\right|\ .$

We can calculate the size of this set using the inclusion-exclusion principle:

$\left|A_1 \cup \cdots \cup A_N \right|=\sum_{p=1}^{N} (-1)^{p+1} \sum_{i_1\cdots i_p} \left|A_{i_1} \cap \cdots \cap A_{i_p}\right|$ ,

where the inner sum runs over $0 \le i_1 < i_2 \cdots < i_p < N$ . Now, the innermost summand doesn’t actually depend on the exact values of each $i_\alpha$ ; it only depends on the number of indices of each type. In fact, if for any particular term in the sum the indices break down into $p_1$ of type 1, etc., then $\left|A_{i_1} \cap \cdots \cap A_{i_p}\right|$ is simply the number of permutations remaining after $p_1$ elements of type 1 are fixed, $p_2$ elements of type 2, etc., which is simply $N_p(n_1-p_1, n_2-p_2, \ldots n_k-p_k)$ . Since this summand only depends on the counts $p_x$ , we can collapse the sum over every possible $i_\alpha$ into a sum over all the values of $p_1\ldots p_k$ , where $0\le p_x < n_x$ , and $\sum p_x = p$ . This is exactly the same sum structure we saw above, and so

$d(n_1, \ldots n_k) = N_p(n_1, \ldots n_k) + \sum_{p=1}^{N} (-1)^p PSum(N_p; p; n_1, \ldots n_k)\ .$

Putting it all together

We can therefore compute the probability of guessing M out of N elements correctly:

$P(M; n_1, \ldots n_k) = \frac{1}{N_p(n_1, \ldots n_k)} PSum(d; M; n_1, \ldots n_k)$

Because of the fairly lengthy sums, this is impractical to evaluate by hand for more than small collections. Tools like Mathematica make it considerably easier; since WordPress won’t let me upload a .nb file, I saved the code (with comments) as a Word file instead for those who want it: Partial Derangements.* We can test it out by asking it a simple case which we can also verify by hand, $n_1 = n_2 = 2$ . It gives

0       0.166666

1       0

2       0.666666

3       0

4       0.166666

We can easily verify this by enumerating the six possible permutations: AABB, ABAB, BAAB, ABBA, BABA, and BBAA. The first of these is the “canonical” order; it (obviously) matches all four elements in the right place. The next four match two elements; the last one, BBAA, is a total derangement of all the elements and matches zero. (Or in terms of the original question: if you guessed AABB and got each of these shuffles, you would get 4, 2, 2, 2, 2, or 0 correct, respectively)

The question I was originally asked was harder, fifteen cards, three of each of five types. Using the Mathematica code, we get:

0       0.0398401

1       0.137505

2       0.228097

3       0.24121

4       0.181806

5       0.103463

6       0.0459281

7       0.0162296

8       0.00461753

9       0.00107095

10      0.000198444

11      0.0000318432

12      3.21107 * 10^-6

13      5.35179 * 10^-7

14      0

15      5.94643 * 10^-9

Note the probability of exactly zero for getting 14 right; this makes good sense, as if you arrange fourteen of the fifteen cards correctly, it’s easy to see that the fifteenth card has no incorrect place left to go. It’s also more likely to get at least a few right by chance, which given the number of identical cards is intuitively natural as well.

* NB that I’m writing Mathematica like a computer scientist; if this looks little like any Mathematica code you’ve read, it’s because the language is actually a functional derivative of Scheme, with some very ugly syntactic sugar meant to make it look procedural. But said sugar is clunky and slow, more like syntactic strychnine, and so one ends up writing the native functional stuff. Unfortunately, functional Mathematica is somewhat of a write-only language, which I’ve tried to ameliorate with comments — but seriously, this is hard to parse, I know. This is what’s known in CS as a “dysfunctional programming language.”

Published in:

Uncategorized

on January 17, 2011 at 12:00 Comments Off
Tags: math, science

Yonatan Zunger’s Blog

From Machine Learning to the Middle East