## Card Shuffling I

Just about anyone interested in mathematics has studied a little probability and probably done some easy analysis of basic card games and dice games. Completely off topic for a second, am I the only one who has noticed that basic probability homework exercises are the only situation, aside from funerals, that anyone will ever use the word “urn?” For whatever reason, probabilists love that word. Anyway, in any real card game, the computations tend to get complicated rather quickly, and most people get turned off from the discussion. With some ingenuity, however, one can answer some pretty cool (but initially difficult seeming) questions without having to go through a lot of tedious computations.

Take as an example card shuffling. In the face of expert card-counters, the natural question for the dealer is how many times he or she has to shuffle the deck before it’s well-mixed. In the case when the dealer is also playing the game — and is a card-counter at the level of a member of the MIT black jack team, say — he or she could drastically improve their odds by using a shuffling method which seems to shuffle the deck well, but actually is very poor at doing so. Anyway, at this point the question is slightly ill-posed, as we have no obvious way to interpret the word mixed, let alone well. In fact, coming up with a mathematical model of what shuffling means is already fairly difficult. What I’m hoping to do is give a framework which makes the problem more tractable.

One can take the following abstract point of view. The fundamental object we are studying is the symmetric group $S_n$ where, more often than not, $n=52$. Each element of $S_n$ corresponds to a particular way to shuffle the deck. Alternatively, one can think of every element of $S_n$ as a particular ordering of the deck, starting from some prescribed order (i.e. however the deck was ordered when we took it out of the box). The identity element corresponds to the “no-shuffle” shuffle (alternatively, the original order). Transpositions $\left(\begin{array}{cc} i & j \\ \end{array}\right)$ correspond to interchanging the cards at position i and j, respectively, and so on. To model a collection of shuffles, one defines a probability measure on $S_n.$ For example, one could put the cards in the deck side-by-side and shuffle as follows: with your left hand pick a card uniformly at random, and with your right hand pick a card uniformly at random, then interchange the two cards. The probability measure defined by this rule is as follows:

$\displaystyle{\begin{array}{l} P(g) = 1/n\textrm{ if }g=\textrm{ identity} \\ P(g)=2/n^2\textrm{ if }g=\left(\begin{array}{cc} i & j \\ \end{array}\right) \\P(g)=0\textrm{ otherwise.}\\\end{array}}$

Once one has a probability measure $P$, one can define the transition matrix $M=(p(s,t))$ so that $p(s,t)=P(st^{-1}).$ Heuristically, the $s,t^{\textrm{th}}$ entry of $M$ corresponds to the probability of starting at ordering $t$ and ending at ordering $s$. As $M$ has all non-negative entries and the rows sum to 1, this matrix corresponds to a Markov Chain (which one might call a random walk on $S_n$). It is worth noting here that the matrix $M$ has $(n!)^2$ entries, which is something to the tune of $10^{135}$ when $n=52.$ Not much help from a computer to be found with numbers like that kicking around. The $k^{\textrm{th}}$ power of $M$, then correspond to the transition probabilities of going from state $t$ to $s$ (this, for example, proves the Chapman-Kolmogorov equations for finite state spaces). Working backwards, one can use $M^k$ to produce a probability distribution $P_k$ by $P_k(s)=(M^k)_{s,\epsilon}.$ This $P_k$ really corresponds to the $k$-fold convolution of $P$ with itself, where convolution means the usual thing, i.e.

$\displaystyle{(P*Q)(s)=\sum_{t\in S_n}P(t)Q(st^{-1})}$

Heuristically, this new measure represents the probability of starting from a standard order getting to any other order by way of $k$ shuffles.

Okay, with this framework in mind, one can now define the difference $||P-Q||$ between two probability distributions, $P$ and $Q$:

$\displaystyle{||P-Q||=\frac{1}{2}\sum_{g\in S_n}|P(g)-Q(g)|}$

This is a pretty intuitive idea for distance; aside from the factor of $1/2$, the formula is basically the $L^1$ norm. We will mostly be interested in this quantity when $P=P_k$ and $Q=U$ where $U$ is the uniform distribution, i.e. $U(g)=1/n!$. One key feature of this difference, is that it is (in a sense) submultiplicative with respect to convolution:

$\displaystyle{||(P-U)*(Q-U)||\le 2||P-U||||Q-U||}$

This is not very difficult to check. This property is important since:

$\displaystyle{\begin{array}{ll} 2||P_k-U||||P_j-U|| & \ge||(P_k-U)*(P_j-U)|| \\ &=||P_k * P_j - P_k * U - U * P_j+U||\\&=||P_{k+j}-U|| \end{array}}$

where we have made use of the fact that the uniform distribution convolved with any other distribution is the uniform distribution (this fact characterizes the uniform distribution, actually) as well as the fact that $P_k*P_j=P_{k+j}$. In particular, this means that

$\displaystyle{||P_{\alpha k}-U||\le (2||P_k-U||)^{\alpha}}$

Hence as soon as $||P_k-U||$ gets smaller than $1/2$, we have rapid (exponential) decay to the uniform distribution. There is a theorem due to Koss which holds in a more general setting where one asks a similar type of question on any compact group:

Theorem (Koss, 1959). Let $G$ be a compact group. Let $P$ be a probability on $G$ such that for some $k_0$ and $c$ with $0 < c < 1,$ and for all $k > k_0$,

$\displaystyle{P_k (A) > cU(A)}$ for all open sets $\displaystyle{A}$

Then, for all $k$,

$\displaystyle{||P_k-U||\le (1-c)^{\lfloor k/k_0 \rfloor}}.$

The additional hypothesis of the theorem says that the shuffling eventually doesn’t avoid a particular subgroup. So, in general, the plot looks something like

Taken from Persi Diaconis's book, Group Representations in Probability and Statistics, 1988

For a long time, this theorem was the end of the road. In our setting of $G=S_n$, it is extremely relieving to know that any reasonable shuffling method will eventually converge very rapidly to the uniform distribution — no reasonable shuffling method would leave particular subgroups of $S_n$ out. However, in no way is the theorem useful from a practical point of view: we still have no idea how many times we need to shuffle the deck!

The goal, then, is to compute the best $k_0$ in the statement of Koss’s theorem, so that we know precisely how many times we have to shuffle until we converge exponentially to the uniform distribution. This turns out to be a difficult problem in general, for which, unsurprisingly, no general principle seems to work. That is to say that every type of shuffling technique seems to require its own special treatment. In my next post (which should be in a couple of days), I’ll describe how to model the standard shuffle, known as the riffle shuffle, and derive some estimates on how $k_0$ depends on $n$.

### 3 Responses to “Card Shuffling I”

1. How many dovetail shuffles suffice? « Rod Carvalho's web notebook Says:

[…] Card Shuffling I […]

2. Debra Says:

Hurrah! Finally I got a blog from where I be able to really get
helpful data regarding my study and knowledge.

3. muscle car Says:

Hi, this weekend is good designed for me, because this moment i am
reading this wonderful informative piece of writing here at my house.