Just about anyone interested in mathematics has studied a little probability and probably done some easy analysis of basic card games and dice games. Completely off topic for a second, am I the only one who has noticed that basic probability homework exercises are the only situation, aside from funerals, that anyone will ever use the word “urn?” For whatever reason, probabilists love that word. Anyway, in any real card game, the computations tend to get complicated rather quickly, and most people get turned off from the discussion. With some ingenuity, however, one can answer some pretty cool (but initially difficult seeming) questions without having to go through a lot of tedious computations.
Take as an example card shuffling. In the face of expert card-counters, the natural question for the dealer is how many times he or she has to shuffle the deck before it’s well-mixed. In the case when the dealer is also playing the game — and is a card-counter at the level of a member of the MIT black jack team, say — he or she could drastically improve their odds by using a shuffling method which seems to shuffle the deck well, but actually is very poor at doing so. Anyway, at this point the question is slightly ill-posed, as we have no obvious way to interpret the word mixed, let alone well. In fact, coming up with a mathematical model of what shuffling means is already fairly difficult. What I’m hoping to do is give a framework which makes the problem more tractable.
One can take the following abstract point of view. The fundamental object we are studying is the symmetric group where, more often than not, . Each element of corresponds to a particular way to shuffle the deck. Alternatively, one can think of every element of as a particular ordering of the deck, starting from some prescribed order (i.e. however the deck was ordered when we took it out of the box). The identity element corresponds to the “no-shuffle” shuffle (alternatively, the original order). Transpositions correspond to interchanging the cards at position i and j, respectively, and so on. To model a collection of shuffles, one defines a probability measure on For example, one could put the cards in the deck side-by-side and shuffle as follows: with your left hand pick a card uniformly at random, and with your right hand pick a card uniformly at random, then interchange the two cards. The probability measure defined by this rule is as follows:
Once one has a probability measure , one can define the transition matrix so that Heuristically, the entry of corresponds to the probability of starting at ordering and ending at ordering . As has all non-negative entries and the rows sum to 1, this matrix corresponds to a Markov Chain (which one might call a random walk on ). It is worth noting here that the matrix has entries, which is something to the tune of when Not much help from a computer to be found with numbers like that kicking around. The power of , then correspond to the transition probabilities of going from state to (this, for example, proves the Chapman-Kolmogorov equations for finite state spaces). Working backwards, one can use to produce a probability distribution by This really corresponds to the -fold convolution of with itself, where convolution means the usual thing, i.e.
Heuristically, this new measure represents the probability of starting from a standard order getting to any other order by way of shuffles.
Okay, with this framework in mind, one can now define the difference between two probability distributions, and :
This is a pretty intuitive idea for distance; aside from the factor of , the formula is basically the norm. We will mostly be interested in this quantity when and where is the uniform distribution, i.e. . One key feature of this difference, is that it is (in a sense) submultiplicative with respect to convolution:
This is not very difficult to check. This property is important since:
where we have made use of the fact that the uniform distribution convolved with any other distribution is the uniform distribution (this fact characterizes the uniform distribution, actually) as well as the fact that . In particular, this means that
Hence as soon as gets smaller than , we have rapid (exponential) decay to the uniform distribution. There is a theorem due to Koss which holds in a more general setting where one asks a similar type of question on any compact group:
Theorem (Koss, 1959). Let be a compact group. Let be a probability on such that for some and with and for all ,
for all open sets
Then, for all ,
The additional hypothesis of the theorem says that the shuffling eventually doesn’t avoid a particular subgroup. So, in general, the plot looks something like
For a long time, this theorem was the end of the road. In our setting of , it is extremely relieving to know that any reasonable shuffling method will eventually converge very rapidly to the uniform distribution — no reasonable shuffling method would leave particular subgroups of out. However, in no way is the theorem useful from a practical point of view: we still have no idea how many times we need to shuffle the deck!
The goal, then, is to compute the best in the statement of Koss’s theorem, so that we know precisely how many times we have to shuffle until we converge exponentially to the uniform distribution. This turns out to be a difficult problem in general, for which, unsurprisingly, no general principle seems to work. That is to say that every type of shuffling technique seems to require its own special treatment. In my next post (which should be in a couple of days), I’ll describe how to model the standard shuffle, known as the riffle shuffle, and derive some estimates on how depends on .