What are the odds of flipping 13 heads in a row?

If you have a million coin flips, it’s almost certain that somewhere in those coin flips there will be 20 heads in a row. My blog post took the Times to task for printing what seemed to be an obviously bogus statement. Since a run of 20 heads is roughly a one-in-a-million occurence, a basic feel for probability should tell you that trying to do this a million times is not going to be a certainty - fairly far from it.

Naturally, I thought to back up my argument with some hard facts, and I came up with a calculation that showed that the chances of this happening were actually around 60%. Unfortunately, I made a pretty basic mistake in calculating the probability, as was demonstrated to me by correspondent Andy Langowitz. My intuition about the likelihood of success was on the money, but my estimate was a bit high.

The story of how I corrected my error in order to get the correct number, around 37.9%, is a good lesson in careful examination of probability rules, with a small detour into the land of the bignum.

Independent Probabilities

When trying to develop a formula like this one, it is often easier to work from the inverse. Instead of calculating how likely it is for 20 heads to occur at any point in the sequence of a million tosses, I thought to instead calculate the probability that it wouldn’t happen. We can then take that number and subtract it from one to get the desired result.

The chance of n heads in a row occurring is 1/2n, so the inverse probability is [2n-1]/2n. If we multiply that probability once for all 999,981 possible occurences of a streak of 20 heads, it seemed to me that I would be in business. Doing this is a simple enough calculation, and the result was the 60% figure. That figure felt like it was in the ballpark to me, and I left it that.

Mr. Langowitz, however, was smart enough to actually test the theory on a smaller set of numbers. Let’s apply this theory to find out how likely we are to throw two heads in a row in four tries.

The chance of two heads in a row is 1/4, so my formula would give a result of 1 - [3/4]3, or a result of 37/64 - a little better than 50% chance of it occuring. But probability is nothing but the art of enumerating and counting, and we can do just that to check the result. The 16 equally possible outcome of four tosses are:

HHHH HHHT HHTH HHTT
HTHH HTHT HTTH HTTT
THHH THHT THTH THTT
TTHH TTHT TTTH TTTT

Look through those outcomes and see how many have two consecutive heads. It turns out to be exactly 8, meaning that my calculation of 37/64 is just flat wrong.

Locating the Mistake

My mistake is a pretty common one among probability neophytes. I assumed that probability at each step in the sequence was identical, because the sequences were completely independent. It’s easy enough to think that if you don’t examine problem carefully.

What actually happens is that when we examine the possibility of an unsuccessful run of heads at toss i, we slightly bias the outcome at toss i+1. A demonstration will show exactly how this happens.

For the sample problem of a sequence of two heads out of four tosses, we can first examine the chance of a negative outcome starting at toss 1. There are just four possible outcomes that have two tosses starting at position 1:

HH HT TH TT

And only one of these tosses yielded two heads in a row, so the probability of not seeing two heads after two tosses is 3/4.

But now when we look at the sequence of tosses starting at position two, we have to throw out the outcomes where we had two heads at toss one - we’ve already seen two heads, so we can’t continue flipping coins in those outcomes. So our universe of possible outcomes is now a bit different:

HTH HTT
THH THT
TTH TTT

Instead of eight outcomes, we have six. And if we look at the first toss seen in position two, instead of having an even distribution of heads and tails, you can see that sample is biased: only two have a head in position two, while four have tails. So the chances of not seeing two heads starting at position two increases to 5/6. Note that this change in probability occurs because we have selected only those outcomes without a streak of two heads at position one.

Likewise, when we look at the possible outcomes for streaks starting at position three, we get a different probability again. Because we have to throw out one sequence in the previous test, the universe of possible outcomes is now limited to:

HTHH HTHT
HTTH HTTT
THTH THTT
TTHH TTHT
TTTH TTTT

So now we have just ten possible outcomes, and two of those will produce the desired outcome, meaning the probability has changed to 4/5.

So what is the probability of all three possible positions not containing a streak? That would be [3/4][5/6][4/5] which reduces nicely to 1/2, the correct answer.

Finding the General Solution

So let’s generalize the question at hand: what is the probability of seeing k consecutive heads when a fair coin is tossed n times? The previous section showed that we can work it out by hand for small numbers of tosses, but it should be clear that if are going to toss a coin a million times, the universe of possible outcomes is going to get unwieldy. We need a general description of the problem in order to solve it for any values of k and n.

For many probability problems, finding a solution is simply a way of figuring out how to count things, and coin tosses indeed appear to be just such a problem. Let’s try to see if we can count the number of times a sequence of k heads will appear at a given toss.

To start to work out the solution to the problem, I will set k to a value of three - in other words, we will be trying to see what is the probability of seeing three consecutive heads is at toss i, given that there have not been three heads at an earlier toss. To calculate the probability, we need to know two things. First, we need to know all the possible outcomes in our universe of samples at toss i. In the previous section, with a value of k=2, we saw the the number of outcomes for tosses 1, 2, 3, and 4, was 2, 4, 6, and 10.

After determining the number of outcomes, we then need to determine how many of those outcomes were successes. If we defined success as being the number of outcomes in which a k heads in a row appear at position i, the values from the previous section would have been 0, 1, 1, 2.

Counting the Successes

I’ll start with the more difficult problem: counting the number of times k heads appear at toss i. To start with, we have the degenerate cases where i is less than k. In all of those tosses, we know that the number of successful outcomes is going to be zero, because there have not been enough tosses to achieve success yet.

If we work our way forwards with the example of k=3, our first four tosses end up giving us three sets of outcomes:

H T

HH HT 
TH TT

HHH HHT
HTH HTT
THH THT
TTH TTT

HHTH HHTT 
HTHH HTHT
HTTH HTTT
THHH THHT
THTH THTT
TTHH TTHT
TTTH TTTT

Note that when we get to toss 3, there is just one successful outcome. Likewise, in toss 4, there is just one successful outcome.

It may not be immediately obvious, but we can in fact always tell how many successes we will achieve at toss i+k after we have enumerated all the possible outcomes at toss i. The number is defined as the number of outcomes at toss i that end in a tail.

The logic behind this is straightforward: in order to have a success at position i+k, we need to generate a sequence of k heads, starting at toss i+1. If we have an outcome that currently ends in a tail, it will generate 2^k outcomes in the next k tosses, and precisely one and only one of these will have k consecutive heads. None of these outcomes will result in an sequence of k heads before toss i+k, because they currently terminate in a tail, so all of the outcomes generated from that outcome at position i will be included in the outcomes seen at toss i+k.

Likewise, none of the outcomes at position i that currently end in a head are going to be able to contribute to a success at toss i+k. Any streak of k heads that follows a terminating head at toss i will result in a run of k heads before we reach toss i+k.

Looking at our outcomes for k=3, we can see that at toss 1 we have one outcome ending in a tail, so at toss 4 we will have one success. At toss 2 we have two outcomes ending in a tail, so at toss five we will have two successes. And we have the special case of toss 0 - we have one sequence starting at toss 0 that generates a sequence of k heads at toss k. Although there were no tails tossed at position 0, any sequence that starts there doesn’t have any preceding heads tosses either, so it is as if there was a single outcome at toss 0 with a value of tails.

So each toss that ends in a tail acts as the root of a successful outcome at a future position. This is good information, but in order to turn this in to a formula we need to be able to compute the number of outcomes ending in tails at toss i - we don’t want to have to enumerate all the outcomes in order to get there. I’ll refer to these special outcomes as anchors, as they form the anchor of a future outcome.

Counting the Anchors

The number of anchor outcomes at each position starts out as a nice number while i is less than or equal to k: 2^i. But after toss k, successful outcomes start being removed from the sample set and the formula no longer holds. For k=3, the anchor count starting at toss1 is: 1, 2, 4, 7, 13, 24.

It turns out that the anchor at position i does more than just generate a success at toss i+k. It is also responsible for generating new anchor outcomes at tosses i+1, i+2, …, i+k-1.

Looking at an example for k=3 should clarify this. Our lone anchor outcome at toss 1 is the sequence T. We know that this anchor will create a new successful outcome at toss 4: THHH. But it also creates new anchors at all intermediate tosses: TT,

HH HT TH TT
0, and
HH HT TH TT
1.

This observation holds true for the generation of all new anchors, and with a little work we can turn this into a usable recurrence. If each anchor at toss i is going to create a new anchor at tosses i+1 through i+k-1, we can calculate the number of anchors at toss i using this formula:

Chủ Đề