1. 程式人生 > >What’s So Hard about the Monty Hall Problem?

What’s So Hard about the Monty Hall Problem?

What’s So Hard about the Monty Hall Problem?

Every student of probability has heard of the Monty Hall problem. Notorious for its counter-intuitive solution, when it was first posed in a letter to the editor in The American Statistician (Selvin 1975: 67) and later in Parade magazine (vos Savant 1990), its solution was rejected, often vehemently, by a majority of respondents, many of whom should have known better. Even

Paul Erdős, one of the greatest mathematicians of the twentieth century and a towering figure in probability theory, was skeptical of its solution. He changed his mind only after being shown the correct answer through an empirical demonstration (Vazsonyi 2002: 5). Mathematicians are not fond of existence proofs. Although introduced as a curious puzzle or brainteaser, the problem has been more like the Riddle of the Sphinx, exposing knowledge as much as hubris in those who seek to answer it. (For good examples of hubris, in the form of severe mansplaining, see
The Time Everyone “Corrected” the World’s Smartest Woman
(Crocket 2015) and Game Show Problem (vos Savant 1991))

For those not familiar with the problem, it goes like this. Imagine you are a contestant on Let’s Make a Deal, a famous American game show hosted by Monty Hall. In the event, the host presents you with three closed doors — let’s call them A, B, and C — and informs you that behind one of the doors is a brand new car, while behind the two others are goats. You really want the car, and if you can guess the door the car is behind, you get to keep it. The host invites you to guess which door contains the car and so you guess door A. At this point, instead of letting you know if you guessed correctly, the host, who knows where the car is, opens a door that you did not

guess, door B, which reveals a goat. The host then gives you the opportunity to change your guess to the remaining door, door C. The question is, should you stay with door A or switch to door C?

Of course, there are many variant tellings of the problem — Selvin’s version involves three boxes and a set of keys, for example — but they are all essentially the same, although some are more confusing than others. One point of confusion that is hard to eliminate, pointed out by the The Angry Statistician, is the ambiguity regarding the host’s modus operandi — does he reveal a door and offer you to switch based on your choice? For example, does he open a door and invite you to switch only when you have guessed correctly the first time? Here, as with the great majority of responses, we shall assume that he does not do anything of the sort, and that his action to reveal a goat happens either each time the game is played or randomly. This is a big assumption, but it is not unreasonable. If it we do not make it, the problem becomes a trick question, as there is no way to know if the host does this or not — although, as we shall see, we can easily model this possibility.

Now, the surprising solution to the question of whether you should keep door A or switch to door C is that you should switch. If you stick with door A, the probability of winning the car is 1/3. If you switch to door C, it is 2/3. This conclusion can be demonstrated both experimentally and logically. Experimentally, it is easy to reproduce the conditions of the game show in real life and tally the results after a sufficient number of trials. (You can find a bunch of computer simulations to do this for you, in a variety of languages, here.) Logically, the solution follows from generating the sample space of the game — the list of all possible outcomes where the contestant decides to switch — and then adding up the results associated with each possibility, as in the following table (based on Selvin’s original letter):

Table based on Colvin 2014.

In this table, each row corresponds to a possible outcome of the game, each of which has the same probability, 1/9. If you add up the winning rows, you get 6/9, or 2/3; the losing rows add up to 3/9, or 1/3. (Note that the first, middle, and last rows actually define two outcomes each, but the probabilities of each of these is half of 1/3, or 1/6, so the sums are the same. But to point this out is to get ahead of ourselves.)

These results, plain as they are, come as a surprise because they conflict with the obvious answer one arrives at by applying basic principles of probability theory. In this approach, it does not matter if you switch, since both remaining doors have the same probability of hiding the car. The reasoning is as follows. In the first guess, the contestant has three doors to choose from, and so, given no other information about the situation, each door has a 1/3 chance of hiding the car. Once the host opens the second door, though, there are only two doors left, and so, given no other information about the situation, each door has a 1/2 chance of hiding the car. The probability is simply a matter of evenly dividing the probability (which must sum to 1) over the number of available doors. Yes, the probability of the remaining door is increased, but so is that of the door first guessed, and so the contestant gains nothing by switching. The problem with this answer is contained in the subtle but, as we will see, revealing assumption that after the host opens a door there is no information to tip the balance in favor of the two remaining doors. Actually, there is new information available, but it is hard to see. Why we don’t see it is what makes the problem hard.

In fairness, one can understand the resistance to accept the correct explanation as given. After all, the logical and empirical solutions are both merely inductive, not deductive, as the second approach is. They arrive at the solution through brute force, by generating a set of results and counting wins and losses, and so are unsatisfying. Such approaches also invite the nagging suspicion, unfounded as it may be, that the generative mechanisms behind them are flawed in some way, and may be producing the wrong results. Consider, for example, if we represented the sample space in the table above by breaking out the lines we collapsed — the first, middle, and last lines, where the host has a choice of doors to open. If we tally up the results with the assumption that each row in the new table has the same probability — an easy mistake to make — we get an equal tally for wins and losses.

Table with rows for unique possibilities.

Of course, to correct this error, we just need to assign probabilities to each event in the sample space, but this moves us beyond the original simplicity of the inductive approach and the clarity of its tabular demonstration. The table below begs for a more formal approach that would eliminate its redundancy and explain the values provided.

Table with rows for unique possibilities and their probabilities.

Most efforts to arrive at the correct solution through a formal, deductive approach apply Bayes’ theorem to the problem. In its canonical form, which is easily derived from the axioms of conditional probability, the theorem looks like this:

P(H|E,C) = P(H,C)P(E|H,C) / P(H,C)H: The HypothesisE: The EvidenceC: The Context

The formula is used when we want to know how probable an hypothesis is for explaining, or accounting for, the known existence of some evidence, and when we know, or can provide good estimates of, things on the right side of the formula. A common example of an hypothesis and evidence is a disease, such as small pox, and a symptom, such as red spots on the skin. A context is specified since all such probabilities take place within the frame of a subset of knowledge; no one is omniscient. In the case of our example, the context might be the nationality of the patient. However, because the context is often known and shared by all of the data employed by of the formula, it is usually dropped out, as so we get this simplified form, here given with the names traditionally ascribed to each element:

P(H|E) = P(H)P(E|H) / P(H)
P(H|E): The posterior probability of H (given E).P(H):   The prior probability of H.P(E|H): The likelihood of E (given H).P(E):   The prior probability of E.

Here it is not necessary to go into depth about Bayes’ theorem, other than to show how it can be applied to the Monty Hall problem. For a lucid and concise introduction to the subject, see James Stone’s Bayes’ Rule: A Tutorial Introduction to Bayesian Analysis (2013).

When applied to the Monty Hall problem, the Bayesian approach proceeds by plugging in the proper values to the theorem and then computing the results. In our example, the two possible locations of the car — doors A and C — are considered to be hypotheses (the only two possible), while the door opened by the host, door B, is the evidence, and the initial choice of the contestant, door A, is the context. So, if we let Car_A and Car_C stand for the hypotheses that the car is behind doors A and B respectively, and also let Host_B stand for the host’s revelation that door B contains a goat, we get the following equations to solve:

Switch: P(Car_C|Host_B) = P(Car_C)P(Host_B|Car_C) / P(Host_B)Keep:   P(Car_A|Host_B) = P(Car_A)P(Host_B|Car_A) / P(Host_B)

In plain English, the first line reads: the probability that the car is behind door C, in the case where the host opened door B (and the contestant guessed door A), is equal to the probability that the car is behind door A in any case, times the probability that the host would open door B in the case that the car is behind door C — remember, he knows where the car is — divided by the probability that the host would open door B in any case. (Here, I have used the phrase “in the case that” instead of the usual “given that,” or the more obscure “conditioned on,” to clarify the logic.) Now, since we want to compare the two hypotheses, we can apply what is called Bayes’ rule — often confused with the theorem — to get the odds ratio for the two hypotheses, eliminating the need to calculate P(Host_B):

P(Car_C)P(Host_B|Car_C) / P(Car_A)P(Host_B|Car_A)

To calculate the value of this expression, we just need to calculate the values of each probability within it. These are as follows. First, we know the prior probabilities

P(Car_A) = P(Car_C) = 1/3

since the probability that the car is behind any door, without taking into account any information provided by the host’s opening of a door, is 1/3. Since the two priors are the same, we can remove them and just worry about the ratio of likelihoods:

P(Host_B|Car_C) / P(Host_B|Car_A)

Now, to calculate these, we have to do a little close reading of our problem. In effect, these likelihoods concern the choices available to the host based on his knowledge of both the location of the car and the door guessed by the contestant. In other words, it is in the likelihoods where we will may observe the information that is conveyed by the host in making his decision. Earlier we noted that the incorrect deduction of the answer using basic probability theory is wrong because it assumes that the eventHost_B is independent of either Car_C or Car_A, i.e. it assumed that P(Host_B|Car_C) = P(Host_B|Car_A) = P(Host_B), although this specific error is cloaked in the general phrasing that there is no information available to the contestant after the host opens the door. In fact, however, the host's decision regarding which door to open is highly dependent on which door the contestant has chosen. In the case where the contestant guesses correctly — the case of Car_A — the host has two choices of door to open, and so P(Host_B|Car_A) is 1/2 (or 0.5), since he could have also opened door C. However, in the case where the contestant guessed incorrectly — the case of Car_C— the host has only one choice, and so P(Host_B|Car_C) is 1. Given this, we get the following ratios:

P(Host_B|Car_C) / P(Host_B|Car_A) = 1 / 0.5 = 2 / 1 = WINS / LOSES

Thus, we see that if the contestant switches and chooses car C, the odds of winning are 2 to 1, i.e. winning by switching has a probability of 2/3.

The advantage of the Bayesian approach is that it arrives at the correct answer by means of deduction, and is therefore more satisfying mathematically. It also has the virtue of demonstrating the utility of Bayes’ theorem, which has until relatively recently been the object of great suspicion by traditional statisticians (see McGrayne 2011). However, the problem with this approach is that, like the other methods described here, it hides the logic of the solution to the problem behind the artifacts of a computational process — and, no, the process is not the logic, not in the sense I have in mind. To use an idiom from journalism, every solution has buried the lede, the most important part of the story, which is that Monty Hall is constrained by a specific set of rules to such a degree that he can be replaced by a machine, a fact already implied by our ability to simulate the game programmatically. In each formulation of the correct solution, these rules are alluded to — as by the middle column of the table to the first solution and in the calculation of the likelihood in the Bayesian one — but they are never made explicit. This is a shame because, once this description is given, it’s not only easy to understand the solution intuitively, it is possible to see that there are other solutions as well.