Pressure, probability, and risk — how does this confluence of factors determine the decisions made by contestants in Deal or No Deal?
In this series of blog posts, we’re exploring the mechanics of TV game show Deal or No Deal. In part 1, we examined how The Banker determines how much to offer. This time, we’ll be turning our attention to the contestant – how do they decide when to accept and when to reject The Banker’s offer?
We’ll start by exploring the offers that real contestants accepted. From there, we’ll replicate the analyses performed by Post et al. (2008), elaborating on the reasoning behind their model and exploring possible alternatives.
Last time, we learned that The Banker will offer extreme lowballs in the early stages of the game, steadily improving their offers as the round progress. With that in mind, let’s take a look at the rounds in which contestants accept The Banker’s offer…
As expected, not a single contestant accepts an offer made in the first two rounds of the game. In fact, most hold their nerve until at least round 6 before accepting an offer, with as many as 15 rejecting all offers and taking home the value of their selected case.
So what does that mean for the value of the offers that are accepted? Let’s have another look at the comparison between The Banker’s offers and the Expected Value (average amount in the remaining cases), this time dividing the offers between rejected and accepted…
Unsurprisingly, most of the extremely poor offers are rejected, with the accepted offers tending to align more closely with the true Expected Value of the remaining cases. However, it’s also apparent that contestants are often willing to accept well below the Expected Value—on average, cheating themselves out of substantial sums—why?
In order to understand these decisions, we’ll first need to break them down to their constituent parts…
Each round, contestants are presented with two options: accept the offer (Deal!) or keep on playing (No Deal). To decide between the two, they must assign a value to each and then compare the two. This is the logic followed by Post et al. (2008), who suggest that the probability an offer will be accepted is a function of the Deal! value minus the No Deal value.
The Deal! value is straightforward – it represents the subjective value (explained further below) of the current offer.
To determine the No Deal value, Post et al. (2008) suggest that contestants mentally project themselves one round into the future and estimate what the offer in the next round might be. The No Deal value is computed as the average subjective value of those estimated next-round offers.
So, if you think the offer you receive next round will, on average, be higher than the current offer, you reject the current offer—easy, right! Well, almost…
One complication arises when we consider that the subjective value of gaining and losing money is not related linearly to the objective value of that money. There are two key ways in which subjective value and objective value differ:
To calculate this subjective value, we can turn to the prospect theory value function (Tversky and Kahneman 1992):
\[\begin{equation} \tag{1} SV(x \mid RP) = \begin{cases} -\lambda(RP - x)^\alpha & \text{if } x \leq RP \\ (x - RP)^\alpha & \text{if } x > RP \end{cases} \end{equation}\]
Typically, \(\alpha\) values lie below 1, producing decelerating functions in both gains and losses. This means that the function accurately reproduces the diminishing returns we mentioned earlier. It is also standard to find \(\lambda > 1\), producing a function that is steeper for losses than gains. This allows the function to capture the relatively larger impact of gains than losses. In this function, \(RP\) refers to the reference point, which represents the value where the psychological impact is neutral – perceived as neither a gain nor a loss.
Let’s see what this function might look like with standard values of \(\lambda = 2.25\), \(\alpha = 0.88\), and a Reference Point of \(0\):
But, you might have noticed, in the game of Deal or No Deal the contestant can only win money, never lose it—surely anything below $0 won’t be relevant, here? Not quite! Remember that subjective value is determined with respect to some Reference Point (RP). While it might seem intuitive to use a Reference Point of 0, Post et al. (2008) suggest that the reference point for a contestant in Deal or No Deal starts at the average Expected Value of all possible outcomes (i.e., the average case value: $131,477.50 in the US edition).
To give this logic a sanity check, consider this: if you were given the opportunity to participate in Deal or No Deal and ended up taking home $500, how would you feel? In this scenario, you would end the game $500 richer than before the game started, but you would probably feel disappointed with the result. This is because $500 is well below the reference point we started at – even though you won $500, it would feel as though you lost.
Post et al. (2008) argue that contestants update their Reference Point as the game progresses. They suggest that, in a given round, the Reference Point is dictated primarily by The Banker’s offer for that round: \(B(x_r)\). Also contributing to the Reference Point are the difference between the Expected Value of the remaining cases and the initial Expected Value of all cases: \(d_r^{(0)}\). Finally, the Reference Point depends on the previous occurrences in the game, operationalized as the difference between the current Expected Value and the Expected Value from two rounds ago: \(d_r^{(r-2)}\)1. Putting that all together:
\[\begin{equation} \tag{2} RP(r) = (\theta_1 + \theta_2 \cdot d_r^{(r-2)} + \theta_3 \cdot d_r^{(0)}) \cdot B(x_r) \end{equation}\]
One interesting implication of this moving reference point is that allows contestants to judge a decision not only by the options presented to them, but also by the decisions they faced en-route to the current situation…
For example, imagine two contestants in the final round. Both contestants face the exact same decision, with the $100 case and the $50,000 remaining. In the last two rounds, Contestant 1 eliminated the $1 case and the $10 case, bringing their Expected Value up substantially. Contestant 2, on the other hand, just eliminated the $25,000 and $50 cases, leaving their Expected Value pretty much the same as it was. Given these differences, Contestant 1 will have a lower reference point than Contestant 2, as the reference point gravitates towards the Expected Value of previous rounds. As a consequence, Contestant 2 is more likely to continue playing than Contestant 12, despite being faced with exactly the same decision.
So far, we have established a descriptive account of contestants’ decisions and specified a function to model the subjective value of Bank offers. The final step for putting together a model of decision-making in Deal or No Deal is to make clear our assumptions about the contestants. Earlier, I mentioned that the No Deal value requires contestants to mentally project themselves forward one round to estimate what their next round offer might be.
To do this, we must first assume that the contestants have some mental model of how the Bank offers are computed. It’s fair to assume that contestants are not naive to the increasing proportion of the Expected Value that The Banker offers each round. Taking the equation we used to estimate bank offers in part 1, we find that bank offers for the next round can be derived by:
\[\begin{equation} \tag{3} b_{r+1} = b_{r} + (1 - b_{r})\cdot\rho^{9-r} \end{equation}\]
Post et al. (2008) assumed that participants accurately approximate the true value of \(\rho\). This isn’t necessarily true, though – we’ll leave \(\rho\) as a free parameter to see if a different value fits the data better.
Finally, we have to make an assumption about how contestants project into the future – which potential outcomes do they anticipate? In the modelling account of Post et al. (2008), they assumed contestants mentally simulate all possible outcomes for the next round, estimating the bank offer and computing the subjective value for each. This seems fairly implausible given the sheer number of possible outcomes each round, as well as being computationally impractical for our modelling approach. To make this more manageable, we’ll assume instead that they only simulate 20 possible future states.
That’s a lot to keep in mind! Let’s summarize everything we’ve outlined so far…
Each round, the contestant receives an offer. They also mentally project themselves one round into the future, estimating the offers they might receive in the next round. Both the current offer and these potential future offers are transformed using a value function that adjusts for risk preferences and a dynamically updating reference point. To predict the decisions, we use the difference between the subjective value of the current offer (the Deal! value) and that of the expected next-round offer (No Deal value):
\[\begin{equation} \tag{4} \Delta = Deal - No Deal \end{equation}\]
We then feed this difference into a logit function to compute the probability (\(p\)) of accepting the current offer:
\[\begin{equation} \tag{5} p = \sigma(\frac{\Delta}{\tau}) = \frac{1}{1 + e^{-\frac{\Delta}{\tau}}}) \end{equation}\]
The temperature parameter (\(\tau\)) influences the determinism of the decision: a lower \(\tau\) makes decisions more sensitive to differences in value, emphasizing a more deterministic choice pattern. The higher the subjective value of the current offer relative to the average subjective value of estimated future offers, the more likely it is that the current offer will be accepted.
For simplicity, we’ll fit the model to the combined dataset. First, let’s have a look at the parameter estimates from our data. Since we’re using Bayesian parameter estimation, we’ll obtain a distribution of values for each parameter. We can use the median of the distribution as a point estimate and the 95% highest density interval as the range of plausible values.
Let’s briefly review the parameters. We find an \(\alpha\) around 0.65, capturing the typical diminishing returns of the value function, while the \(\lambda\) of 2 reflects the expected loss-aversion. The \(\theta_1\) value indicates that contestants initially set their reference point just below the current offer, while \(\theta_2\) and \(\theta_3\) suggest that the reference point is (partially) drawn towards the expected value of recent rounds and of the game overall. Finally, contestants estimate \(\rho\) to be higher than it truly is, meaning they expect the offers to improve more quickly than they do.
Ok, great! Those parameters values look plausible, no major red flags there. This doesn’t really help us to understand what predictions the model will make, though. To achieve that, let’s plot the offers made by The Banker one more time—this time coloured according to the model’s predictions for the likelihood that the offer will be accepted. The Predicted Decision score here reflects the probability that the contestant will accept the offer, according to our model…
You’ll notice that these predictions look pretty reasonable – not too far from the pattern of decisions we saw in the data itself!
There are a few notable deviations from the predictions; these are likely driven by individual variability in the true underlying parameters. To address this, we would ideally be able to estimate the parameters for each contestant separately. Unfortunately, though, we don’t have enough data per contestant to do so—if we wanted to be able to fine-tune our model to each individual, we would need each contestant to play several rounds of the game.
So, there we have it, a comprehensive computational account of the decisions made by contestants in Deal or No Deal. What we can take away from this is that this game is not actually about “Beating the Banker”. Instead, it’s about holding your nerve until The Banker starts offering you something close to your subjective valuation of the remaining cases. Everyone’s subjective value calculation is slightly different—meaning there isn’t really a correct or incorrect way to play. If a contestant accepts an offer that’s way below the Expected Value, they might have miscalculated the Expected Value OR they might simply be extremely loss-averse.
In part 3, we’ll be turning the game on its head, embracing our ruthless capitalist and utilizing everything we’ve learned so far to answer the question: “How can The Banker minimize contestants’ winnings?”
Using the expected value from two rounds ago here is somewhat arbitrary—this could equally be the expected value from one round ago, for example. The main idea is that the reference point is drawn towards the expected value of some previous state. However, when testing the modelling approach, I tried substituting this for either (1) the expected value from one round ago or (2) the average expected value from the last two rounds and neither of these alternatives produced a better model fit.↩︎
We can estimate this using guideline parameters of \(\alpha = .88\) and \(\lambda = 2.25\) and by assuming Contestant 1 has a RP of $15,000 while Contestant 2 has a RP of $25,000. From these parameters, we can calculate the No Deal value—remembering that the higher the No Deal value is, the more the contestant would need to be offered to take the deal. Contestant 1 has a No Deal value of $14,735, while Contestant 2 has a No Deal value of $19,211. This means that Contestant 2 needs a significantly higher offer in order to be persuaded to take the deal.↩︎