Deal or No Deal | Part 3: Becoming the Banker

Is Deal or No Deal’s Banker giving out more money than they need to?

This is the final installment in a series of blog posts exploring the mechanics of TV game show Deal or No Deal. In part 1, we examined how The Banker determines how much to offer, followed by part 2, where we explored how contestants evaluate whether to accept or reject these offers.

In this third and final chapter, we’ll be seeing whether The Banker can leverage what we’ve learned to minimize the payouts of the game.

Simulating contestants

Before we can design our own algorithm for producing bank offers, we’ll first need to replicate the performance of The Banker. To achieve this, I’ll be using the parameters we estimated for both The Banker and the contestants to simulate a new batch of contestants – you can read more about those parameters in the previous two parts.

I simulated 1000 new contestants using this method. To verify that our simulation is roughly equivalent to the real games, I took 1000 bootstrap samples of 145 contestants (the sample size of the original dataset) to find the expected mean winnings of the contestants. Below, I’ve plotted the 95% highest density interval, with lines showing the median of the simulated samples and the observed mean of our original dataset (Post et al. 2008). It looks like the true mean was in our plausible range (albeit at the lower end), which is a good indicator that our simulation is approximating the contestants’ performance.

For a secondary check, I also plotted the number of the 1000 simulated contestants that accepted The Banker’s offer each round. Similar to our dataset, acceptance rates are low in the early rounds, but peak around round 6 before steadily declining:

Becoming the Banker

So, we have a benchmark to aim for—our next goal is to design a Banker that will outperform the current one, paying out less by being more strategic in the offers it makes. First, let’s set out some ground-rules:

  1. The distribution of accepted offers in each round should stay roughly the same. Our banker still has to produce broadcastable games – if all of the round 2 offers are accepted, that won’t make for very interesting TV.

  2. The Banker still has to be broadly predictable. The modelling approach we’re using relies on the contestants to approximate the offers they will receive in the next round. We don’t want to improve performance by generating a mismatch between the contestants’ expectations and the offers—that wouldn’t be a realistic long-term strategy. Our banker will follow the same rough trajectory as the original, increasing the proportion of the Expected Value it offers each round.

Exploiting subjective utility

Let’s start with something simple – contestants are willing to accept well below the Expected Value of the remaining cases. Despite this, The Banker starts out by offering a small percentage of the Expected Value, but quickly increases to approach the full Expected Value in the final few rounds. Instead, we could consider a banker that approaches a different limit—one a bit below the full expected value.

To put it formally, we’ll test how changing \(\gamma\) in the equation below affects behavior. Here, \(\gamma\) effectively dictates the maximum proportion of the Expected Value The Banker will offer. In the standard version of the model, \(\gamma = 1\):

\[\begin{equation} \tag{1} b_r = b_{r-1} + (\gamma - b_{r-1})\cdot\rho^{9-(r-1)} \end{equation}\]

It’s important we bear in mind that our model assumes the contestant is myopic: when deciding whether to continue playing, they only consider the expected offer from the next round. However, if we put a cap on those offers, we might be inadvertently exploiting this myopia and encouraging our simulated contestants to accept low offers by reducing the expected next-round offer. To ensure this is not the case, I allowed the contestant to consider not only the next round, but also the final round (the contents of their own case) when deciding whether to continue (i.e., we compute the subjective value of the expected next-round offer AND the subjective value of the possible contents of the final case. The No Deal value is the maximum of the two).

So, how does altering \(\gamma\) affect the results? Below I’ve plotted the simulated winnings of 1000 bootstrap samples of 145 contestants each for \(\gamma = 1\) against those produced from different levels of \(\gamma\). I’ve annotated the median average winnings for each condition, which can be compared to the \(\gamma = 1\) median of $97,247:

So, it seems that, if we reduce \(\gamma\), we reduce contestants’ winnings quite substantially—as much as 33% when we reduce \(\gamma\) down to 0.7.

We still need to make sure that we’re not breaking our rules, though – these games have to produce broadcastable TV. Let’s take a look at the distribution of rounds in which these simulated games ended, again comparing to the standard \(\gamma = 1\) condition. I’ve annotated the average length of the games in each condition, which can be compared to the \(\gamma = 1\) condition average of 6.03…

From these plots we can see that the general pattern of rejecting early offers and accepting those in the middle rounds replicates through most of our variations. When we bring \(\gamma\) down to 0.6, we notice a red flag starting to emerge: many contestants show a preference for holding out to the final round, where they receive the full value of the last remaining case.

Given these results, it seems as though lowering \(\gamma\) down as far as 0.7 is potentially feasible, reducing winnings by one third while only shortening the average game by 0.31 rounds. For the rest of the analyses, we’ll stick with a slightly more conservative \(\gamma\) of 0.8.

Individualizing offers

Recall how we determine how much to offer the contestant each round. We take the proportion of the Expected Value that we offered in the previous round and we increment it by some proportion of the difference between that proportion and \(\gamma\). The speed at which the proportion offered approaches \(\gamma\) is dictated by \(\rho\). The Banker uses a \(\rho\) of around 0.787, and adds random noise to make sure the offers are variable:

\[\begin{equation} \tag{1} b_r = b_{r-1} + (\gamma - b_{r-1})\cdot\rho^{9-(r-1)} \end{equation}\]

However, as we learned in the previous posts, two contestants facing the same decision may evaluate it differently due to the way they calculate the reference point against which they compare potential outcomes.

In the early rounds, we can leverage these differences by giving better offers to the contestants with the biggest gap between their Deal! Value and the Expected Value of the remaining cases.

So, rather than varying \(\rho\) randomly for each of the 1000 simulated contestants, on each round, we:

  1. Calculated the expected Reference Point by assuming the median parameter values, then used that to calculated the expected Deal! Value (i.e., the monetary value at which we expect the contestant to have a 50/50 preference for accepting/rejecting).

  2. Calculated z-scored residuals when regressing the Expected Value on the expected Deal! Value.

  3. Transformed those z-scores into adjustment values by scaling them down, then exponentiating the result. The adjustment values were scaled in a linearly decreasing fashion, starting from .05 and finishing at 0.

  4. Multiplied the \(\rho\) of each contestant by the resultant adjustment value.

Using z-scores allows us to keep the mean \(\rho\) the same—we just employ a more targeted approach to making our offers1. Put simply, we give better early-round offers to contestants our model suggests have a larger difference between the true Expected Value of their remaining cases and their subjective minimum acceptable offer.

We can compare the performance of this updated Banker to that of the regular Banker, both using a \(\gamma\) value of 0.8:

By strategically adjusting their offers, the updated Banker produced a further reduction in average winnings of $10,000 without meaningfully reducing the average duration of the games.

Bear in mind that this is just a proof-of-concept—these parameters aren’t necessarily optimized for reducing payouts. If we wanted to find the optimized parameters for minimizing winnings, we could define guidelines around the required distribution of accepted rounds and enter weights for the various candidate variables and their interactions.

That’s a wrap

And so ends our journey through the mechanisms of Deal or No Deal.

The Banker may be a ruthless capitalist but they’re not a particularly good one, as our analyses suggest they’re overpaying the average contestant by somewhere around 25-30%. By implementing some minor changes that reduced the average game duration by as little as 0.2 rounds (~3%), our Banker cut median winnings down from $97k to just $73k.

While some pilot testing would help to confirm the feasibility of these changes and more fine-tuning could benefit the exact parameters of the new Banker’s algorithm, it seems The Banker might benefit from acknowledging the more human aspects of contestants’ decisions.

Post, Thierry, Martijn J van den Assem, Guido Baltussen, and Richard H Thaler. 2008. “Deal or No Deal? Decision Making Under Risk in a Large-Payoff Game Show.” American Economic Review 98 (March): 38–71. https://doi.org/10.1257/aer.98.1.38.

  1. For a functional algorithmic Banker, we wouldn’t be able to rely on z-scores, as we wouldn’t be running 1000 games simultaneously in the real gameshow. However, we could estimate the intercept and slope of the regression alongside the variance of the residuals at each round provided we simulated enough contestants or simulated all possible outcomes.↩︎

References