Coin Tosses, Binomials and Dynamic Programming

Today someone asked about the probability of outcomes in relation to coin tosses on Stackoverflow. It’s an interesting question because it touches on several areas that programmers should know from maths (probability and counting) to dynamic programming.

Dynamic programming is a divide-and-conquer technique that is often overlooked by programmers. It can sometimes be hard to spot situation where it applies but when it does apply it will typically reduce an algorithm from exponential complexity (which is impractical in all but the smallest of cases) to a polynomial solution.

I will explain these concepts in one of the simplest forms: the humble coin toss.

Bernoulli Trials

A Bernoulli trial is an event (or experiment) that randomly has two outcomes. The probability of each outcome is known. The probability of success is p and the probability of failure is 1-p where 0 < p < 1. The outcome of each trial is independent of any other trial.

What constitutes success is arbitrary. For the purposes of this post, success will be defined as a coin coming up heads. To clarify the independence property, the probability of a coin coming up heads does not change regardless of any previous tosses of that (or any other) coin.

Assume a fair coin (p = 0.5). If you want to determine the probability of k successes (heads) from n trials in the general case, it is worth first recognizing that if you make n coin tosses there are 2n permutations. We are not interesting in permutations however. For n = 2, heads then tails is identical to tails then heads. If we represent 0 as tails and 1 as heads, the outcomes for the first few values of n are:

Number of Coins Number of Heads
0 1 2 3 4

























If you ignore the actual values and reduce it to the number of permutations:

Number of Coins Number of Heads Total
0 1 2 3 4





























Divide the number of desired outcomes by the total number of outcomes and you have your probability.

Binomial Distribution

The above number are obviously not random and correspond to the coefficients of a binomial function:

(1 + x)n

For example:

(1 + x)2 = x2 + 2x + 1 (1, 2, 1)
(1 + x)3 = x3 + 3x2 + 3x + 1 (1, 3, 3, 1)
(1 + x)4 = x4 + 4x3 + 6x2 + 4x + 1 (1, 4, 6, 4, 1)

Any of the above coefficients can be found with factorials:

f(n,k) = nCk = n! / (n-k)!k!

Binomial Probability

Since we can calculate the number of permutations for our desired outcome and we know the total outcomes are 2n then:

P(n,k) = nCk x 2-n


P = the probability of k coins coming up heads for n coin tosses
n = number of coin tosses
k = number of desired successes

Unfair Coins

The above is accurate for fair coins but what about unfair coins? Assume that a given coin has a 60% chance of coming up heads. What does that do to our formula? Very little actually. The above formula is simply a special case of a more general formula.

P(n,k,p) = nCk x pk x (1-p)n-k


P = the probability of k coins coming up heads for n coin tosses
n = number of coin tosses
k = number of desired successes
p = probability of a coin coming up heads

Plug in p = 0.5 and the formula reduces to the earlier version.

Pascal’s Triangle

Dealing with factorials is cumbersome and impractical for even small values of n (e.g., 1000! has 2,567 digits). Fortunately there is a much faster method of calculating coefficients known as Pascal's triangle.

If you look at the coefficients to the left, you’ll see that any coefficient is the sum of the two coefficients above it (eg 10 = 4 + 6). This leads to the following function:

C(n,k) = 1 when k = 0 or k = n
          = C(n-1,k-1) + C(n-1,k) for 1 < k < n

which could lead to the following naive implementation:

public static int coefficient(int n, int k) {
  return k == 0 || k == n ? 1 : coefficient(n - 1, k - 1) + coefficient(n - 1, k);

Unfortunately this simple function has terrible performance: O(2n) to be precise. The reason is simple: it does an awful lot of redundant calculations  There are two ways to solve this performance problem:


Memoization is an optimization technique that avoids making repeated calls to the same function with the same arguments. This assumes a pure function, meaning any two calls with the same arguments will evaluate to the same result and there are no relevant side effects. One such implementation is:

private static class Pair {
  public final int n;
  public final int k;

  private Pair(int n, int k) {
    this.n = n;
    this.k = k;
  public int hashCode() {
    return (n + 37881) * (k + 47911);
  public boolean equals(Object ob) {
    if (!(ob instanceof Pair)) {
      return false;
    Pair p = (Pair)ob;
    return n == p.n && k == p.k;

private static final Map<Pair, Integer> CACHE = new HashMap<Pair, Integer>();

public static int coefficient2(int n, int k) {
  if (k == 0 || k == n) {
    return 1;
  } else if (k == 1 || k == n - 1) {
    return n;
  Pair p = new Pair(n, k);
  Integer i = CACHE.get(p);
  if (i == null) {
    i = coefficient2(n - 1, k - 1) + coefficient2(n - 1, k);
    CACHE.put(p, i);
  return i;

To compare, calculating C(34,20) took 10.7 seconds on my PC with the first method and 2.4 milliseconds with the second. C(340,200) still only takes a mere 21.1 ms. There probably isn’t enough time left in the universe for the first one to finish that.

The second method uses O(n2) space and O(n2) time.

Introducing Dynamic Programming

The key to understanding dynamic programming and applying it to a problem is identifying a recurrence relation that expresses the solution for a given value n in terms of the solution for a lower value of n. We’ve already identified this.

It is worth noting that the only thing we need to calculate the values for any row of Pascal’s triangles are the values for the previous row. Therefore, we should never need to keep more than one row in memory at a time, which reduces our desired algorithm to O(n) space.

Consider this implementation:

public static int coefficient3(int n, int k) {
  int[] work = new int[n + 1];
  for ( int i = 1; i <=n ; i++ ) {
    for ( int j = i; j >= 0; j-- ) {
      if (j == 0 || j == i) {
        work[j] = 1;
      } else if (j == 1 || j == i - 1) {
        work[j] = i;
      } else {
        work[j] += work[j - 1];
  return work[k];

This algorithm simply calculates one row at a time from the first until the required row is found. Only one row (ie work) is ever kept. The inner loop satisfies the boundary conditions (arguably unnecessarily) and no recursion or more complex cache is required.

This algorithm is still O(n2) but is O(n) space and in practice will be faster than the previous version. My example of C(340,200) runs in 1.5 ms compared 21.6 ms.

Different Coins

Up until now it has been assumed that the same coin (or at least an identical coin) is used for every test. In mathematical terms, p is constant. What if it isn’t? Imagine we have two coins, the first has a 70% chance of coming up heads and the second has a 60% chance, what is the probability of getting one heads when tossing the pair of coins?

Our formula clearly breaks down because it has been assumed until now that HT and TH are interchangeably and (more importantly) equally likely. That is no longer the case.

The simplest solution traverses all permutations, finds those that satisfy the condition and works out their cumulative probability. Consider this terrible implementation:

private static final int SLICES = 10;
private static int[] COINS = {7, 6};

private static void runBruteForce(int... coins) {
  long start = System.nanoTime();
  int[] tally = new int[coins.length + 1];
  bruteForce(tally, coins, 0, 0);
  long end = System.nanoTime();
  int total = 0;
  for (int count : tally) {
    total += count;
  for (int i = 0; i < tally.length; i++) {
    System.out.printf("%d : %,.4f%n", i, (double) tally[i] / total);
  System.out.printf("%nBrute force ran for %d COINS in %,.3f ms%n%n",
      coins.length, (end - start) / 1000000d);

private static void bruteForce(int[] table, int[] coins, int index, int count) {
  if (index == coins.length) {
  for (int i = 0; i < coins[index]; i++) {
    bruteForce(table, coins, index + 1, count);
  for (int i = coins[index]; i < SLICES; i++) {
    bruteForce(table, coins, index + 1, count + 1);

For simplicity, this code is assuming the probability of heads is an integer multiple of 0.1. The result:

0 : 0.4200
1 : 0.4600
2 : 0.1200

Inspection determines this result is correct. Of course, this algorithm is O(10n) time. It can be improved to avoid some redundant calls but the resulting algorithm is still exponential.

Dynamic Programming: Part Two

The key to dynamic programming is to be able to state the problem with a suitable recurrence relation that expresses the solution in terms of smaller values of the solution.

General Problem: Given C, a series of n coins p1 to pn where pi represents the probability of the i-th coin coming up heads, what is the probability of k heads coming up from tossing all the coins?

The recurrence relation required to solve this problem isn’t necessarily obvious:

P(n,k,C,i) = pi x P(n-1,k-1,C,i+1) + (1-pi) x P(n,k,C,i+1)

To put it another way, if you take a subset of the the coins C from pi to pn, the probability that there will be k heads in the remaining coins can be expressed this way: if the current coin pi is heads then the subsequent coins need only have k-1 heads. But if the current coin comes up tails (at a chance of 1-pi) then the subsequent coins must contains k heads.

You may need to think about that before it sinks in. Once it does sink in it may not be obvious that this helps solve the problem. The key point to remember is that each step of this relation expresses the solution in terms of one less coin and possibly one less value of k. That divide-and-conquer aspect is the key to dynamic programming.

Algorithm Explained

Assume three coins (0.2, 0.3, 0.4 of coming up heads). Consider the simplest case: k = 0. The probability that all three coins are tails is equal to:

P(3,0,C,1) = (1-p1) x P(3,0,C,2)
= (1-p1) x (1-p2) x P(3,0,C,3)
= (1-p1) x (1-p2) x (1-p3)

which is clearly correct. It gets slightly more complicated when k > 0. We need to remember that we now have a table for k = 0. We can construct a table for k = 1 in terms of that table and so on.

P(3,1,C,1) = p1 x P(3,0,C,2) + (1-p1) x P(3,1,C,2)
= ...

Once again, if the current coin comes up heads we need k-1 coins from the remaining coins. If it doesn’t, we need k heads from the remaining coins. the above relation is adding those two things together.

In implementation these two cases (k = 0 and k > 0) tend to be treated the same since k = 0 is a special case of k > 0 given the probability of anything where k = –1 is 0.

Expressing this as a table:

Number of Heads 0.200 0.300 0.400  
  0.000 0.000 0.000 0.000
0 0.336 0.420 0.600 1.000
1 0.452 0.460 0.400 0.000
2 0.188 0.120 0.000 0.000
3 0.024 0.000 0.000 0.000

Remember each cell in the table is a function of the cell to its right and the cell above that one weighted by the probability of that particular coin. The first column represents the probabilities for 0 to 3 heads from the 3 coins.

Most importantly, this algorithm is O(nk) time and O(nk) space.


private static void runDynamic() {
  long start = System.nanoTime();
  double[] probs = dynamic(0.2, 0.3, 0.4);
  long end = System.nanoTime();
  int total = 0;
  for (int i = 0; i < probs.length; i++) {
    System.out.printf("%d : %,.4f%n", i, probs[i]);
  System.out.printf("%nDynamic ran for %d coinsin %,.3f ms%n%n",
      coins.length, (end - start) / 1000000d);

private static double[] dynamic(double... coins) {
  double[][] table = new double[coins.length + 2][];
  for (int i = 0; i < table.length; i++) {
    table[i] = new double[coins.length + 1];
  table[1][coins.length] = 1.0d; // everything else is 0.0
  for (int i = 0; i <= coins.length; i++) {
    for (int j = coins.length - 1; j >= 0; j--) {
      table[i + 1][j] = coins[j] * table[i][j + 1] +
          (1 - coins[j]) * table[i + 1][j + 1];
  double[] ret = new double[coins.length + 1];
  for (int i = 0; i < ret.length; i++) {
    ret[i] = table[i + 1][0];
  return ret;


0 : 0.3360
1 : 0.4520
2 : 0.1880
3 : 0.0240

Dynamic ran for 3 coins in 0.018 ms


I hope this serves as a useful introduction to dynamic programming. It can be an incredibly powerful and useful technique but can require some practice to identify where and how to apply it.

The last very short code segment does an awful lot. It can handle large numbers of coins, uniform or not, and deliver results in a timely fashion.

Google Wave, Microsoft and Engineers Running the Asylum

This week marked the official death of Google Wave and Steve Ballmer described the iPad as “just another PC form factor”. These seemingly unrelated events highlight the perils of both driving and not driving a company from an engineering standpoint.

Microsoft, The Meteor is Coming

For those of us who started using computers before the internet, there is a stark difference between now and 10-20 years ago in the perception of Microsoft. In the post-IBM era, Microsoft was the 800 pound gorilla in the room that crushed everything else, even Apple, Wordperfect, Lotus and Borland.

As much as Bill Gates is derided in some quarters he was and is a programmer. It has probably been many years since he wrote a line of code but the fact remains that has has written code and understands the process. He is a technical guy. Case in point: My First BillG Review.

In those days, Microsoft was a lot less bureaucratic. Instead of the 11 or 12 layers of management they have today, I reported to Mike Conte who reported to Chris Graham who reported to Pete Higgins, who reported to Mike Maples, who reported to Bill. About 6 layers from top to bottom. We made fun of companies like General Motors with their eight layers of management or whatever it was.


Bill Gates was amazingly technical. He understood Variants, and COM objects, and IDispatch and why Automation is different than vtables and why this might lead to dual interfaces. He worried about date functions. He didn't meddle in software if he trusted the people who were working on it, but you couldn't bullshit him for a minute because he was a programmer. A real, actual, programmer.

Watching non-programmers trying to run software companies is like watching someone who doesn't know how to surf trying to surf.

Steve Ballmer is no Bill Gates.

Ballmer might be a whiz at making org charts, discussing corporate strategy, managing a division, making M&A deals or whatever but he’s simply not technical. He was hired as a business manager and that’s what he is.

The problem is that Microsoft, in spite of everything it’s tried, is a software product company. If you don’t understand how software is created you have, in my opinion, no business running the company. It would be like me, as a programmer, running an airline.

Ballmer isn’t unique or even in the minority here. My personal experience has been the majority of places I have worked haven’t understood software development nor been driven by engineering rather than the business side. Far more common is that programmers are treated as an unavoidable cost centre for which the perfect management metric has yet to be found.

The problem here is twofold:

  1. Microsoft customers are enterprises and PC OEMs not end users; and
  2. Singularity of purpose: Microsoft exists to sell Windows licenses.

The first point is highlighted in another of this week’s stories, Inside Microsoft's internal IE8 privacy battles. Basically, advertising interests won out over user experience.

Windows is so ridiculously convoluted because Microsoft can’t say no to companies that want ridiculous customizations. This is why you have a horrific mishmash of policies, registry settings, etc that mean it’s typically uneconomic to figure out what’s wrong (if not virtually impossible). It’s easier to simply wipe it and start again.

In Commandos, Infantry, and Police, Jeff Atwood references Robert X. Cringely’s book Accidental Empires, which can be used to categorize companies. Startups are commandos. They are aggressively seeking to gain something with nothing to lose. Successful startups become infantry since the army is now so large that it requires structure and discipline. Truly large companies become police that are only interested in maintaining the status quo.

Microsoft’s sole purpose now is to protect its cash cow of Windows and Office. It’s leadership is probably terrified that something will kill the golden goose. What you fear you create. To quote Princess Leia:

The more you tighten your grip, Tarkin, the more star systems will slip through your fingers.

So this is why Ballmer describes the iPad as just another PC form factor because Microsoft just wants to sell more Windows licenses. Ballmer lacks the technical foundation to understand exactly why he’s running a dinosaur so doesn’t comprehend that the meteor is coming.

Google, Today’s Mammals Are Tomorrow’s Dinosaurs

Google has a similar problem to Microsoft in that it has a couple of core, hugely successful products (being the combined search and advertising business) and everything else has either failed, been lacklustre or is yet to bear fruit.

Of course there are successful ancillary products like GMail, Google Maps, Google Docs and so on. Many of these are completely dominant in their space but all of them merely enhance the search and advertising business or simply don’t generate a considerable amount of revenue.

To quote Google’s Revenue Breakdown Shows It Controls Even More Advertising Than You’d Think:

Google revenues last year were $23.6 billion, 99 percent of which came from advertising, the company said. Google doesn’t break down which of its services earn what, and it still hasn’t disclosed share breakdowns from content on YouTube or mobile devices.

Now I say this not to liken Google’s non-search forays as failures (as Microsoft’s forays beyond Windows and Office are almost entirely failures). Google has a different business model. Google is all about people using the Web. The more they use the Web, the more money Google makes. All their successful non-search products share this theme: they lower the cost of or increase the amount or effectiveness of using the Web.

Google is an unusual company in many respects. Larry and Sergey completely nailed search and will go down in history as tech pioneers for doing so. They have created a company that is largely engineering-driven, which shares a couple of parallels to the Microsoft of old except that early Microsoft was all about embracing the external developer community.

That can enable the company to focus on products rather than attempting to quantify programmer performance (a far more typical scenario) but it can have a downside too, which is the whole point of this post.

Despite it’s massive engineering talent, there are areas where Google has been unable to make any headway, namely in the much-hyped social space. Google Wave’s demise foreshadows the increasingly likely (re)entry into the social space, popularly referred to as Google Me.

But the engineering way of thinking left unchecked can have some serious downsides. Consider an example.


We programmers think differently to “normal” people. While we share a lot in common with practitioners of other scientific and engineering disciplines I think it’s fair to say that part of the programmer psyche is still unique. Many programmers don’t realize this and it’s hard to explain to “outsiders” but I shall try with an example.

Sudoku is a number-placement puzzle where the player tries to figure out how to place numbers in a partly filled grid such that certain rules are followed regarding the allowable repetition patterns of those numbers. Puzzles vary in difficulty based on how few numbers they reveal. A given puzzle has but one solution.

Many people enjoy this as a pastime. When first faced with Sudoku I like many programmers reacted in a very different way. I learnt the rules, devised a program that solved any given puzzle, satisfied myself it was correct and then forgot about Sudoku having never done one of these puzzles by hand, which basically defeats the whole point.

Non-programmers will probably find this bizarre. Programmers will almost certainly nod understanding. The point here is that programming trains you to find general solutions. So you can solve one Sudoku puzzle but what would be the point? There are billions of others unsolved! In terms of time investment, you’re better off solving all of them at once!

Google Wave

Google Wave was presented with much fanfare and hype at Google IO 2009. Spearheaded by the quite brilliant Rasmussens (who brought us Google Maps), Wave was touted as one communication platform to rule them all.

Wave was designed to be low level such that any messaging paradigm could be implemented in terms of Wave. Wave could handle a version of email, Twitteresque micro-blogging, IM, even bug tracking and so on. All of these things can be implemented on top of Google Wave.

Or at least that was the theory.

See the pattern here? Much like the Sudoku problem, Google was seeking a general solution to a set of problems.

So while this may be a great engineering feat, it ignored a fairly basic problem: what is the use case for Google Wave? Who will use it? Why? for what?

Those questions are perhaps unfair because many startup ventures begin with an idea and very little idea of what the end will look like. Or at least that end will change several times (dare I utter the overused “pivot” buzzword?). So you can argue that you create a platform and wait for others to find a use for it.

Clearly they didn’t.

Perhaps a better approach is to ask: what is the pain point? What problem is Google Wave solving?

The programmer in me understand the desire to create one platform that can do everything. It’s an alluring siren but don’t lose sight of the rocks drawing ever closer.

To quote Joel Spolsky from How Microsoft Lost the API War:

By the way, for those of you who follow the arcane but politically-charged world of blog syndication feed formats, you can see the same thing happening over there. RSS became fragmented with several different versions, inaccurate specs and lots of political fighting, and the attempt to clean everything up by creating yet another format called Atom has resulted in several different versions of RSS plus one version of Atom, inaccurate specs and lots of political fighting. When you try to unify two opposing forces by creating a third alternative, you just end up with three opposing forces. You haven't unified anything and you haven't really fixed anything.

To paraphrase, Wave didn’t unite anything. In fact it did nothing other than create yet another way we could communicate.

As much as people complain about email, everyone understands the model. A user who gets on the internet for the first time quickly understand the concept of sending someone “electronic mail”. It parallels something with which they are familiar from real life.

Apple and the Click Wheel

Whatever the future holds for Apple, it (or perhaps more accurately, Steve Jobs) will be remembered as one of the most influential tech figures of the computing era to date. Apple popularized and revolutionized digital content distribution, has a brand synonymous with mobile digital music playback and singlehandedly restructured the telecommunications industry.

The first iPod had a speaker behind the click wheel. Its only function was to make clicking noises to give to give the user auditory feedback. Such a feature makes absolutely zero engineering sense. Can you see Microsoft doing this? I can’t. I can’t even see Google doing it.

There is no other company that completely “gets it” in terms of consumer user experience. No one is close. And before you decry Apple for its lack of Flash, the walled app garden, lack of “freedom” (whatever that means today), etc… nobody cares. That includes me.

The point here is that Apple isn’t run by engineers. Apple is completely focused on product and user experience. Anything can be sacrificed to that end. If Apple were run like Microsoft, the iPad would be running some variant of MacOS X. But it doesn’t. Why? Because it makes no sense to put a full desktop OS based on a pixel-perfect pointer interface (ie the mouse) onto a low-power touchscreen device. Apple isn’t concerned about selling OS licenses. They’re focused on the product.


I’m not against programmers running companies. Not at all. Google is a great example of what can be achieved when you give talented engineers the freedom to innovate. Microsoft suffers I believe because its leader has no understanding of how software developments works and how programmers think.

But both extremes can have a downside. This week it killed Google Wave. It’s food for thought as Google ramps up it’s next stab at social.