Interview Programming Problems Done Right

Introduction

Why 37signals Doesn't Hire Programmers Based on Brainteasers and my comment on HN generated a lot of responses, so much so that I'm writing this post to properly explain the essence of a good (IMHO) interview programming problem.

Pascal's Triangle

Pascal's Triangle is a shortcut for getting coefficients most often used binomial probability. The root element is 1. Every other element is the sum of the one or two above it (diagonally left and diagonally right).

There are several variations of the problem:

  • Print out the triangle to a specific row;
  • Return a given row of the triangle;
  • Return a given element (by row and index) of the triangle.

All of them use the same basic logic. You explain this to the interviewee and ask them to solve it on paper or on a whiteboard.

Recursive Solution

The simplest version is a recursive solution, something like:

#!/usr/bin/python
def value(row, index):
  if index < 0 or index > row:
    return 0
  if index == 0 or index == row:
    return 1
  return value(row-1, index-1) + value(row-1, index)

def row(n):
  return [value(n, x) for x in xrange(0, n+1)]

for i in xrange(10):
  print row(i)

If a candidate can produce this it is at least a working solution even though the performance (for non-trivial n) is prohibitive. Ideally they would be able to point this out (plus the exponential big-O performance).

On my Macbook Pro (2010) this runs for n=20 in about 0.8 seconds.

Memoization

Some candidates will improve this solution by identifying and caching the repeated calculations. Something like this:

import collections

values = collections.defaultdict(dict)

def value(row, index):
  result = values[row].get(index)
  if result is not None:
    return result
  if index < 0 or index > row:
    return 0
  if index == 0 or index == row:
    return 1
  result = value(row-1, index-1) + value(row-1, index)
  values[row][index] = result
  return result

Real time for n=20 is 0.03 seconds. Bonus points if the candidate correctly states this as "memorization" (rather than the more generic "caching").

Iterative Solution

A more common optimization is to use an iterative rather than recursive solution. The simplest version of this is something like:

#!/usr/bin/python
rows = [[1]]

for row in xrange(1, 20):
  values = [1]
  prev = rows[-1]
  for index in xrange(1, row):
    values.append(prev[index-1] + prev[index])
  values.append(1)
  rows.append(values)

for row in rows:
  print row

Performance is similar to the previous one (0.028 seconds). There are lots of subtle variations of this.

Dynamic Programming

The astute candidate will realize that you don't need to store the rows at all. You only ever need the current and the previous rows. This reduces space for O(n2) to O(n).

#!/usr/bin/python
prev = []
for row in xrange(20):
  curr = [1]
  for index in xrange(1, row):
    curr.append(prev[index-1] + prev[index])
  curr.append(1)
  print curr
  prev = curr

0.027 seconds.

Bonus Points

Assuming f(r, i) returns the value for row r and index i, a candidate may well point out that the triangle is symmetrical, specifically that f(r, i) == f(r, r-i), meaning you, at most, only have to calculate half of the triangle. This is particularly relevant if they are asked to return a specific value (ie f(117, 116) == f(117, 1) == 117).

The second optimization one could make is that since f(r, i) == f(r, i-1) + f(r, i) then to calculate f(r, i) you only need to calculate up to the i'th element of each row.

Why is this a good question?

As demonstrated, the code solutions are short. They're longer in, say, C, C++ or Java rather than Python but not that much longer. The point here isn't to get a perfect solution from the interviewee (meaning that deducting points for a missing colon or a typo would be silly). The purpose of the exercise is to demonstrate the they can turn a relatively simple algorithm into code. They have the thought process to do so. In doing so, their familiarity with their chosen language should be obvious.

Also, there are several degrees of solutions. Any solution trumps no solution but the quality of the solution will give you a useful signal (IMHO).

What should a coding test tell you?

A coding test like this is a negative filter. Assuming your chosen problem is sufficiently simple, if an interviewee can't turn it into at least the outline of code in a reasonable length of time then that's a red flag. If they can do it in record time it doesn't mean they're a rock star. You're just trying to weed out candidates who should've already been screened out.

What's a reasonable length of time? IMHO 5 minutes or less is fast (arguably blazing fast). Anything under 10 minutes is fine. If someone is taking more than say 15-20 minutes, that is possibly cause for concern.

Conclusion

The key qualities in a coding test are:

  1. It needs to be relatively simple. If it takes more than about 20-30 lines of Python to solve it's probably too complex;
  2. It needs to be easy to explain. If someone doesn't know what Pascal's Triangle is, that's not a problem. Explain it; and
  3. The goal is to put thought into code. That code doesn't need to be perfect. It just needs to be sufficiently expressive.

Did Michael Arrington Screw Jason Calacanis and the State of California?

This week lawyer turned blogger Michael Arrington sold the TecnCrunch network of sites to AOL for an undisclosed sum. That sum has been speculated to be between $25 and $40 million.

The timeline of events suggests a not-so-pretty picture to this saga. But first some background.

The Crunchpad

The Crunchpad was a project to create a $200 tablet started by Arrington in 2008. By June 2009 there had been several prototypes by then in partnership with Fusion Garage. In November 2009, the Crunchpad was dead as Fusion Garage had announced they were going it alone. In December 2009, filed a lawsuit against Fusion Garage from which not much was heard until a September 2010 update that included some fairly damning communications from discovery.

I, like many, felt that Arrington has been screwed on this one. It was clearly his idea and Fusion Garage had seemingly seriously misjudged both the market for their “JooJoo” and Arrington’s resolve to see justice done beyond any sense of any possible restitution.

No formal contract seems to exist between the two parties, which struck many as strange given that Arrington is a lawyer. Fusion Garage seems to have relied upon this fact too not fully understanding or appreciating that a partnership can be inferred from one’s actions.

Tech Crunch Conference

In January 2007, Arrington announced the TechCrunch 20 conference as a joint venture between himself and Jason Calacanis, founder of the Silicon Alley Reporter, Weblogs Inc and Mahalo. TC20 (later TC40 and TC50) was a conference for startups held in San Francisco annually that launched some highly successful companies including Mint.com and Yammer.

In May 2010, Arrington and Calacnis parted ways on TC50. Arrington had earlier (March) launched TechCrunch Disrupt, his own conference. Soon after the split, Calacanis went on to launch the Launch Conference.

Covering the Valley… from Seattle?!

TechCrunch is either a blog or a news organization (depending on whom you ask) that covers startups and entrepreneurs, primarily in the internet space. The Mecca for such companies is of course Silicon Valley, although there are nascent startup scenes in Boulder (eg TechStars), New York, Austin, Boston and other places.

On May 3, 2010, Arrington announced a move to Seattle:

My plan is to roughly split my time between Seattle and Silicon Valley … Seattle is sort of like the minor leagues of the startup world, … But to be honest the biggest reason I’ve moved is to simply mix things up in my life. Like many people I tend to get bored if I stay in one place too long – five years is the longest I’ve lived anywhere since high school. It was time for a change.

The Tax Man

California and Washington have one important difference: Washington has no state income tax. California’s marginal income tax rate is 10.3% (over $1 million). Why does this matter? Capital gains, such as the sale of a company, depending on the state are added to gross income.

That could be as much as $4 million!

There are however some problems. From Do you want to avoid California residency?

(1) California residents pays California tax on all their income.

(2) California generally taxes California-source income, …

So, who is a resident? In determining residency, California law provides two presumptions. The first presumption is that a taxpayer who, in the aggregate, spends more than 9 months of a taxable year in California will be presumed to be a California resident. The second presumption is that an individual whose presence in California does not exceed 6 months within a taxable year and who maintains a permanent home outside California is not considered a California resident provided the taxpayer does not engage in any activity or conduct within the State other than as a seasonal visitor, tourist, or guest.

Permanent residence in Washington? Check. Splitting time between Seattle and the Valley? Check.

But there is no hard and fast rule for what qualifies you as a California resident. It is a subjective test based on factors such as the location of your permanent residence, the location of your family, where you are registered to vote, how much time you spent in California and so on.

Now there’s nothing wrong with minimizing one’s tax liability. The late Australian media magnate Kerry Packer went so far as to say to a 1991 government inquiry:

Of course I am minimising my tax. And if anybody in this country doesn't minimise their tax, they want their heads read, because as a government, I can tell you you're not spending it that well that we should be donating extra!

The interesting part is that the move implies intent. If the California FTB (Franchise Tax Board) argues Arrington moved simply to avoid tax on a business that is clearly associated with, was created in and operates within the state of California, then tax is due. One could go so far as to argue the ethics of paying what’s owed.

But if you ignore that argument and instead argue that the May move to Seattle indicated Arrington’s intent to sell to AOL (rather than simply being a happy coincidence that he moved to an income tax free state 4 months before receiving $30 million), it puts subsequent events into a whole different context.

AOL Buys TechCrunch

On 29 September 2010, AOL Bought TechCrunch. No dollar amount for the sale has been disclosed. Rumours of $25 to $40 million abound. My personal opinion is that it was in the region of $30-35 million with a substantial ($10m+) earn out over 3+ years. The basis for this opinion is, in Arrington's own words

So we begin another journey. I fully intend to stay with AOL for a very, very long time. And the entire team has big incentives to stay on board for at least three years.

(emphasis added)

Plus earn-outs on an acquisition are common practice.

Where this gets really interesting is that according to Business Insider, AOL Tried To Buy TechCrunch Twice Before:

During the second run, AOL wanted to buy TechCrunch for something a little under $25 million. TechCrunch, in negotiations led by CEO Heather Harde, wanted closer to $30 million. AOL wouldn't go there because it thought too much of TechCrunch's revenues were from a conference business it didn't entirely own itself.

(emphasis added)

Tim Armstrong has been CEO of AOL since March 12, 2009. With TechCrunch Disrupt launched in March 2010 and Armstrong purportedly having made at least one other run at TechCrunch, it’s plausible that this move had been planned six months ago.

Jason Calacanis

Calacanis is an incendiary figure, deliberately so. As such he has many detractors. He also has a large media exit to AOL (Weblogs Inc) under his belt. When the sale announcement was imminent, Jason Calacanis Shreds Mike Arrington, Calls Him "A Trainwreck," "A Liability," And "A Sociopath":

@arrington told me he wouldn't sell @TechCrunch for <than $40M last year.TC has ~$6m in revenue/~$1.5m in profits (all TC50!)

Rather prophetic. From back in June:

The Timeline

Date Event
March 12, 2009 Tim Armstrongs becomes CEO of AOL
2009 Armstrong’s first attempt to buy TechCrunch
September 14-15, 2009 TechCrunch50 conference
March 3, 2010 Arrington announces TechCrunch Disrupt
May 3, 2010 Arrington announces move to Seattle
May 12, 2010 Arrington and Calacanis announce end to TechCrunch50 joint venture.
June 17, 2010 Calacanis first (publicly) accuses Arrington of screwing him out of a piece of the pie.
September 28, 1010 AOL acquires TechCrunch

Conclusion

Where once I felt sympathy for Arrington when he was (quite clearly, in my opinion) intentionally deceived by Fusion Garage, the pendulum has swung the other way.

It’s hard to look at that timeline and not see the intent of Arrington splitting with a partner over something that allegedly represents the majority of TechCrunch’s revenue and then going on to launch a similar conference, building on its success.

As divisive as Calacanis can be, it’s hard to argue that he wasn’t a big part of the TC50 conference at all levels: conceiving (jointly or solely) the idea for the conference, filtering and coaching startups, on-stage presentations, choosing the winners and so on.

Compare this to Chris Dixon’s story: How SiteAdvisor Went From Startup Vision To A Career-Making Exit – with Chris Dixon. There are many tales like this where the founders go above and beyond what’s required to look after those who worked so hard for them.

One could go so far as to call Arrington’s actions premeditated, even despicable.

Coin Tosses, Binomials and Dynamic Programming

Today someone asked about the probability of outcomes in relation to coin tosses on Stackoverflow. It’s an interesting question because it touches on several areas that programmers should know from maths (probability and counting) to dynamic programming.

Dynamic programming is a divide-and-conquer technique that is often overlooked by programmers. It can sometimes be hard to spot situation where it applies but when it does apply it will typically reduce an algorithm from exponential complexity (which is impractical in all but the smallest of cases) to a polynomial solution.

I will explain these concepts in one of the simplest forms: the humble coin toss.

Bernoulli Trials

A Bernoulli trial is an event (or experiment) that randomly has two outcomes. The probability of each outcome is known. The probability of success is p and the probability of failure is 1-p where 0 < p < 1. The outcome of each trial is independent of any other trial.

What constitutes success is arbitrary. For the purposes of this post, success will be defined as a coin coming up heads. To clarify the independence property, the probability of a coin coming up heads does not change regardless of any previous tosses of that (or any other) coin.

Assume a fair coin (p = 0.5). If you want to determine the probability of k successes (heads) from n trials in the general case, it is worth first recognizing that if you make n coin tosses there are 2n permutations. We are not interesting in permutations however. For n = 2, heads then tails is identical to tails then heads. If we represent 0 as tails and 1 as heads, the outcomes for the first few values of n are:

Number of Coins Number of Heads
0 1 2 3 4

1

0

1

 

 

 

2

00

01
10

11

 

 

3

000

100
010
010

011
101
110

111

 

4

0000

1000
0100
0010
0001

0011
0101
0110
1001
1010
1100

0111
1101
1011
1110

1111

If you ignore the actual values and reduce it to the number of permutations:

Number of Coins Number of Heads Total
0 1 2 3 4

1

1

1

 

 

 

2

2

1

2

1

 

 

4

3

1

3

3

1

 

8

4

1

4

6

4

1

16

Divide the number of desired outcomes by the total number of outcomes and you have your probability.

Binomial Distribution

The above number are obviously not random and correspond to the coefficients of a binomial function:

(1 + x)n

For example:

(1 + x)2 = x2 + 2x + 1 (1, 2, 1)
(1 + x)3 = x3 + 3x2 + 3x + 1 (1, 3, 3, 1)
(1 + x)4 = x4 + 4x3 + 6x2 + 4x + 1 (1, 4, 6, 4, 1)

Any of the above coefficients can be found with factorials:

f(n,k) = nCk = n! / (n-k)!k!

Binomial Probability

Since we can calculate the number of permutations for our desired outcome and we know the total outcomes are 2n then:

P(n,k) = nCk x 2-n

assuming:

P = the probability of k coins coming up heads for n coin tosses
n = number of coin tosses
k = number of desired successes

Unfair Coins

The above is accurate for fair coins but what about unfair coins? Assume that a given coin has a 60% chance of coming up heads. What does that do to our formula? Very little actually. The above formula is simply a special case of a more general formula.

P(n,k,p) = nCk x pk x (1-p)n-k

assuming:

P = the probability of k coins coming up heads for n coin tosses
n = number of coin tosses
k = number of desired successes
p = probability of a coin coming up heads

Plug in p = 0.5 and the formula reduces to the earlier version.

Pascal’s Triangle

Dealing with factorials is cumbersome and impractical for even small values of n (e.g., 1000! has 2,567 digits). Fortunately there is a much faster method of calculating coefficients known as Pascal's triangle.

If you look at the coefficients to the left, you’ll see that any coefficient is the sum of the two coefficients above it (eg 10 = 4 + 6). This leads to the following function:

C(n,k) = 1 when k = 0 or k = n
          = C(n-1,k-1) + C(n-1,k) for 1 < k < n

which could lead to the following naive implementation:

public static int coefficient(int n, int k) {
  return k == 0 || k == n ? 1 : coefficient(n - 1, k - 1) + coefficient(n - 1, k);
}

Unfortunately this simple function has terrible performance: O(2n) to be precise. The reason is simple: it does an awful lot of redundant calculations  There are two ways to solve this performance problem:

Memoization

Memoization is an optimization technique that avoids making repeated calls to the same function with the same arguments. This assumes a pure function, meaning any two calls with the same arguments will evaluate to the same result and there are no relevant side effects. One such implementation is:

private static class Pair {
  public final int n;
  public final int k;

  private Pair(int n, int k) {
    this.n = n;
    this.k = k;
  }
  
  @Override
  public int hashCode() {
    return (n + 37881) * (k + 47911);
  }
  
  @Override
  public boolean equals(Object ob) {
    if (!(ob instanceof Pair)) {
      return false;
    }
    Pair p = (Pair)ob;
    return n == p.n && k == p.k;
  }
}

private static final Map<Pair, Integer> CACHE = new HashMap<Pair, Integer>();

public static int coefficient2(int n, int k) {
  if (k == 0 || k == n) {
    return 1;
  } else if (k == 1 || k == n - 1) {
    return n;
  }
  Pair p = new Pair(n, k);
  Integer i = CACHE.get(p);
  if (i == null) {
    i = coefficient2(n - 1, k - 1) + coefficient2(n - 1, k);
    CACHE.put(p, i);
  }
  return i;
}

To compare, calculating C(34,20) took 10.7 seconds on my PC with the first method and 2.4 milliseconds with the second. C(340,200) still only takes a mere 21.1 ms. There probably isn’t enough time left in the universe for the first one to finish that.

The second method uses O(n2) space and O(n2) time.

Introducing Dynamic Programming

The key to understanding dynamic programming and applying it to a problem is identifying a recurrence relation that expresses the solution for a given value n in terms of the solution for a lower value of n. We’ve already identified this.

It is worth noting that the only thing we need to calculate the values for any row of Pascal’s triangles are the values for the previous row. Therefore, we should never need to keep more than one row in memory at a time, which reduces our desired algorithm to O(n) space.

Consider this implementation:

public static int coefficient3(int n, int k) {
  int[] work = new int[n + 1];
  for ( int i = 1; i <=n ; i++ ) {
    for ( int j = i; j >= 0; j-- ) {
      if (j == 0 || j == i) {
        work[j] = 1;
      } else if (j == 1 || j == i - 1) {
        work[j] = i;
      } else {
        work[j] += work[j - 1];
      }
    }
  }
  return work[k];
}

This algorithm simply calculates one row at a time from the first until the required row is found. Only one row (ie work) is ever kept. The inner loop satisfies the boundary conditions (arguably unnecessarily) and no recursion or more complex cache is required.

This algorithm is still O(n2) but is O(n) space and in practice will be faster than the previous version. My example of C(340,200) runs in 1.5 ms compared 21.6 ms.

Different Coins

Up until now it has been assumed that the same coin (or at least an identical coin) is used for every test. In mathematical terms, p is constant. What if it isn’t? Imagine we have two coins, the first has a 70% chance of coming up heads and the second has a 60% chance, what is the probability of getting one heads when tossing the pair of coins?

Our formula clearly breaks down because it has been assumed until now that HT and TH are interchangeably and (more importantly) equally likely. That is no longer the case.

The simplest solution traverses all permutations, finds those that satisfy the condition and works out their cumulative probability. Consider this terrible implementation:

private static final int SLICES = 10;
private static int[] COINS = {7, 6};

private static void runBruteForce(int... coins) {
  long start = System.nanoTime();
  int[] tally = new int[coins.length + 1];
  bruteForce(tally, coins, 0, 0);
  long end = System.nanoTime();
  int total = 0;
  for (int count : tally) {
    total += count;
  }
  for (int i = 0; i < tally.length; i++) {
    System.out.printf("%d : %,.4f%n", i, (double) tally[i] / total);
  }
  System.out.printf("%nBrute force ran for %d COINS in %,.3f ms%n%n",
      coins.length, (end - start) / 1000000d);
}

private static void bruteForce(int[] table, int[] coins, int index, int count) {
  if (index == coins.length) {
    table[count]++;
    return;
  }
  for (int i = 0; i < coins[index]; i++) {
    bruteForce(table, coins, index + 1, count);
  }
  for (int i = coins[index]; i < SLICES; i++) {
    bruteForce(table, coins, index + 1, count + 1);
  }
}

For simplicity, this code is assuming the probability of heads is an integer multiple of 0.1. The result:

0 : 0.4200
1 : 0.4600
2 : 0.1200

Inspection determines this result is correct. Of course, this algorithm is O(10n) time. It can be improved to avoid some redundant calls but the resulting algorithm is still exponential.

Dynamic Programming: Part Two

The key to dynamic programming is to be able to state the problem with a suitable recurrence relation that expresses the solution in terms of smaller values of the solution.

General Problem: Given C, a series of n coins p1 to pn where pi represents the probability of the i-th coin coming up heads, what is the probability of k heads coming up from tossing all the coins?

The recurrence relation required to solve this problem isn’t necessarily obvious:

P(n,k,C,i) = pi x P(n-1,k-1,C,i+1) + (1-pi) x P(n,k,C,i+1)

To put it another way, if you take a subset of the the coins C from pi to pn, the probability that there will be k heads in the remaining coins can be expressed this way: if the current coin pi is heads then the subsequent coins need only have k-1 heads. But if the current coin comes up tails (at a chance of 1-pi) then the subsequent coins must contains k heads.

You may need to think about that before it sinks in. Once it does sink in it may not be obvious that this helps solve the problem. The key point to remember is that each step of this relation expresses the solution in terms of one less coin and possibly one less value of k. That divide-and-conquer aspect is the key to dynamic programming.

Algorithm Explained

Assume three coins (0.2, 0.3, 0.4 of coming up heads). Consider the simplest case: k = 0. The probability that all three coins are tails is equal to:

P(3,0,C,1) = (1-p1) x P(3,0,C,2)
= (1-p1) x (1-p2) x P(3,0,C,3)
= (1-p1) x (1-p2) x (1-p3)

which is clearly correct. It gets slightly more complicated when k > 0. We need to remember that we now have a table for k = 0. We can construct a table for k = 1 in terms of that table and so on.

P(3,1,C,1) = p1 x P(3,0,C,2) + (1-p1) x P(3,1,C,2)
= ...

Once again, if the current coin comes up heads we need k-1 coins from the remaining coins. If it doesn’t, we need k heads from the remaining coins. the above relation is adding those two things together.

In implementation these two cases (k = 0 and k > 0) tend to be treated the same since k = 0 is a special case of k > 0 given the probability of anything where k = –1 is 0.

Expressing this as a table:

  Coins  
Number of Heads 0.200 0.300 0.400  
  0.000 0.000 0.000 0.000
0 0.336 0.420 0.600 1.000
1 0.452 0.460 0.400 0.000
2 0.188 0.120 0.000 0.000
3 0.024 0.000 0.000 0.000

Remember each cell in the table is a function of the cell to its right and the cell above that one weighted by the probability of that particular coin. The first column represents the probabilities for 0 to 3 heads from the 3 coins.

Most importantly, this algorithm is O(nk) time and O(nk) space.

Implementation

private static void runDynamic() {
  long start = System.nanoTime();
  double[] probs = dynamic(0.2, 0.3, 0.4);
  long end = System.nanoTime();
  int total = 0;
  for (int i = 0; i < probs.length; i++) {
    System.out.printf("%d : %,.4f%n", i, probs[i]);
  }
  System.out.printf("%nDynamic ran for %d coinsin %,.3f ms%n%n",
      coins.length, (end - start) / 1000000d);
}

private static double[] dynamic(double... coins) {
  double[][] table = new double[coins.length + 2][];
  for (int i = 0; i < table.length; i++) {
    table[i] = new double[coins.length + 1];
  }
  table[1][coins.length] = 1.0d; // everything else is 0.0
  for (int i = 0; i <= coins.length; i++) {
    for (int j = coins.length - 1; j >= 0; j--) {
      table[i + 1][j] = coins[j] * table[i][j + 1] +
          (1 - coins[j]) * table[i + 1][j + 1];
    }
  }
  double[] ret = new double[coins.length + 1];
  for (int i = 0; i < ret.length; i++) {
    ret[i] = table[i + 1][0];
  }
  return ret;
}

Output:

0 : 0.3360
1 : 0.4520
2 : 0.1880
3 : 0.0240

Dynamic ran for 3 coins in 0.018 ms

Conclusion

I hope this serves as a useful introduction to dynamic programming. It can be an incredibly powerful and useful technique but can require some practice to identify where and how to apply it.

The last very short code segment does an awful lot. It can handle large numbers of coins, uniform or not, and deliver results in a timely fashion.

Google Wave, Microsoft and Engineers Running the Asylum

This week marked the official death of Google Wave and Steve Ballmer described the iPad as “just another PC form factor”. These seemingly unrelated events highlight the perils of both driving and not driving a company from an engineering standpoint.

Microsoft, The Meteor is Coming

For those of us who started using computers before the internet, there is a stark difference between now and 10-20 years ago in the perception of Microsoft. In the post-IBM era, Microsoft was the 800 pound gorilla in the room that crushed everything else, even Apple, Wordperfect, Lotus and Borland.

As much as Bill Gates is derided in some quarters he was and is a programmer. It has probably been many years since he wrote a line of code but the fact remains that has has written code and understands the process. He is a technical guy. Case in point: My First BillG Review.

In those days, Microsoft was a lot less bureaucratic. Instead of the 11 or 12 layers of management they have today, I reported to Mike Conte who reported to Chris Graham who reported to Pete Higgins, who reported to Mike Maples, who reported to Bill. About 6 layers from top to bottom. We made fun of companies like General Motors with their eight layers of management or whatever it was.

... and THERE WERE NOTES IN ALL THE MARGINS. ON EVERY PAGE OF THE SPEC. HE HAD READ THE WHOLE GODDAMNED THING AND WRITTEN NOTES IN THE MARGINS.

Bill Gates was amazingly technical. He understood Variants, and COM objects, and IDispatch and why Automation is different than vtables and why this might lead to dual interfaces. He worried about date functions. He didn't meddle in software if he trusted the people who were working on it, but you couldn't bullshit him for a minute because he was a programmer. A real, actual, programmer.

Watching non-programmers trying to run software companies is like watching someone who doesn't know how to surf trying to surf.

Steve Ballmer is no Bill Gates.

Ballmer might be a whiz at making org charts, discussing corporate strategy, managing a division, making M&A deals or whatever but he’s simply not technical. He was hired as a business manager and that’s what he is.

The problem is that Microsoft, in spite of everything it’s tried, is a software product company. If you don’t understand how software is created you have, in my opinion, no business running the company. It would be like me, as a programmer, running an airline.

Ballmer isn’t unique or even in the minority here. My personal experience has been the majority of places I have worked haven’t understood software development nor been driven by engineering rather than the business side. Far more common is that programmers are treated as an unavoidable cost centre for which the perfect management metric has yet to be found.

The problem here is twofold:

  1. Microsoft customers are enterprises and PC OEMs not end users; and
  2. Singularity of purpose: Microsoft exists to sell Windows licenses.

The first point is highlighted in another of this week’s stories, Inside Microsoft's internal IE8 privacy battles. Basically, advertising interests won out over user experience.

Windows is so ridiculously convoluted because Microsoft can’t say no to companies that want ridiculous customizations. This is why you have a horrific mishmash of policies, registry settings, etc that mean it’s typically uneconomic to figure out what’s wrong (if not virtually impossible). It’s easier to simply wipe it and start again.

In Commandos, Infantry, and Police, Jeff Atwood references Robert X. Cringely’s book Accidental Empires, which can be used to categorize companies. Startups are commandos. They are aggressively seeking to gain something with nothing to lose. Successful startups become infantry since the army is now so large that it requires structure and discipline. Truly large companies become police that are only interested in maintaining the status quo.

Microsoft’s sole purpose now is to protect its cash cow of Windows and Office. It’s leadership is probably terrified that something will kill the golden goose. What you fear you create. To quote Princess Leia:

The more you tighten your grip, Tarkin, the more star systems will slip through your fingers.

So this is why Ballmer describes the iPad as just another PC form factor because Microsoft just wants to sell more Windows licenses. Ballmer lacks the technical foundation to understand exactly why he’s running a dinosaur so doesn’t comprehend that the meteor is coming.

Google, Today’s Mammals Are Tomorrow’s Dinosaurs

Google has a similar problem to Microsoft in that it has a couple of core, hugely successful products (being the combined search and advertising business) and everything else has either failed, been lacklustre or is yet to bear fruit.

Of course there are successful ancillary products like GMail, Google Maps, Google Docs and so on. Many of these are completely dominant in their space but all of them merely enhance the search and advertising business or simply don’t generate a considerable amount of revenue.

To quote Google’s Revenue Breakdown Shows It Controls Even More Advertising Than You’d Think:

Google revenues last year were $23.6 billion, 99 percent of which came from advertising, the company said. Google doesn’t break down which of its services earn what, and it still hasn’t disclosed share breakdowns from content on YouTube or mobile devices.

Now I say this not to liken Google’s non-search forays as failures (as Microsoft’s forays beyond Windows and Office are almost entirely failures). Google has a different business model. Google is all about people using the Web. The more they use the Web, the more money Google makes. All their successful non-search products share this theme: they lower the cost of or increase the amount or effectiveness of using the Web.

Google is an unusual company in many respects. Larry and Sergey completely nailed search and will go down in history as tech pioneers for doing so. They have created a company that is largely engineering-driven, which shares a couple of parallels to the Microsoft of old except that early Microsoft was all about embracing the external developer community.

That can enable the company to focus on products rather than attempting to quantify programmer performance (a far more typical scenario) but it can have a downside too, which is the whole point of this post.

Despite it’s massive engineering talent, there are areas where Google has been unable to make any headway, namely in the much-hyped social space. Google Wave’s demise foreshadows the increasingly likely (re)entry into the social space, popularly referred to as Google Me.

But the engineering way of thinking left unchecked can have some serious downsides. Consider an example.

Sudoku

We programmers think differently to “normal” people. While we share a lot in common with practitioners of other scientific and engineering disciplines I think it’s fair to say that part of the programmer psyche is still unique. Many programmers don’t realize this and it’s hard to explain to “outsiders” but I shall try with an example.

Sudoku is a number-placement puzzle where the player tries to figure out how to place numbers in a partly filled grid such that certain rules are followed regarding the allowable repetition patterns of those numbers. Puzzles vary in difficulty based on how few numbers they reveal. A given puzzle has but one solution.

Many people enjoy this as a pastime. When first faced with Sudoku I like many programmers reacted in a very different way. I learnt the rules, devised a program that solved any given puzzle, satisfied myself it was correct and then forgot about Sudoku having never done one of these puzzles by hand, which basically defeats the whole point.

Non-programmers will probably find this bizarre. Programmers will almost certainly nod understanding. The point here is that programming trains you to find general solutions. So you can solve one Sudoku puzzle but what would be the point? There are billions of others unsolved! In terms of time investment, you’re better off solving all of them at once!

Google Wave

Google Wave was presented with much fanfare and hype at Google IO 2009. Spearheaded by the quite brilliant Rasmussens (who brought us Google Maps), Wave was touted as one communication platform to rule them all.

Wave was designed to be low level such that any messaging paradigm could be implemented in terms of Wave. Wave could handle a version of email, Twitteresque micro-blogging, IM, even bug tracking and so on. All of these things can be implemented on top of Google Wave.

Or at least that was the theory.

See the pattern here? Much like the Sudoku problem, Google was seeking a general solution to a set of problems.

So while this may be a great engineering feat, it ignored a fairly basic problem: what is the use case for Google Wave? Who will use it? Why? for what?

Those questions are perhaps unfair because many startup ventures begin with an idea and very little idea of what the end will look like. Or at least that end will change several times (dare I utter the overused “pivot” buzzword?). So you can argue that you create a platform and wait for others to find a use for it.

Clearly they didn’t.

Perhaps a better approach is to ask: what is the pain point? What problem is Google Wave solving?

The programmer in me understand the desire to create one platform that can do everything. It’s an alluring siren but don’t lose sight of the rocks drawing ever closer.

To quote Joel Spolsky from How Microsoft Lost the API War:

By the way, for those of you who follow the arcane but politically-charged world of blog syndication feed formats, you can see the same thing happening over there. RSS became fragmented with several different versions, inaccurate specs and lots of political fighting, and the attempt to clean everything up by creating yet another format called Atom has resulted in several different versions of RSS plus one version of Atom, inaccurate specs and lots of political fighting. When you try to unify two opposing forces by creating a third alternative, you just end up with three opposing forces. You haven't unified anything and you haven't really fixed anything.

To paraphrase, Wave didn’t unite anything. In fact it did nothing other than create yet another way we could communicate.

As much as people complain about email, everyone understands the model. A user who gets on the internet for the first time quickly understand the concept of sending someone “electronic mail”. It parallels something with which they are familiar from real life.

Apple and the Click Wheel

Whatever the future holds for Apple, it (or perhaps more accurately, Steve Jobs) will be remembered as one of the most influential tech figures of the computing era to date. Apple popularized and revolutionized digital content distribution, has a brand synonymous with mobile digital music playback and singlehandedly restructured the telecommunications industry.

The first iPod had a speaker behind the click wheel. Its only function was to make clicking noises to give to give the user auditory feedback. Such a feature makes absolutely zero engineering sense. Can you see Microsoft doing this? I can’t. I can’t even see Google doing it.

There is no other company that completely “gets it” in terms of consumer user experience. No one is close. And before you decry Apple for its lack of Flash, the walled app garden, lack of “freedom” (whatever that means today), etc… nobody cares. That includes me.

The point here is that Apple isn’t run by engineers. Apple is completely focused on product and user experience. Anything can be sacrificed to that end. If Apple were run like Microsoft, the iPad would be running some variant of MacOS X. But it doesn’t. Why? Because it makes no sense to put a full desktop OS based on a pixel-perfect pointer interface (ie the mouse) onto a low-power touchscreen device. Apple isn’t concerned about selling OS licenses. They’re focused on the product.

Conclusion

I’m not against programmers running companies. Not at all. Google is a great example of what can be achieved when you give talented engineers the freedom to innovate. Microsoft suffers I believe because its leader has no understanding of how software developments works and how programmers think.

But both extremes can have a downside. This week it killed Google Wave. It’s food for thought as Google ramps up it’s next stab at social.

My Google Interview

In early June I was contacted by a Google recruiter to ask if I was interested in applying for an engineering role at Google. She had found me based on my Stackoverflow profile. The position she was recruiting for was in Mountain View, California.

Reasons Not to Apply

I had never considered applying to Google for several reasons.

The first is that I’m not a US citizen and I don’t have a green card. This means I would need a visa. For most people, this means the employer needs to sponsor you for a H-1B visa, which anyone familiar with them will tell you is a huge pain. Most employers simply aren’t interested or at least that’s my perception. I guess the larger employers will have the scale and in-house counsel to make this viable.

Many US employers don’t actually know that it’s far, far easier to employ Australians than any other nationality. In fact it’s almost as easy as employing Canadians. Much like the TN (NAFTA) visa, Australians have the E-3 visa but most people don’t know this and employers tend to assume you need a H-1B. US Employers take note: the process for employing Australian nationals is easy.

The second reason I never considered applying to Google is perhaps another misperception on my part. My view was that Googlers seem to fit a particular profile. That profile is of being a graduate of a top school (think Stanford, UW or MIT), typically in their mid to late 20s. That’s not to say all fit this profile but your path is certainly a lot easier if this is you. Again, I’m not claiming this is the case but it certainly was my perception.

The third reason is that Google uses C++. They also use Python, Java and JavaScript but C++ is my particular point of contention. It’s been nearly a decade since I’ve used C++ in anger. Personally I consider it a horrible language. I will go so far as to call it an abomination.

Others have argued this far better than I could, most notably Linus Torvalds, addressing it in 2007 and 2010. Suffice it to say I find C to be a far more elegant and easy-to-understand language for low-level programming.

You could argue that Java and Python can be used by many (most?) Googlers and it’s a reasonable position but one that I don’t think is correct based on my limited experience. More on this later.

Reasons to Apply

The timing of being contacted was somewhat strange and somewhat timely.

In April, a story blew up on Hacker News, proggit and elsewhere concerning an employee leaving Mahalo for Yahoo. Jason Calacanis (CEO of Mahalo) said some perhaps rash things, which the employee foolishly posted to his blog (soon thereafter taken down but you can’t put the genie back in the bottle). More here.

Jason tweeted about this. Entrepreneur-turned-VC Mark Suster chimed in, most notably with Never Hire Job Hoppers. Never. They Make Terrible Employees. This predictably caused an uproar in all the usual places.

My view is Mark makes some good points but it’s too hardline. Let me give you some examples of why.

I started out doing contract (short term) work because that’s all I could get. This all began with doing Perl CGI programming for an ISP that I was a telephone support person for and quickly transitioned into doing full-time programming (elsewhere). This was in part due to the fact that I was studying for my undergraduate degree part-time (so didn’t qualify for graduate programs), partly due to my location (most Australian companies are headquartered in Sydney in Melbourne so programming employment opportunities are disproportionately low in population terms in Perth) and partly due to timing (the early to mid-90s were a still a post-recession period).

Little did I know that this would brand me to some extent a contractor for life. In many organization contractors are viewed as some combination of second-class citizen (eg I worked at one place that didn’t give internet access to contractors), necessary evil and disloyal mercenary. Most importantly, they are expendable and first to go in tough times. This last part of it I’m fine with because I viewed it as the ultimate meritocracy: you’d stay employed as long as you were valuable.

In 2001 I moved to London, England and found this anti-contractor sentiment to be even more prevalent, which surprised me no end. So the cycle continued.

Stays at various companies varied from 3 months to 3 years (on two occasions; one ending when the company fired all contractors and half the salaried staff due to financial woes and the other ending when the company was acquired and it became clear the software’s future was limited).

But in this time I’ve been screwed over more times than I can recount.

I’ve had a client threaten to sue and terminate a contract leaving months unpaid simply as a tactic to avoid paying. I’ve been denied payments by a recruitment agent I was entitled to simply because it was too expensive and time consuming (for me) to pursue legal action. I’ve been thrown under the bus in a political move by a manager who had a project going south and was looking for a scapegoat so his boss wouldn’t fire or replace him. I’ve had someone promise to pay only later realizing they were hiding behind an offshore shelter and never had any intention of paying. The list goes on.

All in all I’ve probably lost $50,000 to $100,000 over the years from this kind of thing so when Mark (or anyone) likens this to a lack of loyalty, it’s fair to say it pisses me off.

That’s not to say I’ve never done anything on reflection I probably shouldn’t have but hey we all make mistakes. I’ve never intentionally screwed anyone over this way.

Perhaps the most erudite and eloquent take on this issue came from one of my favourite bloggers, Giles Bowkett, in Job-Hopping:

Sorry, tangent. Point is, a job-hopper is like Larry King. Larry King's about to have his 8th divorce. He has all these women asking him to marry them, he marries them, it doesn't work out, he moves on to the next one. I don't know Larry King and I wish him the best, but to me, it sure looks like every time he gets married, he's settling. If that wasn't the way it was, he would have fought to keep at least oneof those marriages intact. And that's kinda what's going on if you're a job-hopper. It means the companies want you bad, but you could care less about the companies. I hate to side with the VCs here, but if they want you more than you want them, maybe it means you need to aim higher.

I’ve spent too many years writing bullshit business software, wading through pointless process (eg spending a day in meetings to remove a comma; no I’m not kidding) and doing other brain dead nonsense. I’m tired of it. I’ve had enough. I’ve reached the point that I don’t want to do it anymore. It’s time to make a change even though I’m not exactly sure what that change is yet. It’s reached the point where I’d rather work for nothing on something remotely interesting than do this one more day.

So this had been percolating in my brain. To bring this back to Google, that’s why I say the timing was strange. The project I’d been working on (which was turning into another that was running out of money) was on hiatus and I was looking to do something different.

So I said yes I would apply.

Telephone Interviews

Let me first say it was hard to speak to them on the phone. By some quirk of geography, Mountain View, California and Perth, Western Australia and 16 hours apart (Perth is 16 hours ahead), which really limited times when they were at work and I was awake.

The recruiter told me there would be a couple of phone interviews over the next couple of weeks. I had one the next week that went through a couple of questions.

Note: for this and the on-site interviews I’m not going to reveal the exact content. For the on-sites I signed an NDA but more importantly (at least to me) I said I wouldn’t.

All I’ll say is that Google’s position seems to be that they want to assess your knowledge of computer science fundamentals and problem solving ability. If you Google “google interview questions” you’ll find the kind of problem that require some kind of recursive or other divide-and-conquer type technique. This makes sense and this theme gelled with my experience overall.

I got through two questions even though I stumbled somewhat through the second. I was trying to visualize the solution and that was the problem. As soon as I took a piece of paper and drew a diagram the solution was obvious. That’s just me: I like to whiteboard/diagram rather than trying to mentally visualize.

It took until next week to hear back. I’d been expecting a second phone screen. As it turned out they wanted to arrange a series of on-site interviews at their office in Sydney. I took this as a good sign: apparently I didn’t require a second idiot test.

Preparation

My field guide for this was Steve Yegge’s Get that job at Google. It’s good advice, not just for Google but for also being a well-rounded programmer.

Like I said, I’ve spent a lot of time writing bullshit business software. Frankly my day-to-day usage of graphs, dynamic programming and balanced trees is, well, almost nonexistent. So it was time to brush up. And brush up I did.

I did some thinking about Steve’s post being over two years old. The one thing missing from this is language theory (compilers, grammars, lexing/parsing and so on). Since Steve’s post Google has added:

  • V8: a Javascript engine;
  • Go: programming language; and
  • Unladen Swallow: an LLVM port of CPython aimed at hugely speeding it up.

So I concluded language theory is important but luckily I’d been doing that anyway as part of my (sadly stalled) Markdown project.

On-Site Interviews

A couple of weeks later I was flown to Sydney for 4 hours of back-to-back interviews. It’s been some years since I’ve been in Sydney so this was enjoyable anyway.

I am primarily a Java developer, more by circumstance than planning. I transitioned to Java in the late 90s from doing Perl, C and C++. Since then I’ve dabbled in many languages including Javascript, C#, Haskell, Ruby, Python, PHP and others.

The first problem I encountered was that at least two of my interviewers had only passing or no familiarity with Java. This made it particularly difficult as some concepts are unique to one language.

But everyone knew C++. I’ve read about this before. This combined with my own experience now leads me to believe that C++ isn’t optional for any Google applicant. Not because you need to use it to work there. I have no direct experience of this. But because of “interviewer lottery”. Some at Google (it seems) do nothing but C++. You might be interviewed by one of these people.

I had to write several code segments on a whiteboard. This I expected and was fine with. I realize this is necessary (see Why Can't Programmers.. Program? and The Non-Programming Programmer) and have no problem with it but it depends on what kind of problem you ask. Simple is usually best.

Two of the problems I had were extremely finnicky to solve. In one I think the interviewer was understanding and simply wanted to determine if I understand the relevant contract and I could see what the issues were more than coding a completely correct solution (which I appreciated). Another interview got caught short before an efficient solution could be developed and I really don’t think it’s the kind of problem that lends itself to writing a code solution in 40 minute. By this I mean I believe it would be more valuable to speak about the algorithms and problems involved as the code for an efficient solution would be quite complex.

The theme of problem solving and analyzing thought processes remained constant throughout.

For some reason I had expected there would be a lunch break (10-2). There wasn’t. By the end I was mentally exhausted, 3pm (it ran over time) and I hadn’t eaten since the day before. The next day I returned home.

Aftermath

I was told the results would be reviewed and I would hear from them in two weeks.

Two weeks rolled around. A time for a phone call was organized. That phone call was rather short: “strong but not strong enough for this particular position” was the crux of it.

I’d be lying if I said I wasn’t disappointed. I’d actually gone into the whole thing fairly neutral to begin with in that I wasn’t mad keen on working for Google but was open to the possibility. But as time went on and I spoke to a few Googlers, I got more excited about it. I started to hope it would work out.

I actually thought the on-sites went quite well and I did expect it to progress to the next level (whatever that would’ve been I’m not sure) but my hopes were dashed.

The recruiter told me no specific feedback would be given (since I asked on what the weaknesses were). The one outright bizarre thing I was told was I was “too Microsoft-centric” (direct quote). Apart from dabbling in C# I haven’t programmed for Windows since Visual Studio 6 in 2000. It made me wonder if they were looking at the right candidate. And if I’m too Microsoft-centric, what is Jon Skeet?

Feedback

I have obviously been supervised in the workplace, I’ve supervised other developers, I’ve done recruiting for companies and obviously been recruited. An important theme through all of these is the need for feedback. People need to know what they’re doing right and what they’re doing wrong.

In high school in Western Australia we have (had?) a thing called the TEE (Tertiary Entrance Exams). In year 12 (the last year of high school) you have a set of exames. Your coursework for the year and that exam result is added up with equal weighting (so 50-50), massaged through a bewildering array of scaling formulae and comes out as a score that is used for university admission. In Australia we don’t have the same system as the US of application essays, admissions committees looking at your extracurricular interests and so on. It’s all about that number.

Anyway to prepare for our TEE exams in all my subjects we went through old exam papers. For Maths I did the exam papers for the previous ten years. Our teacher would go through the problems and we could see what we did right and what we did wrong. This was incredibly valuable. Exams may test knowledge but taking exams is also a skill. If you’ve seen how to solve 50 problems and gone through the process of doing so you’re much more likely to be able to apply that knowledge to future problems. There are only so many ways you can state an integral calculus problem on an exam paper.

The Soviet Union dominated the chess world for much of the twentieth century. The Russian approach was to teach students Chess through old games.

When I got to university this all changed. You couldn’t keep the exam paper. Previous exam papers were never examined. Your exam was never given back to you so you could see what you got right and what you got wrong. The exam problems were never solved in class so you could see the right and wrong ways to go about it. This was probably so lecturers could reuse papers year after year rather than writing new papers but ultimately I considered it then (and still consider it) a failing.

Recently I did a Masters degree (in quantitative finance) and found it to be even worse: there were deliberate holes in your lecture notes about applying that knowledge to particular problems and you could bet those would be what shows up in the exam.

Why I consider this a failing is that if you’re truly meant to learn something you should concentrate on (and be told) what you don’t know, what you got wrong.

This is what I mean by feedback.

I don’t even know what position I was applying for. I don’t know what the requirements were. I don’t know what my perceived strengths and weaknesses were. So while I enjoyed much of the actual process it did require a significant time investment and the value proposition for me, as a candidate, is very low. I’ve been told that my application may be reevaluated in a year. The recruiter told me they’d like to keep me on file. I don’t know if this means anything but I’m assuming not simply because it’s the kind of “let’s still be friends” lip service companies tend to offer candidates.

One person asked if I would reapply. My inclination at this point is to say “no”. This might still be the disappointment talking (and I am disappointed) but a year is a long time. The timing worked out well this time around but the likelihood of it doing so a year from now are, well, unlikely. Taking a few days off to interview again for a low value proposition from the process may very well be hard to justify. At the risk of tooting my own horn, I tend to be free as I am now by choice rather than circumstance. It’s simply a question of finding something that doesn’t bore me shitless.

Conclusion

This has been a long post (now over 3,000 words), worthy of the best of Steve Yegge in length if not content.

My view of Google has changed. I don’t have the same view of Google being a Mecca for Ivy-shrouded twentysomethings. It does seem like they really are interested in talent in many forms.

But the process does seem to be a lottery to some extent. I don’t know what the selection criteria is but it wouldn’t surprise me if 4 (or even all 5) of your 5 interviewers need to give you a thumbs up for you to go on. This may represent the “anarchic” (even haphazard) culture of Google itself. I really don’t know.

I do get the impression that references from Googlers count for a lot.

I’ve read that Google receives some 1,500 CVs a day and they do on-site interviews for less than 1% of these so perhaps I should be flattered and should just take the experience for what it was. To some extent I do. I’m simply not sure I feel the need to repeat it next year.