Plain English Explanation of Big O Notation

I recently read A Beginners’ Guide to Big O Notation and while I appreciate such efforts I don’t think it went far enough. I’m a huge fan of “plain English” explanations to, well, anything. Just look at the formal definition of Big O. The only people who can understand that already know what it means (and probably have a higher degree in mathematics and/or computer science).

On StackOverflow you often get comments like “you should do X because it’s O(2n) and Y is O(3n)”. Such statements originate from a basic misunderstanding of what Big O is and how to apply it. The material in this post is basically a rehash and expansion of what I've previously written on the subject.

What is Big O?

Big O notation seeks to describe the relative complexity of an algorithm by reducing the growth rate to the key factors when the key factor tends towards infinity. For this reason, you will often hear the phrase asymptotic complexity. In doing so, all other factors are ignored. It is a  relative representation of complexity.

What Isn’t Big O?

Big O isn’t a performance test of an algorithm. It is also notional or abstract in that it tends to ignore other factors. Sorting algorithm complexity is typically reduced to the number of elements being sorted as being the key factor. This is fine but it doesn’t take into account issues such as:

  • Memory Usage: one algorithm might use much more memory than another. Depending on the situation this could be anything from completely irrelevant to critical;
  • Cost of Comparison: It may be that comparing elements is really expensive, which will potentially change any real-world comparison between algorithms;
  • Cost of Moving Elements: copying elements is typically cheap but this isn’t necessarily the case;
  • etc.

Arithmetic

The best example of Big-O I can think of is doing arithmetic. Take two numbers (123456 and 789012). The basic arithmetic operations we learnt in school were:

  • addition;
  • subtraction;
  • multiplication; and
  • division.

Each of these is an operation or a problem. A method of solving these is called an algorithm.

Addition is the simplest. You line the numbers up (to the right) and add the digits in a column writing the last number of that addition in the result. The 'tens' part of that number is carried over to the next column.

Let's assume that the addition of these numbers is the most expensive operation in this algorithm. It stands to reason that to add these two numbers together we have to add together 6 digits (and possibly carry a 7th). If we add two 100 digit numbers together we have to do 100 additions. If we add two 10,000 digit numbers we have to do 10,000 additions.

See the pattern? The complexity (being the number of operations) is directly proportional to the number of digits. We call this O(n) or linear complexity. Some argue that this is in fact O(log n) or logarithmic complexity. Why? Because adding 10,000,000 to itself takes twice as long as adding 1,000 to itself as there are 8 digits instead of 4. But 10,000,000 is 10,000 times as large so depending on your application it may be appropriate to define the problem in terms of number of digits (ie order of magnitude) of the input. In others, the number itself may be appropriate.

Subtraction is similar (except you may need to borrow instead of carry).

Multiplication is different. You line the numbers up, take the first digit in the bottom number and multiply it in turn against each digit in the top number and so on through each digit. So to multiply our two 6 digit numbers we must do 36 multiplications. We may need to do as many as 10 or 11 column adds to get the end result too.

If we have 2 100 digit numbers we need to do 10,000 multiplications and 200 adds. For two one million digit numbers we need to do one trillion (1012) multiplications and two million adds.

As the algorithm scales with n-squared, this is O(n2) or quadratic complexity. This is a good time to introduce another important concept:

We only care about the most significant portion of complexity.

The astute may have realized that we could express the number of operations as: n2 + 2n. But as you saw from our example with two numbers of a million digits apiece, the second term (2n) becomes insignificant (accounting for 0.00002% of the total operations by that stage).

The Telephone Book

The next best example I can think of is the telephone book, normally called the White Pages or similar but it'll vary from country to country. But I'm talking about the one that lists people by surname and then initials or first name, possibly address and then telephone numbers.

Now if you were instructing a computer to look up the phone number for "John Smith", what would you do? Ignoring the fact that you could guess how far in the S's started (let's assume you can't), what would you do?

A typical implementation might be to open up to the middle, take the 500,000th and compare it to "Smith". If it happens to be "Smith, John", we just got real lucky. Far more likely is that "John Smith" will be before or after that name. If it's after we then divide the last half of the phone book in half and repeat. If it's before then we divide the first half of the phone book in half and repeat. And so on.

This is called a bisection search and is used every day in programming whether you realize it or not.

So if you want to find a name in a phone book of a million names you can actually find any name by doing this at most 21 or so times (I might be off by 1). In comparing search algorithms we decide that this comparison is our 'n'.

For a phone book of 3 names it takes 2 comparisons (at most).
For 7 it takes at most 3.
For 15 it takes 4.
...
For 1,000,000 it takes 21 or so.

That is staggeringly good isn't it?

In Big-O terms this is O(log n) or logarithmic complexity. Now the logarithm in question could be ln (base e), log10, log2 or some other base. It doesn't matter it's still O(log n) just like O(2n2) and O(100n2) are still both O(n2).

It's worthwhile at this point to explain that Big O can be used to determine three cases with an algorithm:

  • Best Case: In the telephone book search, the best case is that we find the name in one comparison. This is O(1) or constant complexity;
  • Expected Case: As discussed above this is O(log n); and
  • Worst Case: This is also O(log n).

Normally we don't care about the best case. We're interested in the expected and worst case. Sometimes one or the other of these will be more important.

Back to the telephone book.

What if you have a phone number and want to find a name? The police have a reverse phone book but such lookups are denied to the general public. Or are they? Technically you can reverse lookup a number in an ordinary phone book. How?

You start at the first name and compare the number. If it's a match, great, if not, you move on to the next. You have to do it this way because the phone book is unordered (by phone number anyway).

So to find a name:

  • Best Case: O(1);
  • Expected Case: O(n) (for 500,000); and
  • Worst Case: O(n) (for 1,000,000).

The Travelling Salesman

This is quite a famous problem in computer science and deserves a mention. In this problem you have N towns. Each of those towns is linked to 1 or more other towns by a road of a certain distance. The Travelling Salesman problem is to find the shortest tour that visits every town.

Sounds simple? Think again.

If you have 3 towns A, B and C with roads between all pairs then you could go:

A -> B -> C 
A -> C -> B 
B -> C -> A 
B -> A -> C 
C -> A -> B 
C -> B -> A

Well actually there's less than that because some of these are equivalent (A -> B -> C and C -> B -> A are equivalent, for example, because they use the same roads, just in reverse).

In actuality there are 3 possibilities.

Take this to 4 towns and you have (iirc) 12 possibilities. With 5 it's 60. 6 becomes 360.

This is a function of a mathematical operation called a factorial. Basically:

5! = 5 * 4 * 3 * 2 * 1 - 120
6! = 6 * 5 * 4 * 3 * 2 * 1 = 720
7! = 7 * 6 * 5 * 4 * 3 * 2 * 1 = 5040
...
25! = 25 * 24 * ... * 2 * 1 = 15,511,210,043,330,985,984,000,000
...
50! = 50 * 49 * ... * 2 * 1 = 3.04140932... × 10^64

So far, the only way known of solving the Travelling Salesman problem is by brute force. Unfortunately, such a technique has O(n!) complexity to solve.

By the time you get to 200 towns there isn't enough time left in the universe to solve the problem with traditional computers.

Something to think about.

Polynomial Time

Another point I wanted to make quick mention of is that any algorithm that has a complexity of O(nk) for any constant k is said to have polynomial complexity or is solvable in polynomial time.

Traditional computers can solve problems in polynomial time. Certain things are used in the world because of this. Public Key Cryptography is a prime example. It is computationally hard to find two prime factors of a very large number. If it wasn't, we couldn't use the public key systems we use.

Big Greek Letters

Big O is often misused. Big O or Big Oh is actually short for Big Omicron. It represents the upper bound of asymptotic complexity. So if an algorithm is O(n log n) there exists a constant c such that the upper bound is cn log n.

Θ(n log n) (Big Theta) is more tightly bound than that. Such an algorithm means there exists two constants c1 and c2 such that c1n log n < f(n) < c2n log n.

Ω(n log n) (Big Omega) says that the algorithm has a lower bound of cn log n.

There are others but these are the most common and Big O is the most common of all. Such a distinction is typically unimportant but it is worth noting. The correct notation is the correct notation, after all.

Determinism

Algorithms can also be classified as being either deterministic or probabilistic.It’s important to understand the difference. Sometimes requirements or constraints may dictate the choice of one over the other even if the expected case is worse. You should be able to classify an algorithm as one or the other.

A good example of this is comparing files. Say you have two files A and B and you want to determine if they are the same. The simple implementation for this is:

  1. If the sizes are different, the files are different; else
  2. Compare each file byte-for-byte. If two different bytes are found, the files are different; else
  3. The files are the same.

This is a deterministic algorithm because the probability of a false positive (the algorithm saying the files are the same when they aren’t) and a false negative (saying they are different when they aren’t) is 0 in both cases.

For various reasons however it might be impractical or undesirable to implement the algorithm this way. Many file comparisons may be required making the operation potentially very expensive on large files. Also the files might be remote to each other and it might be impractical to send a complete copy just so the remote system can see if its changed.

A more common approach is to use a hash function. A hash function basically just converts a large piece of data into a smaller piece of data (called a hash), usually a 32-128 bit integer. A good hash function will distribute values in the new (smaller) data range as evenly as possible.

A common hash function is an MD5 hash, which generates a 128-bit hash. Let’s say files A and B were on different servers. One could send an MD5 hash of the file to the other, which could compare it to its own MD5 hash. If they’re different, the files are different. If they’re the same, the files are highly likely to be the same.

An MD5 hash comparison is a probabilistic comparison algorithm for this reason.

And before you say that the chance is so remote it’ll never happen, think again. A malicious exploit has been demonstrated of generating two files with the same MD5 hash.

Algorithms such as this that only have brute force approaches age relatively quickly. Where once MD5 was considered safe, creating two messages with the same MD5 hash is now feasible (in a matter of days with not unreasonable hardware) such that the more secure SHA-1 algorithm has largely replaced it’s usage.

Conclusion

Anyway, that's it for my (hopefully plain English) explanation of Big O. I intend to follow this up with applications to some common scenarios in weeks to come.

24 comments:

Anonymous said...

Thanks William! This is by far the best explanation of Big O in plain english.

Trey said...

You wrote "highly unlikely" when you meant "highly likely".

Anonymous said...

>> See the pattern? The complexity (being the number of operations) is directly proportional to the number of digits. We call this O(n) or linear complexity.

If an algorithm is directly proportional to *the number of digits* in the problem size, then that algorithm has logarithmic complexity.

An O(n) algorithm would be something like a linear search through the phone book, or a (very) naive primality test, but not simple addition.

Anonymous said...

regarding the MD5 hash at the end of your post, you say:

" If they’re different, the files are different. If they’re the same, the files are highly unlikely to be the same."

Your saying that if the Hashes are the same then then they are *unlikely* to be the same as the full resolution file that generated the hash?

I can't quite grock that, could you have another go at explaining it, please?

If the hashes match and that means that they are unlikely to be the same as the originals, and if the hashes don't match you're saying that they are not the same too; so whats the point of hashes if they can't distinguish between same/non-same files?

I'm sure I'm missing something, its a great post thank you for writing it.

Anonymous said...

"Public Key Cryptography is a prime example."
pun intended? ;)

Anonymous said...

@Anonymous commenter #2

>> If an algorithm is directly proportional to *the number of digits* in the problem size, then that algorithm has logarithmic complexity.

I am pretty sure the author was correct when he said that his human addition algorithm is linear complexity and that you are wrong when you say it is logarithmic. Logarithmic would mean having more digits in the numbers you are adding would yeild smaller changes in the number of operations required. How can you add two numbers without adding each digit? And why would the positive slope of the number of additions you need to make fall as the two numbers got bigger? Maybe you are confused about the details of the addition algorithm.

Anonymous said...

Also, one more explanation. In the function y = Ax , where A is a constant, y and x are "directly proportional". In the function y = log x , their relationship is logarithmic. So saying that an algorithm which is directly proprtional has logarithmic complexity is false.

Anonymous said...

>> If an algorithm is directly proportional to *the number of digits* in the problem size, then that algorithm has logarithmic complexity.

Put more accurately, the algorithm has logarithmic complexity relative to the size of the number. You are correct in this.

Unfortunately, computer scientists (largely) don't care about complexity relative to the size of the number. When dealing with numbers as input to a problem (as in the case of addition), computer scientists tend to care about complexity relative to the size of the *encoding* of the number (that is, its length when you write it down).

So: In a sense you're both right - Big Oh can be used to measure the complexity both ways. Normally, though, we talk about the one used in the article.

Xerxes said...

Excellent article. Thanks for taking the time :)

Dan said...

@Anonymous #3 -

See Trey's comment: "unlikely" in that section should read "likely". If you have the hashes of two files and the hashes are different, then the files are different. If the hashes are the same, then they're very likely the same - although they could be be different (by the Pigeonhole Principle) since people have been able to generate collisions (which is the link provided).

>> “you should do X because it’s O(2n) and Y is O(3n)”

This always pisses me off, since it's people who should know better (since they can presumably determine that an algorithm is O(2n)). Big-O, as you say, deals with the algorithm when the factor in question goes to infinity. In this case, the "2" vs. "3" isn't going to be an issue. Big-O is meant to compare the order of an algorithm's performance, as opposed to the details.

The rule of thumb, when I was taught Big-O, was to use 1000000000 for n - at which point it becomes apparent what the dominant factor is. Big-O isn't meant for comparing 2000000000 [O(2n)] to 3000000000 [O(3n)] - it's meant for comparing 2000000000 [O(2n)] to 1000000000000000000 [O(n^2)]. The other one we were taught is that O(n) is really O(kn), where k is a constant that we supress since it doesn't matter when making comparisons of this nature.

Jass said...

A very good explanation of Big O in plain english!

P.S: Been following your posts from a couple months now... "I'm lovin it" ;)

arachnode.net said...

Great article! Could you go into more detail about calculating the 'log'...

Anonymous said...

That is an excellent article! I wish I'd read it yesterday, might not have screwed up my job interview this afternoon so badly :(

Thanks for writing it.

NonComplacentLabRat said...

I'm a self taught programmer who's still having some trouble with what O(log n) means. I understand logaritms (e.g. log base 2 1024 = 10), but what does the n mean in O(log n)? And what does the O mean. What is the base? 2? I think if the base is 2 then O means "the number of times you need to raise n to the power of 2 to get O?".

But I also think I'm totally wrong and I'm thinking too logically. Does O(log n) represent a concept as opposed to a formula? For instance does O(log n) mean that the worst case scenario is n times? and O(log n squared) mean the worst case scenario is n squared?

Please help!

William Shields said...

Searching for a name in the telephone book (example above) is a classic O(log n) algorithm.

1 name = 1 comparison
2-3 names = 2 comparisons
4-7 names = 3 comparisons
8-15 names = 4 comparisons
16-31 names = 5 comparisons
...
1,000,000 names = 21 comparisons

Logarithms are just the inverse of exponents ("to the power of").

2^20 = 1,048,576
log2 1,048,576 = 20

or

loga(a^n) = n

n in any O(...) expression is the meaningful quantifier and varies depending on the algorithm. For sorting it is typically the number of elements. For multiplication it is the number of digits.

Base isn't specified with O(log n). Whether it's log2 n, log10 n or log100 n is irrelevant just like it's irrelevant if its 2n, 10n or 100n. Mathematically there is a linear relationship between loga n and logb n.

Big O generally refers to expected or worst cases.

Anonymous said...

"So the Big-O of the Travelling Salesman problem is O(n!) or factorial or combinatorial complexity."
I'm not clear what this means. Surely problems don't have complexities, only algorithms?

Also you give "Public Key Cryptography" is given as an algorithm of polynomial complexity. I thought it was unknown what the complexity of integer factorisation was.

Anonymous said...

Well its good explanation but i would like to know what does the Big-O actually do.I mean say we have 2 expressions O(n^2) and O(n^2logn)......how do we compare them and know which is better

NonComplacentLabRat said...

Thanks William Shields! I just revisted your reply from 3 months ago and re-read the article. I understand now that that the O doesn't stand for anything, it just tells you that this is notation called Big-O. And the Big-O notation means not to take the notation literally, but as a concept to describe the complexity of a problem. So, for any number n, O(log n2) is more complex than O(log n) because there are more digits in the problem, thus increasing it's complexity. And the base of the log doesn't matter because there is a linear relationship among the base of logarithms that is insignificant when it comes to the complexity of the problem

similar to when you wrote "The astute may have realized that we could express the number of operations as: n2 + 2n. But as you saw from our example with two numbers of a million digits apiece, the second term (2n) becomes insignificant (accounting for 0.00002% of the total operations by that stage)."

The base portion is the insignificant portion (I think anyway).

Thanks for the "english" explanation!

andren.lars said...

Thank you William! This is a very concise yet clear explanation of Big-O and the other algorithm buzzwords.

Thanks for taking time to write this, I am sure many people will find this helpful.

Anonymous said...

Wow this was amazing - you should consider writing a book: Computer Science in PLAIN ENGLISH!

Gmust said...

Great article, commendable efforts too... although your treatment of the concept is largely incomplete. Try effecting some of the corrections pointed out and give thorough dissection of all concepts in the original article.

Anonymous said...

please can some one tell me is there any function for which Big O and Omega are same?? are there any fuctions exists?? for f(x) and g(x) for which both are same. Please mail me ansary_90@yahoo.com Thanks all

ruwan Indika Prasanna said...

cheers !!! ...good one

Anonymous said...

I could not understand this concept in 4 years of college..I read this article and makes sense overnight.. Awesome article..Good Job and Thanks a million.

Post a Comment