The Hysteria over URL Shortening

If you keep up with current events on the internet at large and the Web in particular, it's easy to become convinced that the World (Wide Web) is coming to an end. Both the good and the bad are exaggerated. Just as Opera hasn't reinvented the Web, URL shortening isn't the end.

For those who are (somehow) unfamiliar with URL shortening, it is the process of taking a long URL and turning it into a short one. Certain sites exist to do this, most notably Tiny URL but there are many others. So instead of http://www.cforcoding.com/2009/06/ibatis-tutorial-dynamic-sql.html I can have http://tinyurl.com/klvwuc.

The mechanism for doing this is simple: your browser goes to the shortened URL. That site simply does an HTTP redirect back to the original URL.

Although URL shortening predates Twitter, Twitter is the most common reason. Tweets are limited to 140 characters so every character counts. It's not the only reason however. For example, StackOverflow can't autolink URLs containing parentheses.

The main criticism of URL shortening is that it exacerbates link rot. Not only can the target change but the URL shortening service itself can go offline and I guarantee you that will happen.

The other major criticism of URL shortening is that it will break the Web by breaking search engines. To understand this argument, you simply need a high-level understanding of how search engines such as Google work. They look at the content of pages and create indexes. They crawl the links on those pages to find other pages. By virtue of the content of your page, what links there are to your page, how long your domain has been registered and a number of other factors, Google assigns your page an authority (commonly referred to as "PageRank"). Google uses that PageRank as a key component of ordering search results.

Some argue that if URL shortening reaches some critical mass, the links between pages will devolve to the point where search engines can't rank pages.

This may sound familiar as some once feared that because search engines (meaning "Google") were so good that noone would link to pages anymore and Google would become a victim of its own success (for the same reason: lack of links).

Like I said, people love to be dread merchants. I guess it makes for a good headline. But, as just one example, the history of predicting Google's demise has been hysterically funny to date.

The latest to weigh into this debate is Jeff Atwood, largely quoting Joshua Schacter who decries the extra layer of indirection as well as doom-and-gloom scenarios such as erasing a database and monetization of these links.

While I love a good "Chicken Little" story as much as the next reader, one has to point out that none of this matters.

While there are some valid criticisms of URL shortening such as the idea that it should be part of Twitter (to avoid the extra point-of-failure), the rest is largely just a storm in a teacup.

To illustrate my point, if you google "ibatis tutorial" and your results are like mine, you'll see two links on this first page pointing to posts I've written. One of them is a link to a DZone submission of one of those articles.

The astute reader will be able to view source on this page and find not a single reference to my blog on that page. So why does Google rank this page similarly? For those unfamiliar with DZone, it is a social news site for submitting programming related articles blog posts, tutorials, news and the like. Clicking on one of the links will go back to DZone, which will then redirect you to the page. They do this to measure click-through rate, which is the exact same technique used in Web advertising and URL shortening.

The salient points to take from this observation is that Google is smart enough to follow a redirect and I guarantee you that Google has indexed every single shortened URL it has ever found. Google understands HTTP redirects.

That debunks the search problem but what about link rot and monetization? If ever the problem becomes sufficiently large then the worst possible outcome is Google will need to reluanch such a site, provide that site with a backup of all their links or simply provide you with a plugin (like the Google toolbar) to automatically redirect you to the correct place.

That's all.

So don't panic. It's not the end of the Web. It's not the beginning of the Web. Nor is it the thin end of the wedge of the slipper slope. This is something that if it ever does become a problem is trivial to solve. So no, the sky isn't falling.

4 comments:

jamesmurty said...

Google may be smart enough to unpack a shortened URL, but what is to prevent a shortening service from lying about the real link? The service could direct Google's bots to the wrong location, and everyone else to the right one.

This would be an easy way to hijack legitimate links for evil purposes.

Julien Sobrier said...

Safe.mn (http://safe.mn/) makes a dump of all short links available publicly through FTP (ftp://safe.mn/). Anybody can make a public mirror of this list.

Anonymous said...

Well, I suppose google can provide their own short link service! http://67c.biz

Anonymous said...

bit.ly used to be automatic on Twitter but it isn't anymore. Any one know why?

Post a Comment