[UPDATE: the below was, as I mentioned, an excerpt from Andrew's lengthy post. He's now asked me to include the rest of it, too. His full post continues at the bottom.]
Andrew Odlyzko, a mathematician, long-time analyst of digital economics and the director of the University of Minnesota's Digital Technology Center, has written a response to Lee Gomes' latest WSJ column, which says hits still matter. (I know!)
Andrew kindly sent me a post he just made to an economics mailing list he participates in. Here's an excerpt (I've underlined the data that he quotes from Lee's column):
Lee Gomes' Aug. 2 WSJ column does make a valid point that hits still matter. But at the same time the evidence it cites actually confirm a quantitative form of the "long tail" hypothesis. If we assume that popularity of items in an online store follows the ubiquitous Zipf's Law, in which the k-th most popular item is 1/k times as popular as the most popular one, we find that (by approximating 1 + 1/2 + 1/3 + ... + 1/k by log(k)) that the most popular k items out of a total of n items should be bought/viewed/...
log(k) / log(n)
fraction of the time.
Now let's look at the numbers in Lee's column:
(a) For Amazon, Lee cites estimates that the top 100,000 sellers account for 60% to 80% of all sales, among the "millions of books" that Amazon lists. Let's assume that those millions of books are 2 million. Then the rule above suggests that the top 100,000 should account for
log(100000) / log(2000000) = 0.793...
which is at the boundary of the range Lee gave (and the boundary that corresponds to the least contribution of the "long tail").
If those "millions of books" are 6 million, the number we get is 73.7%, still safely inside the 60-80% range.
(b) Netflix: 50 out of 60,000 titles account for 30% of rentals. The rule above predicts
log(50) / log(60000) = 0.355...
which is even more than what we see (and so the "long tail" is even bigger than might be expected).
(c) YouTube: Top 10% of 5.1 million videos account for 79% of plays, and top 20% for 89%. The rule listed predicts
log(510000) / log(5100000) = 0.8509...
log(1020000) / log(5100000) = 0.8957...
so that in the first case the "long tail" is again bigger than predicted, while in the second case it is almost exactly on target.
So the conclusion is that yes, Lee is right, one should not go overboard with the "long tail" thesis, and that hits do continue to play a major role, and should be expected to do so in the future. But at the same time, the long tail is there, and can be expected to play an increasing role, it's just that it will take a while.
Andrew's post continues:
Consider the Amazon example. With the current 3.7 million titles, the top 100,000 should account (according to the logarithmic ratio rule) for 76% of sales. But how much larger can the 3.7 million figure grow? Books are not easy to write, and so even if every would-be author who manages to write a complete manuscript gets "published" in some form, we are unlikely to increase the total number of books by more than a factor of 10, say. So suppose that Amazon goes to 37 million books from 3.7 million.
Then the quantitative rule would suggest that the top 100,000 titles would account for 66% of the sales. That is a noticeable drop from the 76% today, but hardly earth-shattering.
On the other hand, the difference can be substantial in other settings.
For example, if historical patterns repeat, then home-made videos will become key to the growth in penetration of broadband. And with improved cameras, editing tools, and high-speed connectivity, it is easy to imagine billions of videos available on the Net. Let's assume we end up with a relatively modest figure of 6 billion videos (we already have over 5 million on YouTube).
Then the top 50 titles on Netflix might drop from the 35% predicted by the rule for today to 17%, and the entire current inventory of 60,000 titles might account for just
log(60000) / log(6000000000) = 0.488...
or 49% of the total. That would be a major change.
The quantitative version of the "long tail" hypothesis is developed in my paper with Ben Tilly, "A refutation of Metcalfe's Law and a better estimate for the value of networks and network interconnections" (which also gives references for Zipf's Law and related issues), and in a shorter form in the paper with Bob Briscoe and Ben Tilly, "Metcalfe's Law is wrong" which appeared in the July 2006 issue of IEEE Spectrum,
It can also be used to provide a quantitative justification for the observation that connectivity has traditionally been valued more highly than content, as was shown in my Feb. 2001 paper "Content is not king".
Basically the huge mass of trivial communications (such as your making a dinner reservation), mostly of very little importance to anyone beyond the two people involved, and so at the extreme tail of the long tail, outweighs the blockbusters. (Ordinary voice telephony in the US, wired and wireless, still produces well over $300 billion a year in revenues, while Hollywood brings in something like $80 billion, and much of that from overseas.)