[UPDATE: the below was, as I mentioned, an excerpt from Andrew's lengthy post. He's now asked me to include the rest of it, too. His full post continues at the bottom.]

Andrew Odlyzko, a mathematician, long-time analyst of digital economics and the director of the University of Minnesota's Digital Technology Center, has written a response to Lee Gomes' latest WSJ column, which says hits still matter. (I know!)

Andrew kindly sent me a post he just made to an economics mailing list he participates in. Here's an excerpt (I've underlined the data that he quotes from Lee's column):

Lee Gomes' Aug. 2 WSJ column does make a valid point that hits still matter. But at the same time the evidence it cites actually confirm a quantitative form of the "long tail" hypothesis. If we assume that popularity of items in an online store follows the ubiquitous Zipf's Law, in which the k-th most popular item is 1/k times as popular as the most popular one, we find that (by approximating 1 + 1/2 + 1/3 + ... + 1/k by log(k)) that the most popular k items out of a total of n items should be bought/viewed/...

log(k) / log(n)

fraction of the time.

Now let's look at the numbers in Lee's column:

(a) For Amazon, Lee cites estimates that

the top 100,000 sellers account for 60% to 80% of all sales, among the "millions of books" that Amazon lists. Let's assume that those millions of books are 2 million. Then the rule above suggests that the top 100,000 should account for

log(100000) / log(2000000) = 0.793...

which is at the boundary of the range Lee gave (and the boundary that corresponds to the least contribution of the "long tail").

If those "millions of books" are 6 million, the number we get is 73.7%, still safely inside the 60-80% range.

(b) Netflix:

50 out of 60,000 titles account for 30% of rentals. The rule above predictslog(50) / log(60000) = 0.355...

which is even more than what we see (and so the "long tail" is even bigger than might be expected).

(c) YouTube:

Top 10% of 5.1 million videos account for 79% of plays, and top 20% for 89%. The rule listed predictslog(510000) / log(5100000) = 0.8509...

and

log(1020000) / log(5100000) = 0.8957...

so that in the first case the "long tail" is again bigger than predicted, while in the second case it is almost exactly on target.

So the conclusion is that yes, Lee is right, one should not go overboard with the "long tail" thesis, and that hits do continue to play a major role, and should be expected to do so in the future. But at the same time, the long tail is there, and can be expected to play an increasing role, it's just that it will take a while.

Andrew's post continues:

Consider the Amazon example. With the current 3.7 million titles, the top 100,000 should account (according to the logarithmic ratio rule) for 76% of sales. But how much larger can the 3.7 million figure grow? Books are not easy to write, and so even if every would-be author who manages to write a complete manuscript gets "published" in some form, we are unlikely to increase the total number of books by more than a factor of 10, say. So suppose that Amazon goes to 37 million books from 3.7 million.

Then the quantitative rule would suggest that the top 100,000 titles would account for 66% of the sales. That is a noticeable drop from the 76% today, but hardly earth-shattering.

On the other hand, the difference can be substantial in other settings.

For example, if historical patterns repeat, then home-made videos will become key to the growth in penetration of broadband. And with improved cameras, editing tools, and high-speed connectivity, it is easy to imagine billions of videos available on the Net. Let's assume we end up with a relatively modest figure of 6 billion videos (we already have over 5 million on YouTube).

Then the top 50 titles on Netflix might drop from the 35% predicted by the rule for today to 17%, and the entire current inventory of 60,000 titles might account for just

log(60000) / log(6000000000) = 0.488...

or 49% of the total. That would be a major change.

The quantitative version of the "long tail" hypothesis is developed in my paper with Ben Tilly, "A refutation of Metcalfe's Law and a better estimate for the value of networks and network interconnections" (which also gives references for Zipf's Law and related issues), and in a shorter form in the paper with Bob Briscoe and Ben Tilly, "Metcalfe's Law is wrong" which appeared in the July 2006 issue of IEEE Spectrum,

It can also be used to provide a quantitative justification for the observation that connectivity has traditionally been valued more highly than content, as was shown in my Feb. 2001 paper "Content is not king".

Basically the huge mass of trivial communications (such as your making a dinner reservation), mostly of very little importance to anyone beyond the two people involved, and so at the extreme tail of the long tail, outweighs the blockbusters. (Ordinary voice telephony in the US, wired and wireless, still produces well over $300 billion a year in revenues, while Hollywood brings in something like $80 billion, and much of that from overseas.)

Let be slightly naive and say that I'm unclear whether Lee and you and this researcher are talking about units sales, retail price, or net profit to Amazon or another service when discussing this problem.

If the 100,000 best sellers produce 80% of the unit sales volume, they might only contribute 40% of the revenue and 10% of the profit. As we have talked about before, Amazon.com's great innovation in the book business was exhaustiveness in that every book in print is as easy to order (but not as quick to receive) as any other book in print.

(Out of print books are sometimes easier because the book is shipped from a used book seller shipped directly to a customer out of their inventory, rather than from a publisher that takes 1 to 6 weeks to fulfill orders to the bookseller.)

Amazon charges list price and sometimes a $1.99 surcharge for a significant majority of its in-print titles and has (except the surcharge) since its inception.

Thus, Lee could be right about volume, you could be right about profit before taxes, etc. Is there a clear answer as to whether we're talking units, retail price, or the raw profit?

Posted by: Glenn Fleishman | August 04, 2006 at 12:34 PM

Glenn,

Excellent point. I do address the difference between viewing the Long Tail from a unit, revenue and profit perspective in the book. But unfortunately the current debate was started by Gomes, who has been unclear on definitions in his columns. And, as you know, definitions are everything.

FWIW, in my quantitative research I almost always use units exclusively. This allows for proper head-to-head comparisons between markets, even if their revenue and margin strategies are different.

Chris

Posted by: Chris Anderson | August 04, 2006 at 12:49 PM

That's interesting. I would expect that the $ amounts are actually more compelling than the unit amounts. I have long thought that Amazon essentially funds the discounts on popular books with no discount and surcharges on typically much more expensive, non-discounted titles.

Selling one copy of $100 at list price and demanding 55% off from the grateful publisher (and free shipping because you're ordering multiple items from them each time) plus $1.99 -- well, my math says that's $56.99 you've just netted before any other expenses. (Free shipping being one of them.)

Meanwhile, that $34.99 hardcover, which may be sold at 40% off list is purchased at 45 to 50% off list and then shipped free. That's, well, my math says that's nearly nothing.

Posted by: Glenn Fleishman | August 04, 2006 at 03:02 PM