« November 2005 | Main | January 2006 »

13 posts from December 2005

December 29, 2005

Announcing the Fortune 500 Business Blog Index

Wiki Short form: In collaboration with Socialtext, we've created a wiki that tracks which of the Fortune 500 is blogging. We found that only 3 4% of the F500 are doing so. Check it out here.

Long form: Earlier this year I was at a dinner with Doc Searls and we got to talking about why some companies blog and some don't. Microsoft blogs, and Apple doesn't. Sun blogs and Intel doesn't. GM blogs and Toyota doesn't. And so on.

    Perhaps, Doc wondered, the risks and uncertainties of public business blogging are so great that big companies only do it under duress, when their traditional corporate messaging has lost traction. So companies on the way up don't want to mess with their success by introducing a new lens on the enterprise that isn't controlled by the PR department. But companies on the way down are willing to try anything to regain the confidence of their customers. [Update: Doc has posted more background on this here.]

    Hmm, I thought. That's testable. Let's look at which of the Fortune 500 companies are blogging and compare their past twelve month share performance with those that aren't. If this theory stands up, the blogging members of the F500 will have underperformed the nonblogging members. And then we can also see if blogging makes a difference going forward, by continuing to follow the two cohorts.

    So  I asked our research department to check it out. And they quickly discovered that the problem is that there's no good list of F500 company blogs.

    Business blogging, as we defined it, is:

Active public blogs by company employees about the company and/or its products.

    This site has a list of F500 companies with blogs, but doesn't distinguish between internal and external ones (and doesn't link to any of them so they can be checked). This wiki has a linked list of blogs, but it's a bit of a mishmash, ranging from tiny companies to big ones, with some non-profits thrown in as well. (The NewPRwiki also has a list of CEO blogs, which is better, but still doesn't fit the bill.)

    So we decided to create a list ourselves, and put the ace Wired interns on the case. But they, too, discovered that it's tough to do right, given ambiguity about what is or isn't a proper business blog, what's still active and so on.

    Nevertheless, they did come up with a rough version, which you can see in this spreadsheet. By the above definition, they found only 16 18 members (3 4%) of the Fortune 500 with business blogs. [UPDATE: the wiki's working already; contributors found two more companies blogging, including (ulp!) IBM. Don't know how we missed that. The spreadsheet and the share price performance stats have been updated] And, for what it's worth with such a small sample size, the average trailing 12-month share performance of the blogging members was +5 +4%, while the performance of non-blogging members was +19%. So although the statistics aren't good enough to confirm Doc's theory, they do point in the right direction.

    The next step was to improve the data. So we've decided to open source the project. In collaboration with Ross Mayfield at Socialtext, we've created the Fortune 500 Business Blogging Wiki. It's a wiki'd version of the above spreadsheet that anyone can edit, adding new Fortune 500 blogs as they're found or revising existing entries. It's released under a Creative Commons attribution license, so anyone is free to use it any way as long as they point back to the wiki.

    Over time, as the list gets more robust, we'll add the share price performance back in and turn it into a Business Blogging Index so you can see if blogging is indeed correlated to company performance (and who knows, maybe someday some smart mutual fund will actually turn it into a fund you can buy). Have at it!

December 27, 2005

Microsoft and the Long Tail of search

A former senior Microsoft manager (he didn't want his name used) emailed me with an interesting perspective on the Long Tail of search:

As you know, search engine query logs have a Zipfian distribution (Rank * Frequency = Constant).  When I presented this concept to senior execs at Microsoft, including BillG, they never quite got it.  I would draw the graph just like the logo on your site, but it just wouldn't sink in.  I realized that the size of the X axis was the problem.  People see a graph, but they don't comprehend the scale of the X axis. 

Looking at your logo, for example, it looks as if the (yellow) long tail is what, 4x, 5x the size of the red portion?  And it is, if you are comparing the integral of the yellow portion (i.e. query volume) compared to the red.  But the X axis, on a linear scale, extends almost infinitely to the right, and no visual can communicate that.   In essence, in order to represent the long tail in graphical/visual form, you implicitly have to represent the X axis logarithmically, as you know, and people (generally) don't comprehend logarithmic / exponential scale.

He eventually found a way to explain the curve using stacks of pennies, which ends up with a few high stacks followed by a trail of single pennies extending out the door, down the hall and, if you have enough pennies, as far as the eye can see.

He also found a handy rule of thumb to estimate the consequences of this distribution:
Loosely speaking, if you divide the number of queries by 4, you'll get the frequency of the most popular query, and if you divide by two you'll get the number of queries that occurred only once over whatever time period you are measuring.
Finally, it turned out that Zipf distribution turned up in all sort of unexpected places at Microsoft:
A second example refers to what the external world knows as Windows Crash Analysis but what inside Microsoft is known as Watson or Dr. Watson (of "come here, Watson, I need you" fame) because that's what it was first called.  Anyway this is the dialog that appears when a Windows application crashes on Windows XP -- an alert appears and offers to send the information to Microsoft.  Back at Microsoft, they compile that information in a SQL Server database, indexed by application name, module name, module version, and the internal address where the crash occurred.  Back in 2001 or 2002, they started showing off a graph of all the application crashes, with Rank on the X axis and Frequency on the Y axis.  Internally they called this the "Watson Curve," even in front of Gates. 

When I saw the curve, I smiled, because it looked familiar.  I asked one of the guys on that team to do me a favor and plot the results log-log.  He got back to me a few days later and said "wow, it's a straight line!"  I wasn't surprised.  I don't think they call it the Watson curve anymore because it's just yet another example of a Zipfian distribution at work.   
For more, check out this amazing site, which is all about the many Zipfian distributions found in the real world.

December 19, 2005

One year anniversary stats

Today is this blog's one-year anniversary. Here are the current stats, rounded to two significant digits:

  • Posts:  180 (3.5 a week)
  • Total words written:  120,000 (the equivalent of 1.5 books)
  • Comments:  1,800 (10 per post)
  • Trackbacks:  860 (5 per post)
  • "Long Tail comment elsewhere" entries: 300 (6 a week)
  • RSS subs on Bloglines (including all feed variants):  2,700
  • RSS subs elsewhere: Who knows? Maybe another 3,000 or so?
  • Average daily traffic:  1,600 visitors (that doesn't include RSS readers)
  • Total daily readers: Given the uncertainties in RSS readership (how many of those Bloglines subs are active?), I can only guess that it's somewhere between 3,000 and 5,000.
  • Current Technorati rank (combining my old "longtail.typepad.com" URL, which still works  and continues to be linked to, and my current one, "thelongtail.com"): around 200.

Some lessons from the first year:

  • People seem forgiving of the occasional times of crazy travel when I don't post for a week. No apology required.
  • People also seem forgiving of occasional deviations from the mission of this blog. My Friday Fanboy series (usually on a Friday but sometimes just over a weekend), where I talk about what I'm really into in any given week, is fun for me and often (but not always) related to Long Tail themes. But there's no need to overstate the relevance to the theory. Sometimes a cigar is just a smoke. 
  • The appetite for really wonky posts on Long Tail economic and statistical methodology is limited. There are, as it turns out, subjects too obscure even for a blog.
  • On topical subjects, post sooner. Better to add something to the debate in its early hours than to wait too long trying to craft the definitive post after the crowd has moved on.
  • I don't know if my "Long Tail comment elsewhere" sidebar is working as well as it might. It's meant to be a way to collect and call attention to relevant and interesting writing elsewhere without deviating too much from the mission of the mainbar/feed, which is to feature my own mini-essays on Long Tail-related topics. Although it's now possible to subscribe to the sidebar's feed separately, there are other limitations of Typelists (which is what that sidebar is) that I find frustrating, such as no posting dates and no archive (old ones just disappear from the public site). Any suggestions on a better approach? Or is it working well enough already?
  • Finally, I should try to respond to more of the comments. I read them all and always find the ideas and advice of great value in steering the book. But I don't reply to many of them, largely because in instances where I agree no reply is necessary and in instances where I don't, I'm loath to take the debate off into what I fear will be an unproductive direction. It's in that latter category where I should more often dive in and see where it goes. Maybe when the book is done (by the end of the year, with luck).

December 18, 2005

The Probabilistic Age

325pxnormal_distribution_pdf_3 Q: Why are people so uncomfortable with Wikipedia? And Google? And, well, that whole blog thing?

A: Because these systems operate on the alien logic of probabilistic statistics, which sacrifices perfection at the microscale for optimization at the macroscale.

Q: Huh?

A: Exactly. Our brains aren't wired to think in terms of statistics and probability. We want to know whether an encyclopedia entry is right or wrong. We want to know that there's a wise hand (ideally human) guiding Google's results. We want to trust what we read.

    When professionals--editors, academics, journalists--are running the show, we at least know that it's someone's job to look out for such things as accuracy. But now we're depending more and more on systems where nobody's in charge; the intelligence is simply emergent. These probabilistic systems aren't perfect, but they are statistically optimized to excel over time and large numbers. They're designed to scale, and to improve with size. And a little slop at the microscale is the price of such efficiency at the macroscale.

    But how can that be right when it feels so wrong?

    There's the rub. This tradeoff is just hard for people to wrap their heads around. There's a reason why we're still debating Darwin. And why Jim Suroweicki's book on Adam Smith's invisible hand is still surprising (and still needed to be written) more than 200 years after the great Scotsman's death. Both market economics and evolution are probabilistic systems, which are simply counterintuitive to our mammalian brains. The fact that a few smart humans figured this out and used that insight to build the foundations of our modern economy, from the stock market to Google, is just evidence that our mental software has evolved faster than our hardware.

    Probability-based systems are, to use Kevin Kelly's term, "out of control". His seminal book by that name looks at example after example, from democracy to bird-flocking, where order arises from what appears to be chaos, seemingly reversing entropy's arrow. The book is more than a dozen years old and decades from now we'll still find the insight surprising. But it's right.

    Is Wikipedia "authoritative"? Well, no. But what really is? Britannica is reviewed by a smaller group of reviewers with higher academic degrees on average. There are, to be sure, fewer (if any) total clunkers or fabrications than in Wikipedia. But it's not infallible either; indeed, it's a lot more flawed that we usually give it credit for.

    Britannica's biggest errors are of omission, not commission. It's shallow in some categories and out of date in many others. And then there are the millions of entries that it simply doesn't--and can't, given its editorial process--have. But Wikipedia can scale to include those and many more. Today Wikipedia offers 860,000 articles in English - compared with Britannica's 80,000 and Encarta's 4,500. Tomorrow the gap will be far larger.

    The good thing about probabilistic systems is that they benefit from the wisdom of the crowd and as a result can scale nicely both in breadth and depth. But because they do this by sacrificing absolute certainty on the microscale, you need to take any single result with a grain of salt. As Zephoria puts it in this smart post, Wikipedia "should be the first source of information, not the last. It should be a site for information exploration, not the definitive source of facts."

    The same is true for blogs, no single one of which is authoritative. As I put it in this post, "blogs are a Long Tail, and it is always a mistake to generalize about the quality or nature of content in the Long Tail--it is, by definition, variable and diverse." But collectively they are proving more than an equal to mainstream media. You just need to read more than one of them before making up your own mind.

    Likewise for Google, which seems both omniscient and inscrutable. It makes connections that you or I might not, because they emerge naturally from math on a scale we can't comprehend. Google is arguably the first company to be born with the alien intelligence of the Web's large-N statistics hard-wired into its DNA. That's why it's so successful, and so seemingly unstoppable.

    Paul Graham puts it beautifully:

"The Web naturally has a certain grain, and Google is aligned with it.  That's why their success seems so effortless.  They're sailing with the wind, instead of sitting becalmed praying for a business model, like the print media, or trying to tack upwind by suing their customers, like Microsoft and the record labels. Google doesn't try to force things to happen their way.  They try to figure out what's going to happen, and arrange to be standing there when it does."

The Web is the ultimate marketplace of ideas, governed by the laws of big numbers. That grain Graham sees is the weave of statistical mechanics, the only logic that such really large systems understand. Perhaps someday we will, too.

[Update: Nicholas Carr, who seems to have inherited the Clifford Stoll chair of reliable techno-skepticism, has a clever and well-written response here.]

Autohornblowing

Three gratifying call-outs landed on my desk this week:

  • In anticipation of the book's release (either June or Sept, depending on how this last burst of writing goes) I was named one of the 26 "People to Watch in 2006" by SciFi magazine (just ahead of Ben Affleck!).
  • And the Guardian UK wrote about the theory, particularly in regard to books:

"For every punter who strolls out of Waterstone's with a heavily plugged copy of the latest Lynne Truss, there is another who will not rest until they have tracked down an obscure volume on Venetian calligraphy, or the Bob Monkhouse bumper book of jokes. Add up all those niche and downright obscure purchases and you have a business worth billions.

If the 20th-century entertainment industry was about hits and blockbusters, according to Anderson, the 21st will be about a multiplicity of misses. The long tail, he predicts, is giving rise to an entirely new economic model for the media and entertainment industries, and one that is just beginning to show its mettle."

December 15, 2005

VC advice on finding the money in the Long Tail

David Hornik, a VC at August Capital who has been a good sounding board for me in my Long Tail research, has a long and thoughtful post on where the money in the Long Tail is. He's been pitched countless Long Tail business plans, and his conclusion after all this is the ones that make sense are not so much content creators as aggregators and filters:

The aggregators are those web businesses that seek to collect up as much of the Long Tail content as is possible, so as to make their "stores" a one stop shop for content no matter how popular or obscure. That aggregation may be on a horizontal basis, as is the case with Amazon or Netflix, or it may be on a vertical basis, as is the case with WantedList or GameFly (the Netflix of porn and video games respectively). The value to consumers from these content aggregators is that they need not shop in dozens of places on the web in order to acquire a diverse set of content. As a result, aggregators are able to extract a disproportionate amount of value for the sale of each individual piece of content. And while creators are likely to sell slightly more content as a result of the increased ease of salability, they will not likely emerge from the obscurity of the Tail merely because they are made available for sale on Amazon or iTunes.

The filterers are those businesses that make it easier to find the content in which we are interested, despite the increasing proliferation of content creators, hosts, aggregators, etc. The purest form of filterer is the search engine. But the more obscure the content, the less effective the generalized search engine will be. Thus, I have been pitched on an increasingly large number of vertical search engines that use their thematic focus (shopping, real estate, employment, etc.) as a proxy to increase search effectiveness. And I have also seen an increasing variety of clever technical solutions to help filter the myriad of available content (for example, Pandora uses professional musicians analyzing songs based upon a standard set of characteristics and Delicious and Flickr use forms of end user tagging to characterize a disparate set of content). Again, while these different filtering technologies may make it slightly more likely that an end user finds his or her way to a piece of obscure content, it will not likely be sufficient to catapult an artist into the mainstream. The beneficiary of the filtering is the end user and the filterer, not the content owner per se.

Cheating Getting Sued By Google 101

Law blog Infamy or Praise warns me that I could get sued for my AdWords stunt because I asked readers not to click on my ad.

California law, which, according to the AdWords Program Terms, applies to Anderson's transaction with Google, implies a covenant of good faith and fair dealing in all contracts between parties entered into in the State of California. Does Anderson's request violate this implied covenant?

....I think Anderson's on shaky legal ground driving prospective ad-clickers away from his ads and denying Google their reasonably-anticipated AdWords revenues from him.

So, if my guess is correct and Anderson's in breach of his implied covenant of good faith and fair dealing with Google, what's his liability? Since this is Google and not Microsoft with which he's dealing, it's unlikely that Anderson will be half-hung, drawn and quartered. The Program Terms disclaim "consequential, special, indirect, exemplary, punitive and other damages" and provide that Anderson's probable liability would be measured by Google's lost per-click values. As Anderson cheerfully admits, his ad "sucks"; Google's losses due to his passing interference with his audience's natural ad-clicking tendencies are almost certainly negligible. A measure of damages more to Google's liking might be that suggested in a comment to Anderson's post: "Chris, how much will you pay us not to click on your ad?"

Read the whole post, especially the fun part at the end when he notes my comment that the actual benefit of my ads is probably close to zero: "It wouldn't take much legal woe to create an actual detriment somewhere south of zero." I think he's joking.

December 13, 2005

Cheating Google 101

Ad_1[UPDATE: After several months of this Google finally raised the cost-per-click on my underperforming ads above my $1.00 daily budget, so I ended the campaign. The final stats: over tens of thousands of impressions, my net CPM (cost per thousand impressions) for these highly targeted ads was $0.36. That's not free, but it's really, really cheap. My conclusion: there's something to this strategy, and I may well use it again.] 

I wanted to understand clickfraud a bit better, so I started advertising this blog on Google and trying to see what sort of fraud it could catch. Then when I had finished the experiments I left the ads running. Now I've just checked the latest stats, and I appear to have found a loophole in Google's revenue model. I seem to be advertising on Google for free.

Google ads are pay-per-click. They're based on an auction model, so for each keyword/phrase the best performing ads (some combination of those that generate the most clicks and those who will pay the most for those clicks) rise to the top, displacing others. For popular keywords, I'm sure that's an efficient, highly-optimized model.

But I chose a bunch of very obscure terms to advertise against. And my ad sucks (see above) and nobody ever clicks on it. The result is that I get hundreds (sometimes thousands) of highly-targeted impressions a day for free. Every now and then Google notices that my ad isn't performing, so I have to raise the price I'll pay for each click (I'm now at $0.40). But since I get no clicks it doesn't matter.

I have to admit to a slight rush of pride that I've managed to outsmart Google in some tiny way and get free impressions. Granted, the value of those impressions is at most a couple bucks a day. And because my ad, as mentioned, sucks, the actual benefit to me is probably close to zero. Furthermore, if anyone were to actually click on the ads, I'd quickly lose whatever gains I've made (if you do happen to see my ad out in the wild, please don't click on it). But still! I've hacked Google! Woot!

December 12, 2005

The Long Tail of Cakes

Today we had a bake-off contest at the office. It was fiercely competitive; so much so that this masterpiece only won 3rd prize in the Most Creative category. I couldn't bear to eat it myself, but am reliably informed that it was delicious.

Img_0287_3

 

And the making of...

Img_0281

 

Img_0282

Lisa Katayama

Img_0288

From left, Greta Lorge, Joanna Pearlstein, Angela Watercutter

December 11, 2005

Redistributing those excess search profits

Bill Gates thinks there are excess profits in search and suggests that users should be paid for searching.

No sooner do I read that than I stumble across Blingo, a Google search affiliate whose consumer proposition is this:

  1. We pick a bunch of random winning times.
  2. Search at the right time and you win. No registration required.
  3. Then tell us where to send your prize.

It sounds totally 1999 ("Get paid to click on ads!"), but there's something here. I'm no Wall Street quant, but Google's outsized profits scream "arbitrage opportunity" to me.

December 08, 2005

The Long Tail at CSFB Media Week

Thanks to PaidContent's comprehensive coverage of CSFB's Media Week conference in New York, we've got some sense of how the Long Tail meme is playing in the boardroom.

Dec 6 @ Media Week: Google's Rosenberg Serves The Kool-Aid: The UBS lunch crowd for Jonathan Rosenberg, Google VP-product development is into the hallway, drawn by the strength of a presentation on one of the market's hot stocks. Rosenberg gives a very energetic presentation but I'm not quite sure he reaizes how far away some of the people here are from the tech/web universe. One example: He started to quote "The Long Tail", assumed everyone (probably more fair to say most) had read it and then, as he saw some quizzical responses, urged everyone to do so. He followed that up by referring people to Chris Anderson's blog: "It's a very insightful blog." ... I just want a hands-up show of how many people in this room know what a blog is; many do and some are even our readers. (Hi, Sam.) But this is still a crowd on a learning curve when it comes to the internet. Then again, Google is still on a learning curve when it comes to Wall Street, $400-plus shares aside.

Dec 7 @ Media Week: Creative Answers Needed for A La Carte: When I asked about using Warner's video library on AOL instead of offering it on iTunes at $1.99 a pop, [Time Warner Chairman and CEO Dick] Parsons replied: "How many video iTune (iPod) owners are there? One million maybe? We've got 20 million people who subscribe to AOL and 45 million who use it all the time so you're putting this in front of a hugely larger audience." Then he went for the long tail -- and borrowed my legal pad and pen to illustrate with a graph. Luckily, I had a tape recorder: "Most of the stuff starts over here and ends way, way out here. You monetize this piece of it because that's the 80 percent we can make all the money in. ... Out here (the end) there's 152 people in America who want to see `The Fugitive' again. There not going to go out and buy iPods to see it but if you get it online this enables to monetize the whole thing. There's currently no business model for that in a profitable way so we'll see." He said that strategy doesn't preclude charging for downloads.

Xbox 360 Media Center Update

After my enthusiastic post about the Xbox 360 as a Media Center Extender, I belatedly discovered that updating the software for the 360 had disabled my older extenders, including the original Xboxes and a Linksys hardware extender.

I hunted around online for help and didn't find any, so I spent a fruitless hour and half on the phone with Microsoft tech support (being bounced from Xbox support to Xbox 360 support to Windows support, each time eventually escalating to managers who weren't able to help). Finally, at wits-end, I asked Charlie Owen, a Microsoft project manager who runs a great Media Center blog, if he had any suggestions.

Charlie put me in touch with the team that had worked on the Extender port. After a few days of running diagnostic tests, we discovered that it was due to a version conflict between some of the earlier Extender software I'd been running on the original Xboxes and the new Extender manager that you download and run on the Media Center PC as part of installing the 360. It's now solved and I'm happy.

Two lessons from this:

1) If anyone else is having trouble getting older extenders to work once you've got an Xbox 360 on the network, do this:

Make sure you have the latest Extender software. That's 1.01 for the Xbox, and to use the below process you need the DVD version that was sent out earlier this year. Delete any files from the previous version by going to the Xbox dashboard, selecting "memory" and deleting the "Media Center Extender" entry. On the 360, disconnect the Media Center (it's on the media tab). Uninstall the extender software from the Media Center PC, and reboot.

Then download and install the new PC extender software. Put the 1.01 extender disk in the original Xbox and go through through the usual 8-digit code entry to associate it.  On the 360, use the media tab options to do the same. This should clear your system and ensure that everything's working with the latest versions of the software.

2) The Microsoft team (Rob Lehew, the MCX project manager, and his colleagues) were totally great and quickly got me to the solution by diagnosing packet traces and otherwise walking me though some process-of-elimination steps. Obviously I'm not the average customer and they don't usually have project managers doing tech support. But because the team has a number of active bloggers who are accessible and willing to respond to users, it's much easier for anyone to find answers quickly from people who know the most about the product.

This is a great example of how company blogs can improve consumer relations by putting a human face on the development team. That's helpful in problem solving, as in my case, but it's even more useful in passing on tips and tricks from the pros and inviting suggestions from users on future development.

The old model was mostly to use newsgroups and forums for this, and that still has its place for really specific tech support. But I find blogs far easier to navigate and read, and you can subscribe to them in a way that you can't with newsgroups. Obviously not all developers want to take on the email and comment burden that comes with having a blog, but it only takes a few to really improve the customer relationship. Hats off to Charlie and the rest of the Media Center team for the fine role model.

Squidoo Long Tail "lens"

Squidoo The irrepressible Seth Godin has a new company called Squidoo that is in part setting out to provide a solution to the blogger's dillemma: how to improve the "what's this all about?" experience for first-time visitors.

I've tried to do that with a relatively comprehensive About page, which has an accompanying FAQ. Squidoo does a more comprehensive job with its Long Tail "lens". It combines my own pick of "start here" posts with a sampling of recent posts and comments.

It's certainly better looking than my About page, but is it more useful? More importantly, is this a better way to discover new blogs, by sampling them in a common format?  I'll have to try it more before I can answer that.

Tidbits