« Autohornblowing | Main | One year anniversary stats »

December 18, 2005

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341bfb6353ef00d8345b783c69e2

Listed below are links to weblogs that reference The Probabilistic Age:

» Why tagging and Wikipedia work from Ton's Interdependent Thoughts
Chris Andersen writes a piece that I recommend you to goread in full, to prevent me from quoting it here in full.With clients and others I often have a hard time explaining my information strategy when it comes to blogreading... [Read More]

» Excellent post on the Long Tail from On IT and beyond
Chris Anderson has another great piece on the Long Tail. Generally, I have nothing to add in this context. Interestingly enough, there is enough software that is expected no behave in a non-Gaussian way - that is: they have to work perfectly with no fl... [Read More]

» Have faith from Rough Type: Nicholas Carr's Blog
Wired editor Chris Anderson offers a spirited defense of internet "systems" like Wikipedia, Google, and the blogosphere. Criticism of these systems, he argues, stems largely from our incapacity to comprehend their "alien logic." Built on the mathematic... [Read More]

» Have faith from Rough Type: Nicholas Carr's Blog
Wired editor Chris Anderson offers a spirited defense of internet "systems" like Wikipedia, Google, and the blogosphere. Criticism of these systems, he argues, stems largely from our incapacity to comprehend their "alien logic." Built on the mathematic... [Read More]

» Chris Anderson on Probabilistic Thinking from The Stalwart
Forgive the spate of link entries, The Stalwart is on partial vacation this week in Austin, TX. We like to rib the whole long-tail crowd for letting one idea so dominate their worldview, that almost everything can be seen through [Read More]

» Probability, Superstition and Ideology from alex wright
Nick Carr makes the humanist case against Chris Anderson's defense of probabilistic systems like Google and Wikipedia, taking issue with Anderson's argument that qualitative criticisms of these systems fail to recognize the virtues of sacrificing "perf... [Read More]

» Probability, Superstition and Ideology from alex wright
Nick Carr makes the humanist case against Chris Anderson's defense of probabilistic systems like Google and Wikipedia, taking issue with Anderson's argument that qualitative criticisms of these systems fail to recognize the virtues of sacrificing "perf... [Read More]

» Emergent Properties of the Long Tail from Emergent Chaos
Chris Anderson warms the cockles of our heart as he discusses the psychological acceptability of "The Probabilistic Age:" When professionals--editors, academics, journalists--are running the show, we at least know that it's someone's job to look out fo... [Read More]

» Lots of links from JD on [TBD]
Lots of links: I've kept a lot of tabbed windows open in my browser the past week, but didn't have sufficient original commentary to justify sending each to the aggregator... here's a bunch of recent postings, some of which you may find of interest too... [Read More]

» The Politics of Statistics from Ryan Shaw
Chris Anderson has posted an absurd piece called The Probabilistic Age in which he suggests that the reason people arent comfortable with Wikipedia and Google is that they are systems that operate according to the laws of probabilistic statisti... [Read More]

» Why tagging and Wikipedia work from Ton's Interdependent Thoughts
Chris Andersen writes a piece that I recommend you to goread in full, to prevent me from quoting it here in full.With clients and others I often have a hard time explaining my information strategy when it comes to blogreading... [Read More]

» "The Probabilistic Age" - Why Wikipedia Works from Influence
A topic of continuing interest here is why and how wikipedia works (which we think it does), and so this commentary by Chris Anderson, writer for Wired, is insightful: The Probabilistic Age: "Q: Why are people so uncomfortable with Wikipedia? And [Read More]

» Probability the Mammalian Brain from exoskeleton
Check out this essay on the Long Tail blog (which I think is written by one of the Wired editors) which answers the question: Why are people so uncomfortable with Wikipedia? And Google? And, well, that whole blog thing? Our brains aren&... [Read More]

» Probablistic systems from Johnnie Moore's Weblog
There's a thought provoking post by Chris Anderson on probablistic systems - and some good debate in the comments and trackbacks. One of those led me to this post by Wiggy:This is a battle. Wikipedia is under attack by those... [Read More]

» Probablistic systems from Johnnie Moore's Weblog
There's a thought provoking post by Chris Anderson on probablistic systems - and some good debate in the comments and trackbacks. One of those led me to this post by Wiggy:This is a battle. Wikipedia is under attack by those... [Read More]

» Google e Wikipedia: por que o desconforto? from De Gustibus Non Est Disputandum
O link veio do Marginal Revolution (link fixo aí ao lado), mas a entrevista está aqui e eu a recomendo. Claudio... [Read More]

» Micro vs. Macro in a Duel to the Death from Snarkmarket
Get ready: I am about to compare Wikipedia to Wal-Mart. Chris Anderson says the magic of Wikipedia (and other internet systems, e.g. Google) is that they work on hugely macro "probabilistic" scales. Think of it like this: To put it... [Read More]

» 蓋然的(確率的)時代 from The Croton
Unofficial Japanese translation of "The Probabilistic Age" by Chris Anderson. [Read More]

» 蓋然的(確率的)時代 from The Croton
Unofficial Japanese translation of "The Probabilistic Age" by Chris Anderson. [Read More]

» Probabalistic Information Flow from Toomre Capital Markets LLC
At TCM, we spend a lot of time talking about the convergence of asset markets, the liability markets and the liquidity markets. The liability markets and to a lesser extent, the liquidity markets are focused significantly on probabilistic statistics. [Read More]

» Länkar från 2005-12-22 from k-mrkt
Jeffrey Zeldman: Style vs Design Zeldman om webbutveckling: "Design är kommunikation", "De flesta webbsidor ska användas", "Därför måste webbsidor... [Read More]

» Probabalistic Information Flow from Toomre Capital Markets LLC
At TCM, we spend a lot of time talking about the convergence of asset markets, the liability markets and the liquidity markets. The liability markets and to a lesser extent, the liquidity markets are focused significantly on probabilistic statistics. [Read More]

» Probability, Superstition and Ideology revisited from alex wright
Gartner's Nick Gall sent along a few thoughts on my earlier post Probability, Superstition and Ideology (itself a commentary on earlier posts by Nick Carr and Chris Anderson). With Nick's permission, I've excerpted his comments here: "The image of a... [Read More]

» Amazon's Recommendations are Probabilistic from Kaedrin Weblog
Amazon.com is a fascinating website. It's one of the first eCommerce websites, but it started with a somewhat unique strategy.... [Read More]

» Amazon's Recommendations are Probabilistic from Kaedrin Weblog
Amazon.com is a fascinating website. It's one of the first eCommerce websites, but it started with a somewhat unique strategy.... [Read More]

» The Probabilistic Age from Musings From Alfheim
But now were depending more and more on systems where nobodys in charge; the intelligence is simply emergent. Chris Anderson Chris is Patient Zero of the Long Tail meme. I finally got around to giving an in-depth read of his lat... [Read More]

» Challenges for Blog Analysts from Netcoms
"Blogs are a long tail" Chris Anderson recently observed, in a post otherwise dedicated to explaining... [Read More]

» Cheating Probabilistic Systems from Kaedrin Weblog
Further discussion of probabilistic systems like Amazon.com recommendations, Google, and Wikipedia, including specific references to "cheating" in those systems. Also noted is how these new systems are not meant to replace the old, but in the words of ... [Read More]

» Cheating Probabilistic Systems from Kaedrin Weblog
Further discussion of probabilistic systems like Amazon.com recommendations, Google, and Wikipedia, including specific references to cheating in those systems. Also noted is how these new systems are not meant to replace the old, but in the words of Ne... [Read More]

» How Many Worms In A Can? from theQview
The problem with James Surowiecki's bookThe Wisdom of Crowds is not in its logic but in its application. Instead of understanding and questioning the limits of group wisdom, it is currently vogue to simply cite the book, drink the Kool-Aid [Read More]

» Internet Search Engine from Web Search Engines
Blog search engines help you find blogs on the Web on whatever topic you'd like to ... Profile of AltaVista, One of the Oldest Search Engines on the Web... [Read More]

» The Anti-Authoritarian Age from Mike Linksvayer
In a compelling post Chris Anderson claims that people are unconfortable with distributed systems [b]ecause these systems operate on the alien logic of probabilistic statistics, which sacrifices perfection at the microscale for optimization at t... [Read More]

» Yahoo improves My Web 2.0 from myBlog
Yahoo! My Web 2.0 is an extension of the My Web personal search service. My Web 2.0 lets you save your bookmarks and share your bookmarks with family, friends and colleagues. You can also discover new things by browsing what’s popular or interesting to... [Read More]

» MSN Search's WebLog from MSN tests new blog, search features
MSN tests new blog, search features | The service will let users find blogs and syndicate content using the RSS format, as well as search blogs for specific ... [Read More]

» Medical License Search from Pharmacy Web Site Directory
Search engine for Florida Health related licenses. ... Skip left hand navigation and go to main body of page. Welcome to the Health and Human Services ... [Read More]

» MP3 Downloads, Find your favorite mp3 from Mp3 Search
Enter Artist or Song or Album name to search:. Download MP3 Music for $0.10 per song. MP3 Archive:. # - A - B - C - D - E - F - G - H - I - J - K - L - M ... [Read More]

» BEST OF MP3 MUSIC, SOUNDTRACKS, COLLECTIONS AND FULL ALBUMS from MP3 Directory
BEST OF MP3 MUSIC, SOUNDTRACKS, COLLECTIONS AND FULL ALBUMS. ... 17.04.2006 Silent Voices Silent Voices: Silent Voices - Full Album: Building Up The Apathy ... [Read More]

» Probabilistic accuracy v definitive authority from aTypical Joe: A gay New Yorker living in the rural south.
Chris Anderson has a wonderful post on why people are uncomfortable with Wikipedia, Google and blogs. It's because these systems "sacrifice perfection at the microscale for optimization at the macroscale." He says we're living in a probabilistic age: T... [Read More]

» Pilot gets 15 months in cable car deaths (AP) from heavy equipment
whose helicopter dropped heavy equipment onto a ski lift in Austria last year, killing nine Germans, was convicted [Read More]

Comments

Brock

Wikipedia is not a probabilistic system.

I do not really "understand" Google because the math is beyond me, but I trust it. I understand Wikipedia just fine, which is why I don't trust it.

Information systems are only useful to the user at the point in time at which the system is accessed. At the time of a Google search you are presented with a mathmatically determined 'average' value; the sum wisdom of the internet's hyperlinks. It is an average value, and even if 30% of the links on the web are "wrong" you still get the right answer.

Wikipedia does not work like that. When you access Wikipedia you do not get the average value of an article; you get the last author's value only. Instead of getting a probabilistic average you instead are getting a single data-point.

Google is "wrong" only when the entire web is wrong. This happens on occasion, such as when an urban legend becomes more popular than the truth (when it's done purposefully it's called a Google Bomb). Wikipedia is wrong when a single person is wrong. It is also incredibly easier to "bomb" Wikipedia. Anyone with a login can do it with 1 minute's work. With 860,000 articles an error in an obscure article can remain undetected for some time.

(I found an article where someone had inserted "Jake is the best!" or something like that in the middle of a sentence. As an experiment I left it there to see how long it took for someone to find it. It's still there 4 months later, and that's with an obvious error. An error in the data that only an authoritative source would know was wrong is likely to last even longer.)

To use an analogy most survivors of the Dot.Bomb would understand, a Google search is like predicting stock performance by taking the average stock price of every Wall St. analyst (occasionally wrong and sometimes very wrong, but usually close); while a Wikipedia search is like doing the same by trolling chat rooms for tips.

chris anderson

Brock,

In the popular entries with many eyes watching, Wikipedia becomes closer to the statistical average of the views of the participants, weighted by such factors the authority of each as defined by the others (frequent contributors to any entry tend to win any vote-offs). Studies have shown that for such entries, the mean time to repair vandalism of the sort you describe is measured in minutes. As Wikipeida grows that rapid self-repairing property will spread to more entries.

But the main point I was making about Wikipedia was not that any single entry is probabilistic, but that the *entire encylopedia* is probabilistic. Your odds of getting a substantive, up-to-date and accurate entry for any given subject are excellent on Wikipedia, even if every individual entry isn't excellent.

To put it another way, the quality range in Britannica goes from, say, 5 to 9, with an average of 7. Wikipedia goes from 0 to 10, with an average of, say, 5. But given that Wikipedia has ten times as many entries as Britannica, your chances of finding a reasonable entry on the topic you're looking for are actually higher on Wikipedia.

That doesn't mean that any given entry will be better, only that the overall value of Wikipedia is higher than Britannica when you consider it from this statistical perspective.

kitchen hand

Either way it takes the academics in ivory towers out of the equation, which is both a very good and a very bad thing.

Brock

Chris,

I agree that Wikipedia as a whole has more total value than Britannica as a whole. It probably does produce more social utility than Britannica, just as the Web + Google produces more utility than a good library + a card catalog.

But no one needs the whole of Wikipedia. They need the article they need, and they need it to be (mostly) right.

My point was that individual Google searches are probabilisitic, but that individual Wikipedia articles (the ones in the Long Tail at any rate) are not. Since individual searches and articles are what matter to individual people, I think that's the more important thing to focus on.

I think Wikipedia would be more probabilistic to the user if disputed issues, history of changes, and "voting" was displayed in the actual article without having to comb through the changes. Put the statistics of opinion right out in front where the intelligent reader can judge them for himself.

I just want to make clear that I think Wikipedia is great in a lot of ways, but it is engineered poorly. Wikipedia is a lot like Communism - a nice idea, but inappropriate for humans. Too many of us has motivations far from the pursuit of objective truth. It would be far better if each author could write his own, complete version (perhaps borrowing sections using a Creative Commons license). If you don't like it, write your own, but don't mess with his. Then all readers have to do is find both articles, read them, and judge for himself.

Of course Step 1, "finding", brings us back to Google ... :-)

JCJ

Brock;
You don't care about the entire Google database either, just one or two entries. PageRank isn't an average either, it's basically whoever gets has the most links today (with weighting).

I think you make an erroneous argument, that the latest Wikipedia article is the result of only the last person's edit. This would be true if every edit involved a complete rewrite of the article. This is astronomically rare. Almost all changes are incremental, and as a matter of practical interest they're often reviewed by the most recent contributors. As such, the wiki article you view is more of an average, or better an aggregation, of all previous edits. The most recent edit might be less trusted than the previous ten, but it usually represents a small portion of article.

Add to that, if you have even the slightest doubt about something, you can persue the article history to find when such a crazy thing was added.

Then there's the human habit of yielding to people who seem to know what they're talking about. This means that uninformed people tend to avoid putting in the work to contest something they don't understand, and informed and motivated people tend to do most of the work. Wikipedia's NPOV policy, maintained by crowd without the natural stimuluses towards mob mentality, means that demagogues naturally lose. This is rather unlike the practice of mid-sized groups that produce traditional encyclopedias.

And finally, I have to say that anyone who regards any *single* source as authorative gets what they deserve. Wikipedia is my first stop, and it's sometimes my last stop (for revisions) when I find out most authorative sources say something a little different.

Just try to write a report on something like witchcraft based on the Encyclopedia Britanica I grew up with. It won't even get you started. Wikipedia will though, because contributors try to be comprehensive to all input, not authorative about what something should be. That's precisely Wikipedia's strength: It's not meant to be authorative, but it will take authorative input (even when two authoraties viciously disagree). It doesn't take academic authorities out of the equation--they're reduced from all powerful to merit-weighted influence.

jelons17

Brock makes some good points about Wikipedia. Surowiecki explains in WoC that a good "aggregation function" is critical to extracting the wisdom from the crowd, such as a voting mechanism or calculating the average. Wikipedia doesn't really have one. Chris suggests that "frequent contributors" win vote-offs, but that is rare, and it puts the quality issue back in the hands of a few. (Google's aggregation function is the math that Brock and I don't understand, and is their core asset).

There is another concept relevant to the WoC that Surowiecki does not spend much time on called the Condorcet Jury Theorem, which says that if the members of the crowd each individually have a less than 50% chance of getting the answer right, then the chance the crowd will get it right is almost certainly 0%. (See http://www.lessig.org/blog/archives/003027.shtml). That is a real likelihood in Wikipedia, especially if the "frequent contributors" are few and in the < 50% category.

Chris has faith that as "wikipedia grows" it will become better. I fear that the growth necessary is similar to that of a Ponzi scheme: every human being on Earth will need to be doing nothing but editing the wiki entries they have knowledge on all the time for it to be reliable.

Seamus

jelons17 -

You say I fear that the growth necessary is similar to that of a Ponzi scheme: every human being on Earth will need to be doing nothing but editing the wiki entries they have knowledge on all the time for it to be reliable.

Not so, I think - there are technical fixes around that problem. An expert on, say, the First World War only really needs a stored RSS search that informs them if the pages on that subject change, or even better one that informs them if the pages on that subject change in particular ways. Any given person might need to keep a feed of the page about them (if there is one); their company (if they have one); and whatever other tiny number of things they happen to be sufficiently expert in that they would be expected to constantly edit those pages on Wikipedia in your Ponzi model.

Now, admittedly, Wikipedia doesn't have RSS searches that tell you when pages have changed. Yet. But lots of newspapers - the Baltimore Sun-Times is, I think, the longest-running example and NYT the most recent - have saved RSS search facilities, it's not especially hard to do.

Paul Robinson

I really wish I'd read this post before I wrote my post concerning what I think is going on:

http://www.well.com/~wiggy/2005/12/battle-in-new-war.html

The interesting thing is that people don't see this for what it is: an outright philosophical war. Some people think a small group who are qualified in some capacity can produce 'better' information than a much larger group that on average is lesser qualified, but INCLUDES the small 'highly qualified' group anyway.

It's OK to think about these things in terms of probabilistic systems, but it's much simpler than that: you either believe in democracy and freedom of speech or you don't. Twenty years from now, information will be a more valued resource than oil. In some industries, it already is. We have a choice: do we want to put the systems in place now to make sure we all own it, or do we actively fight against a system that seems counter-intuitive, thereby putting the ball back into the court of a very small group of people.

The Internet needs Wikipedia and sites like it. It needs information to be free and editable by anybody. To fail to work out the very small glitches and protect assets from the attacks predicted by game theory would be to plan to lose to the Murdochs, the Turners, the Rumsfelds of this World. Simple as that.

Piers Fawkes

re.: "the *entire encylopedia* is probabilistic."

Doesn't this ignore the way users access the content on Wikipedia? Sure, some scholars may browse subject areas and therefore the greater content is probabilistic - but most folk engage in hit and run activity. Quick in, quick out. This is the age of attention deficit - we want single entries now not subject areas or entire encycolpedias. Wikipedia has broken our trust in the single entries of content (and not doing much to rebuild it, to be honest) - and this could be Wikipedia's downfall.

robbie

"Twenty years from now, information will be a more valued resource than oil"

This ignores supply and demand. In twenty years we'll be saturated in information and thirsting for oil

Tony Mendoza

Actually, I couldn't disagree more with your observation. All biological systems are emergent, including ourselves. If you notice the folks having the problem with these types of systems, they are scientists, engineers or business folk. These people have been trained since their formative years to think in a quite unnatural way when dealing with the world. The world does not follow a simple set of linear equations that can be pulled from you typical college textbook, it is emergent. Yes, they may be using a mathematical technique to exploit emergent properties in an information space, but that doesn't make it any less emergent. For most of us, it actually feels right already. It's most of you who it feels wrong for, with "you" being the scientists, academics, professionals, etc.

Andrew Thomas

I always think the best example of "Wisdom of Crowds" is the "Ask the audience" part of Who Wants to be a Millionaire. The crowd is almost never wrong. The people who don't know the answer make a random guess, but all the random guesses cancel each other out and you're left with the people who really DO know the answer.

Seth Finkelstein

Umm, all you seem to be saying is that these system are built to be mostly right, most of the time, and we strange weird primitives don't "GET IT" when we are bothered that they're notably wrong many times.

That's a comprehensible view - but not necessarily an easily defensible view!

Mike Purvis

Piers: I can't speak for anyone, but I find that due to the interlinked structure of Wikipedia, it's rare that I view only a single entry. Typically, I surf broadly related entries for a half-hour or more, absorbing information on a variety of topics.

Obviously, this is pure, not applied, research. But is Wikipedia really the tool for applied research anyways?

Nate

The "Rumsfelds of the world"? Is he a media mogul now, too?

Daniel

Re: everyone editing wikipedia all the time. It should not be necessary to edit a topic more than *once* or to monitor it constantly. The "technical fixes" should take care of all that. Once information is entered it should be preserved, not hidden away under "changes" where a casual reader may not see it. If each entry could be made probalistic as well as the whole site it would increase the value of each entry. The value of the entry is what most visitors will be interested in, especially in out consumer based society.

Trent

For those of you who don't trust Wikipedia, I pose the question, "Do you trust Britannica?" If so, check out this link . Seems there are errors either way. The search for truth is an endless quest.

joe

Wikipedia is probabilistically successful even if you hit and run. The question is "Given a query, what is the chance that you will get an answer, and that it will be correct?" With Britannica, the latter half of that question is a bit higher, but the first half is much lower. Getting no answer at all is a failure, too. Overall, your odds of getting useful information are higher on wikipedia.

I do think it could do with better aggregation. There are plenty of experiments out there, wikipedia's just one of them...we'll get there.

j-lon

I trust that someone is accountable for mistakes in Britanica. That may be misguided too. But it's also why we have defamation law. Wikipedia is kind of defamiation proof. Sure, if someone complains, the offending publication will be removed. But in at least some cases, the damage is already done, and the distributed nature of the WP makes it difficult or impossible to hold anyone accountable (particularly given the ease with which people can post anonymously). Conversely, if Britanica did the same thing, they'd face a defamiation suit. This, I would imagine, if a pretty profound incentive to err on the side of not publishing untruths or information damaging to someone's reputation.

I like WP quite a bit myself. It's a very nice way to interface with info to the extent the info is accurate. I love being able to read one article and then drill down on a term by clicking on a link. That's a really great way to explore.

But I think the WP should do a better job making clear to the users the inherent limitations of the WP at the micro level (i.e., there's a pretty good chance that any given article could be wrong in a pretty major way).

I'm one of those over educated academic/professional people someone was complaining about above. But from time to time, I teach college students. The limitations of the WP are not at all obvious to them. They just want the easiest path to getting an answer (or at least the feeling of getting the answer), regardless of whether the answer is accurate. Clearly, it's the job of teachers to help educate students about the limitations of things like the WP, but it sure would help if the WP folks were a bit more forthright with the user about the WP limitations.

WP does have a disclaimer. But you must click an 8 point type link below the fold at the bottom of the page to get to it. How many people ever click on links like that? Not many.

Instead, I think each article should begin with some language like this followed by a link to the longer disclaimer:

"WIKIPEDIA IS A PLACE TO START RESEARCH, NOT A PLACE TO FINISH IT. THE WIKIPEDIA COMMUNITY DOES ITS BEST TO POLICE THE ACCURACY OF THE INFORMATION HERE. BUT BECAUSE WIKIPEDIA ALLOWS ANONYMOUS CONTRIBUTORS, NO INDIVIDUAL OR INSTITUTION IS LEGALLY ACCOUNTABLE FOR THE ACCURACY OF THIS INFORMATION. THEREFORE, THIS INFORMATION IS PRESENTED "AS IS," WITH NO WARRANTY TO ITS ACCURACY, AND THE BEST PRACTICE IS TO CHECK WIKIPEDIA ENTRIES AGAINST OTHER MORE EASILY VERIFIABLE SOURCES."

Mike Stone

With respect, I don't buy the idea that the human mind can't handle the notion of a micro/macro tradeoff. In point of fact, the human brain is BUILT to discard information at the microscale and produce decent average results at the macroscale.

A simple case in point is the concept of temperature. There's no such thing at the microscale, in this case meaning atomic scale. Temperature is an aggregate property of the average motion of huge numbers of atoms, not the instantaneous, or even long-term-average, motion of a single atom.

Even at the macroscale, human perception of temperature involves more loss of low-level precision. Most people can't tell you how the temperature at their elbows compares to the temperature at their knees, let alone how much signal they're getting from a single, specific nerve. Nor can most people give you a precise statement of the absolute temperature around them at the moment.

It's hard to find a part of the human information-processing system that doesn't characterize information and throw away the detail before passing the message up to the next level of processing, in fact.

IMO, the real trouble is that people want to believe that every problem has a simple, easily-stated, one-size-fits-all solution that will always provide good answers. No such solution exists, or ever has, but in time, people get used to the inaccuracies of whatever system is in use at the time, and learn to ignore them.

We discount the fact that many specific news stories about, say, atrocities in the Superdome following Katrina, were completely inaccurate, because we believe that on average the mechanism of news production gives reasonably good results.

People have trouble with Google and such because they haven't had time to develop a blind spot that lets them ignore the erronous results, and go back to their comfortable assumption that the system is Platonically perfect.

jelons17

If Wikipedia is a "place to start", that "shouldn't be cited" and beneficial for the ability to "surf a bunch of interrelated topics through links to get a quick overview" built by anonymous contributors who can't be check for authority, how is it any different from the Web with Google?

Also, does the success/quality of Wikipedia require that there only be one Wikipedia? If there are more than one Wikipedia, doesn't that make it harder for each individual article to have the many eyes necessary to improve quality? If so, who gets to decide which Wikipedia is the one?

Glen Raphael

"Given a query, what is the chance that you will get an answer, and that it will be correct?"

Wiki does much better at this than one would initially assume because queries aren't randomly distributed through wiki-space -- people share common interests. Queries cluster. The more likely it is that you are interested in a particular topic, the more likely it is that other people were interested too. Interested enough to create, modify, and watchlist that topic.

Thus, Wikipedia could easily be >99% accurate (measured as percentage of accurate answers returned) even if half the articles in the database were complete nonsense, so long as the /right/ articles are in the accurate half. The important question is whether the articles being given the most attention are the ones people care most about the answer to. Which is where the probability comes in.

Brock

Ok, two last points.

What I've been trying to say is that WP does not provide enough of a filter. Information gets in too easily. "Correct" information almost always has a higher signal strength than incorrect information, so raising the bar should not damage WP.

Soundbite: With each additional user WP gets less reliable and Google gets more so. (And the evidence of WP's co-founder Wales editing his own bio should make my point quite clearly)

And on Daniel's point, he's right. Correct information should have have to be contantly guarded. Vigilance is a high-cost activity and I have better things do with my time than constantly watch out for people editing the article about me, or mentiong me in other articles.

And as a last, third point, Paul Robinson (above) is full of crap. This is not a philosophical war. This is a straight-up social engineering question of how information is processed within a society, how information is filtered, and how decisions are made. Some systems are better than others at different kinds of tasks. The only "War" is the war to improve Wikipedia.

Eric Johnson

Re: . It's most of you who it feels wrong for, with "you" being the scientists, academics, professionals, etc.

Posted by: Tony Mendoza | December 19, 2005 at 07:15 AM
-----
There is a reason scientists would have a concern about this. There are "laws" of nature that are immutable as far as we are concerned. We can analyze a process, and if done the same way every time we will get the same results. This ONLY applies to the real, observable science fields (mathematics, physics, etc) - not to fields like archaeology, anthropology, etc where people see what they want to see. There is truth, and there are embellishments of it, retractions from it, etc. It only takes a person with an "agenda" to put their slant on the information to make it "tainted." Same applies for standard encyclopedias.

Adam Moskowitz

what if filters worked together on an aggregation platform focused on the specific content the filter's cared about and completely eliminated the clutter found around it. what if filter's could start aggregating the specific content they like like content from other sites. what if there was a web search engine that did not contain web pages but instead only contained the specific stuff a user wanted from any given page. what if the scalability of filters was infinite and their actions over time created a social engine of purely filtered content. we are attempting all of this and more at clipmarks.com. click my name to see what i am filtering...

Pablo

I don't buy the "not wired" argument, that seems more a matter of how we're trained to think. Many people are quite good at figuring out things intuitively, and I'd be inclined to guess that what we call intuition is an innate capacity for sorting out imprecise probabilities without consciously working out special algorithms.

And evolution? That's not a question of probabilities confusing people, that's a question of refusing to accept notions that counter what you've been raised to think.

Stephen Downes

Article prompted by this post: An Introduction to Connective Knowledge

-- Stephen

Jake Kaldenbaugh

I think Chris' point with Wikipedia was to illustrate how people often think in binary terms - things are 'bad' or 'good'. Further, they make these determinations by examining the exceptions to the rule rather than the rule itself. For instance, the media recently publicized a few very prurient examples of incorrect entries on Wikipedia which causes a lot of people doubt the 'referencability' of the entire body of work (see Nicholar Carr's Rough Type for an illustration). However, this type of thinking ignores the fact that Wikipedia is as accurate as other historically accepted reference materials *on average*.
People who dismiss it Wikipedia because of a few instances of bad entries are missing out on the rest of the material which may be as good as any other source. This kind of thinking can be limiting, whether for personal research or building enormous businesses (Google).
Think about how much Google was denigrated when it IPO'd. People just don't get it. Why? Because they thought Google was a "dot-com". They couldn't appreciate the fact that conditions may have changed that would allow the company to leverage distributed information using an innovative business model as many smart people had predicted.
I think Wikipedia has an opportunity to adjust the system to improve it's accuracy. And if they do, then it goes from being widely disputed to the global standard. Wikipedia may undergo a similar transformation to Google: a little tweak could make all of the difference.
In other words, I think it's probable that Wikipedia's short-comings can be addressed.
;)

Marko

I keep forgeting about Wikipedia all the time, I just hit homepage button and I am on Google, but I will change that.

cwolf

An insightful piece and well placed with context. Although I reject your premise based solely on the fact that known data about mammalian brains shows operation based on noise reduction and recursion. The systems you mention such as Google and Wikipedia currently work as a magnifying glass on the output side and as a massively backpropagating neural net on the input side in each case there is random input and well filtered output. Obviously the number of calculations and bias/dampening on the whole is far less than that of a rodent's brain at this stage, the point is the interface, we are all looking through ACME brand scale systems.

Given Bayes Theorem applied to a multi-user state enabled system like Wikipedia you will find that the difference in the ratio of input to output quality is decreasing but the interface is still the same. Wikipedia view models have a much closer correlation to input then Google (hence the scheduled crawl/whatnaught). Given that Google bases results on known indexed web pages, video, and other items, Wikipedia is very different and bases all results on directly written items. If the system continues at the pace it is currently there will be lengthy entries for the word "the" and the word "this". Disruptions in the quality of both systems benefit the back propagation and enhance the returned results in the long run en masse. Now we CRAWL soon we will WALK eventually they will run.

The human brain and most other animals do large amounts of filtering and pre-processing before it is deemed as conscious. The brain does absorb more audio input then consciously recognized, as well as tactile and so on. Signal degradation among the pathways in the Limbic system, Hypothalamus and such damper this causing less processing to be done in the Cerebrum. Google and Wikipedia operate in similar fashion but the interface is much smaller and the output is very generalized. A close example is the stop words you see at returned results from Google excluding words such as "the", "him", "it". Kurzweil identifies that this gap is shortening in the statement about "Law of accelerating returns".

Invoking Darwinism I will agree that it is better to be on the side of the masses in this aspect. How many editors revert or change Britannica entries on a daily basis? How many books has the Librarian read versus indexed texts by Google? But on the defense of Britannica how many similar methods used to build the volumes were around before it?

I will agree that my statement is pure bullshit. We are just building stuff off the only thing we know which is "us". At the current state we haven't hit the level of "cave art" yet.

Larry Irons

Speaking of Wikipedia and Google,

"They're designed to scale, and to improve with size. And a little slop at the microscale is the price of such efficiency at the macroscale."

But, what if it matters that your use of the information is "right" or "wrong"? When your use of information from these sources is judged by others who "know" the information, it does matter whether one micro-fact is correct or not!

Peter da Silva

"When professionals--editors, academics, journalists--are running the show, we at least know that it's someone's job to look out for such things as accuracy."

That's the same old "Crazy Yenta Gossip Line" argument that Harlan Ellison was on about ten years ago.

It's still wrong. Journalists aren't any smarter than the rest of us, and every time I've run across a story about something I'm an expert in, they've gotten it so wrong that these days I treat them as no more reliable than blogs... if they can't get it right when I can thell they're wrong, how can I trust them to get it right when I don't know better?

See my response to Ellison by following the link...

Pointer Institute for Media Studies

Wikipedia is so inaccurate that its writers are referred to as wikiling writers. Another problem that the article above overlooks is that wikiling administrators actually ban writers who post correct info that is disliked by the administrators. That skews all "accuracy" discussions. http://rexcurry.net/wikipedialies.html

Recently, someone who posts to Wikipedia wised up and improved the "Roman salute" article somewhat so that it recognizes and repeats some of Dr. Rex Curry's discoveries. http://rexcurry.net/wikipedia-lies.html

Wikiling writers cover up new discoveries by Professor Curry that the salute of the horrid National Socialist German Workers' Party originated from the USA's Pledge of Allegiance. http://rexcurry.net/book1a1contents-pledge.html

Wikiling writers cover up Dr. Curry's discovery that although the swastika was an ancient symbol, it was also used sometimes by German National Socialists to represent "S" letters for their "socialism." Hitler altered his own signature to use the same stylized "S" letter for "socialist" and similar alphabetic symbolism still shows on Volkswagens. http://rexcurry.net/book1a1contents-swastika.html

azeem

Chris,

The claim that humans don't get statistics is something that we should be able to verify--perhaps by looking at 'experimental economics' and Kahneman's work. Perhaps this is something you can do to evidence that claim.

The wikipedia vs britannica is a funny debate. Surely the point is that both wikipedia and britannica are jumping off points, unless you are checking the really really trivial.

In most cases, the wikipedia is a 'good enough' jumping off point but for anyone doing anything more detailed, you may want to be furnished by longer bibliographies (of both journals and books) which EB may (or may not) help you with.

Even for canonical resources, there is revisionism and post-revisionism.

Earl

The problem with Wikipedia is exactly that at a microscopic scale, it's often wrong. As someone pointed out above, there is no built in aggregator to display group wisdom; you merely see the last edit at the given point in time you read it. Further, it's often (perhaps always?) worse to 'know' something that is wrong than not know anything (the so-called failing of Britannica.)

For example, I'm a mathematician. I was reading a topology book and wanted examples of a particular idea. I went and looked on Wikipedia and found such examples; unfortunately, the five examples listed were entirely wrong. I had the ability to recognize this but in general, most of the audience for the post wouldn't (or hence wouldn't need to look at the post in the first place.) In mathematics, knowing something wrong is definitely worse than not knowing anything; hence, the problem that Wikipedia isn't trustworthy means it's close to useless for many subject fields in which correctness is important.

Worse yet, I annotated the five examples with a note explaining how they were wrong, said that I didn't have time to fix them, then gave one correct example and a diving line (a horizontal rule) to offset my one correct example. Some retard then came along and removed the horizontal line, so there was a note saying "the following examples are all wrong for blah blah blah reasons but here is a correct example" with no divider between the correct example and the incorrect ones. The article remained this way for another month or so, IIRC. Hence, for subjects such as mathematics where there is a distinct right and wrong, mathworld is far more useful than wikipedia will ever be until they find authorities and start locking articles.

earl

chris anderson

Earl,

You raise an interesting point when you say "As someone pointed out above, there is no built in aggregator to display group wisdom; you merely see the last edit at the given point in time you read it."

But I think there is, in fact, a cumulative effect that comes from "community of ownership" that is formed by the contributors. If someone takes the trouble to edit an entry, they're more likely to put it on their watchlist and take an interest in its further development. Thus over time more contributors equals more people invested in improving and protecting the quality of the entry.

That's why they tend to get better, not worse, as times goes on.

Earl

Chris,

You're wrong -- you still only see a snapshot. Now, community interest may (though often doesn't) mean that errors in wikipedia are corrected. Nonetheless, the key difference is that while google *always* displays aggregated community knowledge, wikipedia *always* displays some instance of one person's edit. Thus google always approximates correctness while wikipedia is often quite wrong. And that, of course, is the reason wikipedia isn't at all useful for finding correct facts -- you may well visit during one of the periods of incorrectness and these periods last a highly variable (and potentially quite long) amount of time.

earl

chris anderson

Earl,

Well, of course it's just a snapshot. But my point was that as the community of ownership grows, the change delta between snapshots and the average length of time to correct errors will shrink. No guarantees for any particular moment, but over time the entry becomes statistically more likely to be accurate.

A.R.Yngve

If you keep thinking in terms of probability instead of simple either/or logic, another issue comes up which REALLY unnerves some people:

Countless other universes.

I know, I know: the existence of other universes (with their own natural laws, energies, matter, life and other things) cannot be proved with the old either/or logic.

But in terms of probability, the idea that ONLY this universe exists, and still has intelligent life in it, is absurdly improbable.

The "Shmintelligent Design" crowd loves to point this out as "proof" of God's objective existence... but they hate the idea of many other universes because it annihilates the ID argument: if our cosmos is one of (infinitely) many, our existence is not just likely -- it's inevitable.

Mention the probability argument for the existence of other universes at a party, and watch the other guests explode: "YOU CAN'T PROVE THAT!!"

You can of course reply: "I can't logically prove that other people really exist as thinking beings and not as automatons, but probability logic argues that they DO exist. In your case, though, I'll make an exception..."

Then run -- fast.
;)

James Heckman

I just wanted to take a moment to thank you for writing your article, The Probabilistic Age. I read it about a month ago and it really expanded my thinking about how to operate a content-based Web site in the age of the blog.

I run the American Marketing Association's Web site at http://www.marketingpower.com. I've been aware of blogs for a while, but I didn't really get the concept.

Based on some of the ideas I found in your article, I'm developing a new content strategy for the site that ties into all the blog and audience generated content activity.

I've even started my own blog -- all of one day ago -- to work through the ideas, Little Wolf (http://littlewolfpack.blogspot.com).

You have a fascinating site, thanks again!

James Heckman

The comments to this entry are closed.

Tidbits

The Long Tail by Chris Anderson

Notes and sources for the book

FREE was available in all digital forms--ebook, web book, and audiobook--for free shortly after the hardcover was published on July 7th. The ebook and web book were free for a limited time and limited to certain geographic regions as determined by each national publisher; the unabridged MP3 audiobook (get zip file here) will remain free forever, available in all regions.

Order the hardcover now!