« I wish people would stop using economy as just a smart-sounding metaphor | Main | The surprising derivation of the word free »

August 06, 2008

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341bfb6353ef00e553d3368f8833

Listed below are links to weblogs that reference Thirteen words that lose their meaning when the denominator approaches infinity:

Comments

Varun Mahajan

Most of these words still make sense. Say, even after new additions on youtube everyday, if the ratio of useless to usefull video remains about 90:10 ( this ratio changes continuously), then you can always use 'Most'.

Bertil

Hail to that!
— from the “I'm a statistician, but I don't need that to tell you that your prejudiced, off-the fly, narrowly personal analysis of ‘all the blogs’ is cr*p”-support group.

For most areas I can think of outside of blogs and forums, there is representative, well-appreciated measurement of audience, and speaking of ‘head’ and ‘tail’ make sense — as soon as you try to give a frontier of where you set the line (as in ‘most nationally syndicated newspaper’) because, let's face it: there is increasingly hardly any representative scale.

“Blogosphere”, of the vast set of all things that has been thrown under that term, certainly challenges anyone's imagination: how many times have I read “blogs (the other source of news)” or “blogs (the sub-academic article thingy)” or “blogs (those places where you can find LolCats)”.

I've been arguing in the desert for people to mention “[their] blogroll”; Scobble has been doing it, I thing, but I always assumed that he did it to boast (because he things his roll is bigger then the blogosphere — might be).

However, you can give ratios in an unbounded world: that's what samples are for. Good sampling is not a complicated one, but an explicit one: “ramdom URI” is very inefficient, but OK; “Feedburner entry base” is biased, but all are.

So let's try:
Most blog entries that I read (including this one) are very thoughtful. That I read.

César

Add me to the skeptics about this post...

At the very least, it's not infinity that takes the meaning out of those words: most real numbers are irrational, the average real number is zero, and all rationals are real. And you can also make statements about the average human being...

I'll agree there are 'hard to define sets', so your 'average blogger' and my 'average blogger' might not be the same. And, even, my today's 'average blogger' and my tomorrow's 'average blogger' might not be the same, and that might cause quite a lot of definition trouble.

But what that has to to do with the denominator approaching infinity (and 10^8 or 10^10 is not such a good approximation for infinity) I cannot see.

SWIMMER21

On this question of langage, you should have a look at the general semantics theory by Korzybski.

bokus

I would agree that when people apply these qualifiers to large groups like "bloggers" (or "Americans" for that matter), they often don't really understand the scope of the group they are characterizing. But these groups certainly can be quantified. For example, see this research:
http://www.slideshare.net/mickstravellin/universal-mccann-international-social-media-research-wave-3
which estimates that there are 184 million bloggers globally - 42 million in China alone (compared to 26 million in the US). That's a lot, but does not "approach infinity". Most blogger's write a personal blog - 63.5%, in fact.

Steve

The opening sentence is perhaps a great illustration of your point?

Aaron

Words like most and average do make sense over an infinite or extremely large number of items even if they are increasing rapidly.

Examples:


  • The universe is mostly empty space.

  • Most numbers will never be used by the average person.

  • The average of all numbers is 0.

  • The average size of all living creatures that will ever exist is smaller than a penny.

  • For most of time, humans didn't and won't exist.

  • Most people are innately good.

Nearly everything in large groups can be defined on a Bell curve which means that there is an average and a most for everything.

The only thing to watch out for is applying the words haphazardly to make negative generalizations. "Typical men are..." Once generalizations to make a negative point, they become dangerous.

Chris Anderson

@Varun:

Yes, you can say "90% of the videos I've seen on YouTube were useless to me" but then you've closed the set. You can't say "90% of the videos on YouTube are useless" because that's two open sets (videos on YouTube and range of measures of usefulness).

Chris Anderson

@Cezar:

When I say "approach infinity" I mean "unbounded set growing at an unknown rate". But the first sounds cooler than the second ;-) Yes, you can count me among those who sometimes use mathematical language sloppily to make a point. But at least I admit it!

Chris Anderson

@Bokus:

That study is a perfect example of what I mean. What's a blogger? How did they count (did they only count active blogs on major platforms? Whoops!)? What does "personal blog" mean? Is this one? Etc...

Chris Anderson

@Aaron:

You make a good point, but I'd argue that although those are all, technically, open sets, they are actually pretty well understood open sets. So the human population and the set of numbers are something we understand pretty well, well enough to make sweeping observations like that. But the open sets I'm talking about are not.

I would say that your last point verges into the danger zone I'm talking about. Although I agree with it and use that phrase a lot, "good" is an ill-defined and unbounded set. I'm sure there is some definition of "good" that would falsify that sentence.

Alberto Cottica

Ok, there seems to be an agreement: Chris's basic point is meaningful, without this having a lot to do with openness or closedness of sets, or infiniteness of denominators, or even computability. In fact, Taleb in "The Black Swan" makes exactly the same point with reference to perfectly closed sets: what makes the thirteen words useless is, in his opinion, the shape of the distribution. When your frequency distribution is a power law, averages, standard deviations, percentiles ("90% of blog posts...") are still computable, but they do not mean much.

Whatever the reasons, I find Chris's remarks really really useful and will try to stick to them in the future.

GK

Thank you Alberto. The (most) illuminating part of this entry (for me) is not whether standard statistical information can be derived from open sets, but is whether the implications are to remain the same. To a certain degree we need to generalize, which is why it makes it all the more important to be specific about the limits set in place.

The real question for me is how do we get useful information...but that's another post.

To Chris- how do we get to an open set "we understand pretty well" without making a series of wrong generalizations and moving on?

This topic may be getting a bit out of context. When I read, “Blogs are personal. Bloggers are passionate. Journalism is institutional. Journalists are dispassionate,” I realize it may not only be the old world making the generalizations.

j h h l

What Chris seems to have picked up is that nobody knows what they are talking about ( as denominators). So how can the numerator make any sense? And averages make sense as representatives in only distributions where there's a symmetrical lump in the middle. So there's three flaws right there.

As you remember from Logic, a false premise implies any conclusion, so these hasty generalizations are very useful!

Whenever I hear a generalization, I wonder how it applies to the Inuit.

Ole Eichhorn

An interesting point but I think it is dead wrong; just because you don't know the absolute quantity doesn't mean the ratio isn't meaningful. You can say "most blogs" and make a point without knowing how many blogs you're talking about...

Chris Anderson

@Ole:

Can you give me an example? I really don't know how anyone can say "most blogs..." if they don't have a way to measure the implicit ratio one way or another.

Private Eye

Great post!
I agree with every word, especially with your point about using 'most' - I mean, come on, how can you even use it without sounding arrogant.

Michael A. Banks

"Many" and "most" are among the most over-used words in the English language. Some readers glaze on past these words, assigning neither any meaning. Depending on the context and the reader, some readers take "many" to mean most.

"A large number of" and "the majority" are overused, as well, as they are used to avoid repeating "many" or "most" within paragraphs.

The better approach for the writer or speaker is to seek out definite percentages, numbers, or prooprtions. Even better is to avoid writing oneself into the position of having to use one of the imprecise terms or a synonymous word or phrase. Witers would do well to search manuscripts for oiccurrencse of "many" and "most," and then go back and write them out.
--Mike

Michael A. Banks

I remember as a child assigning values to "few" (3 or 4), "several" (5 or 6) and "many" (7 or more).
--Mike

Mohit Hira

I'd like to suggest another addition to that list: 'generally'. As in, "generally speaking" which is as obscure as one can get...

Phil Osborne

As a university lecturer who gets to mark many assignments / essays / marketing plans each year, i will be tagging this post for the students to digest... of course you can argue that the words have meaning for the person who has used them BUT the problem is for the reader to interpret them in the same way (in order that the communication is clear)... just because a word has a link to a scientific meaning it doesn't give it anymore legitimacy or usefulness... the key to good description is the decoding

all generalizations (including this one) are dangerous
average simply means 49.9% of the population are less and 49.9 are more (in other words 99.9 don't reflect the average no matter what the population is)_ i know i could be more precise with the numbers

examples for the usefulness of generalized words are boundless, while they have a place in 'conversations' they have little use in a coherent argument

Thanks for the provocation!

The comments to this entry are closed.

Tidbits

The Long Tail by Chris Anderson

Notes and sources for the book

FREE was available in all digital forms--ebook, web book, and audiobook--for free shortly after the hardcover was published on July 7th. The ebook and web book were free for a limited time and limited to certain geographic regions as determined by each national publisher; the unabridged MP3 audiobook (get zip file here) will remain free forever, available in all regions.

Order the hardcover now!