Corante

Quote
"I can’t think of anything that demonstrates the sovereign nature of the self better than a blog.” - Doc Searls
About the Author
stowegold150x150.jpg
Stowe Boyd is a well-known media subversive, and an internationally recognized authority on real-time, collaborative and social technologies. His new blog is Message.
Check out the The AppGap - a group blog on the tools and trends that are changing the way we work.

Get Real

« Thomas Freidman on The Power Of Networks And Blogging | Main | MSN Filter - where do you stand? »

August 09, 2005

Technorati 100: Inside And Outside The List

Email This Entry

Posted by Stowe Boyd

In a recent post, Jason Calacanis on The Blog 500, I suggested that Jason Calacanis was off base when he said that the Technorati 100 are selected -- unlike the rest of the index -- "based on the number of links for all time." Jason also asserts that the T 100 don't change over time.

I asked the nice folks over at Technorati to demystify all this:

[from Adam Hertz email]

Stowe,

We use the same authority calculation for Top 100 as we do elsewhere in our service: it is based on the the number of unique sources that are currently linking to the blog.

Let me know if you have more questions on this.

Best,
-A-

So it seems that the Technorati 100 is just the first one hundred of the total index, all of which are be recalculated on the same basis, and those 100 really do in fact have large numbers of inbound links from a large number of sources, and those links are "current" which in Technorati-speak means that those inbound links are on the front pages of the source blogs.

As I said, this makes most of what Jason is asking for in his Blog 500 post irrelevant, since Technorati is actually implementing pretty much what he says he wants. And I don't blame Jason for being confused about how this works: Technorati's innner workings seem to be a mystery to us all, no matter how critical Technorati has become.

I still maintain that this "hit parade" approach is less interesting than some long-term reputation model (as I outlined here and here). Bob Wyman at PubSub offers a bunch of useful insights on the pros and cons of using ageless links (like blogrolls, that change very slowly), here:

A "for-all-time" ranking system rewards people simply for having been blogging longer than others. It gives weight to seniority not quality. In a "for-all-time" system, a blog that accumulates 1,000 InLinks over the last five years is given the same rank as one that has generated 1,000 InLinks since it was first created 10 days ago. This just doesn't make sense. Imagine a blog that carried links to pictures of Janet Jackson's "wardrobe event" at the Superbowl and as a result gained 10's of thousands of InLinks in a matter of hours. Imagine also that that blogger hasn't had much to say since that event that anyone has found to be worthy of an InLink. Does it make sense that years later all those stale links should be lifting the rank of the now boring or even dormant blog over that of people blogging interesting content today? I don't think so. One important question that a ranking system should answer is: "What have you done for me lately?"...

Although the Technorati 100 is not based on a "for-all-time" system of weighting links, it is based on "ageless" links. I'm sure there are some uses for such ranking systems, but I must say that this attribute of the Technorati 100 is the one that contributes most to my failing to find it to be useful. Apparently, the Technorati system only considers links that are still visible on the blog when they scrape it. (Unlike PubSub, which is feed oriented, Technorati scapes blog pages...) However, they give to all such links an equal weighting in their ranking -- no matter how old they might be. What this means is that you can give your blog more say in the Technorati system simply by showing more history on your blog! Also, it means that if you abandon your blog, your links will continue indefinitely to have weight in the Technorati system. Given that a massive number of blogs are abandoned, any ranking system based on ageless link weights will have a persistent bias towards bloggers that used to be popular whether or not they are still popular.

PubSub does not provide a "for-all-time" ranking system nor do we base our LinkRanks on "ageless" links. As mentioned before, the Daily PubSub LinkRanks are computed using a window of only a couple weeks of LinkCounts data. Thus, very old InLinks have no impact on current Daily LinkRank. If a blog is abandoned, its influence on our rankings will rapidly disappear. We decay the value of more recent InLinks according to their age in somewhat the same way that we decay the value of multiple InLinks from a single site (see discussion above). What we do is give more value to an InLink created today than to one created yesterday and we give less value to a two day-old InLink, etc. until an InLink created a couple of weeks ago has no value to contribute to a blogs rank. The result is a much more accurate and current measure of a blog's current popularity, importance, impact, whatever...

As I suggested in a post yesterday, based on Mary Hodder's notion of taking control of the algorithms being used by services like Google, Technorati, and PubSub, I don't really want services like PubSub or Technorati just to grind away with their internalized algorithms, how ever well-motivated and rational: I want them to export the raw data in a structured format (to be determined what that is), so that we can determine what our own top 100 or 500 or 1000 blogs ought to be, based on the weighting that we place on the various factors. Since I believe that longtail reputation is more important than current number of links, I could put a higher weighting on that, and based on a candidate set of 100 or 1000 blogs, come up with an ordering based on my own recipe.

Even better, I could combine metrics derived from different services -- Technorati, PubSub, and the fictional Blognetter I outlined yesterday, for example -- in a spreadsheet, or even better, in a new meta-ranking service I envisioned called RankOut.

As these sorts of metrics become increasingly relevant, we need an open model to emerge. Jason's call for a different list of 500 A-listers is not the answer, and neither is the carefully tweaked algorithms buried within Technorati, BlogPulse, PubSub, or other services.

I am fine with Technorati and the others having their own closed algorithms, and offering the results up as one element of their value add. But I believe that this information -- the raw data they are amassing -- is not theirs: it is our data, it is the accumulation of our "gestures" -- our links, our trackbacks, our blogrolls, our tags, and so on -- and we have a right to ask these services to give us back the data that they have spidered from the Blogosphere, so that we can fiddle with it however we want to.

Comments (3) + TrackBacks (0) | Category: Technology


COMMENTS

1. Jason on August 9, 2005 10:33 AM writes...

Not confused at all. You were clearly wrong in your fist post when you said it was only links on the top page of people's sites. That's obviously not the case.

I'm asking for a list which only counts the last--say--year. The Technorati 100 favors long-term players who started blogging in 2002, 2003, 2004... I'd like to see another couple of lists based recent links in (i.e. 30 days, 90 days, 1 year) and have the list increased to say, 500.

also, getthing things like Yahoo instant messenger off the list would be good! :-)

Permalink to Comment

2. Stowe Boyd on August 9, 2005 10:57 AM writes...

Jason -

In Technorati speak, "the number of unique sources that are currently pointing to the blog" means scraping the top pages of other sites. As Adam points out in his comments (see above) the same technique is used for the T 100 as elsewhere in the index. So, yes, you are confused, and no, I'm not clearly wrong. Ask the folks at Technorati, if you don't buy it.

Permalink to Comment

3. Bob Wyman on August 9, 2005 01:59 PM writes...

Stowe, PubSub openly provides a good bit of the raw data that we use to compute LinkRanks. Take a look at our LinkCounts pages starting at: http://pubsub.com/linkcounts.php and the various pages that are linked to there. For instance, if you go to http://www.pubsub.com/site_stats.php you can input the url of your site. You will then get the page which lists all the InLink, OutLinks and Entries that we have seen for your blog for the last 30 days. Click on the InLink numbers for any day and you'll get a list of everyone who linked to you on that day... This stats page will contain a link to an Atom file containing daily stats for your site. It will also contain Structured Blogging data (see: http://structuredblogging.org/) that makes it easy to do reports. To get the structured data, "View Source" and search for the script tag that contains "x-subnode". Inside that tag you'll find a chunk of XML that contains the data found on the page. This should make it trivial for you to build apps, etc. that will keep track of the stats that we have for you.

On the subject of who "owns" the data... Please remember that it is very expensive for us to gather this data and it is very expensive to provide access to it (bandwidth, servers, liability, etc.) Gathering, cleaning and correlating data for over 14 million blogs isn't cheap. By doing this work, I believe that we add a great deal of "value" to the community. Who owns the "value" that we add?

bob wyman
CTO, PubSub.com

Permalink to Comment


EMAIL THIS ENTRY TO A FRIEND

Email this entry to:

Your email address:

Message (optional):




RELATED ENTRIES
Reminder -- /Message
/Message - A New Blog
The Individual Is The New Group -- Part 1
1000 Tags: Tag Advertising
Social Ethics And Technology Design
Nancy Hass on In Your Facebook.com
Black and White and Dead All Over: Is Newsprint Dead?
Anonymous Trolls, Beware: You Are Breaking Federal Laws