Please note!

This is an archived, inactive copy of
"Blocking Technorati"

Please do not link to this page.


April 16, 2006

Blocking Technorati

Posted by bopuc at April 16, 2006 10:15 AM

If you're one of the people who prefers to disallow Technorati's bot, which does not respect the robots.txt convention and it's directives (talk about a good net citizen...), you may have resorted to blocking the bot's IP, 209.237.230.104.

As of today, Technorati has changed the originating IP of its crawler to 209.237.228.227.

Update your block-lists accordingly. ;)

(I don't currently block TR myself. I enjoy the blog-ego hand-job it provides... when it works. ;)

Comments

Hey Boris,

I hadn't heard that Technorati doesn't respect robots.txt files. Do you know where I could find more information about this? Who has reported on it and such?

Posted by: Michael Boyle at April 16, 2006 11:46 AM

Hi Michael,
Niall Kennedy, who later joined TR and then recently left, reported it over a year ago.
In an update on that post he mentions ambiguously the use of the "robots meta tag", but I do not know if they respect that. Seems a silly to actually except the energy to retrieve every file (and thus download and store, even temporarily, the person's content) just to then read the meta tag and decide to "drop it"... Dunno, I know TR developers personally.. I know they are very smart people; making their bot respect robots.txt is trivial...)

Also, the empirical proof is that there are no requests for robots.txt in your server logs.

Also, *someone we know* has been blocking them, and all search engines for 6 months and yesterday found TR's bot crawling his site with the new IP.

Posted by: Boris Anthony at April 16, 2006 07:42 PM

Technorati's bot works primarily in response to pings - whereby you notify us to tell us that you have updated. For certain blogging tools, we gather bulk update information and index them based on that.
If ceasing to ping us doesn't work, and you wish to be removed from our index, please email support@technorati.com, and we can mark your blog to not be indexed again.

Posted by: Kevin Marks at April 16, 2006 09:40 PM

Thanks Kevin!

The site in question was not pinging TR directly, but other services, one (or more) of which may have then pinged TR. (ping-o-matic was not listed however)

To be clear, this is not so much a dig at TR, but rather information for those who might care.

Awareness is a good quality to have, and knowledge a good resource. We agree I'm sure. ;)

Posted by: Boris Anthony at April 16, 2006 10:22 PM

Pissed off. :)

Technorati is at it again. I haven't pinged any services for the last week. Technorati has changed the bot IP, AGAIN.

I think I will have to change strategy and to block with the User Agent. So What Kevin Marks says is just. plain. wrong.

Posted by: karl at April 25, 2006 04:53 PM

Karl, which blog url do you want me to block from indexing? It's easily done. Maybe someone else is pinging your blog on your behalf.

Posted by: Kevin Marks at May 2, 2006 02:38 AM

Kevin: if my mail has reached your services. I have sent this email on April 26, 2006 to support@technorati.com with topic: "technorati bot".

================
Hi,

Could you add my weblog to your list of site not indexed again?

After trying robots.txt
http://www.la-grange.net/robots.txt
After trying IP blocking
TR changed for the 3rd time at least
After not pinging anyone else

All of this combined together, Technorati bot still comes.

Thanks
Best Regards
==========================

Posted by: karl at May 2, 2006 03:06 AM

they changed again the IP and restarted to look for data.

Posted by: カ−ル at September 27, 2006 02:14 PM

Technorati is annoying as hell. It totally ignores both meta tags and robots.txt.

As to Kevin Marks, I shouldn't have to write to ANYONE to make technorati stop. It should see my robots.txt file and go away.

It's unethical to do what you people do.

Posted by: Donna at February 19, 2007 10:55 PM