April 16, 2006 10:15 | WebBlogging / WebTech

Blocking Technorati

If you're one of the people who prefers to disallow Technorati's bot, which does not respect the robots.txt convention and it's directives (talk about a good net citizen...), you may have resorted to blocking the bot's IP, 209.237.230.104.

As of today, Technorati has changed the originating IP of its crawler to 209.237.228.227.

Update your block-lists accordingly. ;)

(I don't currently block TR myself. I enjoy the blog-ego hand-job it provides... when it works. ;)

Comments

Hey Boris,

I hadn't heard that Technorati doesn't respect robots.txt files. Do you know where I could find more information about this? Who has reported on it and such?


Hi Michael,
Niall Kennedy, who later joined TR and then recently left, reported it over a year ago.
In an update on that post he mentions ambiguously the use of the "robots meta tag", but I do not know if they respect that. Seems a silly to actually except the energy to retrieve every file (and thus download and store, even temporarily, the person's content) just to then read the meta tag and decide to "drop it"... Dunno, I know TR developers personally.. I know they are very smart people; making their bot respect robots.txt is trivial...)

Also, the empirical proof is that there are no requests for robots.txt in your server logs.

Also, *someone we know* has been blocking them, and all search engines for 6 months and yesterday found TR's bot crawling his site with the new IP.


Technorati's bot works primarily in response to pings - whereby you notify us to tell us that you have updated. For certain blogging tools, we gather bulk update information and index them based on that.
If ceasing to ping us doesn't work, and you wish to be removed from our index, please email support@technorati.com, and we can mark your blog to not be indexed again.


Thanks Kevin!

The site in question was not pinging TR directly, but other services, one (or more) of which may have then pinged TR. (ping-o-matic was not listed however)

To be clear, this is not so much a dig at TR, but rather information for those who might care.

Awareness is a good quality to have, and knowledge a good resource. We agree I'm sure. ;)


Pissed off. :)

Technorati is at it again. I haven't pinged any services for the last week. Technorati has changed the bot IP, AGAIN.

I think I will have to change strategy and to block with the User Agent. So What Kevin Marks says is just. plain. wrong.


Karl, which blog url do you want me to block from indexing? It's easily done. Maybe someone else is pinging your blog on your behalf.


Kevin: if my mail has reached your services. I have sent this email on April 26, 2006 to support@technorati.com with topic: "technorati bot".

================
Hi,

Could you add my weblog to your list of site not indexed again?

After trying robots.txt
http://www.la-grange.net/robots.txt
After trying IP blocking
TR changed for the 3rd time at least
After not pinging anyone else

All of this combined together, Technorati bot still comes.

Thanks
Best Regards
==========================


8- カ−ル

they changed again the IP and restarted to look for data.


Technorati is annoying as hell. It totally ignores both meta tags and robots.txt.

As to Kevin Marks, I shouldn't have to write to ANYONE to make technorati stop. It should see my robots.txt file and go away.

It's unethical to do what you people do.