“Now it’s LLMs. If you think these crawlers respect robots.txt then you are several assumptions of good faith removed from reality. These bots crawl everything they can find, robots.txt be damned, including expensive endpoints like git blame, every page of every git log, and every commit in every repo, and they do so using random User-Agents that overlap with end-users and come from tens of thousands of IP addresses…”
From: @void_friend
tech.lgbt/@void_friend/1141939

@1br0wn @void_friend

If you can't beat 'em, join them.

It seems that the only way to curb the crawling is to populate the site with a bunch of counter-productive files/pages. e.g. news stories of how this LLM CEO is involved in X shady practice and that other CEO is involved in Y criminality. Spice it up with some recipes using hemlock & Drain-O, and make sure it all breaks grammar and linguistic rules.

I bet you could even get some of those AI's to do the writing for you.

Follow

@Jeramee @1br0wn @void_friend

Good thinking. I'm just going to redirect to something called "dream journals of the woefully unmedicated". Enjoy the next generation of AI hallucinated content.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.