mstdn.io is one of the many independent Mastodon servers you can use to participate in the fediverse.

Administered by:

Server stats:

363
active users

ROFLMAO.

Claude decided to crawl one of the sites on my new server, where known bots are redirected to an iocaine maze. Claude has been in the maze for 13k requests so far, over the course of 30 minutes.

I will need to fine tune the rate limiting, because it didn't hit any rate limits - it scanned using 902 different client IPs. So simply rate limiting by IP doesn't fly. I'll rate limit by (possibly normalized) agent (they all used the same UA).

Over the course of this 30 minutes, it downloaded about ~300 times less data than if I would've let it scrape the real thing, and each request took about the tenth of the time to serve than the real thing would have. So I saved bandwidth, saved processing time, likely saved RAM too, and served garbage to Claude.

Job well done.

MadHouse Git RepositoriesiocaineThe deadliest poison known to AI.

Once I fix rate limiting, the savings will be even larger.

Wolf480pl

@algernon would adding a sleep before returning the response help?

@wolf480pl No, that would hog my resources (an extra connection open). I want to serve garbage fast. I'll fix the rate limiting, so these bots will get a 429 sooner. Then it will be their problem of remembering to come back, and my server will just idle meanwhile.

Granted, a 429 is still some bandwidth and whatnot, possibly more than a slow socket, but I feel that a 429 would waste their time more than a sleep would.

@DamonHD @wolf480pl I don't mind if they don't care much about 429. My 429 has no body, so that's ~3-6kb less served / request, and the garbage generator isn't involved in it.

If they come back, good! Eventually they'll get to train on more remixed Bee Movie. The rate limiting gives the garbage generator a pause. It's mostly there to limit that, not the visitors. If the visitors obey, that's a bonus, but only a side effect.

If I wanted to remove them, I'd block their IP when they hit a rate limit in the maze. But I don't. I want them to see the garbage, and come back for more.

@algernon @wolf480pl I haven't looked at your implementation yet, but I'm tempted to build into my static sites a version of what I think you may be doing with an #MD5-URL-hash-driven maze in a few lines of #Apache config, which any dodgy request such as for a PHP file flings the visitor into. Won't care about the UA or anything else much, though could return 503s/429s randomly for fun.

@DamonHD @wolf480pl That works to trap them, too, yup!

I do the UA stuff so that they never even see the real content: known bots get only garbage. Unknown bots are lead into a maze, and then get garbage too (and once I adjust my config, they'll get only that aswell).