I had a website earlier this year running on Hetzner. It was purely experimenting with some ASP.NET stuff but when looking at the logs, I noticed a shit-load of attempts at various WordPress-related endpoints.
I then read something about a guy who deliberately put a honeypot in his robots.txt file. It was pointing to a completely bogus endpoint. Now, the theory was, humans won't read robots.txt so there's no danger, but bots and the like will often read robots.txt (at least to figure out what you have... they'll ignore the "deny" for the most part!) and if they try and go to that fake endpoint you can be 100% sure (well, as close as possible) that it's not a human and you can ban them.
So I tried that.
I auto-generated a robots.txt file on the fly. It was cached for 60 seconds or so as I didn't want to expend too many resource on it. When you asked for it, you either got the cached one or I created a new one. The CPU-usage was negligible.
However, I changed the "deny" endpoint each time I built the file in case the baddies cached it, however, it still went to the same ASP.NET controller method. By hitting it, I sent a 10GB zip bomb and your IP was automatically added to the FW block list.
It was quite simple: anyone that hit that endpoint MUST be dodgy... I believe I even had comments for the humans that stumbled across it letting them know that if they went to this endpoint in their browser it was an automatic addition to the firewall blocklist.
Anyway... at first I caught a shit load of bad guys. There were thousands at first and then the numbers dropped and dropped to only tens per day.
Anyway, this is a single data point but for me, it worked... I have no regrets about the zip bomb either :)
I have another site that I'm working on so I may evolve it a bit so that you are banned for a short time and if you come back to the dodgy endpoint then I know you're a bot so into the abyss with you!
This is approximately my approach minus the zip bomb. I use a piece of middleware in my AspNetCore pipeline that tracks logical resource consumption rates per IPv4. If a client trips any of the limits, their IP goes into a HashSet for a period of time. If a client has an IP in this set, they get a simple UTF8 constant string in the response body "You have exceeded resource limits, please try again later".
The other aspect of my strategy is to use AspNetCore (Kestrel). It is so fast that you can mostly ignore the noise as long as things are configured properly and you make reasonable attempts to address the edge case of an asshole trying to break your particular system on purpose. A HashSet<int> as the very first piece of middleware rejecting bad clients is exceedingly efficient. We aren't even into URL routing at this point.
I have found that attempting to catalog and record all of the naughty behavior my web server sees is the highest risk to DDOS so far. Logging lines like "banned client rejected" every time they try to come in the door is shooting yourself in the foot with regard to disk wear, IO utilization, et. al. There is no reason you should be logging all of that background radiation to disk or even thinking about it. If your web server cant handle direct exposure to the hard vacuum of space, it can be placed behind a proxy/CDN (i.e., another web server that doesn't suck).
> they get a simple UTF8 constant string in the response body "You have exceeded resource limits, please try again later"
I imagine they get a 429 response code, but if they don't, you may want to change that.
I do think you are on the right place in that it's important to let those requests get the correct error, so if innocent people are affected, they at least get to see there's something wrong.
> If a client has an IP in this set, they get a simple UTF8 constant string in the response body "You have exceeded resource limits, please try again later".
Would a simple 429 not do the same thing? You could log repeated 429's and banish accordingly.
Both are important - the response code for well-behaved machines, as many tools intrinsically know that 429 means to slow down (also send a Retry-After header if you want more customization), and the text message for humans, as they don't see the response code and would otherwise see a blank page.
Reddit is guilty of sending 429 with no message - try browsing it through Tor and you'll see.
It's interesting to study, right? This is the Internet equivalent of background radiation. Harmless in most cases. Exploit scanners aren't new to the LLM age and shouldn't overload your server - unless you're vulnerable to the exploit.
Fun fact: Some people learn about new exploits by watching their incoming requests.
Definitely! I wasn't experiencing any issues, hell it wasn't even for public consumption at that time so no great loss to me but I found a few things fascinating (and somewhat stupid!) about it:
1. The sheer number of automated requests to scrape my content
2. That a massive number of the bots openly had "bot" or some derivative in the user agent and they were accessing a page I'd explicitly denied! :D
3. That an equally large number were faking their user agents to look like regular users and still hitting a page that a regular user couldn't possibly ever hit!
Something I did notice but it was towards the end and I didn't pursue it (I should log it better the next time for analysis!) was that the endpoint was dynamically generated and only existed in the robots.txt for a short time but there were bots I caught later on, long after that auto-generated page was created (and after the IP was banned) that still went for that same page: clearly the same entities!
My spidey senses are tingling. Next time, I'm going to log the shit out of these requests and publish as much as I can for others to analyse and dissect... might be interesting.
I admit, my approach was rather nuclear but it worked at the time.
I think an evolution would be to use some sort of exponential backoff, e.g. first time offenders get banned for an hour, second time is 4 hours, third time and you're sent into the abyss!
I then read something about a guy who deliberately put a honeypot in his robots.txt file. It was pointing to a completely bogus endpoint. Now, the theory was, humans won't read robots.txt so there's no danger, but bots and the like will often read robots.txt (at least to figure out what you have... they'll ignore the "deny" for the most part!) and if they try and go to that fake endpoint you can be 100% sure (well, as close as possible) that it's not a human and you can ban them.
So I tried that.
I auto-generated a robots.txt file on the fly. It was cached for 60 seconds or so as I didn't want to expend too many resource on it. When you asked for it, you either got the cached one or I created a new one. The CPU-usage was negligible.
However, I changed the "deny" endpoint each time I built the file in case the baddies cached it, however, it still went to the same ASP.NET controller method. By hitting it, I sent a 10GB zip bomb and your IP was automatically added to the FW block list.
It was quite simple: anyone that hit that endpoint MUST be dodgy... I believe I even had comments for the humans that stumbled across it letting them know that if they went to this endpoint in their browser it was an automatic addition to the firewall blocklist.
Anyway... at first I caught a shit load of bad guys. There were thousands at first and then the numbers dropped and dropped to only tens per day.
Anyway, this is a single data point but for me, it worked... I have no regrets about the zip bomb either :)
I have another site that I'm working on so I may evolve it a bit so that you are banned for a short time and if you come back to the dodgy endpoint then I know you're a bot so into the abyss with you!
It's not perfect but it worked for me anyway.