The decision to block bots is not always about protecting intellectual property. A practical consideration I haven't seen mentioned is that some of these AI bots are stupidly aggressive with their requests, even ignoring robots.txt. I had to activate Cloudflare WAF and block a variety of bots to prevent my web app servers from crashing. At least they're reasonable enough to identify themselves!
yeah, we had a bunch of them crawling out git repositories in a very aggressive way, repeating the crawl within a few days, etc. etc. 403 to the lot of them, regardless of the bot's purpose.