If that's correct then it's a significant problem with LLMs that needs to be addressed. Would it work to have the agent keep the talky, verbose answer to itself and only return to a finally summary to the user?
That's what the "reasoning" models do, effectively. Some LLM services hide or summarize that part for you, other return it verbatim, and ofc. you get the full thing if you're using a local reasoning model.