I think it's certainly possible if you compromise on accuracy. https://en.wikipedia.org/wiki/Dragon_NaturallySpeaking has been around since the late 90s, there's various (rather robotic-sounding) free speech synths available which don't require much processing power at all (look at the system requirements of https://en.wikipedia.org/wiki/Software_Automatic_Mouth ), and of course machine translation has been an active topic of research since the mid-20th century.
IMHO it's unfortunate that everyone jumps to "use AI!" as the default now, when very competitive approaches that have been developed over the past few decades could provide decent results but at a fraction of the computing resources, i.e. a much higher efficiency.
Why online? Why would I want some third-party to (a) listen to my conversations; (b) receive a copy of my voice that hackers could download; (c) analyze my private conversations for marketing purposes; (d) hobble my ability to translate when their system goes down, or permanently offline; or (e) require me to pay for a software service that's feasible to run locally on a smart phone?
Why would I want to have my ability to translate tied to internet connectivity? Routers can fail. Rural areas can be spotty. Cell towers can be downed by hurricanes. Hikes can take people out of cell tower range. People are not always inside of a city.
Hosting is annoyingly expensive. ping latency between us-east-1 and ap-southeast-1 is 230ms. So you either setup shop in one location or go multi-region (which adds up).
Also, there are many environments (especially when you travel) where your phone is not readily connected.