As someone who has run LLMs in production, using Ray is probably the worst idea. It's not optimized for language models, and is extremely slow. There's no KV-caching, model parallelism, and other basic table stakes features that are offered by Dynamo or other open source inference frameworks. Useful only if you have <1 QPS.
Use SGLang, vLLM, or text-generation-inference instead.
It really depends on the task. If you have 1 massive job, Ray sucks and doesn't provide table stakes. If you have 50M tiny jobs, Ray and kuberay is great and serves as the backbone of several billion dollar products.