balancing compute-bound (prefill) and memory-bound (decode) is a fine art. Luckily there are lots of improvements (incentives) if you can adjust it to your use case (this time is Coding assistants), but it is generally a lonely journey. Good to see you paired with Colfax International.