128 is still not 300. Something like 4x 6000 blackwell is the minimum to run any model that is going to feel anything like claude locally.
To my deep disappointment the economics are simply not there at the moment. Openrouter using only providers with zero data retention policies is probably the best option right now if you care about openness, privacy and vendor lock-in.
For local use and experimentation you don't need to match a top of the line model. In fact something that you train or rather fine-tune locally might be better for certain use cases.
If I was working with sensitive data I sure would only use on prem models.
To my deep disappointment the economics are simply not there at the moment. Openrouter using only providers with zero data retention policies is probably the best option right now if you care about openness, privacy and vendor lock-in.