Indeed the "no-moat" (weight exfiltration of any openly accessible model) is something that makes me optimistic. That, and the fact that most tasks have "capacity thresholds" which on-device models will increasingly be able to saturate. One example is SQL query generation from text (example: duckdb-text2sql https://motherduck.com/blog/duckdb-text2sql-llm/).