Admittedly knowing little about how multi-core CPUs work, I've always thought the next breakthrough in CPU tech would be a hardware-based scheduler, a chip that effectively load balances threads to ensure all cores are used equally and simplifies the writing of software. The dev writes thread-safe code and the hardware does the rest. I wonder how feasible that really is.
That sounds perfectly reasonable to me, but John Hennessy literally wrote the book on computer architecture. Towards the end of the deck he has a slide that we shouldn't expect large gains from improved architecture (on general purpose chips) in the future. I'm inclined to believe him, although I would be interested in hearing a deeper proof/disproof of the architecture you proposed.