Although the practicality of what you described towards the end of your original...

Although the practicality of what you described towards the end of your original comment conceptually demonstrates an MoE-like architecture, the fact that you explicitly mentioned not understanding why larger models are smarter and then proceeded to try to couch-engineer a new, smaller architecture suggests that you were in fact not aware of the MoE architecture and thus the ELI5 LEGO approach was reasonably helpful. I’ve read your question carefully many times, and I’ve read others’ comments in the thread; you seem frustrated that folks aren’t answering your questions when in fact they have been answered — albeit not in the way you seem to want; how can we fix this?