Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the paper in general completely oversells the idea of "universality".

For CNNs, the 'Universal Subspace' is simply the strong inductive bias (locality) forcing filters into standard signal processing shapes (Laplacian/Gabor) regardless of the data. Since CNNs are just a constrained subset of operations, this convergence is not that surprising.

For Transformers, which lack these local constraints, the authors had to rely on fine-tuning (shared initialization) to find a subspace. This confirms that 'Universality' here is really just a mix of CNN geometric constraints and the stability of pre-training, rather than a discovered intrinsic property of learning.





For me at least, I wasn't even under the impression that this was a possible research angle to begin with. Crazy stuff that people are trying, and very cool too!



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: