Lower latency is definitely a thing. With FPGA it's possible to 'chase the beam' like the original hardware, and have much reduced input latency from devices, etc. With an emulator you're going to be fighting the OS and the frameworks you built on top of. Even if you go "bare metal" (like my friend's BMC64 project which runs a C64 emulator like a unikernel on the RPi with no OS) you are still dealing with hardware built for usage patterns very different from the classic systems. You're always going to be one or more frames behind.
That is true. There are however techniques software emulators can use like run-ahead that can get you lower latency than even the original hardware on a PC: https://near.sh/articles/input/run-ahead
The caveat is that it doesn't always work, and it makes the power requirements even more unbalanced. Some might also see it as a form of cheating to go below the original game's latency. If you want to match the original game's latency precisely, FPGAs are the way to go right now for sure.
Run-ahead seems pretty cool, great technical write up. How would you compare this to the feature called frame-skipping that I often see implemented in software emulators?
Frame-skipping is just a speed hack of skipping rendering every other frame or so, and makes games very unenjoyable to play. It won't help with input lag at all.