My understanding is it can't. The proof is "this photo was taken with this real camera and is unmodified". There's no way to know if the photo subject is another image generated by AI, or a painting made by a human etc.
I remember when snapchat were touting "send picture that delete within timeframes set by you!" and all that would happen is you'd turn to your friend and have them take a picture of your phone.
In the above case, the outcome was messy. But with some effort, people could make reasonable quality "certified" pictures of damn near anything by taking a picture of a picture. Then there is the more technical approach of cracking a system physically in your hands so you can sign whatever you want anyway...
I think the aim should be less on the camera hardware attestation and more on the user. "It is signed with their key! They take responsibility for it!"
But then we need:
1. fully active and scaled public/private key encryption for all users for whatever they want to do
2. a world where people are held responsible for their actions...
I don’t disagree with including user attestation in addition to hardware attestation.
The notion of their being a “analog hole” for devices that attest that their content is real is correct on the face, but is a very flawed criticism. Right now, anybody on earth can open up an LLM and generate an image. Anybody on earth can open up Photoshop and manipulate an image. And there’s no accountability for where that content came from. But not everybody on earth is capable of projecting an image and photographing it in a way that is in distinguishable from taking a photo of reality. Especially when you’ve taken into consideration that these cameras are capturing depths of field information, location information, and other metadata.
I think it’s a mistake to demand perfection. This is about trust in media and creating foundational technologies that allow for that trust to be restored. Imagine if every camera and every piece of editing software had the ability to sign its output with a description of any mutations. That is a chain of metadata where each link in the chain can be assigned to trust score. If, an addition to technology signatures, human signatures are included, that just builds additional trust. At some point, it would be inappropriate for news or social media not to use this information when presenting content.
As others have mentioned, C2PA is a reasonable step in this direction.
Perhaps if it measured depth it could detect "flat surface" and flag that in the recorded data. Cameras already "know" what is near or far simply by focusing.
I wonder if a 360 degree image in addition to the 'main' photo could show that the photo was part of a real scene and not just a photo of an image? Not proof exactly but getting closer to it.