Creating parsers which entirely ignore parts of the input is generally a bad idea, because you lose the ability to round-trip. That is, it's often a desirable property to have a way to go text1 -> DOM -> text2, and have text2 be identical to text1, or at least very close to it. This is particularly true for markup languages, which intermix text and tags.
But somehow almost every programming language and data format manages to define these equivalences and have it not ruin their editors. JSON is whitespace-insensitive but syntax highlighting it in my editor works fine; I don't know or care what the parser implementation that accomplishes that is, but it's never caused any problems I've heard of.
I really don't get what you mean. HTML and JSON behave essentially the same way in relation to spaces. It's you who seems to be asking for the HTML parsers to apply display logic in the parsing step. And sure, JSON parsers discard whitespace information outside of JSON strings, but that only works because JSON has an explicit string type. In HTML everything is a user-visible string unless it's a tag, so the same logic fundamentally can't be applied.
In fact JSON is the perfect example - if you have multiple spaces or \n in a JSON string and load that into some DOM element with JS at runtime, those spaces will be eaten up just as much by the browser renderer as any spaces that were part of the original HTML. Because, again, HTML and even the DOM don't do any kind of space collapsing; only the browser render step does that, as instructed by CSS.
> JSON parsers discard whitespace information outside of JSON strings, but that only works because JSON has an explicit string type. In HTML everything is a user-visible string unless it's a tag, so the same logic fundamentally can't be applied.
Well, sure. The point is that's an unfortunate design.
But it's a core part of the concept, the whole idea behind a markup language. Basically the whole point of HTML, and even of SGML before it, is that you are adding annotations in-line in a text, not representing a text as a tree-like data structure, at least for much of it.