Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As this article shows, browsers actually collapse spaces differently based on the specific CSS applied - and this is in fact intended behavior, not some corner case.

Also, the output of HTML parsers is the HTML structure, and changing that to collapse spaces would break numerous tools. So while probably all HTML renderers do some kind of space collapsing, there are many other uses of HTML parsing that don't. Most likely the syntax highlighting in your HTML editor of choice in fact relies on a space-preserving HTML parser, just for one example.



> browsers actually collapse spaces differently based on the specific CSS applied

Sure. But they do all collapse spaces. I don't think anyone wants their browser to always preserve all the spaces that are in the source.

> and this is in fact intended behavior, not some corner case.

Eh maybe. They collapse the spaces of block elements like block elements and the spaces of inline elements like inline elements; that seems like the obvious thing that your renderer would do if you didn't make any deliberate design decision.

> So while probably all HTML renderers do some kind of space collapsing, there are many other uses of HTML parsing that don't. Most likely the syntax highlighting in your HTML editor of choice in fact relies on a space-preserving HTML parser, just for one example.

I very much doubt it. And even if it did, that would be an incredibly backwards reason to keep that behaviour - "we've spent all this effort working around our bad standard, that would be wasted if we fixed the standard".


Creating parsers which entirely ignore parts of the input is generally a bad idea, because you lose the ability to round-trip. That is, it's often a desirable property to have a way to go text1 -> DOM -> text2, and have text2 be identical to text1, or at least very close to it. This is particularly true for markup languages, which intermix text and tags.


But somehow almost every programming language and data format manages to define these equivalences and have it not ruin their editors. JSON is whitespace-insensitive but syntax highlighting it in my editor works fine; I don't know or care what the parser implementation that accomplishes that is, but it's never caused any problems I've heard of.


I really don't get what you mean. HTML and JSON behave essentially the same way in relation to spaces. It's you who seems to be asking for the HTML parsers to apply display logic in the parsing step. And sure, JSON parsers discard whitespace information outside of JSON strings, but that only works because JSON has an explicit string type. In HTML everything is a user-visible string unless it's a tag, so the same logic fundamentally can't be applied.

In fact JSON is the perfect example - if you have multiple spaces or \n in a JSON string and load that into some DOM element with JS at runtime, those spaces will be eaten up just as much by the browser renderer as any spaces that were part of the original HTML. Because, again, HTML and even the DOM don't do any kind of space collapsing; only the browser render step does that, as instructed by CSS.


> JSON parsers discard whitespace information outside of JSON strings, but that only works because JSON has an explicit string type. In HTML everything is a user-visible string unless it's a tag, so the same logic fundamentally can't be applied.

Well, sure. The point is that's an unfortunate design.


But it's a core part of the concept, the whole idea behind a markup language. Basically the whole point of HTML, and even of SGML before it, is that you are adding annotations in-line in a text, not representing a text as a tree-like data structure, at least for much of it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: