Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How to garbage collected languages like F# or Dlang deal with this? Is everything just boxed?


In modern JS engines with 64-bit CPU when the engine cannot deduce types and must use a generic word to represent any kind of values numbers (double values) are not boxed. Rather for everything else a NaN tag bit pattern is used. I.e. code checks if the word matches the NaN pattern. If not, a double is assumed. Otherwise the NaN is checked if it is a real NaN o something else masked as NaN.

This slows down object access as detecting and stripping the NaN tag requires few CPU instructions. Plus it assumes that pointers only have 48 bits with rest are zeros (true for AMD, ARM and Intel) or at least have fixed values (can be arranged on more exotic CPUs). But that does not require to box numbers greatly reducing GC pressure.


> Plus it assumes that pointers only have 48 bits with rest are zeros (true for AMD, ARM and Intel)

Except when it isn't: https://en.wikipedia.org/wiki/Intel_5-level_paging

That's not something you're likely to run into on consumer hardware, but with JS being used on the server I wonder if/when JS engines will need to deal with that.


That's probably quite some time off. As an application, you'd only get the new level of paging if you asked for it, like with PAE. And you're only going to ask for it if you need more than 256TB of address space, which seems like a rather large space to need.

I guess you could have a lot of files mmaped though?


ARM also has pointer cryptography which may one day become a blocker.


When there are more bits in a pointer than NaN 52 bits allows, the trick is to replace pointers with indexes from the start of JS heap. This is not efficient even with arrangement like having heap aligned, say, on 4GB or even more granular address so to get the full pointer one just use bit operation, not an add. But if one wants efficiency, then make sure that types in the code is stable and JIT will generate type-specific code.



This describes 32 bit CPU. As opposite to V8, SpiderMonkey, Mozilla’s JS engine, uses 64 bit words and NaN boxing, even on a 32 bit CPU, to represent a generic JS thing.


I'm pretty sure even on 64bit v8 doesn't use NaN boxing. Do you have docs or code that show it does?

Yes other engines use NaN boxing, but not all engines do.


No, mixing boxed and unboxed types is very typical for garbage collected languages. As the article shows, you can distinguish with a bit whether something is a boxed type or not, so you do that during GC. For many GCs, you typically have another bit for marking (whether you need it depends on the GC technique -- copying GC may not need it). With modern 64-bit CPUs, usually the three lowest bits are free (because every struct will be 8-byte aligned and thus pointers end in 0b000) that you can use for tagging. This will give you 61 bit ints, which work just as described in the article.


No, mixing boxed and unboxed types is very typical for garbage collected languages. As the article shows, you can distinguish with a bit whether something is a boxed type or not, so you do that during GC.

Both languages I mentioned use regular 8/16/32/64 bit size values, which is why I asked.


In F# (.NET) primitives aren't boxed unless they are stored in a variable/field of type System.Object. Generic methods are specialized by the JIT compiler when their generic arguments are primitive types.


Upd: I noticed that someone mentioned upthread that OCaml uses this to fit sum types that are either an int or a pointer into a single 64-bit value.

F# doesn't do that and always stores a separate tag field for sum types. This is marginally less efficient, but doesn't make interop with 99% of languages and data formats in existence awkward.


In my experience with Dlang I can't remember seeing 63 bit ints or boxing. In either the language or the standard lib.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: