Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, but hasn't "separate compression dictionary" been a core feature of Zstd since its inception, at least under that name?


Not even close, zlib had one. (Search for "FDICT" from RFC 1950.) What Zstandard did was a CLI to automatically generate a good enough dictionary from sample inputs.


      FDICT (Preset dictionary)
         If FDICT is set, a DICT dictionary identifier is present
         immediately after the FLG byte. The dictionary is a sequence of
         bytes which are initially fed to the compressor without
         producing any compressed output. DICT is the Adler-32 checksum
         of this sequence of bytes (see the definition of ADLER32
         below).  The decompressor can use this identifier to determine
         which dictionary has been used by the compressor.
Well, wow. I have to wonder why this wasn't more utilized, then. There are a ton of contexts (columnar data in databases, for example) where shared-dictionary-based compression might have helped a ton before now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: