Way back in time, I used delta encoding for storing posting list (inverted index for search index). I experimented with using GPUs for decoding the posting list. It turned out that, as another reply mentioned copying posting list from CPU memory to GPU memory was taking way too long. If posting list is static, it can be copied to GPU memory once. This will make the decoding faster. But still there is a bottle neck of copying the result back into CPU memory.
Nvidia's unified memory architecture may make it better as same memory can be shared between CPU and GPU.
AMD has had unified memory for ages in HPC and for a while now in the Strix Halo systems. I haven't had the chance to play with one yet, but I have high hopes for some of our complex simulation workloads.
Oh neat. I have some related unpublished SOTA results I want to release soon: PEF/BIC-like compression ratios, with faster boolean algebra than Roaring Bitsets.
If the CPU touches an address mapped to the GPU doesn't it fault a page into the CPU address space? I mean the program doesn't do anything special, but a page gets faulted in I believe.
Very true, but in recent years feature development has taken precedence over efficiency. VP of whatever says hardware is cheap, software engineers are not.
reply