Agree with others saying HN needs more content like this! After reading I don’t ...

stuhood · 2025-07-22T03:51:56 1753156316

Thanks for the questions!

    After reading I don’t get how locks held in memory affect WAL shipping.
    WAL reader reads it in a single thread, updates in-memory data structures
    periodically dumping them on disk. Perhaps you want to read one big
    instruction from WAL and apply it to many buffers using multiple threads?

We currently use an un-modified/generic WAL entry, and don't implement our own replay. That means we don't control the order of locks acquired/released during replay: and the default is to acquire exactly one lock to update a buffer.

But as far as I know, even with a custom WAL entry implementation, the maximum in one entry would still be ~8k, which might not be sufficient for a multi-block atomic operation. So the data structure needs to support block-at-a-time atomic updates.

    I guess your implementation generates a lot of dead tuples during
    compaction. You clearly fighting PG here. Could a custom storage
    engine be a better option?

`pg_search`'s LSM tree is effectively a custom storage engine, but it is an index (Index Access Method and Custom Scan) rather than a table. See more on it here: https://www.paradedb.com/blog/block_storage_part_one

LSM compaction does not generate any dead tuples on its own, as what is dead is controlled by what is "dead" in the heap/table due to deletes/updates. Instead, the LSM is cycling blocks into and out of a custom free space map (that we implemented to reduce WAL traffic).