Your exact command works reliably but is inefficient. And it works by design, not accident. For starters, the default block size in most netcat implementations is tiny like 4 kB or less. So there is a higher CPU and I/O overhead. And if netcat does a partial or small read less than 4 kB, when it writes the partial block to the nvme disk, the kernel would take care of reading a full 4kB block from the nvme disk, updating it with the partial data block, and rewriting the full 4kB block to the disk, which is what makes it work, albeit inefficiently.