Rkyv: A zero-copy deserialization framework for rust

westurner · on Jan 13, 2024

rust_serialization_benchmark: https://github.com/djkoloski/rust_serialization_benchmark

apache/arrow-rs: https://github.com/apache/arrow-rs

> How does Arrow relate to Flatbuffers?

> Flatbuffers is a low-level building block for binary data serialization. It is not adapted to the representation of large, structured, homogenous data, and does not sit at the right abstraction layer for data analysis tasks.

> Arrow is a data layer aimed directly at the needs of data analysis, providing a comprehensive collection of data types required to analytics, built-in support for “null” values (representing missing data), and an expanding toolbox of I/O and computing facilities.

> The Arrow file format does use Flatbuffers under the hood to serialize schemas and other metadata needed to implement the Arrow binary IPC protocol, but the Arrow data format uses its own representation for optimal access and computation

coderedart · on Jan 13, 2024

The star of that benchmark is actually nanoserde.

1. outperforms a bunch of binary formats (as well as all other text formats) 2. has zero dependencies with fast compile times (even the derive macros are custom and don't use syn family crates as build-time deps)

fbdab103 · on Jan 13, 2024

Annoying how they mix µs and ms inside the same timing column. Makes it harder to visually assess.

taintegral · on Jan 13, 2024

Mixing milliseconds down to picoseconds made the columns too wide. All of the benchmark data is also available as JSON in the benchmark_results directory if you wanted to drop it into sheets or some other analysis.

nviennot · on Jan 13, 2024

After trying to like rkyv, I came to the conclusion that the serialization/deserializarion API is not ergonomic due to the heavy use of generics in their API.

I think we can do better.

jadbox · on Jan 13, 2024

Can you give an example of it being non ergonomic due to genetics?

brundolf · on Jan 13, 2024

The benchmark results vs serde are stunning though. There's at least something to keep exploring here

t0astbread · on Jan 13, 2024

My first thought was "How does it handle untrusted input?" and they have a page dedicated to it: https://rkyv.org/validation.html

But the phrasing on that page does not exactly inspire confidence ("...good defaults that will work for most archived types...", "...it's not possible or feasible to ensure data integrity with these use cases..."). Is this actually usable for untrusted data or is it mostly used in scenarios where you already know the data is fine?

venil · on Jan 13, 2024

As for the second quote, the surrounding context explains that the validator will by default return an error if you point to a single object using two different pointers with different opinions on what the pointee's type is. This doesn't sound like a safety issue, since the validator is being too conservative rather than not conservative enough.

The first quote is probably in part referring to the second quote. If that is all it is referring to, than there is no safety issue. If there are other similar issues but rkyv chooses to reject valid archives rather than accept invalid ones, then there also is no safety issue. However, that isn't unambiguous, so I can't say for certain that it isn't possible to misuse the library from safe rust.

taintegral · on Jan 13, 2024

Author here, you’re correct. You can customize your validation context for your specific needs. For example, if you don’t have allocation available (i.e. `#![no_std]` without the alloc crate) then you’ll probably need to write your own mapping system to handle shared pointers. Or you can just not use them if that works better for you. That’s also a large part of why rkyv uses generics so heavily.

If your data is read-only then pointing to the same object from two locations is (usually) fine. But rkyv also supports in-place mutability, which requires validating that no two pointers will overlap each other. Otherwise you could have simultaneous mutable borrows to the same value which is UB.

CGamesPlay · on Jan 13, 2024

I tried Rkyv and one aspect that I came to dislike was the lack of an explicit schema. I know that some people consider this a feature, but after using it I wish there was a more clear path for migrations.

3cats-in-a-coat · on Jan 13, 2024

Sorry to be pedantic, but there’s nothing “zero copy” in a computer. Every single operation copies a bunch of things around.

Performance in computing is about managing the copy granularity by trying to align cohesiveness of space and time (optimal batching) which also relies on end-to-end analysis not just one step in the computation (whether a specific operation is a copy, virtual copy /cow/, move or virtual move by reference etc.). Avoiding a copy has time costs, such as indirection and incidental coupling. You trade space for time. Sometimes that’s smart sometimes not.

We have countless stories about how removing poor caching and going brute force speeds up an app.

Specifically for serialization and storage we also have increasingly more examples of how storing something compressed and calculating it on demand from compressed storage is faster than avoiding a copy on it because of how heterogenous memory bandwidth to compute bandwidth is over time (i.e. slow IO fast compute).

For example it’s often better to pack data into varints or compress sequences by delta rather than naively store them inRAM as long as they’re read roughly in sequence or groups.

Avoiding a copy and transform feels nice, but often isn’t. It should be only applied when specific uses show compute is the bottleneck.

Premature optimization yadda yadda.

ithkuil · on Jan 13, 2024

All you said is true and useful.

Just to chime in on a detail: often the "zero-copy" term is used in the context of data "forwarding" use cases (serving data loaded from disks or passed through a proxy etc).

The idea there is to have zero "extra" copies of the data.

Obviously if you're reading data from disk and sending it over the network there is at least one copy going on: the copy from the disk to the network card buffer.

And unless your network controller buffer is directly addressable by the disk controller, there is also going to be another copy of the data sitting in RAM.

That's often still considered zero copy IO mainly because the CPU is not involved in the copying (which happens via DMA).

In case of network to network forwarding the data can even stay within the same buffers on the same network controller.

Protocols that involve sending large blobs of uninterpreted bytes are particularly amendable to this kind of optimization.

For example you can send a large blob of data from disk and pump it on a TCP connection without having that data going through the CPU at all. Practical necessities as encryption make that even more challenging but there are some network controllers supporting hardware encryption so at least in principle that's still possible.

However when you also throw data serialization in the mix you often end up getting the CPU involved in converting data from the raw form you read on disk and the encoded/serialized form in the network response.

For example consider a server that returns a stream of data using the gRPC ByteStream API which is a gRPC stream of ReadResponse messages that contain a single `bytes` field.

The protobuf encoding of that field is just a field tag plus the field size plus the raw bytes of the payload. In principle it could be possible to implement a gRPC serializer that avoid materializing the body of that data field in memory but instead rely on sendfile or other primitives that potentially can apply further optimizations and at least remove the CPU out of the loop.

But doing that is hard, and affects the API of the serializer and every library that uses it. I'm not aware of any opensource protobuf+grpc implementations that do this (I may be wrong, long time since I checked and I may be missing something).

I don't mind that the rest of the serialized message would need to be unpacked/repacked to/from different representations, as generally the size of that part is dwarfed by the large blobs.

jeffbee · on Jan 13, 2024

> In principle it could be possible to implement a gRPC serializer

FWIW it isn't that much work to rig up a C++ gRPC implementation that just does whatever you want with some buffers. That is, you provide SerializationTraits for your type, and you provide the function that converts your type to what is the functional equivalent of an iovec. This helps minimize copies because, as in your example, it's often useful to add a preamble to some large byte buffer. Doing so in a naive codelab style with the vanilla protobuf API would result in extra copies, probably. However I also note that recent releases of protobuf C++ allow bytes fields to be represented by an absl::Cord, which enabled you to refer to existing large buffers instead of copying them. In either case you need a pointer that can be sent with `sendmsg`, not `sendfile`

ithkuil · on Jan 13, 2024

Yeah I was talking about an API that would expose the file descriptors so you could use sendfile.

notso411 · on Jan 13, 2024

This fetishisation of zero copy, whatever that means, usually ends up in a less than optimal solution anyway.

geewee · on Jan 13, 2024

We use Rkyv heavily at Climatiq to embed already-parsed data into our binaries that we then serve on an edge network. The speed is great and for the happy path it works very nicely, and is extremely fast. However the fact there's a separate version of each struct does become a bit of a bother.

felixguendling · on Jan 13, 2024

Exactly the same approach for C++ (including RelPtr which is called offset_ptr in cista): https://cista.rocks