> What led us here was a need to create an additional thread within an existing ...

mark_undoio · on Feb 28, 2022

> I'm curious to hear more. What's its purpose?

Sure! I'll try to illustrate the general idea, though I'm taking liberties with a few of the details to keep things simple(r).

Our software (see https://undo.io) does record and replay (including the full set of Time Travel Debug stuff - executing backwards, etc) of Linux processes. Conceptually that's similar to `rr` (see https://rr-project.org/) - the differences probably aren't relevant here.

We're using `ptrace` as part of monitoring process behaviour (we also have in-process instrumentation). This reflects our origins in building a debugger - but it's also because `ptrace` is just very powerful for monitoring a process / thread. It is a very challenging API to work with, though.

One feature / quirk of `ptrace` is that you can't really do anything useful with a traced thread that's currently running - including peeking its memory. So if a program we're recording is just getting along with its day we can't just examine it whenever we want.

First choice is just to avoid messing with the process but sometimes we really do need to interact with it. We could just interrupt a thread, use `ptrace` to examine it, then start it up again. But there's a problem - in the corners of Linux kernel behaviour there's a risk that this will have a program-visible side effect. Specifically, you might cause a syscall restart not to happen.

So when we're recording a real process we need something that:

* acts like a thread in the process - so we can peek / poke its memory, etc via ptrace * is always in a known, quiescent state - so that we can use ptrace on it whenever we want * doesn't impact the behaviour of the process it's "in" - so we don't affect the process we're trying to record * doesn't cause SIGCHLD to be sent to the process we're recording when it does stuff - so we don't affect the process we're trying to record

Our solution is double clone + magic flags. There are other points in the solution space (manage without, handle the syscall restarting problem, ...) but this seems to be a pretty good tradeoff.

[edit: fixed a typo]

aidenn0 · on March 1, 2022

I looked into something similar for implementing a concurrent GC. I ended up just using mmap() and ptrace() since I did have to manipulate the process for certain barrier operations; I probably could have done it with non-ptrace system calls; there are tradeoffs to be made (either way you need to interrupt any pending systemcalls, but there are multiple ways of doing that).

switch33 · on March 2, 2022

The problem record and replay is expansions of languages and apis too. That is a good thing for some things but it needs to be reworded sometimes too and implementations of things aren't always newer versions of things either.

mark_undoio · on March 2, 2022

> The problem record and replay is expansions of languages and apis too. That is a good thing for some things but it needs to be reworded sometimes too and implementations of things aren't always newer versions of things either.

Changes to languages and APIs can be a problem to record/replay depending on exactly how they're implemented.

Undo's core tech, rr (and, arguably, GDB's built in record/replay) operate at the level of machine instructions and operating system calls, so changes to language and library behaviours don't generally affect us, outside of a few corner cases.

When you have that, you don't need to even know what the language is in order to operate - though if you want source-level debugging then it does matter as you have to be able to map from "your program counter is here" to "you're at this source line".

We occasionally need to add support for new system calls but an advantage of Linux is that the kernel ABI is very stable. New extensions to CPU instruction set also require work - these can be harder to support but they change more slowly.

Of course, operating at such a low level level isn't the only way to record/replay - there are distinct costs and benefits to operating at a higher level in the stack.

kccqzy · on Feb 28, 2022

Maybe some kind of snapshotting for an in-memory database?