Well they also claim to be able to cache build steps somehow build-system independently.
> As the build runs, any step that exactly matches a prior record is skipped and the results are automatically reused
> SourceFS delivers the performance gains of modern build systems like Bazel or Buck2 – while also accelerating checkouts – all without requiring any migration.
Yeah, I agree. This part is hand waved away without any technical description of how they manage to pull this off since knowing what is even a build step and what dependencies and outputs are are only possible at the process level (to disambiguate multi threaded builds). And then there’s build steps that have side effects which come up a lot with CMake+ninja.
So they could in principle get a full list of dependencies of each build step. Though I'm not sure how they would skip those steps without having an interposer in the build system to shortcut it.
But initially the article sounded like it was describing a mix of tup and Microsoft's git vfs (https://github.com/microsoft/VFSForGit) mushed together. But doing that by itself is probably a pile of work already.
Yes, you are correct - SourceFS also caches and replays build steps in a generic way.
It works surprisingly well, to the point where it’s hard to believe until you actually see it in action (here is a short demo video, but it probably isn't the best way to showcase it: https://youtu.be/NwBGY9ZhuWc?t=76 ).
We intentionally kept the blog post light on implementation details - partly to make it accessible to a broader audience, and partly because we will be posting gradually some more details. Sounds like build caching/replay is high on the desired blogpost list - ack :-).
The build-system integration used here was a one-line change in the Android build tree. That said, you’re right - deeper integration with the build system could push the numbers even further, and that’s something we’re actively exploring.
Yeah that’s what I meant. I bet you the build must be invoked through a wrapper script that interposes all executables launched within the product tree. Complicated but I think it could work. Skipping steps correctly is the hard part but maybe you do that in terms of knowing somehow the files that will be accessed ahead of time by that processes and then skipping the launch and materializing the output (they also mention they have to run it once in a sandbox to detect the dependencies). But still, side effects in build systems seem difficult to account for correctly; I bet you that’s why it’s a “contact us” kind of product - there’s work needed to make sure it actually works on your project.
I think I got the magic part. You can store all build system binaries in the VFS itself. When any binary gets executed, VFS can return a small sham binary instead that just checks command line arguments, if they match, checks the inputs, and if they match, applies the previous output. If there is any mismatch, it can execute the original binary as usual and make the new output. Easy and no process hacking necessary.
I used to use a python program called ‘fabricate’ which did this. If you track every file a compiler opens, then id the same compiler is run with the same flags, and no input changed, you can just drop a cached copy of the outputs in place.
I’m actually disappointed this type of thing never caught on, it’s fairly easy on Linux to track every file a program accesses, so why do I need to write dependency lists?
> As the build runs, any step that exactly matches a prior record is skipped and the results are automatically reused
> SourceFS delivers the performance gains of modern build systems like Bazel or Buck2 – while also accelerating checkouts – all without requiring any migration.
Which sounds way too good to be true.