It's indeed written in Go now, which indeed has GC, and ultimately all memory is freed ... but there are indeed some issues around intermediate retention of memory, taking more memory than one would have expected.
Agreed, very much a user tool -- once it's released most people will get it via brew/yum/apt/conda/chocolatey and compile time will not be relevant for most folks.
Miller has always been a user tool and will remain so.
I think the confusion arose because I did the port per se and feature-adds first (which has taken most of the time), and left analysis/optimization (even low-hanging fruit like output-buffering) until the very end -- https://github.com/johnkerl/miller/pull/786 for example merged just a couple days ago.
Thanks -- my apologies for the sequencing of the dev work. I focused first on the port per se; second on feature-adds; leaving benchmarking, performance analysis, and optimizations (even some real low-hanging fruit) until last. The 10x number was indeed the case pre-optimization ... on https://github.com/johnkerl/miller/pull/786 you can see tabulated results where Miller 6 is now on par with Miller 5, and in some cases is far faster.
The 10x number was before improvements on https://github.com/johnkerl/miller/pull/786 et al. The earlier negative perf results were my fault, not Go's -- I was focusing initially on the port and feature development, leaving benchmarking and optimization until the end. That said, Go is a bit slower than C line for line; however, Miller 5 (in C) was single-threaded and Miller 6 (in Go) actively uses multicore. This is why complex processing chains now run much quicker in Go than in C -- due to multicore and pipelining which are much easier to do in Go.
Indeed -- that was before final optimization work on https://github.com/johnkerl/miller/pull/786 et al., thanks to which Miller 6 performance is now on par with Miller 5 for simple processing, and far better than Miller 5 for complex processing chains. See that PR for tabulated processing-time results.
Distros are still at 5.10 pending a few final issues to be resolved -- then 6.0 can be generally released. Hopefully in a month or so.
The most important release blocker (now resolved) is https://github.com/johnkerl/miller/pull/786 et al., thanks to which Miller 6 performance is now on par with Miller 5 for simple processing, and far better than Miller 5 for complex processing chains.
I saw in some other comments you mentioned some performance lift coming from thread level parallelism in Go. I wonder -- can you get oversubscription issues if the user is doing their own parallelism (I assume some folks will implement 'parallelism' by throwing a bunch of independent processes at a bunch of independent records).
I know in openMP (for example) this would be something where the user is expected to keep track of it, but maybe the Go runtime handles this stuff gracefully?
Also re 'hopefully the distros will keep up' -- some do some don't. I hope the C-to-Go issue isn't too much of a speedbump but we should know a couple distros in.
The other day I ran across https://repology.org/project/miller/versions -- something autogenerated on the web; I don't maintain it. Anyway you can see most distros are on 5.10 but there are some farther behind ... I will probably start with Conda and Brew since these are more personal/interactive, then maybe Fedora and Ubuntu -- ?
Indeed -- in Go technically it's "concurrency" not "parallelism" (something the Go folks are careful to articulate) but with multiple processors available, in effect you get multicore processing so calling it "parallelism" isn't far off :).
More specifically, Miller uses separate "goroutines" (more or less threads) -- 2 for input (one for raw byte stream ingest and one for forming records), 1 for each verb (sort then head would be two verbs), and 1 for output (records back to strings), all pipelined up. Some of the recent perf improvements came from splitting the record-reader into two concurrent goroutines like that. Some of the older perf improvements are from pipelining the verbs.
But yes, if you have say 16 CPUs and you launch 10, or 20, or 30 Miller executables -- in the C impl each would soak a single CPU and anything beyond 16, the OS would have to multitask things. With Miller 6, basically the same kind of thing except each executable will be trying to use more than one CPU if it can. If it can't, the Go runtime and the OS will multitask. And both do tend to handle this stuff gracefully and without tuning on the part of the user.
Also I'd point out that even with big files & deeper multi-verb processing chains, htop rarely shows over 250% CPU, maybe 350% for deeper chains -- the input and output processing typically take most of the time, and the verbs in between not as much.
When we run out of memory, systems crash; when we run out of CPU, things just take a bit longer. I.e. the oversubscription is a real but non-fatal issue, for C or for Go ...
I like upbeat (not droopy) and not too many lyrics (which can distract the language-processing parts of the brain that I need to work). Trip-hop fits the bill well for me.
This would be exciting even if I weren't joining TileDB in a couple weeks -- I'm looking forward to see this from the inside out as well as from the outside in.
Some gains were made on https://github.com/johnkerl/miller/pull/1133 and https://github.com/johnkerl/miller/pull/1132