The article has fair points, but after trying OCaml and Rust... I chose Rust. Without going into huge amounts of detail, a compiler is more than simply a parser/ast/code generator and there are other aspects to consider such as the richness of the ecosystem, editor support, etc. Also, I suspect the author is more familiar with OCaml than Rust as you wouldn't typically box everything but likely use an arena for the AST. In the same way, I am more familiar with Rust than OCaml, so some of the warts I observed may be to lack of familiarity. As such I suspect the authors perspective is biased...as is mine. Nothing wrong with that.
Can you write an equivalent piece of code that shows why Rust wins here with more familiarity and leveraging the ecosystem? With compilers I don't think there is a huge amount of using the ecosystem. I think what TFA does is a good case study in trying to be objective: write it both ways and compare.
I'm just saying a compiler is a program, and while the more important things are heavy algorithm related, supporting libraries like those for error handling, etc. all still matter and add up. No problem if you disagree - just my perspective. This isn't so much an objective thing as it is a personal opinion.
Personally, it’s not so much that I disagree. It’s more that Ocaml has a top notch ecosystem for compiler writing. That’s probably by far its strongest point with library like Menhir having few equivalent in different ecosystem.
Not that there is anything wrong with Rust if you are ready to pay the price of having so much low level things to deal with.
Can you explain what Menhir does better than other parser frameworks? For what it's worth, I'm not a huge fan of parser frameworks. I tend to prefer hand written parsers, either via a combinator library or fully manually.
Rust has a pretty darn good ecosystem too btw. chumsky for parsing, rowan for syntax trees, salsa for incremental computation, miette for errors, inkwell/walrus for code generation.
Yeah chumsky is fantastic, albeit with some rough compile errors and some unsafe. Still my favorite parser library by far and I’ve used a bunch by now.
You use the generated parser as a platform for experimentation.
If you know the language, and you have a bunch of users, and you are writing a parser for it, by all means, write a parser by hand and give it the best error recovery that you can muster. If you are developing a language and want to do a bunch of experiments, it pays dividends to use a parser generator. And then there is the whole space of DSLs and mini-languages you encounter, where beautiful error messages are a nice-to-have, but you would rather ship a generated parser and move on to more important work.
It’s easy to focus on compilers from the perspective of familiar languages like writing compilers for Rust or for OCaml, but you may end up writing a compiler that gets used by a much smaller number of people, for smaller tasks.
That's great and I agree for these use-cases a parser generator makes sense. But that doesn't answer my question about what makes these parsers particularly elegant. Nor does it seem to be a benefit specific to Mehnir, since any parser generator has the quick iteration speed.
I don't mean to blame you for that; you are answering a question about something that you did not say. But I find it frustrating that Mehnir seems to be cited as this fantastic cornerstone of the OCaml ecosystem when I haven't been presented with a good example of how it's better than any other parser generator. Not just my parent comment (who also decided to cast aspersions on my knowledge), but others in the OCaml community too.
> with a good example of how it's better than any other parser generator.
I already told you that it has full support for disambiguating LR(1) grammar and still generating a parser which is easy to read. How do you want me to paste a full parser in a HN comment?
Most generators only support LALR(1) grammar which is limiting and don’t deal with corner cases as gracefully.
I get that you are hell bent on wanting Rust to prevail here but Rust will always be a subpar experience for wiring anything which doesn’t strongly benefit for its low level primitive. Rust has annoying semantics and a convoluted syntax. I can bear with that when the performances are needed but writing a compiler in it is just unnecessary pain. It’s also one of the only thing for which I would actually use Ocaml.
Okay…so it accepts more grammars than other parser generators. That doesn’t seem massive if I’m being honest. If you had said mehnir works with your IDE (navigating C code inside bison drove me nuts), and had good support for error recovery and idk, gave you syntax highlighting for free, I’d agree. But a minor upgrade in grammars? Not exactly Christmas here.
I have a question for you then: why is it that so many projects that are not performance bound, that are not low level systems projects, why do they use Rust and not OCaml? OCaml had a 22 year head start after all.
I didn’t intend to answer your question—your comment had multiple aspects to it and I am responding to one part of it, rather than every single part of it.
My main take on the “which language do I write my compiler in” conversation is that some of the various code transformation passes are just more convenient to write in a language which makes it easy to use both mutation and garbage collection, which puts OCaml above both Rust and Haskell. I think parsing is an easier problem to solve in the first place.
I don’t think so - I’ve written parsers using flex/lex yacc/bison, antlr, a bunch of functional combinator libraries and maybe others I’ve forgotten but now would never consider anything except hand-written recursive descent with an embedded Pratt parser for expressions and precedence.
Simple to write, debug, recover from errors, provide decent error messages, unit test, integrate into build systems, IDEs etc.
I also believe that nearly all the popular compilers these days do something similar - gcc was rewritten a few years ago in this fashion because of the technical benefits I’ve listed above.
There was nothing objective about the article. The moment I saw seemingly random new lines in the rust for no reason, I knew there was going to be a “line counts!!!!” Sentence. When there was, I stopped reading because it was apparent that all matter of objectivity is completely missing.
It would be nice if somebody could offer what they consider to be an idiomatic Rust solution to this very routine problem in compilation if they believe the author is being intentionally deceiving. The Rust and OCaml code from the article looked decent to me. m
I don't think anyone was claiming the author was being intentionally deceiving, just that they're not very good at writing idiomatic Rust.
The first thing I'd suggest, as someone who's done their fair share of Rust compiler dev, is to stop using `Box` and `Ref` everywhere. Terms should not own their child terms; the idiomatic way to handle this is to use an arena and typed IDs, and pass a reference to the arena when you need access to the underlying data from the ID.
So the solution to writing idiomatic code in the language whose essential feature is the ownership semantics and borrow checker is to not use the ownership semantics and borrow checker?
This is sort of a flippant response, but I'll answer it seriously.
The premise of this question is incorrect: you can't not use the borrow checker in Rust. Instead, you are satisfying the borrow checker by ensuring at the point of use (deref of the ID) that there is valid data by providing a provably-live reference to the arena in which it was allocated.
In another language like C/C++, you'd just use a raw pointer, which is ergonomically-nice, but there's no guarantee that it actually points to a valid object. Rust forces you toward a solution that is guaranteed to be safe, and logically makes sense when you think about it -- nodes in a graph should not own other nodes in the graph; instead, they should be owned by the graph itself, and other nodes should simply hold references to each other. This is how you tend to implement safe and efficient graphs in C++ anyways, the only difference being that in Rust your references are indices, whereas in C++ your references are raw pointers.
Although I think the implementation of the gensym function is broken, the article does explain that it wasn’t possible to use &mut u32 because multiple references to the value were required. It would be more idiomatic to use Cell rather than RefCell, but there’s nothing really wrong with using RefCell as far as I can see.
I like Rust but this isn't one of it's strong suits. The code in the article is what Rustfmt outputs (playground link below), so it's at least syntactically idiomatic.
I've been seriously at it with Rust for about six months now and really loving it. What do you mean by "use an arena for the AST?" What is an arena in this context?
If you're curious about arena allocators, look at the rust compiler itself. It uses one for all of its allocations as regular heap allocation was not performant enough.
I figured as much but was unsure. So the gain is that you still take heap allocations, but you do it in a "novelty" heap allocator that you will probably dipose of entirely at some point?
If current top of heap + allocation size > buffer size, allocate an extra buffer for the arena.
Save the current top in a temp variable.
Add the allocation size to the top
Return the temp variable.
It's fast, and the amortized memory overhead per allocation is near zero because you don't need to track allocation sizes or free lists, since you only free the whole arena at once (one or more buffers).
It's ideal for anything where the lifetime of allocations is known to be roughly the same. E.g. a data structure you'll free all at once.
I don't quite follow the algorithm here, but I'm not sure the `gensym` Rust implementation works as expected. `RefCell::clone` does not return a copy of the reference; it returns a new `RefCell` with the current `RefCell`'s value, resulting in duplicate IDs. However, a `RefCell` isn't even necessary here, since a `Cell` would do just fine - and you'd pass around a reference to that `Cell` instead of cloning it.
It does feel like the code was ported as-is to Rust, and only adjusted slightly to compile; there are going to be pain points as a result of this process. I suspect this is the source of some of the author's complaints, especially given:
> Although it provides us with a greater sense of how the code is executing, it brings very little value to the algorithm itself.
Rust is, in general, for people who find value in having that information; it is okay to not want to have to worry about ownership, borrowing, safety, etc., but it seems a bit odd to complain about this when that's what Rust is for? If you want to focus on just the algorithm, and not how it's executing, then OCaml is definitely a valid choice.
However, the point about GADTs - can Rust's recently-stabilized GATs not work in the same way? Though I will admit that Rust's GATs don't seem nearly as powerful as OCaml's GADTs in this regard.
> it seems a bit odd to complain about this when that's what Rust is for
that's the point of the article - rust gives you a lot of low-level control, but if you don't actually need that control then you're paying the cost in ergonomics for nothing.
Exactly. I did get the impression that the author is more familiar with OCaml than Rust. However I don't think they were claiming Rust's greater low-level control makes it inferior to OCaml in general. They're just saying it makes it less suitable for writing compilers, since (in the author's opinion) this level of low-level control isn't necessary for that task.
I skimmed the article, and comparing the two programs, yes, the OCaml one is shorter and more elegant. But it also reads like a math magic spell. There's no type annotations for me to figure out what the heck each term is. The naming conventions lean extremely terse. Perhaps it's my lack of experience with OCaml, but it doesn't feel as legible.
The Rust one reads like...well a program. A program that's not as beautiful, but is very much designed to be taken apart, debugged, improved, etc.
I fully agree that if you're writing pure, recursive, data structure manipulations, OCaml is likely a better fit. It's closer to mathematical notation and I see the elegance in that. But if I were to take that data structure manipulation and turn it into a compiler with thousands of lines that I navigate with an IDE, with logging, with annoying little edge cases, with dozens of collaborators, I'd choose Rust.
> There's no type annotations for me to figure out what the heck each term is
You can add additional annotations in OCaml if you want, or just query the type of a term in Merlin.
> a compiler with thousands of lines that I navigate with an IDE, with logging, with annoying little edge cases, with dozens of collaborators, I'd choose Rust.
Why? OCaml supports logging and IDEs. Simple elegant code without the burden of manual memory management, makes it better able to cope with edge cases, being taken apart and refactored etc. Less of the complexity budget has already been spent.
I think it’s your inexperience talking here as programs in both languages here use mostly the same naming conventions and abbreviations, the main difference being that Ocaml is its usual terse and to the point while Rust is well far less terse
Yeah isn't that my point? Rust isn't trying to be short or elegant. There's no zen of Rust. There are elegant aspects of Rust, but it's not a central goal. Whereas with OCaml it's trying to be elegant. It's encouraging people to write a program where you read it, go "wait, how does that work?", then re-read it and marvel at the beauty of it.
To be clear, elegance is important. A language absent of elegance would be a bore to write (cough Java cough). But too much elegance and it can eclipse the legibility of the language. No type annotations is elegant. Is it legible? Not in my opinion. But perhaps it is in yours.
It is definitely very legible here and at least as legible as the Rust equivalent. It’s actually a lot more legible than the Rust equivalent here to be fair.
The Ocaml program is mostly matching followed by a chain of operations. It’s far removed from elegance for elegance sake. Meanwhile Rust is handicapped by the machinery it forces you to deal with as a lower level language introducing life time.
Type annotations are a non issue. They are systematically provided in Ocaml in a different file than the code. This header file is not provided here because well it’s a blog post.
I suppose that's ultimately in the eye of the beholder. I suspect that if this compiler were to be 10x, 100x bigger, you'd see a pretty big difference. Like rustc, as massive as it is, is pretty legible as compilers go.
You can browse compiler source code here with type annotation and code navigation if you like https://htzh.github.io/browse-ocaml/ . The editor plugins typically don't work on the compiler itself. These annotations and links were generated fairly painlessly from Ocaml compiler features (most hard work done by the compiler already).
Thanks for the links! I'll try to give it a read sometime.
But also, the editor plugins don't work on the compiler itself? Why's that? So you can't use Merlin on the OCaml compiler? That doesn't seem great.
And what's with the syntax highlighting being funky? The highlighter appears to think that a lot of these files have unterminated comments or string literals.
> There's no type annotations for me to figure out what the heck each term is.
If types can be perfectly inferred, I don't see why I wouldn't take advantage of it. Whatever Ocaml IDE you use will easily be able to tell you the types anyways
I have no problems with type interference that happens within limited scope. Once you get into territory of type interference across multiple functions you get a similar problem as c++ template errors (before c++20), where the compiler can tell that types aren't correct but it's not clear where the actual mistake was made: did you pass a wrong value with with wrong type to a function at the top, did you use a wrong function call 5 levels deeper or did you access a wrong property somewhere in the middle. I am not sufficiently familiar with Ocaml to tell how much of a problem it's for the specific example in article, but I have some experience with C++ template errors and I remember similar problem in Haskell if type annotations were used too sparingly. Not every programing language will have as bad looking errors as c++ templates, but even if errors are short it doesn't change the inherent problem that compiler can't know what the intention behind large block of type interfered region was. More explicit types split the type interference into smaller regions which means that error message will be closer to the actual place of mistake made by programmer, it also allows checking each region individually for correctness.
As someone who wrote a fair amount of Rust and OCaml code, I have to agree with the author.
While working at Routine (YC W21), I was tasked with porting our core library to iOS to minimize duplication of business logic. This was a lucky opportunity to write something resembling a compiler: it took in schemas described with our in-house data exchange library and generated C (for FFI) and Swift code (for the end users, i.e., iOS developers).
Since Routine uses OCaml for everything (which was a big motivator for joining the company—I wanted to see how that would work out), I wrote it in OCaml. The end result is a 3-5k LOC project. It's by no means a full compiler, but it was lots of fun to write. The language got in the way incredibly rarely. On average, it made my life a lot easier. We did encounter our fair share of issues, mostly due to the cross-compilation tooling[1], third-party libraries, and intricacies of FFI. Those do take their toll on sanity.
I tried my hand at writing small compilers / interpreters in Rust, and the experience was nowhere near as smooth. It was fun, and the runtime performance is definitely there, but the ergonomics aren't the same. I especially miss first-class modules whenever I code in something other than OCaml now.
[1]: we initially used esy [2], flirted with Nix, and eventually switched to opam-cross-ios [3].
[1]: https://github.com/esy/esy/
[2]: https://github.com/ocaml-cross/opam-cross-ios
Compilers are in this weird spot where they are really mathematically defined programs (which OCaml excels at implementing), while also having high runtime efficiency as a requirement (the reason why C/C++ are such prominent languages for compilers).
With such requirements, I think a point that is fair to make is that Rust acts as a great middle-ground. It avoids the cost of automatic memory management and provides low-level control while also having a more powerful type system and a more "functional" style.
Brushing off the actual efficiency of the produced binary seems like a huge oversight when dealing with a compiler.
I am not sure that the runtime efficiency of the compiler binary is that important. People like fast compile times, but that is more to do with language design than the choice of language for the compiler.
You could write a compiler for Pascal in Python or another very slow language and it would be faster than a Rust or C++ compiler written in Rust or C++. That is because those languages have designs that make compilation algorithmically slow, while Pascal was designed to be fast to compile.
Almost every compiler is fast on toy-sized programs. E.g. the standard Java compiler is pretty fast, and uses little resources.
It becomes visible when you build a large project: you notice that when you face 100k LOCs, efficiency of every compiler's part starts to matter, and RAM usage may grow to uncomfortable levels if your compiler does not care enough.
>People like fast compile times, but that is more to do with language design than the choice of language for the compiler.
People like fast compile times and people either like to use (or are forced to use) languages that are inherently slow to compile. That's exactly why compiler performance is absolutely critical.
The cost of automatic memory management is latency and increased memory usage. In a soft real-time system like a game, the garbage collector may cause lag spikes so you miss the head-shot on your opponent. You also require at least 50% more memory for efficient automatic memory management. Throughput, however, is not one of the costs. You can in fact achieve higher throughput with automatic memory management than with manual memory management.
Any compiler that gets used at runtime (branded JIT, usually) ends up growing performance hacks or being written from scratch to run quickly. Javascript is prone to using multiple compilers based on how frequently code was executed. That's also what the whole -O0 -O3 -flto -thin-lto -pgo etc flags are about, granting permissions to burn different amounts of time during compilation.
It's really easy to accidentally write code that walks off a performance cliff on unexpected input, but that's likely to get hacked around if someone reports it as slow compilers do annoy people.
The author doesn't seem to have much experience in writing Rust.
For instance, passing RefCell<u32> by value as their code does makes no sense (just use u32...), and the code seems to have a lot of clones, most of which are probably unnecessary, while not having a single instance of the "mut" keyword.
In fact, I'm pretty sure it's completely broken, since their gensym doesn't do what they want, due to their wrong use of clones and refcells (it should take an &mut u32 and just increment it).
Why does it seem like I'm hearing about OCaml all the time now? It could just be frequency bias but it wasn't that long ago that I'd never heard of it and now it seems to be getting a lot of attention online.
I think we're cycling back towards a general preference for statically typed languages, for one thing. Additionally, a number of traditionally functional language characteristics have been finding more widespread adoption among popular languages. Putting these together, OCaml is on a short list of languages that are functional and statically typed and, uh, perhaps "intuitive" is the word I want — Haskell is not very intuitive for many people due to its lazy evaluation scheme.
In my opinion, OCaml would see even more widespread use if the documentation were improved. I find it a chore to figure out how to use OCaml well. I also would like to use third-party libraries like Jane Street's Base because they've put a lot of work into providing even more functionality in their standard library, but their documentation is absolutely atrocious (where it exists at all).
OCaml is a mature language but does not have a very supportive ecosystem. I'm hoping the renewed interest will prompt changes there.
> Why does it seem like I'm hearing about OCaml all the time now?
I felt that way about a dozen years ago. These things have cycles, apparently. But they also recently released multi-core OCaml in OCaml 5 which opens some doors for OCaml that were previously not open.
the ocaml ecosystem has been going through something of a renaissance over the last few years (i believe because the build and package management tooling hit some sort of inflection point with dune and opam respectively), so there's been a lot of increased interest in it. it was (imo, of course) always a very pleasant language to use, and produced small and fast executables; the tooling was what was really holding it back.
To some degree a lot of people are now finding OCaml via dev YouTube influencers who highlight OCaml without actually having used it. There's a lot of enthusiasm, kind of like how lots of people were loving and super excited about Rust without ever using it even for toy programs.
Edit:
I have used OCaml in production and currently I don't see a point to doing it again for the vast majority of problems. From a holistic language + runtime point of view OCaml occupies a space where it's not useful enough from a runtime perspective to replace any of the more convenient languages that exist and not low-level enough to fill the spot of any of the good alternatives in that space. Modularity-wise functors are nice but ultimately plenty of alternatives exist even it the lower-level languages.
With all that said, people should probably use the hell out of it if they're excited. It's a bit tiring seeing the constant stream of misinformation regarding alternatives to OCaml, though. There are good reasons it's losing out in industrial use to even languages like Haskell.
> Lisps can be very flexible, but they usually lack static type safety, opening a wide and horrible door to run-time errors.
People should do basic research before writing something silly like this. Qualifying your statement with 'usually' is just a chicken sh*t approach. Common Lisp and Racket have optional strong typing, leaving the responsibility and choice to the developer. Common Lisp is great for implementing compilers. You also have things like Typed Racket and Coalton. The latter is completely statically typed ala MLTON
This starts out as a fair comparison but evolves pretty quickly towards a one-sided recommendation for Ocaml. I'm quite sure that there are _some_ advantages of Rust that are not listed here and would be curious to learn more about them too.
I normally love articles comparing programming languages at real tasks, but this article seems very low quality to me. The author clearly doesn't understand how rust thinks about programs. Instead, they're trying to pretend that rust is an alternate syntax for ocaml and being surprised to find it comes up short.
The same article could easily be written the other way around. We could start with a high performance rust program (which makes use of arena allocators, internal mutation and any other rust features you love) and then try and convert it line by line into ocaml. We would find that many of rust's concepts can't be clearly expressed in ocaml. The ocaml code would end up uglier and measurably slower than rust. And just like that the article would reach the opposite conclusion - that rust is clearly the better language!
But this is silly.
In general, you obviously can't translate between languages line by line like this and expect to have a good time. A beautiful C program is constructed using different ideas than a beautiful Lua program. And a beautiful Ocaml program is very different from a beautiful rust program.
Some obvious examples of ocaml ideas being overapplied to rust in this article:
1. The types don't really need to be wrapped in Rc here.
2. Rust generally prefers mutable imperative code over applicative code. And if you insist on applicative patterns, functions should take a &Foo.
3. Rust code usually doesn't rely on recursion that much, so the lack of guaranteed TCO isn't something people in the community care about.
4. Rust is optimized for runtime performance over code beauty or code size. Of course rust is less elegant looking than a garbage collected language! The trade is that it should also run faster. But where are the benchmarks to make the comparison fair?
The match example is just straight out bad rust code. This code:
fn eval(term: &Term) -> Value {
match term {
Bool(b) => Value::Bool(*b),
Not(m) => match eval(m) {
Value::Bool(b) => Value::Bool(!b),
_ => panic!("`Not` on a non-boolean value"),
},
// ... lots more nested matches & panics
}
}
Can be flattened, to approximately halve the length of the program like this:
fn eval(term: &Term) -> Value {
match term {
Bool(b) => Value::Bool(*b),
Not(Value::Bool(b)) => Value::Bool(!b),
// ... (all other valid patterns)
_ => panic!("{term} invalid"),
}
}
There's an old saying: "Every programming language you learn should teach you to see programs in a new way". Rust is not a crappy alternate syntax for ocaml any more than ocaml is a crappy, alternate syntax for rust. The only thing I learned from the article is that the author doesn't know rust well enough to evaluate it.
I would probably put a trait implementatoin on Value that does `.invert_bool`, and have that panic. That way eval is just `eval(m).invert_bool()` and if it panics it panics.
What you really probably want is to make this a Result type-returning thing, and then have have `not` be a function of type Value -> Result<Value,ErrType>, and then you can do not(eval(m)) and panic at the top-level.
You know how hard it is to take someone seriously after they embarrass themselves like that while also trying to bring somebody's code down, & ultimately being wrong?
I was going to ask, but the author answers this in the end:
> Other alternatives to consider is Haskell and various Lisp dialects. If you have already “tamed” Haskell (my congratulations and condolences), probably learning OCaml just for writing a compiler is not going to be worth it; if you have not, OCaml is a much more approachable language.
This is an interesting claim, as I thought Haskell and OCaml were more or less equivalently inscrutable.
Here are the features that I think are most important for compiler development:
1) built in eval -- this allows you to transpile to the host language which is invaluable for writing small tests
2) multiline string syntax -- for evaling more than just one liners
3) built in associative and sequential arrays (for the ast)
4) first class closures
5) panic support (for aborting early from unimplemented use cases)
The AST can be represented as an associative array. Each element type can have a 'type' field and rather than pattern matching, you can use if/else. Performance doesn't really matter for the bootstrap compiler because it will only ever be run on relatively small input sets. To get started, you simply walk the ast to transpile to the host language. The snippet is then evaled in the host language to test functionality. Closures allow you to implement the visitor pattern for each ast node, which allows contextual information to be seamlessly interwoven amongst ast nodes during the analysis/transpilation steps.
Keeping all of this in mind, I have identified luajit as my personal favorite language for compiler development. It checks the boxes above, has excellent all around performance for a dynamic language (particularly when startup time is included -- js implementations may beat it on many benchmarks but almost always have slow start up time relative to luajit) and provides a best in class ffi for host system calls. You can run 5000+ line lua scripts faster than most compilers can compile hello, world.
The other reason I like lua(jit) is the minimalism. Once you master lua (which is possible because of its small size) it becomes very obvious that if you can implement something in lua, you can translate the lua implementation to essentially any other language. In this way, there is a sense in which writing a lua implementation becomes almost like a rosetta stone in which a translation can be produced for nearly any other language. With more powerful languages, it is hard to resist the temptation to utilized features that can't always be easily transported to another language. In other words, lua makes it easy to write portable code. This is true both in the sense that lua can be installed on practically any computer in at most a few minutes and in the sense that the underlying structure of a lua program that transcends the syntax of the language can be ported to another computing environment/language.
Another benefit of transpiling to lua is that your new language can easily inherit lua's best properties such as embeddability, fast start up time and cross platform support while removing undesirable features like global variables. Your language can then also be used to replace lua in programs like nginx, redis and neovim that have lua scripting engines. This of course extends to transpiling to any language, which again should be relatively easy if you have already transpiled to lua.
I chose luajit for my language and while I agree with many of your points I really miss a typesystem. Somewhat ironically I'm working on a typesystem for luajit..
I also wish it was a bit more performant, but here it's likely my medium to high level code and not luajit's fault. However running the test suite in plain Lua seem some order of magnitude slower than luajit, so it's a lot faster than plain Lua at least.
I would add pattern matching. I've found this really helpful for manipulating ASTs by matching multiple levels of the tree and pulling out values simultaneously.
No, it's not. For a toy compiler (or for compiling programs with small CFGs), you can use Rc/Weak to represent cycles. For a "real" compiler you'd be using an arena for allocations anyway, which amounts to pointers-are-array-indices.