Why I write recursive descent parsers, despite their issues (2020)

JonChesterfield · 2025-07-28T08:42:37 1753692157

Pet subject of the week here.

Big choices are handrolled recursive decent vs LALR, probably backed by bison or lemon generator and re2c for a lexer.

Passing the lalr(1) check, i.e. having bison actually accept the grammar without complain about ambiguities, is either very annoying or requires thinking clearly about your language, depending on your perspective.

I claim that a lot of the misfires in language implementations are from not doing that work, and using a hand rolled approximation to the parser you had in mind instead, because that's nicer/easier than the formal grammar.

The parser generators emit useless error messages, yes. So if you want nice user feedback, that'll be handrolled in some fashion. Sure.

Sometimes people write a grammar and use a hand rolled parser, hoping they match. Maybe with tests.

The right answer, used by noone as far as I can tell, is to parse with the lalr generated parser, then if that rejects your string because the program was ill formed, call the hand rolled one for guesswork/diagnostics. Never feed the parse tree from the hand rolled parser into the rest of the compiler, that way lies all the bugs.

As alternative phrasing, your linter and your parser don't need to be the same tool, even if it's convenient in some senses to mash them together.

mrkeen · 2025-07-28T16:04:29 1753718669

> parse with the lalr generated parser, then if that rejects your string because the program was ill formed, call the hand rolled one for guesswork/diagnostics

This feels like a recipe for disaster. If the hand-rolled parser won't match a formal grammar, why would it match the generated parser?

The poor programmer will be debugging the wrong thing.

It reminds me of my short stint writing C++ where I'd read undefined memory in release mode, but when I ran it under debug mode it just worked.

senkora · 2025-07-29T04:08:47 1753762127

> It reminds me of my short stint writing C++ where I'd read undefined memory in release mode, but when I ran it under debug mode it just worked.

I assume it’s far too late at this point, but that almost always means that you’re invoking UB. Your next step should be enabling UBSan.

JonChesterfield · 2025-07-29T00:17:27 1753748247

The generated parser will match the grammar.

The hand rolled parser might do, but also might not, what with software being difficult and testing being boring and so forth.

8n4vidtmkvmk · 2025-07-28T22:01:18 1753740078

There's risk, but it seems like you could run both parsers against the same unit tests to help mitigate.

jasperry · 2025-07-28T00:49:57 1753663797

> If I was routinely working in a language that had a well respected de facto standard parser generator and lexer, and regularly building parsers for little languages for my programs, it would probably be worth mastering these tools.

In OCaml, a language highly suited for developing languages in, that de facto standard is the Menhir LR parser generator. It's a modern Yacc with many convenient features, including combinator-like library functions. I honestly enjoy the work of mastering Menhir, poring over the manual, which is all one page: https://gallium.inria.fr/~fpottier/menhir/manual.html

debugnik · 2025-07-28T15:50:51 1753717851

I gave up on Menhir after I understood how allocation-heavy it is during the hot path, at least in the incremental API which is needed for proper errors; and how much of a giant hack you need to force extra lookahead, which shouldn't be such a big deal for parser generators.

These days I just handroll recursive descent parsers with a mutable stream record, `raise_notrace` and maybe some combinators inspired by FParsec for choices, repetition and error messages. I know it's not as rigorous, but at least it's regular code without unexpected limitations.

jasperry · 2025-07-28T16:28:17 1753720097

Could be, I'm not that far along yet. I've only just peeked into the incremental API. I'm still using the error token to try to improve my messages. It's just for syntax errors anyway, right?

fuzztester · 2025-07-28T07:51:10 1753689070

>In OCaml, a language highly suited for developing languages in,

What makes OCaml suited for that?

mjburgess · 2025-07-28T08:31:27 1753691487

algebraic datatypes (tagged unions + pattern matching); compiled, garbage collected (you dont really need memory management for a compiler), statically typed with inference

hibikir · 2025-07-28T09:42:18 1753695738

Yeah, the same reasons Scala has a built in parser combinator module in the standard library: Just easy to use with those features in the language

fuzztester · 2025-07-28T17:17:34 1753723054

thanks.

greggyb · 2025-07-28T08:40:07 1753692007

ML, the language heritage from which OCaml derives, was explicitly designed with interpreters and compilers in mind.

nicoburns · 2025-07-28T00:49:16 1753663756

I wonder who it is that likes other kinds of parser. Over the last ~10 years or so I've read several articles arguing that recursive descent parsers are in fact great on HN. And they seem to be both the easiest to get started with and what almost all production-grade systems use. I've seen very little in the way of anything arguing for any other approaches.

o11c · 2025-07-28T01:22:03 1753665723

Recursive descent is fine if you trust that you won't write buggy code. If you implement a generator for it (easy enough), this may be a justifiable thing to trust (though this is not a given). I am assuming you're willing to put up with the insanity of grammar rewriting, one way or another.

LR however is more powerful, though this mostly matters if you don't have access to automatic grammar rewriting for your LL. More significantly, however, there's probably more good tooling for LR (or perhaps: you can assume that if tooling exists, it is good at what it is designed for); one problem with LL being so "simple" is that there's a lot of bad tooling out there.

The important things are 1. that you meaningfully eliminate ambiguities (which is easy to enforce for LR and doable for LL if your tooling is good), and 2. that you keep linear time complexity. Any parser other than LL/LR should be rejected because it fails at least one of these, and often both.

Within the LL and LR families there are actually quite a few members. SLR(1) is strong enough to be interesting but too weak for anything I would call a "language". LALR(1) is probably fine; I have never encountered a useful language that must resort to LR(1) (though note that modern tooling can do an optimistic fallback, avoiding the massive blowups of ancient LR tools). SLL(1) I'm not personally familiar with. X(k), where X is one of {SLL, LL, SLR, LALR, LR} and where k > 1, are not very useful; k=1 suffices. LL(*) however should be avoided due to backtracking, but in some cases consider if you can parsing token trees first (this is currently poorly represented in the literature; you want to be doing some form of this for error recovery anyway - automated error recovery is a useless lie) and/or defer the partial ambiguity until the AST is built (often better for error messages anyway, independent of using token trees).

kerkeslager · 2025-07-28T03:35:15 1753673715

> Recursive descent is fine if you trust that you won't write buggy code. If you implement a generator for it (easy enough), this may be a justifiable thing to trust (though this is not a given).

The idea that you're going to hand-roll a parser generator and then use that to generate a parser and the result is going to be less buggy than just hand-rolling a recursive descent parser, screams "I've never written code outside of an academic context".

pjc50 · 2025-07-28T10:22:12 1753698132

One of the smartest projects I've ever seen was a tool that took the human-readable tables of the HEVC and AV1 specs, used them as input to https://en.wikipedia.org/wiki/OMeta parser-generator, and then output both HEVC parsers in a variety of languages and also auto-fuzzers for test coverage. Ended up at https://www.graphcore.ai/posts/graphcore-open-sources-argon-...

Personally I've also written a parser-generator for XML in C# to overcome some of the odd limitations of Microsoft's one when used in AOT contexts.

Hand-rolling is easy if the grammar is small. The larger it gets (and video codecs are huge!) the more you want something with automatic consistency.

kerkeslager · 2025-07-29T01:27:11 1753752431

Sure, if you need parsers in a dozen languages, then a parser generator might to make sense because you're not writing one parser, you're writing a dozen.

But, the vast majority of parsers I've written didn't have this requirement. I needed to write one parser in one language.

maxbond · 2025-07-28T03:54:37 1753674877

> [It] screams "I've never written code outside of an academic context".

SQLite, perhaps the most widely deployed software system, takes this approach.

https://sqlite.org/lemon.html

> The Lemon LALR(1) Parser Generator

> The SQL language parser for SQLite is generated using a code-generator program called "Lemon".

> ...

> Lemon was originally written by D. Richard Hipp (also the creator of SQLite) while he was in graduate school at Duke University between 1987 and 1992.

Here are the grammars, if you're curious.

https://github.com/sqlite/sqlite/blob/master/src/parse.y

mpyne · 2025-07-28T04:41:59 1753677719

SQLite is kind of cheating here, you won't catching me writing my own source control management system either.

But I do think the wider point is still true, that there can be real benefit to implementing 2 proper layered abstractions rather than implementing 1 broader abstraction where the complexity can span across more of the problem domain.

kerkeslager · 2025-07-29T01:28:45 1753752525

Yeah, let me know when you're writing the next SQLite. For your average parser, you're not writing the SQLite parser, you don't have the SQLite parser's problems, and you don't need SQLite's solutions.

maxbond · 2025-07-29T02:29:32 1753756172

Most people aren't writing something as complex as SQLite, but most people aren't writing parsers either. Those writing parsers are disproportionately writing things like programming languages and language servers that are quite complex.

SQLite isn't some kind of universal template, I'm not saying people should copy it or that recursive descent is a bag choice. But empirically parser generators are used in real production systems. SQLite is unusual in that they also wrote the parser generator, but otherwise is in good company. Postgres uses Bison, for example.

Additionally, I think that Lemon was started as a personal learning project in grad school (as academic a project as it gets) and evolved into a component of what is probably the most widely deployed software system of all time shows this distinction between what is academic and what is practical isn't all that meaningful to begin with. What's academic becomes practical when the circumstances are right. Better to evaluate a technique in the context of your problem than to prematurely bin things into artificial categories.

kerkeslager · 2025-07-30T21:03:34 1753909414

> Those writing parsers are disproportionately writing things like programming languages and language servers that are quite complex.

Sure, but adding the complexity of a parser generator doesn't help with that complexity in most cases.

[General purpose] programming languages are a quintessential example. Yes, a compiler or an interpreter is a very complex program. But unless your programming language needs to be parsed in multiple languages, you definitely do not need to generate the parser in many languages like SQLite does. That just adds complexity for no reason.

You can't just say "it's complex, therefore it needs a parser generator" if adding the parser generator doesn't address the complexity in any way.

maxbond · 2025-08-02T02:56:37 1754103397

I'm not saying everyone or even anyone in particular needs a parser generator. I'm saying real, widely deployed projects - as far as you can get from academic - empirically find it useful.

Creating abstractions does decrease complexity if one (or more) of the following is true:

- The abstraction generates savings in excess of it's own complexity

- The abstraction is shared by enough projects to amortize the cost of writing/maintaining it to a tolerable level

- There are additional benefits like validating your grammars are unambiguous or generating flow charts of your syntax in your documentation, amortizing the cost across different features of the same project

It's up to you as the implementer to weigh the benefits and costs. If you choose to use recursive descent, more power to you. (For what it's worth, I personally use parser combinators to split the difference between writing grammars and hand-rolling parsers. But I've used parser generators before and found them helpful.)

kerkeslager · 2025-08-11T21:16:10 1754946970

I agree, I just don't understand why you seem to think this is a correction to anything I said.

If your goal is simply to reduce bugs--not something more complex like generating parsers in a bunch of languages--then hand rolling a parser generator and then using it to generate your parser [singular] is not a path to achieving your goals. That's what I said, and that's actually just true, which you probably know.

This is not an invitation to bring up irrelevant, exceptional cases, it's the rule of thumb you should operate on. Put another way, don't add layers when there isn't a reason to do so. If there is a reason to do so, have at it. Obviously.

In a meta sense, it's pretty socially inept to jump in with corrections like this. In a complex field like programming, of course there are exceptions, and it's disrespectful to the group of professionals in the room to assume that they don't know about the exceptions. I'm guilty of this myself: it's because I was brought up being praised for knowing things, so I want to demonstrate that I know things. But as an adult, I had to learn that I'm not the only knowledgeable person in the room, and it's rude to assume that I am.

lanstin · 2025-07-30T15:27:04 1753889224

There is a great Steve Yegge post on how useful ad-hoc transformation of source code is: http://steve-yegge.blogspot.com/2007/06/rich-programmer-food...

The only time I have used this myself was an expat style transformer for terraform (HCL). We had a lot of terraform and they kept changing the language, so I would build a fixer to make code written for say 0.10 to work with 0.12 and then again for 0.14. It was very fun and let us keep updating to newer terraform versions. Pretty simple language except for distinguishing quoted blocks from non-quoted.

kerkeslager · 2025-08-11T21:20:37 1754947237

> The only time I have used this myself was an expat style transformer for terraform (HCL). We had a lot of terraform and they kept changing the language, so I would build a fixer to make code written for say 0.10 to work with 0.12 and then again for 0.14. It was very fun and let us keep updating to newer terraform versions. Pretty simple language except for distinguishing quoted blocks from non-quoted.

I hear stories like this and I just wonder how we got here. Like, did this work provide any monetary value to anyone? It sounds like your team just got way too lost in the abstractions and forgot that they were supposed to make a product that did something, ostensibly something that makes money.

I mean, I guess if you can persuade people to give you money to do something, it's profitable. :shrug:

lanstin · 2025-08-16T06:23:33 1755325413

Pre terraform it would take a half day to setup I new account, create buckets meeting our policies, replicate to backup buckets, etc etc. with terraform we could do same in thirty minutes so we could service many more development teams with same head count.

This work made it easier to keep terraform in sync with aws and so maintenance (e.g. adding a new policy to existing S3 buckets was just edit the S3 bucket module and re-applying to every account. The parser was way easier than manually editing the files manually (especially since I had so much terraform as a test bed of data).

kerkeslager · 2025-08-18T20:48:22 1755550102

Nobody was saying Terraform is worse than configuring servers by hand--I certainly wasn't. Automation here is obviously right. If you're comparing Terraform to no automation at all, of course Terraform comes out ahead, but that's not saying much.

What isn't right is changing the language and policies constantly, and if your editing configs is so difficult that writing a parser to do it was easier, one begins to think that Terraform wasn't the right automation tool.

motorest · 2025-07-28T04:48:17 1753678097

> The idea that you're going to hand-roll a parser generator and then use that to generate a parser and the result is going to be less buggy than just hand-rolling a recursive descent parser, screams "I've never written code outside of an academic context"

Your comment is quite funny as hand-rolling a recursive descent parser is the kind of thing that is often accused of being a) bug-prone, b) only done in academic environments.

paddim8 · 2025-07-28T07:00:16 1753686016

What? Accused of only being done in academic environments? Never heard that. Academics seem to spend 99% of their time talking about parser generators and LR parsing for some reason while most production compilers have handwritten recursive descent parsers...

jfyi · 2025-07-28T12:11:40 1753704700

Same here, recursive descent is what I have run into in the real world.

I'm just happy when parsing isn't being done with some absurdly long regex with no documentation.

vidarh · 2025-07-28T13:47:08 1753710428

Having written several parser generators, all my production parsers are hand-written - either pure recursive descent or a combination of recursive descent and operator precedence parsing.

The reason being that the reason there are so many parser generators is largely that we keep desperately looking for a way of writing one that isn't sheer pain in production use.

rstuart4133 · 2025-08-07T11:23:35 1754565815

> LALR(1) is probably fine; I have never encountered a useful language that must resort to LR(1)

An LR(1) parser can have many more states in it's DFA than LALR(1). That was important back in the 1970's when I was fighting for every byte of RAM, but now it's a total non-issue. I don't know why you would bother with LALR(1) now if you had a LR(1) parser generator.

nrds · 2025-07-28T18:11:10 1753726270

> ambiguities

It's important to note that ambiguities are something which exist in service of parser generators and the restricted formal grammars that drive them. They do not actually exist in the language to be parsed (unless that language is not well-specified, but then all bets are off and it is meaningless to speak of parsing), because they can be eliminated by side-conditions.

For example, one famous ambiguity is the dangling 'else' problem in C. But this isn't an actual ambiguity in the C language: the language has a side-condition which says that 'else' matches to the closest unmatched 'if'. This is completely unambiguous and so a recursive descent parser for C simply doesn't encounter this problem. It is only because parser generators, at least in their most academic form, lack a way to specify this side-condition that their proponents have to come up with a whole theory of "ambiguities". (Shockingly, Wikipedia gets this exactly right in the article on dangling else which I just thought to look up: "The dangling else is a problem in programming of parser generators".)

Likewise goes the problem of left-recursion. Opponents of recursive descent always present left-recursion as a gotcha which requires some special handling. Meanwhile actual programmers writing actual recursive descent parsers don't have any idea what these academics are talking about because the language that they're parsing (as it exists in their mind) doesn't feature left-recursion, but instead iteration. Left-recursion is only introduced in service of restricted formal grammars in which recursion is the only available primitive and iteration either doesn't exist or is syntactic sugar for recursion. For the recursive descent user, iteration is a perfectly acceptable primitive. The reason for the discrepancy goes back to side-conditions: iteration requires a side-condition stating how to build the parse tree; parser generators call this "resolving the ambiguity" because they can't express this in their restricted grammar, not because the language was ambiguous.

rstuart4133 · 2025-08-07T11:38:22 1754566702

> It's important to note that ambiguities are something which exist in service of parser generators and the restricted formal grammars that drive them. They do not actually exist in the language to be parsed

Only partially true. How do you define the language to be parsed? It's with a grammar. If the grammar can yield two different parse trees for the same input, it's ambiguous. In LR parlance, if your grammar is ambiguous because of a shift-reduce conflict, it's because you stuffed up your grammar.

That's a real problem. It the difference between parsing "1 + 2 / 3" as "(1 + 2) / 3" and "1 + (2 / 3)". The two interpretations yield very different outcomes. The reason you see so many people here say "use a generated LL or LR parser" is the generator will find and report that mistake. It's a very easy mistake to make, and you won't realise you've made it.

Then there are what LR calls reduce-reduce conflicts. Yes, that may happen because the LR parser can't look far enough ahead. Or, it may again be because you've stuffed you grammar. Or it may be because the language you have in your head really isn't context free. Perl is in the last category. They claim to have got around it by saying its a "do what I mean" language. Fine, but it turns out in some cases what they think a string obviously means doesn't agree with what I thought it obviously meant.

nrds · 2025-08-18T03:30:10 1755487810

> How do you define the language to be parsed? It's with a grammar.

False. This is how you define a language _to a parser generator_, but it is not how humans (and/or developers) define languages to each other.

> you won't realise you've made it

This is literally impossible in a recursive descent parser. I'm not saying getting it wrong is impossible, of course not. But what you literally cannot do (without concerted intentional effort) is make it ambiguous. Your parser will parse one first, or the other first, or either one left-to-right; and you will know which of these it does by reading the code.

layer8 · 2025-07-28T19:32:41 1753731161

> They do not actually exist in the language to be parsed (unless that language is not well-specified

How do you specify your language “well” when you don’t know if your grammar is unambiguous? Determining whether a grammar is ambiguous is famously undecidable in the general case. So how do you decide, if you don’t restrict your grammar to one of the decidable forms checkable by parser generators? You can add some disambiguation rules, but how do you know they cover all ambiguities?

We use formal systems exactly to make sure that the language is well-defined.

o11c · 2025-07-28T20:24:41 1753734281

Dangling "else" isn't actually a problem for parser generators. All you have to do is either:

* use proper rules rather than cramming everything into "statement", or

* specify explicit precedence rules, which is just a shortcut for the above (also skipping useless reductions)

Doing this is ubiquitous with parser generators when dealing with vaguely Algol-like languages, and is no different than the fact that you have to do the same thing for expressions.

jasperry · 2025-07-28T00:55:20 1753664120

But remember that the articles arguing for recursive descent parsers are arguing against the long-dominant paradigm of using LR parsers. Plenty of us still like LR parser generators (see my other comment.)

In between "easiest to get started with" and "what production-grade systems use", there is "easy to actually finish a medium-sized project with." I think LR parsers still defend that middle ground pretty well.

nicoburns · 2025-07-28T01:53:08 1753667588

> But remember that the articles arguing for recursive descent parsers are arguing against the long-dominant paradigm of using LR parsers

That was part of my question I think. I wouldn't have been able to tell you that the dominant paradigm being argued against was LR parsers, because I've never come across even one that I'm aware of (I've heard of them, but that's about it). Perhaps it's academia where they're popular?

jasperry · 2025-07-28T02:23:47 1753669427

I did learn about LR parser generators first in my Compilers class in college, but I assumed they were generally known about in language development communities.

userbinator · 2025-07-28T03:02:11 1753671731

I wonder who it is that likes other kinds of parser.

It seems to be mainly academics and others interested in parsing theory, and those who like complexity for the sake of complexity.

masklinn · 2025-07-28T14:08:35 1753711715

Pratt parsers are really fun if slightly mind-bending, their ability to handle odd associativities, precedences, and arities is basically unmatched making them really useful to embed inside recursive descent for when you reach expressions. If you need infix and mixfix operators anyway.

lenkite · 2025-07-28T01:58:57 1753667937

The literature for incremental parsing doesn't appear to have much for recursive descent. Everyone appears to use the LR tree sitter approach.

cxr · 2025-07-28T23:49:36 1753746576

The post by Laurence Tratt, which this piece is a response to, argues for another approach and is mentioned in the first sentence.

o11c · 2025-07-28T00:55:38 1753664138

In terms of language-agnosticism, you can use Bison to calculate the tables (the hard part) and dump an xml file, then implement the machine yourself trivially.

I get really annoyed when people still complain about YACC while ignoring the four decades of practical improvement that Bison has given us if you bother to configure it.

randomNumber7 · 2025-07-28T14:38:44 1753713524

The paper "Top Down Operator Precedence" also called "Pratt's Paper" introduced a very elegant algorithm for recursive descent parsers in 1973.

Is is also written in a badass style and argues that this is superior to parser generators.

https://dl.acm.org/doi/pdf/10.1145/512927.512931

pratt4the_win · 2025-07-28T22:14:55 1753740895

Pratt parsers are elegant. I really like them.

For those to whom they are new: I found them a little tricky to implement directly from Pratt's paper or even Crockford's javascript that popularized them.

So, through trial and error I figured out how to actually implement them in regular languages (i.e. not in Lisp).

If it helps, examples in C and Go are here:

https://github.com/glycerine/PrattParserInC

https://github.com/glycerine/zygomys/blob/master/zygo/pratt....

I find them easier to work with than the cryptic LALR(1) bison/yacc tools, but then I never really felt like I mastered yacc to begin with.

ivanjermakov · 2025-07-27T23:59:45 1753660785

Related: Resilient LL Parsing Tutorial https://matklad.github.io/2023/05/21/resilient-ll-parsing-tu...

ufo · 2025-07-27T23:53:59 1753660439

A middle ground that I think is sometimes useful is to use an LR parser generator to check if the grammar is ambiguous, but use recursive descent for the actual implementation. Since we won't actually use any code from the LR parser generator, you can pick whatever one you prefer regardless of the programming language.

sirwhinesalot · 2025-07-28T08:21:36 1753690896

It's trivial to get a recursive descent parser without any ambiguities hidden in it if you don't go the PEG route (which is only unambiguous because you always pick the first choice, which might not be what you want). Just always branch on the current token. No way to have an ambiguity like that.

ufo · 2025-07-28T12:22:09 1753705329

I disagree. When writing recursive descent by hand, it's easy to miss an ambiguity because of miscomputed FIRST and FOLLOW sets.

In practice most recursive descent parsers use if-else liberally. Thus, they effectively work like pegs where the first match wins (but without the limited backtracking of pegs). They are deterministic in the sense that the implementation always returns a predictable result. But they are still ambiguous in the sense that this behavior might not have been planned by the language designer, and the ambiguity may not have been resolved how the programmer expected.

kazinator · 2025-07-28T14:49:06 1753714146

Without a comprehensive test suite you can easily break a recursive descent parser. By adding code into some function to handle something new, you can accidentally prevent some existing syntax from being recognized.

It has been my eperience that if you have a LALR parser that reports no errors at generation time, and you add something such that there are still no errors, you've not ruined any existing syntax. That could be a theorem.

sirwhinesalot · 2025-07-28T16:14:22 1753719262

Don't compute first and follow sets. Just branch on the current token. It is trivially unambiguous since 1 token = 1 branch. Expressions can be dealt with using precedence climbing / pratt, which still just amounts to branching on the current token after the "lhs" has been computed.

If the language doesn't fit this LL1 + operator precedence mold then I would not use a recursive descent parser.

ufo · 2025-07-28T16:44:12 1753721052

The whole issue is that, without computing first and follow sets, it's easy to make a mistake and think that the grammar is LL(1) when it actually isn't. When that happens, the if statement does you no good. It'll be deterministic, but will silently mask the ambiguity.

sirwhinesalot · 2025-07-28T17:19:13 1753723153

The only case I can think of is something like the dangling else problem or similar, which is very easy to detect.

You need to have a recursive call to the same rule or a "parent" rule, followed by an optional token match (note that this is already more than a simple branch on the current token since it is effectively a 1 token backtrack).

If there's any extra token match inbetween (like mandatory {}) you've already dodged the problem.

So I agree a mistake is possible, but "easy" I would not call it. It only appears in very specific conditions. The issue is way more prevalent in general PEGs.

thechao · 2025-07-28T01:54:41 1753667681

I've been having thoughts along these lines. Earley parsers match recursive descent really nicely. In my head there'd by an Earley parser "oracle": you'd tell the oracle about the operations you've performed (in terms of terminal consumption); and, then, you can ask the oracle which recursive descent subfunctions are safe to call (based on the prediction phase).

marssaxman · 2025-07-28T01:36:36 1753666596

I have never found parser generators to be worth the hassle. Recursive descent with a little Pratt-style precedence climbing is all you need.

derriz · 2025-07-28T05:34:22 1753680862

Agree completely and I’ve used a bunch of them and also functional combinator libraries. I‘d go further and say the recursive descent and Pratt approach is the way if you want to offer useful error messages and feedback to the user. They’re also trivial to debug and test unlike any generation based approach.

fuzztester · 2025-07-28T07:54:50 1753689290

>functional combinator libraries

By that, do you mean parser combinators?

derriz · 2025-07-28T09:59:29 1753696769

Yes - but this was decades ago so my memory is hazy. It was with an early Haskell variant called Gofer - which had a nice feature which allowed using list comprehension notation with arbitrary monads - which for simple grammars produced very readable - even beautiful - parser code. But like with parser generators, once the grammar became complex, the beauty and simplicity disappeared.

Actually I wish this generalization of list comprehensions had been taken up by Haskell or other languages. Haskell decided on the do notation while Python users these days seem to shun the feature.

themk · 2025-07-28T12:36:48 1753706208

ghc has the the MonadComprehension extension which does what you desire

https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/mona...

fuzztester · 2025-07-28T10:05:31 1753697131

Thanks.

On a side note, I do use Python list comprehensions, and like them.

zahlman · 2025-07-28T01:18:58 1753665538

> But in practice I bounce back and forth between two languages right now (Go and Python, neither of which have such a standard parser ecology)

https://pypi.org/project/pybison/ , or its predecessors such as https://pypi.org/project/ply/ ?

But yes, the decidedly non-traditional https://github.com/pyparsing/pyparsing/ is certainly more popular.

somat · 2025-07-28T17:19:41 1753723181

To add to your survey, I have been reading the lark documentation https://github.com/lark-parser/lark and like the cut of it's jib, I have not used it yet as I don't really have any projects that need a full parser.

fjfaase · 2025-07-24T06:23:43 1753338223

I recently wrote a small C compiler that uses a recursive decent parser while this should not be possible if you just look at the syntax grammar. Why, because it looks at some semantic information about the class of identifiers, whether they are variables of typedefs for example. On the otherhand this is not very surprising, because in the days C was developed, easy parsing was a practical implication of it not being an academic research thing, but something that just had to work.

Recursive decent parsers can simply be implemented with recusive functions. Implementing semantic checks becomes easy with additional parameters.

ufo · 2025-07-27T23:46:06 1753659966

It sounds like you're describing the Lexer Hack[1]. That trick works just the same in an LR parser, so I wouldn't count it as an advantage of recursive descent.

[1] https://en.wikipedia.org/wiki/Lexer_hack

fjfaase · 2025-07-28T06:08:44 1753682924

Yes, it is basically this. I feel that writing a recursive descent parser with recursive functions is a bit easier than using an LR parser generator or a back-tracking PEG parser from my experience. It also does not requirer any third party tools or libraries, which I see as advantage.

WalterBright · 2025-07-27T23:28:13 1753658893

When I developed ImportC (which enables D compilers to read and use C code) I tried hard to build it and not require semantic analysis.

What a waste of time. I failed miserably.

However, I also realized that the only semantic information needed was to keep track of typedefs. That made recursive descent practical and effective.

norir · 2025-07-28T15:00:02 1753714802

There is a straightforward technique for writing unambiguous recursive descent parsers. The high level algorithm is this: parsing always consumes one character, never backtracks and is locally unambiguous.

You then construct the parser by combining unambiguous parsers from the bottom up. The result ends up unambiguous by construction.

This high level algorithm is much easier to implement without a global lexer. Global lexing can be a source of inadvertent ambiguity. Strings make this obvious. If instead, you lex in a context specific way, it is usually easy to efficiently eliminate ambiguities.

coldcode · 2025-07-28T14:07:17 1753711637

Funny, I wrote a recursive descent parser in 1982, in Fortran, to parse the syntax of the Jovial programming language. That was my first ever professional programming project, with no university degree in CS, or job experience. Note, Fortran (78) is a terrible language to write a parser in.

I wish I could have save the source. It would be fun to see it.

pklausler · 2025-07-28T17:49:17 1753724957

Recursive descent is a cleaner way to go when the language cannot be lexed without feedback from the parser and semantics, like Fortran. And parser combinators make RD straightforward to code.

favorited · 2025-07-28T19:21:26 1753730486

Which mainline compilers or runtimes use a generated parser? I know that CRuby does, though they've recently standardized on Prism as their public AST, and it's possible that they'll switch to Prism for parsing eventually. I know that Go used to, as well as ancient versions of GCC.

It seems that, from the outside looking in, ~all significant PL projects end up using a hand-written recursive descent parser, eventually.

layer8 · 2025-07-28T19:24:29 1753730669

The problem remains how to verify that the hand-written parser matches the purported grammar, and that the grammar isn’t ambiguous in the first place.

keithnz · 2025-07-27T23:53:04 1753660384

recursive descent parsers are usually what I do for my little domain specific scripting languages. They are just easy and straightforward. I do like things like ANTLR, but most of the time it seems unnecessary.

fuzztester · 2025-07-28T08:47:16 1753692436

Got any open source ones you can share links / code of?

I am interested in that area, and reading up and learning about it.

deterministic · 2025-07-31T06:08:20 1753942100

I use recursive descent parsers all the time for small DSL's and for a JIT compiled optimizing production quality compiler. It works great.

markus_zhang · 2025-07-28T11:06:00 1753700760

I have heard that RDP is prominent in production parsers, I wonder is it true? And is it pure handwritten RDP or combined with other automated techniques?

o11c · 2025-07-28T17:18:32 1753723112

One reason hand-written recursive-descent parsers are common is because a lot of languages are poorly designed, and it's easier to hack around the mistakes in a hand-written parser.

For new languages this should be avoided - just design a sane grammar in the first place.

chadcmulligan · 2025-07-28T03:14:35 1753672475

fwiw LLM's seem very good at writing recursive descent parsers, at least for the small experiments I've done (wrote a Lua parser in Delphi).

UncleOxidant · 2025-07-28T04:56:46 1753678606

Agreed. I recently had Gemini write a recursive descent parser for a specified subset of C in C and it did quite well. I've tried similar with Claude 4 and Qwen3 Coder and again, both did quite well.

ogogmad · 2025-07-28T04:31:39 1753677099

Have people heard of the following top-down parsing algorithm for mathematical expressions:

  1. Replace any expression that's within parentheses by its parse tree by using recursion
  2. Find the lowest precedence operator, breaking ties however you'd like. Call this lowest precedence operator OP.
  3. View the whole unparsed expression as `x OP y`
  4. Generate a parse tree for x and for y. Call them P(x) and P(y).
  5. Return ["OP", P(x), P(y)].

It's easy to speed up step 2 by keeping a table of all the operators in an expression, sorted by their precedence levels. For this table to work properly, the positions of all the tokens must never change.

da-bacon · 2025-07-28T08:44:40 1753692280

For 2, I don’t think you can break ties however you like because this would give you random left or right associativity https://en.m.wikipedia.org/wiki/Operator_associativity For example 2-4-7 would be either (2-4)-7 or 2-(4-7), depending on how you broke the tie.

johnwbyrd · 2025-07-28T22:07:54 1753740474

I'm surprised, and a little disappointed, that no one in this thread has mentioned parsing expression grammars (https://en.wikipedia.org/wiki/Parsing_expression_grammar) which are a much more human-friendly form of grammar for real-world parsing tasks.

sparkie · 2025-07-28T23:47:36 1753746456

PEGs are closely related to recursive descent, and have some of the same problems.

A PEG is always unambiguous because it picks the first option - but whether that was the intended parse is not necessarily straightforward. In practice these problems don't usually show up, so they're fine to work with.

The advantage LR gives you is that it produces a parser where there are no ambiguities and every successful parse is the one intended. An LR grammar is a proof, as well as a means of producing a parser. A decent LR parser generator is like a simple proof assistant - it will find problems with your language before you do, so you can fix your syntax before putting it into production.

In "real-world" parsing tasks as you put it, the problems of LR parser generators is that they're not the best suited to parsing languages that have ambiguities, like C, C++ and many others. Some of the complaints about LR are about the workarounds that need to be done to parse these languages, where it's obviously the wrong tool for the job because those languages aren't described by proper LR grammars.

But if you're designing a new language from scratch, surely it's better to not repeat those mistakes? If you carefully design your language to be parsed by an LR grammar then other developers who come to parse your language won't encounter those issues. They won't need lexical tie-ins and other nonsense that complicates the process.