Two Algorithms, One Intuition: Shunting Yard and Pratt Parsing

Parsing arithmetic expressions looks simple… until precedence enters the picture. Why does * bind tighter than +, and how can a parser enforce these rules without complex grammar machinery? Two classic algorithms provide radically different answers: Dijkstra’s stack-based Shunting Yard and Pratt’s recursive Top-Down Operator Precedence. Side by side, they reveal the same underlying intuition expressed through two very different machines.

Winter is the perfect time of year to rediscover vintage algorithms and savour them between one line of code and the next. It may seem odd at a time when everyone seems busy chasing the next big thing in software development; and yet, every now and then a walk outside the usual paths can only do good. Let us call it my hour of freedom.

It is precisely in this space of freedom that I stumbled upon the Shunting Yard, an algorithm described by Dijkstra in 1961 in the report ALGOL 60 Translation. In the literature, it has never been the subject of extensive treatment and, curiously, the original text assigns it no name at all. The nickname “Shunting Yard” has settled over time, more through teaching tradition than by the author’s designation.

I am not sure why, but reading Dijkstra brought to mind another example of an algorithm that, while elegant and surprisingly modern, went for decades without much visibility in traditional compilation textbooks. I am referring to the Pratt parser, formally introduced by Vaughan R. Pratt in 1973 in the paper Top Down Operator Precedence, presented at the first POPL symposium.

Here too the trajectory is curious: the idea never quite disappears, yet it remains at the margins of mainstream teaching, overshadowed by the dominance of LL and LR parsers. Only many years later is it rediscovered and brought to the attention of a new generation of developers, notably through Douglas Crockford’s popularising work and, more recently, through Robert Nystrom with Crafting Interpreters.

It is hard not to notice a certain historical symmetry: two algorithms born in eras when compilation theory was still taking shape, both elegantly simple, both left for a long time outside the main spotlight.

Perhaps, even more than the historical context, it was the weight both place on precedence that led me to set them side by side. Barely a decade apart, Dijkstra and Pratt seem to answer — each in their own way — the same fundamental question: how to handle precedence rules in an expression in a systematic way?

The solutions are profoundly different in their tools, yet surprisingly akin in their goal. To clarify this parallelism, it is worth briefly telling the story of both algorithms, without resorting to heavy formalisms.

The Shunting Yard

The algorithm assumes a small conceptual “machine” composed of four elements:

an input tape containing the expression in infix notation;
an operator stack, used as a waiting area;
an output tape, on which the expression is built in RPN (Reverse Polish Notation);
a precedence (and associativity) table that orders operators from least binding to most binding.

The idea is simple: operands pass immediately to output, while operators are held on the stack until we are certain that no other operator with higher precedence (or equal precedence, in the case of left-associativity) needs to be emitted first.

The algorithm

Scanning the input from left to right, for each token t:

If t is a number (or an identifier): write it to the output tape.
If t is an operator op: while there is an operator top on top of the stack with precedence greater than or equal to op (left-associativity), move top from the stack to the output tape; then push op onto the stack.
If t is (: push it onto the stack (it acts as a “barrier”).
If t is ): move operators from the stack to the output tape until ( is encountered; remove ( from the stack (but do not write it to output).
At the end of input: flush the stack, moving all remaining operators to output.

This is the operational rule that, in effect, implements precedence: a * stays “suspended” above a +, and is not emitted until it becomes inevitable.

Worked example

Let us take the expression:

1 + 2 * 4 - (10 + 1 * 2)

For brevity, I denote the state as:

OUT = output tape (RPN)
STK = operator stack (top on the right)

The main states, token by token:

Token  Action                          OUT                        STK
─────────────────────────────────────────────────────────────────────────────────────
1      output                          1
+      push                            1                          +
2      output                          1 2                        +
*      push (prec > +)                 1 2                        + *
4      output                          1 2 4                      + *
-      pop *, pop +, push -            1 2 4 * +                  -
(      push                            1 2 4 * +                  - (
10     output                          1 2 4 * + 10               - (
+      push                            1 2 4 * + 10               - ( +
1      output                          1 2 4 * + 10 1             - ( +
*      push                            1 2 4 * + 10 1             - ( + *
2      output                          1 2 4 * + 10 1 2           - ( + *
)      pop *, pop +, remove (          1 2 4 * + 10 1 2 * +       -
end    flush stack: pop -              1 2 4 * + 10 1 2 * + -

The result on the output tape is:

1 2 4 * + 10 1 2 * + -

This is simply the RPN form of the original infix expression, and it can in turn be evaluated by an even simpler machine: a value stack on which operators are applied as they appear.

Shunting Yard conceptual machine. Operator precedence is enforced by an explicit stack: operators are temporarily held in suspension and flushed to the output when precedence rules require it. The result is a flat RPN sequence ready for evaluation.

Shunting Yard conceptual machine. Operator precedence is enforced by an explicit stack: operators are temporarily held in suspension and flushed to the output when precedence rules require it. The result is a flat RPN sequence ready for evaluation. (Generated with AI tool by Christian Del Monte)

The Pratt Parser

The method, introduced by Vaughan Pratt in 1973 under the name Top-Down Operator Precedence, rests on a simple and powerful idea: assign each operator a binding strength — the so-called binding power — and let that decide whether an operator belongs to the current expression or to a higher syntactic level. Where the Shunting Yard’s precedence governs when a stack must be flushed, the Pratt parser’s precedence governs how far the recursion is allowed to reach. This is a significant shift in perspective: instead of holding operators in suspension and releasing them at the right moment, the parser directly builds a syntactic structure in which precedence is already embedded in the shape of the tree.

The Pratt parser can be described as a surprisingly compact machine:

an input tape with a cursor advancing over tokens,
a central function parse_expr(min_bp) that drives the entire process,
and a binding power table that assigns each operator a numeric pair (left_bp, right_bp).

Unlike the Shunting Yard, there is no explicit operator stack, no structure accumulating symbols waiting to be released. Precedence is not implemented through push and pop, but as a numeric threshold that bounds the expansion of recursion. Each call to parse_expr establishes a boundary, min_bp, and only operators whose left binding power is strictly greater than this threshold can be absorbed into the current context; the rest are left to the level above. The hierarchy of operations thus emerges from the depth of the recursive calls, not from the discipline of an explicit stack.

Of course, at the level of a concrete implementation, a stack does exist: it is the call stack of the recursion. But here the conceptual difference from the Shunting Yard becomes apparent: in the Shunting Yard the stack is an integral part of the algorithmic model and is manipulated directly; in the Pratt parser the stack is implicit, an artefact of top-down execution. In the Pratt parser, precedence does not discipline a pile of operators but delimits the syntactic region that can be constructed before returning control to the calling level. And it is precisely in this shift — from the temporal management of operators to the structural delimitation of recursion — that the comparison with Dijkstra becomes truly interesting.

The algorithm

Let lhs be the expression produced by the initial token. Given a function parse_expr(min_bp):

Read the next token t.
Interpret t as an initial-position token (a number yields an immediate value; an opening parenthesis triggers a recursive sub-expression; a prefix minus builds a unary node).
While the next token op is a binary operator with left_bp(op) > min_bp:
- Consume op.
- Compute r_bp = right_bp(op).
- Compute rhs = parse_expr(r_bp).
- Update lhs = Bin(op, lhs, rhs).
Return lhs.

Left-associativity arises naturally from choosing asymmetric pairs such as (10, 11) for + and - and (20, 21) for * and /: when parse_expr is called recursively with min_bp = 11 and the next operator is another + with left_bp = 10, the condition 10 > 11 is false, so the second + is not consumed by the current level but handled by the outer one, yielding left-associativity with no explicit check. Were the condition >= instead of >, the operator would be absorbed by the current level, producing right-associative behaviour.

Worked example

Let us trace the same expression as before through the Pratt parser:

1 + 2 * 4 - (10 + 1 * 2)

The table below includes an OUT_DBG column: a conceptual debug band that records the order in which values and operators are finalised. This is not part of the actual algorithm; it is here so we can compare it with the Shunting Yard’s RPN output.

CTX   Token       Action                          lhs                      OUT_DBG
──────────────────────────────────────────────────────────────────────────────────────────────────
 0    1           initial → number               1                        1
 0    +           10 > 0 → consume               —                        1
11    2           initial → number               2                        1 2
11    *           20 > 11 → consume              —                        1 2
21    4           initial → number               4                        1 2 4
21    -           10 > 21 false → return         4                        1 2 4
11    (return)    combine 2 * 4                  (2*4)                    1 2 4 *
11    -           10 > 11 false → return         (2*4)                    1 2 4 *
 0    (return)    combine 1 + (2*4)              (1+(2*4))                1 2 4 * +
 0    -           10 > 0 → consume               —                        1 2 4 * +
11    (           initial → parse_expr(0)        —                        1 2 4 * +
 0    10          initial → number               10                       1 2 4 * + 10
 0    +           10 > 0 → consume               —                        1 2 4 * + 10
11    1           initial → number               1                        1 2 4 * + 10 1
11    *           20 > 11 → consume              —                        1 2 4 * + 10 1
21    2           initial → number               2                        1 2 4 * + 10 1 2
21    )           stop → return                  2                        1 2 4 * + 10 1 2
11    (return)    combine 1 * 2                  (1*2)                    1 2 4 * + 10 1 2 *
11    )           stop → return                  (1*2)                    1 2 4 * + 10 1 2 *
 0    (return)    combine 10 + (1*2)             (10+(1*2))               1 2 4 * + 10 1 2 * +
11    (return)    close (...), return to outer   (10+(1*2))               1 2 4 * + 10 1 2 * +
 0    end         return, combine lhs - rhs      ((1+(2*4))-(10+(1*2)))   1 2 4 * + 10 1 2 * + -

Final output

AST (the true Pratt output):

((1 + (2 * 4)) - (10 + (1 * 2)))

OUT_DBG (conceptual debug band):

1 2 4 * + 10 1 2 * + -

Notice anything? It is identical to the RPN produced by the Shunting Yard:

Algorithm            Real output   Debug output
────────────────────────────────────────────────────────────────────────
Shunting Yard        RPN           1 2 4 * + 10 1 2 * + -
Pratt                AST           ((1 + (2 * 4)) - (10 + (1 * 2)))
Pratt (debug band)   —             1 2 4 * + 10 1 2 * + -

Pratt (Top-Down Operator Precedence) conceptual machine. Operator precedence is encoded as binding power and enforced through recursion: instead of flushing a stack, the parser limits how far the recursive descent may expand. The result is a syntax tree where precedence is visible in the structure itself.

Pratt (Top-Down Operator Precedence) conceptual machine. Operator precedence is encoded as binding power and enforced through recursion: instead of flushing a stack, the parser limits how far the recursive descent may expand. The result is a syntax tree where precedence is visible in the structure itself. (Generated with AI tool by Christian Del Monte)

Closing the Circle

The parallel can now be seen in full. The Shunting Yard and the Pratt parser solve the same problem, giving an unambiguous order to operations in an expression, but they do so with profoundly different tools and philosophies. The Shunting Yard is an iterative machine: it scans tokens one by one, routes them between a stack and an output tape, and entrusts precedence to a numeric comparison that decides when to flush the stack. Its final product is a flat sequence in reverse Polish notation, free of parentheses and ready to be evaluated left to right. The Pratt parser is a recursive machine: each call to parse_expr carves out a fragment of the expression, and precedence governs not a stack but the depth to which the recursion is allowed to reach. Its final product is not a flat sequence but a syntax tree, in which the hierarchy of operations is already visible in the structure of the nodes.

In a sense, the Shunting Yard defers decisions, it holds operators back until their place is certain, while the Pratt parser anticipates them, building the correct structure as it goes. The explicit stack of the one and the recursion of the other are two different ways of keeping track of the same state: which operators are still pending and at what level of precedence they belong.

If one accepts this reading, the two algorithms are not competing alternatives but complementary viewpoints on the same problem, separated by a decade and reunited by the same insight: that precedence, when treated as data rather than as a grammatical rule, makes expression parsing far simpler than classical theory might suggest. Precedence is not merely a property of grammar, but a strategy for controlling evaluation structure.

This equivalence holds for fixed-precedence binary expressions, the domain explored here. Beyond it, the two designs begin to diverge in ergonomics rather than expressive power: Pratt parsing extends more naturally to prefix, postfix, and mixfix operators, while the Shunting Yard can support them only with additional machinery. The underlying intuition is shared; the engineering trade-offs are not.

References

Primary sources

Dijkstra, E. W. (1961). ALGOL 60 Translation. Mathematisch Centrum Report MR 35. PDF
Pratt, Vaughan R. (1973). Top Down Operator Precedence. Proceedings of the 1st ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL). ACM

Secondary sources

Crockford, Douglas (n.d.). Top Down Operator Precedence. Link
Kladov, Alex (matklad) (2020). Simple but Powerful Pratt Parsing. Link
Kladov, Alex (matklad) (2020). From Pratt to Dijkstra. Link
Nystrom, Robert (2011). Pratt Parsers: Expression Parsing Made Easy. Link
Nystrom, Robert (2021). Crafting Interpreters. Genever Benning.

The Shunting Yard#

The algorithm#

Worked example#

The Pratt Parser#

The algorithm#

Worked example#

Final output#

Closing the Circle#

References#

Primary sources#

Secondary sources#

The Shunting Yard

The algorithm

Worked example

The Pratt Parser

The algorithm

Worked example

Final output

Closing the Circle

References

Primary sources

Secondary sources