The Hidden DSL in Catalyst

The Hidden DSL in Catalyst

How Spark’s internal rewriting framework — Catalyst — exposes an embedded DSL with a public extension surface, the same one Delta Lake and Iceberg use to plug into the optimizer pipeline.

April 26, 2026 · 18 min
Anti-patterns in Catalyst rules

Anti-patterns in Catalyst rules

Six concrete anti-patterns I encountered building a real Catalyst extension — from the wrong rule type for throws, to mutable state under AQE and Spark Connect, to JVM bootstrap traps in PySpark.

April 26, 2026 · 14 min