The Hidden DSL in Catalyst
How Spark’s internal rewriting framework — Catalyst — exposes an embedded DSL with a public extension surface, the same one Delta Lake and Iceberg use to plug into the optimizer pipeline.
How Spark’s internal rewriting framework — Catalyst — exposes an embedded DSL with a public extension surface, the same one Delta Lake and Iceberg use to plug into the optimizer pipeline.
Six concrete anti-patterns I encountered building a real Catalyst extension — from the wrong rule type for throws, to mutable state under AQE and Spark Connect, to JVM bootstrap traps in PySpark.