cdelmonte.dev

I write about data systems, distributed execution, and infrastructure boundaries.

This blog focuses on how modern platforms actually behave under load: Spark and Delta Lake internals, execution models, control planes, trust boundaries, and the architectural trade-offs hidden behind clean APIs.

Where Spark Changes Shape: UnsafeRow, the JVM, and the Relocation of Portability

Where Spark Changes Shape: UnsafeRow, the JVM, and the Relocation of Portability

Run df.explain on a Parquet scan and you find ColumnarToRow, an operator whose only job is to change the data’s shape. It is the seam where two architectural eras meet, and a record of how portability in analytical systems has relocated from the JVM runtime to the data format and the query plan.

May 26, 2026 · 18 min
The Hidden DSL in Catalyst

The Hidden DSL in Catalyst

How Spark’s internal rewriting framework, Catalyst, exposes an embedded DSL with a public extension surface, the same one Delta Lake and Iceberg use to plug into the optimizer pipeline.

April 26, 2026 · 19 min
delta-explain: Making Delta Lake Pruning Visible

delta-explain: Making Delta Lake Pruning Visible

Partition pruning and data skipping are invisible by default. delta-explain reads the Delta log directly and shows, step by step, how a WHERE predicate narrows down candidate files, with no engine required.

April 13, 2026 · 19 min
Anti-patterns in Catalyst rules

Anti-patterns in Catalyst rules

Six concrete anti-patterns I encountered building a real Catalyst extension: from the wrong rule type for throws, to mutable state under AQE and Spark Connect, to JVM bootstrap traps in PySpark.

April 26, 2026 · 16 min
The Disaggregation of the Lakehouse Stack

The Disaggregation of the Lakehouse Stack

How Delta Kernel, Arrow, and pluggable execution are disaggregating the lakehouse stack. The lakehouse stack is not converging on a new dominant engine — it is converging on a layered architecture in which protocol, data representation, and query execution are increasingly isolated behind stable interfaces.

March 8, 2026 · 15 min
Where Data System Abstractions Break: A Semiotic Reading

Where Data System Abstractions Break: A Semiotic Reading

Many of the most surprising performance pathologies in modern data systems are semiotic failures — structural divergences between what an interface signifies and what the underlying system does.

March 4, 2026 · 13 min