Where Data System Abstractions Break: A Semiotic Reading

Where Data System Abstractions Break: A Semiotic Reading

Many of the most surprising performance pathologies in modern data systems are semiotic failures — structural divergences between what an interface signifies and what the underlying system does.

March 4, 2026 · 13 min
Delta Lake MERGE Is Not a Simple Upsert. What Actually Happens at Scale.

Delta Lake MERGE Is Not a Simple Upsert. What Actually Happens at Scale.

At 10 TB, updating 200k rows can mean rewriting thousands of files. Here’s why, and what to do about it.

March 2, 2026 · 15 min
Spark Is Not Lazy. Spark Compiles Dataflow.

Spark Is Not Just Lazy. Spark Compiles Dataflow.

Why calling Spark ’lazy’ is technically reductive, and how thinking of it as a dataflow compiler changes the way you design pipelines.

November 3, 2025 · 12 min
Fixing Skewed Nested Joins in Spark with Asymmetric Salting

Fixing Skewed Nested Joins in Spark with Asymmetric Salting

In large-scale Spark pipelines, skew can occur when a single key carries a disproportionately large nested payload. Asymmetric salting offers a targeted solution: explode, salt, join in parallel, and optionally re-aggregate.

December 1, 2025 · 17 min