Unpacking Parquet: Where Explicit SIMD Actually Matters
On the JVM, optimizing a hot kernel is not only about writing faster code: it is about controlling how much the result depends on the compiler recognizing the code’s shape. Using Parquet bit-unpacking as a concrete case, our experiment shows that a SIMD speedup depends on which scalar baseline C2 is handed, when explicit vectorization is actually justified, and why a more specialized scalar routine is not necessarily faster.