sparklearning

Adaptive Query Execution

Stories about how Spark rewrites query plans at runtime using real shuffle statistics.

Stories

AQE: How Spark Rewrites Plans After the Shuffle — coalescing shuffle partitions, join conversion, skew join handling, and dynamic partition pruning

How Spark Chooses a Join — the join strategies AQE can switch between at runtime
What Spark Knows About Your Data: Statistics and the Cost-Based Optimizer — how AQE complements static CBO with real runtime statistics
The Journey of a Shuffle Record — the shuffle mechanism AQE observes to make its decisions
When One Partition Holds Up Everyone: The Data Skew Story — the skew problem AQE’s skew join handler solves