sparklearning

Catalyst: Query Planning & Optimization

Stories about how Spark SQL turns a query into an optimized physical plan and executes it.

Stories

For a complete understanding of the Catalyst pipeline, read in this order:

  1. From SQL to a Running Plan — the big picture
  2. Expressions All the Way Down — the data model rules operate on
  3. Making Sense of Names: Analyzer Rules — phase 2: resolution
  4. The Optimizer’s Rulebook — phase 3: logical optimization
  5. From Logic to Execution: Physical Planning Rules — phase 4: physical planning
  6. EXPLAIN Yourself — reading the output of all four phases