sparklearning
Scheduler
Stories about how Spark’s DAG Scheduler and Task Scheduler turn a job into running tasks.
Stories
From One Action to Many Tasks
— DAG Scheduler, stage boundaries, TaskScheduler, task assignment
Locality and Delay Scheduling
— data locality levels, delay scheduling, when Spark waits for a better executor
Scheduling Pools and Fair Sharing
— FIFO vs fair scheduler, pools, minimum share, weight-based ordering
Related stories
The Driver, the Executors, and How a Job Actually Runs
— the driver hosts the DAG and Task Schedulers described in these stories
How Spark Survives Failure
— what the DAG Scheduler does when a stage or task fails
Partitions: The Grain of Parallelism
— each partition becomes one task; partition count determines task count
Elastic Executors: How Dynamic Allocation Grows and Shrinks the Cluster
— the scheduler’s backlog of pending tasks triggers dynamic allocation scale-up