sparklearning
Broadcast Variables & Accumulators
Stories about how Spark shares read-only data and collects metrics across distributed tasks.
Stories
Shared State in a Distributed Job
— broadcast variables, tree-based distribution, accumulators, and their limitations
Related stories
How Spark Chooses a Join
— broadcast hash join uses broadcast variables to distribute the small side
Bytes on the Wire: How Spark Serializes Data for Tasks and Shuffles
— how broadcast data is serialized before being sent to executors
The Driver, the Executors, and How a Job Actually Runs
— the driver/executor model that broadcast and accumulators sit on top of