This is the story of what happens between typing spark-submit and the first task actually executing on a worker node. This journey involves the Spark application packaging, driver launch, cluster resource negotiation, executor acquisition, and task scheduling—a chain of events spanning the client, the cluster manager, and the distributed worker nodes. Understanding this chain explains what each component of the infrastructure is responsible for, why some failures happen before any data is read, and how to diagnose the gap between “submit” and “running.”
Before tracing the journey, the cast of characters:
spark-submit: the command-line script that packages your application, resolves dependencies, and launches the driver (or submits to a cluster manager that will launch the driver).
Driver: the JVM process that hosts your SparkContext, runs your main program, and coordinates everything. It communicates with the cluster manager to acquire resources and with executors to schedule work.
Cluster manager: the resource scheduling layer—YARN ResourceManager, Kubernetes API Server, or Spark’s standalone master. It manages worker nodes and allocates containers/pods/workers to applications.
Executor: a JVM process on a worker node that runs tasks and stores cached data. Each executor is a container/pod/process allocated by the cluster manager to your application.
Think of spark-submit like a hiring agency submission. You submit your CV (the application JAR), describe what resources you need (cores, memory), and the agency (cluster manager) decides whether to place you in an available office (worker node). The cluster manager first assigns a manager role (driver), who then requests staff (executors). No work starts until staff are in their seats.
spark-submit is a shell script that:
--jars), and any files (--files) to the classpath.SPARK_HOME/conf/spark-defaults.conf, environment variables, and command-line arguments (--conf key=value). Command-line arguments take highest precedence.--deploy-mode client (driver runs on the machine where spark-submit was run) or --deploy-mode cluster (driver is launched on a worker node by the cluster manager).The --master argument determines which cluster manager to use: yarn, k8s://https://..., spark://host:7077 (standalone), or local[*] (local mode).
When the driver JVM starts and runs new SparkContext(conf) (or SparkSession.builder.getOrCreate()):
YarnClient, Kubernetes client, or standalone client, depending on the master URL.SchedulerBackend and TaskScheduler are created: the scheduler backend talks to the cluster manager to acquire executors; the task scheduler assigns tasks to available executor slots.The SchedulerBackend contacts the cluster manager to request executor resources. The request specifies:
spark.executor.memory: memory per executor (e.g., 4g)spark.executor.cores: CPU cores per executor (e.g., 4)spark.executor.instances: total number of executors (or min/max for dynamic allocation)spark.executor.memoryOverhead: off-heap overhead (JVM overhead, off-heap buffers)Requesting executors is like a project manager calling the HR department (cluster manager) to request office space and staff. “I need 10 desks, each with 4 computers and 8GB of RAM.” HR checks availability across the office building (worker nodes) and assigns desks where space is available. Until desks are assigned, the project manager waits.
For YARN: the SchedulerBackend requests containers from the ResourceManager. Each container satisfies executor.memory + memoryOverhead in RAM and executor.cores in virtual CPU. YARN’s scheduler places containers on NodeManagers (worker nodes) subject to queue policies and node capacity.
For Kubernetes: the SchedulerBackend creates Pod specs and submits them to the Kubernetes API Server. The K8s scheduler places pods on nodes subject to node selector, affinity, resource limits, and namespace quotas.
Once the cluster manager allocates resources:
CoarseGrainedExecutorBackend, which immediately contacts the driver’s RPC endpoint and registers itself: “I’m Executor 1, I have 4 cores and 4GB of memory.”As executors register, task slots become available. With 10 executors × 4 cores each, the driver has 40 task slots. Tasks are sent to executors as soon as slots are available.
Executor registration is like new employees showing up for their first day and checking in at the front desk. The project manager (driver) now knows how many people are ready to work. Until at least one person checks in, no work assignments can be sent out.
When an action is called (e.g., df.count()), the DAG Scheduler creates stages and the Task Scheduler assigns tasks to executor slots. For each task:
The complete time from spark-submit to first task running is:
spark-submit startup + JAR upload (a few seconds to tens of seconds)For large clusters (100+ executors), the first tasks may start running while other executors are still starting.
Client mode (--deploy-mode client): the driver runs on the machine where spark-submit was invoked. This is convenient for development (driver logs appear in the terminal) but means the driver is not managed by the cluster. If you lose the terminal or the submit machine restarts, the driver dies and the job fails.
Cluster mode (--deploy-mode cluster): the cluster manager launches the driver as a container/pod on a worker node. The driver is managed by the cluster—if the driver container fails, the cluster manager may restart it. For production jobs, cluster mode is preferred because the driver’s lifecycle is tied to the cluster, not to the user’s laptop.
Client mode is working from your laptop in the office (convenient, fragile). Cluster mode is working from a dedicated workstation locked in the server room (less convenient to interact with, more resilient). In cluster mode, you don’t get real-time console output on your machine—you view logs through the Spark UI or the cluster’s logging system.
Many failures happen before any tasks run:
--jars or spark.jars configuration.The journey from spark-submit to running tasks has five acts: package and launch (spark-submit assembles classpath and starts/submits the driver), driver initializes (SparkContext starts, scheduler components are created), executors are requested (SchedulerBackend asks the cluster manager for containers/pods with the specified cores and memory), executors start and register (worker JVMs come up and check in with the driver), and tasks are dispatched (the first action triggers DAG/Task Scheduler to send work to executor slots). Every step is a potential failure point—from authentication errors at submission to classpath errors at executor startup. Understanding the chain gives you the mental model to locate failures quickly: is the problem before the driver starts? Between driver and cluster manager? Between cluster manager and executors? Or after executors are up, in task execution itself?