AI prompts
base on Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes. # Kubeflow Spark Operator
[](https://github.com/kubeflow/spark-operator/releases)
[](https://goreportcard.com/report/github.com/kubeflow/spark-operator)
[](https://github.com/kubeflow/spark-operator/actions/workflows/integration.yaml)
[](https://www.kubeflow.org/docs/about/community/#kubeflow-slack-channels)
[](https://deepwiki.com/kubeflow/spark-operator)
[](https://www.bestpractices.dev/projects/10524)
[](https://opensource.org/licenses/Apache-2.0)
[](https://app.fossa.com/projects/git%2Bgithub.com%2Fkubeflow%2Fspark-operator?ref=badge_shield&issueType=license)
## What is Spark Operator?
The Kubernetes Operator for Apache Spark aims to make specifying and running [Spark](https://github.com/apache/spark) applications as easy and idiomatic as running other workloads on Kubernetes. It uses
[Kubernetes custom resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) for specifying, running, and surfacing status of Spark applications.
## Quick Start
For a more detailed guide, please refer to the [Getting Started guide](https://www.kubeflow.org/docs/components/spark-operator/getting-started/).
```bash
# Add the Helm repository
helm repo add --force-update spark-operator https://kubeflow.github.io/spark-operator
# Install the operator into the spark-operator namespace and wait for deployments to be ready
helm install spark-operator spark-operator/spark-operator \
--namespace spark-operator \
--create-namespace \
--wait
# Create an example application in the default namespace
kubectl apply -f https://raw.githubusercontent.com/kubeflow/spark-operator/refs/heads/master/examples/spark-pi.yaml
# Get the status of the application
kubectl get sparkapp spark-pi
# Delete the application
kubectl delete sparkapp spark-pi
```
## Overview
For a complete reference of the custom resource definitions, please refer to the [API Definition](docs/api-docs.md). For details on its design, please refer to the [Architecture](https://www.kubeflow.org/docs/components/spark-operator/overview/#architecture). It requires Spark 2.3 and above that supports Kubernetes as a native scheduler backend.
The Kubernetes Operator for Apache Spark currently supports the following list of features:
- Supports Spark 2.3 and up.
- Enables declarative application specification and management of applications through custom resources.
- Automatically runs `spark-submit` on behalf of users for each `SparkApplication` eligible for submission.
- Provides native [cron](https://en.wikipedia.org/wiki/Cron) support for running scheduled applications.
- Supports customization of Spark pods beyond what Spark natively is able to do through the mutating admission webhook, e.g., mounting ConfigMaps and volumes, and setting pod affinity/anti-affinity.
- Supports automatic application re-submission for updated `SparkApplication` objects with updated specification.
- Supports automatic application restart with a configurable restart policy.
- Supports automatic retries of failed submissions with optional linear back-off.
- Supports collecting and exporting application-level metrics and driver/executor metrics to Prometheus.
## Project Status
**Project status:** *beta*
**Current API version:** *`v1beta2`*
**If you are currently using the `v1beta1` version of the APIs in your manifests, please update them to use the `v1beta2` version by changing `apiVersion: "sparkoperator.k8s.io/<version>"` to `apiVersion: "sparkoperator.k8s.io/v1beta2"`. You will also need to delete the `previous` version of the CustomResourceDefinitions named `sparkapplications.sparkoperator.k8s.io` and `scheduledsparkapplications.sparkoperator.k8s.io`, and replace them with the `v1beta2` version either by installing the latest version of the operator or by running `kubectl create -f config/crd/bases`.**
## Prerequisites
- [Kubernetes](https://kubernetes.io) >= 1.16
- [Helm](https://helm.sh) >= 3
- [kubectl](https://kubernetes.io/docs/reference/kubectl) >= 1.16
## Getting Started
For getting started with Spark operator, please refer to [Getting Started](https://www.kubeflow.org/docs/components/spark-operator/getting-started/).
## User Guide
For detailed user guide and API documentation, please refer to [User Guide](https://www.kubeflow.org/docs/components/spark-operator/user-guide/) and [API Specification](docs/api-docs.md).
If you plan to run Spark workloads with Spark operator on [Alibaba Cloud Container Service for Kubernetes (ACK)](https://www.alibabacloud.com/product/kubernetes), also refer to the [ACK guide](https://www.alibabacloud.com/help/ack/ack-managed-and-ack-dedicated/use-cases/run-apache-spark-workloads-on-ack) for optimized configuration practices.
If you are running Spark operator on Google Kubernetes Engine (GKE) and want to use Google Cloud Storage (GCS) and/or BigQuery for reading/writing data, also refer to the [GCP guide](https://www.kubeflow.org/docs/components/spark-operator/user-guide/gcp/).
## Blog Posts
- [[2025-03] Announcing the Kubeflow Spark Operator Benchmarking Results](https://blog.kubeflow.org/operators/benchmarking/performance/2025/03/15/kubeflow-spark-operator-benchmarks.html)
- [[2024-04] Announcing the Kubeflow Spark Operator: Building a Stronger Spark on Kubernetes Community](https://blog.kubeflow.org/operators/2024/04/15/kubeflow-spark-operator.html)
## Version Matrix
The following table lists the most recent few versions of the operator.
| Operator Version | API Version | Kubernetes Version | Base Spark Version |
| --------------------- | ----------- | ------------------ | ------------------ |
| `v2.3.x` | `v1beta2` | 1.16+ | `4.0.0` |
| `v2.2.x` | `v1beta2` | 1.16+ | `3.5.5` |
| `v2.1.x` | `v1beta2` | 1.16+ | `3.5.3` |
| `v2.0.x` | `v1beta2` | 1.16+ | `3.5.2` |
| `v1beta2-1.6.x-3.5.0` | `v1beta2` | 1.16+ | `3.5.0` |
| `v1beta2-1.5.x-3.5.0` | `v1beta2` | 1.16+ | `3.5.0` |
| `v1beta2-1.4.x-3.5.0` | `v1beta2` | 1.16+ | `3.5.0` |
| `v1beta2-1.3.x-3.1.1` | `v1beta2` | 1.16+ | `3.1.1` |
| `v1beta2-1.2.3-3.1.1` | `v1beta2` | 1.13+ | `3.1.1` |
| `v1beta2-1.2.2-3.0.0` | `v1beta2` | 1.13+ | `3.0.0` |
| `v1beta2-1.2.1-3.0.0` | `v1beta2` | 1.13+ | `3.0.0` |
| `v1beta2-1.2.0-3.0.0` | `v1beta2` | 1.13+ | `3.0.0` |
| `v1beta2-1.1.x-2.4.5` | `v1beta2` | 1.13+ | `2.4.5` |
| `v1beta2-1.0.x-2.4.4` | `v1beta2` | 1.13+ | `2.4.4` |
## Developer Guide
For developing with Spark Operator, please refer to [Developer Guide](https://www.kubeflow.org/docs/components/spark-operator/developer-guide/).
## Contributor Guide
For contributing to Spark Operator, please refer to [Contributor Guide](CONTRIBUTING.md).
## Community
- Join the [CNCF Slack Channel](https://www.kubeflow.org/docs/about/community/#kubeflow-slack-channels) and then join `#kubeflow-spark-on-kubernetes` Channel.
- Check out our blog post [Announcing the Kubeflow Spark Operator: Building a Stronger Spark on Kubernetes Community](https://blog.kubeflow.org/operators/2024/04/15/kubeflow-spark-operator.html).
- Join our monthly community meeting [Kubeflow Spark Operator Meeting Notes](https://bit.ly/3VGzP4n).
## Adopters
Check out [adopters of Spark Operator](ADOPTERS.md).
", Assign "at most 3 tags" to the expected json: {"id":"10046","tags":[]} "only from the tags list I provide: []" returns me the "expected json"