Example of spark job on Kubernetes with Spark Operator¶
Archived (pre-2022)
Preserved for reference only -- likely outdated. View original | Last updated: May 2020
- Create fileĀ SparkPI.yaml
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-pi-s3logs-1
namespace: spike-streaming
spec:
type: Scala
mode: cluster
image: "767648288756.dkr.ecr.eu-west-1.amazonaws.com/bln-spark-k8s:spark2.4.5-scala2.12-hadoop2.8.5-v1"
imagePullPolicy: Always
mainClass: org.apache.spark.examples.SparkPi
sparkConf:
"spark.eventLog.enabled": "true"
"spark.eventLog.dir": "s3a://k8s-spark-operator-eu-west-1/logs"
mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-2.4.5.jar"
sparkVersion: "2.4.5"
restartPolicy:
type: Never
driver:
envFrom:
- secretRef:
name: spike-aws-creds
tolerations:
- key: spark
operator: Exists
effect: NoSchedule
- key: spark
operator: Exists
effect: NoExecute
nodeSelector:
spark: test
cores: 1
memory: "512m"
labels:
version: 2.4.5
serviceAccount: spark
executor:
envFrom:
- secretRef:
name: spike-aws-creds
terminationGracePeriodSeconds: 60
tolerations:
- key: spark
operator: Exists
effect: NoSchedule
- key: spark
operator: Exists
effect: NoExecute
nodeSelector:
spark: test
cores: 1
instances: 1
labels:
version: 2.4.5
serviceAccount: spark
- Run on kubernetes
- Monitor containers
> kubectl get po
NAME READY STATUS RESTARTS AGE
spark-history-server-8476d667-j4r4s 1/1 Running 0 174m
spark-pi-driver 0/1 Pending 0 31s
sparkoperator-f7b57cd86-2hst7 1/1 Running 0 5d23h
- Check logs
- Check history server
Spark History Server