Skip to content

Mount SSD ephemeral disks for Spark

Imported from Confluence

Content may be outdated. Verify before following any procedures. View original | Last updated: March 2022

Add instance types with local SSD disks to VNG config

For example the following list needs to be added to both global cluster whitelist and to VNG:

    "c5ad.xlarge",
    "c5ad.2xlarge",
    "c5ad.4xlarge",
    "c5ad.8xlarge",
    "c5d.xlarge",
    "c5d.2xlarge",
    "c5d.4xlarge",
    "m5ad.xlarge",
    "m5ad.2xlarge",
    "m5ad.4xlarge",
    "m5ad.8xlarge",
    "m5d.xlarge",
    "m5d.2xlarge",
    "m5d.4xlarge",
    "m5d.8xlarge",
    "r5ad.xlarge",
    "r5ad.2xlarge",
    "r5ad.4xlarge",
    "r5ad.8xlarge",
    "r5d.xlarge",
    "r5d.2xlarge",
    "r5d.4xlarge",
    "r5d.8xlarge"

Format and mount local SSD disk on node bootstrap

The following code needs to be added to user-data:

yum install -y nvme-cli
EPHEMERAL_DISK=$(sudo nvme list | grep 'Amazon EC2 NVMe Instance Storage' | awk '{ print $1 }')
pvcreate $EPHEMERAL_DISK
vgcreate ephemeral $EPHEMERAL_DISK
lvcreate -n ephemeral -l 100%FREE ephemeral
mkfs.xfs /dev/ephemeral/ephemeral
mkdir /ephemeral
mount -t xfs -o defaults,noatime /dev/ephemeral/ephemeral /ephemeral

Mount ephemeral storage in Pod

Example:

spec:
  ...
  containers:
    ...
    volumeMounts:
      - name: ephemeral
        mountPath: /ephemeral
        subPathExpr: $(POD_NAME)
  volumes:
    - name: ephemeral
      hostPath:
        path: /ephemeral

Mount ephemeral storage in Spark

To use ephemeral storage as spark local storage in spark operator use the following config:

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
...
spec:
  ...
  sparkConf:
    ...
    spark.local.dir: "/ephemeral"
  volumes:
    ...
    - name: spark-local-dir-1
      hostPath:
        path: /ephemeral
  driver:
    volumeMounts:
      ...
      - name: spark-local-dir-1
        mountPath: /ephemeral
  executor:
    volumeMounts:
      ...
      - name: spark-local-dir-1
        mountPath: /ephemeral

Note

Name of the local volume and mount must start from spark-local-dir-