Mount SSD ephemeral disks for Spark¶
Imported from Confluence
Content may be outdated. Verify before following any procedures. View original | Last updated: March 2022
Add instance types with local SSD disks to VNG config¶
For example the following list needs to be added to both global cluster whitelist and to VNG:
"c5ad.xlarge",
"c5ad.2xlarge",
"c5ad.4xlarge",
"c5ad.8xlarge",
"c5d.xlarge",
"c5d.2xlarge",
"c5d.4xlarge",
"m5ad.xlarge",
"m5ad.2xlarge",
"m5ad.4xlarge",
"m5ad.8xlarge",
"m5d.xlarge",
"m5d.2xlarge",
"m5d.4xlarge",
"m5d.8xlarge",
"r5ad.xlarge",
"r5ad.2xlarge",
"r5ad.4xlarge",
"r5ad.8xlarge",
"r5d.xlarge",
"r5d.2xlarge",
"r5d.4xlarge",
"r5d.8xlarge"
Format and mount local SSD disk on node bootstrap¶
The following code needs to be added to user-data:
yum install -y nvme-cli
EPHEMERAL_DISK=$(sudo nvme list | grep 'Amazon EC2 NVMe Instance Storage' | awk '{ print $1 }')
pvcreate $EPHEMERAL_DISK
vgcreate ephemeral $EPHEMERAL_DISK
lvcreate -n ephemeral -l 100%FREE ephemeral
mkfs.xfs /dev/ephemeral/ephemeral
mkdir /ephemeral
mount -t xfs -o defaults,noatime /dev/ephemeral/ephemeral /ephemeral
Mount ephemeral storage in Pod¶
Example:
spec:
...
containers:
...
volumeMounts:
- name: ephemeral
mountPath: /ephemeral
subPathExpr: $(POD_NAME)
volumes:
- name: ephemeral
hostPath:
path: /ephemeral
Mount ephemeral storage in Spark¶
To use ephemeral storage as spark local storage in spark operator use the following config:
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
...
spec:
...
sparkConf:
...
spark.local.dir: "/ephemeral"
volumes:
...
- name: spark-local-dir-1
hostPath:
path: /ephemeral
driver:
volumeMounts:
...
- name: spark-local-dir-1
mountPath: /ephemeral
executor:
volumeMounts:
...
- name: spark-local-dir-1
mountPath: /ephemeral
Note
Name of the local volume and mount must start from spark-local-dir-