Running workloads on different instance type with fallback¶

Imported from Confluence

Content may be outdated. Verify before following any procedures. View original | Last updated: November 2023

We would like to run workload on instance types which are suitable and available for us on GCP spot market for the best prise/performance ratio. So we would like to prioritize some instance type over others with possibilty to fallback on On Demand in case lack of resources.

Below is example of configuration which can be used to achieve this:

Deployed node pools (test-app-n2d, test-app-c2) for two different instance types (c2-standard-4, n2d-standard-4) with taints, labels as below

Screenshot 2023-11-03 at 17.09.39.png

Deploy application (example yaml)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: devops-test-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: devops-test
  template:
    metadata:
      labels:
        app: devops-test
    spec:
      tolerations:
      - effect: NoSchedule
        key: noschedule
        operator: Equal
        value: test-app
      - effect: NoExecute
        key: noexecute
        operator: Equal
        value: test-app
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: service
                    operator: In
                    values:
                      - test-app
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 1
              preference:
                matchExpressions:
                  - key: node.kubernetes.io/instance-type
                    operator: In
                    values:
                      - c2-standard-4
            - weight: 2
              preference:
                matchExpressions:
                  - key: node.kubernetes.io/instance-type
                    operator: In
                    values:
                      - n2d-standard-4
      containers:
      - name: base
        image: ubuntu:latest
        command: ["/bin/sh"]
        args: ["-c", "while true; do echo $(date -u); sleep 5; done"]
        resources:
          requests:
            memory: "10Gi"
            cpu: "1"

Check where application was deployed. Should be n2d-standard-4, because weight 2 is higher.
Update test-app-n2d node pool Maximum number of all nodes parameter to 1 and scale up deployment to 3 pods.
New pods should be placed to test-app-c2, due to the lack of instances in test-app-n2d.