Skip to content

GKE Backup POC

Imported from Confluence

Content may be outdated. Verify before following any procedures. View original | Last updated: September 2024

Disaster Recovery for GKE Workloads

Overview

Disaster Recovery (DR) is essential for maintaining the availability and resilience of stateful workloads in Google Kubernetes Engine (GKE). We use GKE Backup as our DR tool to schedule and manage backups for stateful workloads, including persistent volume data. In the event of an outage for critical stateful applications, the last successful backup can be easily restored to any cluster within any Google Cloud Platform (GCP) project.

Backup Plan (test MR)

For the Proof of Concept (POC), we selected the direct-core-dev cluster. A backup plan named gke-backup-core-dev-cluster has been created for all Elasticsearch-related namespaces (eck-operator, eck-elasticsearch). This backup is scheduled to run daily at 10:50 AM.

Restore Plan

To restore the GKE backup to any cluster in any project, a restore plan must be created in the target project. Below is an example of restoring the backup to the direct-infra-dev / direct-core-dev cluster.

Create Restore Plan for All Namespaces or single  or selected namespace

#### Restore plan to restore backup in different cluster (infra)  

gcloud beta container backup-restore restore-plans create test-restore-plan \
--location=us-east1 \
--backup-plan=projects/agp-direct-dev-45/locations/us-east1/backupPlans/gke-backup-core-dev-cluster  \
--cluster=projects/agp-direct-dev-45/locations/us-east1/clusters/gke-infra-direct-dev-useast1 \
--namespaced-resource-restore-mode=fail-on-conflict \
--all-namespaces \
--cluster-resource-conflict-policy=use-existing-version \
--cluster-resource-scope-all-group-kinds \
--volume-data-restore-policy=restore-volume-data-from-backup \
--project=agp-direct-dev-45  

#### Restore plan to restore backup in same cluster (core)   

gcloud beta container backup-restore restore-plans create test-plan-core \
--location=us-east1 \
--backup-plan=projects/agp-direct-dev-45/locations/us-east1/backupPlans/gke-backup-core-dev-cluster  \
--cluster=projects/agp-direct-dev-45/locations/us-east1/clusters/gke-core-direct-dev-useast1 \
--namespaced-resource-restore-mode=fail-on-conflict \
--selected-namespaces=eck-operator \
--cluster-resource-conflict-policy=use-existing-version \
--cluster-resource-scope-all-group-kinds \
--volume-data-restore-policy=restore-volume-data-from-backup \
--project=agp-direct-dev-45  

Restore plan https://console.cloud.google.com/kubernetes/backups/restorePlans?project=agp-direct-dev-45

Restore from backup using restore Plan

If the backup needs to use specific nodepools to run the pods in target cluster , then first we need to create the nodepool before run restore backup command.

### Restore in infra cluster  
gcloud beta container backup-restore restores create test-restore-stack \
--project=agp-direct-dev-45 \
--location=us-east1 \
--restore-plan=test-restore-plan \ 
--backup=projects/agp-direct-dev-45/locations/us-east1/backupPlans/gke-backup-core-dev-cluster/backups/sched-2024-0829-0850  

### Restore in core  cluster  
gcloud beta container backup-restore restores create eck-operator-restore \
--project=agp-direct-dev-45 \
--location=us-east1 \
--restore-plan=test-plan-core \ 
--backup=projects/agp-direct-dev-45/locations/us-east1/backupPlans/gke-backup-core-dev-cluster/backups/sched-2024-0829-0850  

Restore list https://console.cloud.google.com/kubernetes/backups/restores?project=agp-direct-dev-45

After running the above commands, the backup will be restored to the direct-infra-dev (eck-operator and eck-elasticsearch)  and direct-core-dev (eck-operator) cluster. Verify that all resources, including persistent volume data, have been restored correctly.

Verification

After restoration, ensure that all Elastic Cloud on Kubernetes (ECK) stacks, including persistent volumes, are successfully restored.

Screenshot 2024-08-30 at 10.59.51 AM.png

Additional Notes

  • Selective Namespace Restore: You can create a restore plan for specific namespaces by using the --selected-namespaces flag and specifying the desired namespaces.
  • Backup Schedule: The backup runs daily to ensure that recent changes to workloads and persistent volumes are captured regularly.