GKE Backup POC¶
Imported from Confluence
Content may be outdated. Verify before following any procedures. View original | Last updated: September 2024
Disaster Recovery for GKE Workloads¶
Overview¶
Disaster Recovery (DR) is essential for maintaining the availability and resilience of stateful workloads in Google Kubernetes Engine (GKE). We use GKE Backup as our DR tool to schedule and manage backups for stateful workloads, including persistent volume data. In the event of an outage for critical stateful applications, the last successful backup can be easily restored to any cluster within any Google Cloud Platform (GCP) project.
Backup Plan (test MR)¶
For the Proof of Concept (POC), we selected the direct-core-dev cluster. A backup plan named gke-backup-core-dev-cluster has been created for all Elasticsearch-related namespaces (eck-operator, eck-elasticsearch). This backup is scheduled to run daily at 10:50 AM.
- Backup Plan: View GKE Backup Plan
- Last Successful Backup: View Last Successful Backup
Restore Plan¶
To restore the GKE backup to any cluster in any project, a restore plan must be created in the target project. Below is an example of restoring the backup to the direct-infra-dev / direct-core-dev cluster.
Create Restore Plan for All Namespaces or single or selected namespace¶
#### Restore plan to restore backup in different cluster (infra)
gcloud beta container backup-restore restore-plans create test-restore-plan \
--location=us-east1 \
--backup-plan=projects/agp-direct-dev-45/locations/us-east1/backupPlans/gke-backup-core-dev-cluster \
--cluster=projects/agp-direct-dev-45/locations/us-east1/clusters/gke-infra-direct-dev-useast1 \
--namespaced-resource-restore-mode=fail-on-conflict \
--all-namespaces \
--cluster-resource-conflict-policy=use-existing-version \
--cluster-resource-scope-all-group-kinds \
--volume-data-restore-policy=restore-volume-data-from-backup \
--project=agp-direct-dev-45
#### Restore plan to restore backup in same cluster (core)
gcloud beta container backup-restore restore-plans create test-plan-core \
--location=us-east1 \
--backup-plan=projects/agp-direct-dev-45/locations/us-east1/backupPlans/gke-backup-core-dev-cluster \
--cluster=projects/agp-direct-dev-45/locations/us-east1/clusters/gke-core-direct-dev-useast1 \
--namespaced-resource-restore-mode=fail-on-conflict \
--selected-namespaces=eck-operator \
--cluster-resource-conflict-policy=use-existing-version \
--cluster-resource-scope-all-group-kinds \
--volume-data-restore-policy=restore-volume-data-from-backup \
--project=agp-direct-dev-45
Restore plan https://console.cloud.google.com/kubernetes/backups/restorePlans?project=agp-direct-dev-45
Restore from backup using restore Plan¶
If the backup needs to use specific nodepools to run the pods in target cluster , then first we need to create the nodepool before run restore backup command.
### Restore in infra cluster
gcloud beta container backup-restore restores create test-restore-stack \
--project=agp-direct-dev-45 \
--location=us-east1 \
--restore-plan=test-restore-plan \
--backup=projects/agp-direct-dev-45/locations/us-east1/backupPlans/gke-backup-core-dev-cluster/backups/sched-2024-0829-0850
### Restore in core cluster
gcloud beta container backup-restore restores create eck-operator-restore \
--project=agp-direct-dev-45 \
--location=us-east1 \
--restore-plan=test-plan-core \
--backup=projects/agp-direct-dev-45/locations/us-east1/backupPlans/gke-backup-core-dev-cluster/backups/sched-2024-0829-0850
Restore list https://console.cloud.google.com/kubernetes/backups/restores?project=agp-direct-dev-45
After running the above commands, the backup will be restored to the direct-infra-dev (eck-operator and eck-elasticsearch) and direct-core-dev (eck-operator) cluster. Verify that all resources, including persistent volume data, have been restored correctly.
Verification¶
After restoration, ensure that all Elastic Cloud on Kubernetes (ECK) stacks, including persistent volumes, are successfully restored.

Additional Notes¶
- Selective Namespace Restore: You can create a restore plan for specific namespaces by using the
--selected-namespacesflag and specifying the desired namespaces. - Backup Schedule: The backup runs daily to ensure that recent changes to workloads and persistent volumes are captured regularly.