Skip to content

Aerospike Backup and Restore

Imported from Confluence

Content may be outdated. Verify before following any procedures. View original | Last updated: October 2024

This article will explain how to create and restore Aerospike backup.

Create and restore PROD cluster

  1. Adjust and run Terragrunt code
    Change the name of the cluster in terragrunt code:
name          = "vm-aerospike-growth-userdata-prod-useast1-02"

Change the name of SA

running_service              = "aerospike-userdata-02"

Check or add your current ssh-key

Info

To create a new cluster in the same project pls copy all folder prod/aerospike-userdata and rename to aerospike-userdata2

Review and change other parameters if needed.
Run terragrunt plan and terragrunt apply after review.

➜  growth-iac git:(master) cd terraform/configs/prod/aerospike-userdata/us-east1
➜  us-east1 git:(master) ls
inventory.yml  terragrunt.hcl
➜  us-east1 git:(master) terragrunt plan --terragrunt-source-update
...
➜  us-east1 git:(master) terragrunt apply
  1. Update the terragrunt Cloud DNS code for new cluster IPs or create a new set of records for each node.
    Change the name of the cluster from 01 to 02 for example:
          records = [
            dependency.aerospike-userdata.outputs.instances_ip.vm-aerospike-growth-userdata-prod-useast1-02-0,
          ]...

Info

If you create a new set of records, change the field host for each node as well!!!

If you need metrics scraping Prometheus configuration also needs to be updated. 3. Modify and run Ansible code
Change the name of the cluster in the Ansible vars file or create and source a new vars file.
Update inventory file with new DNS records.
Use the created by terraform new ansible inventory.yml file and run Production playbook

Info

Install tar of gnu compatibility on macos: brew install gnu-tar

ansible-playbook -v -i ../terraform/configs/prod/aerospike-userdata/us-east1/inventory.yml \
playbook-vm-aerospike-growth-userdata-prod-useast1-01.yml --private-key /Users/andriyshamray/.ssh/google_compute_engine
  1. Get lists of disk Snapshots

Use gcloudcommand to generate a full list of snapshots sorted by source disk and grep timestamp(specify latest available date):

gcloud compute snapshots list --project agp-growth-prod-d1 --filter="sourceDisk~vm-aerospike-growth-userdata-prod-useast1-01" --sort-by=SRC_DISK | grep 2024060919581 | awk '{print $1}'

or list latest one:

gcloud compute snapshots list --project agp-growth-prod-d1 --filter="sourceDisk~vm-aerospike-growth-userdata-prod-useast1-01" --sort-by=SRC_DISK,~creationTimestamp|uniq -f2|cut -d ' ' -f1
  1. Add snapshots and run Terragrunt
    Shutdown VMs for newly created cluster from Google console or using gcloud
gcloud compute instances stop VM_NAME \
    --project=PROJECT_ID \
    --zone=ZONE

Update terragrunt.hcl file with 2 new variables snapshot_names_enabled and snapshot_names:

snapshot_names_enabled = true
...attached_disks_parameters = [
         {
          ...
          snapshot_names = [
                 "vm-aerospike-bkp-gr-us-east1-b-20240609195810-jifl2tfu",
                 "vm-aerospike-bkp-gr-us-east1-b-20240609195810-5w4822r1",
                  ...
                 "vm-aerospike-bkp-gr-us-east1-d-20240609195810-gd8l5171"
            ]
}
]

Run terragrunt apply two times! The first run will create disks and 2nd will attach each disks to the needed node. 6. Updated Firewall rule to allow new cluster internal code communication - firewall 7. Start a cluster and validate its health.
Start VMs from console or using gcloud cli:

gcloud compute instances start VM_NAME \
    --project=PROJECT_ID \
    --zone=ZONE

Connect via ssh to one of VM and validate cluster startup. Could take ~30 min for service up, depending on cluster size.

gcloud compute ssh --zone "us-east1-b" "vm-aerospike-growth-userdata-prod-useast1-01-0" --tunnel-through-iap --project "agp-growth-prod-d1"

To check status and connect to the cluster run :

$sudo systemctl start aerospike
$systemctl status aerospike
$asinfo
$sudo asadm

To check cluster health:

info
info namespace
info set

You should see Rx/Tx value not 0 as the cluster was restored from backup. All data synchronization could take 2 days but clients can already connect to the cluster.

Official docs

Live Cluster Mode Guide

Troubleshoot

Full restore new cluster in DEV.

  1. Create new cluster using Terrafom code
    For POC new terraform folder was created by path:
    /growth-iac/terraform/configs/dev/aerospike-userdata-bkpBefore running pls review and adjust terragrunt.hcl file. As for now, it will create a 3-node cluster.Also, new DNS records were created for *aerospike-bkp-userdataGitlab MR: appgrowthplatform (Gitlab)*
  2. Run ansible-playbook to set up a clean cluster
    For Dev a new ansible playbook file was created playbook-vm-aerospike-bkp-growth-userdata-dev-useast1-01.yml

* Dry-run debug mode
ansible-playbook -v -i ../terraform/configs/dev/aerospike-userdata-bkp/us-east1/inventory.yml \ playbook-vm-aerospike-bkp-growth-userdata-dev-useast1-01.yml --check --diff --private-key /Users/andriyshamray/.ssh/google_compute_engine
* Apply mode
ansible-playbook -v -i ../terraform/configs/dev/aerospike-userdata-bkp/us-east1/inventory.yml \ playbook-vm-aerospike-bkp-growth-userdata-dev-useast1-01.yml --private-key /Users/andriyshamray/.ssh/google_compute_engine

  1. Run gcloud script command to create snapshots and attach new disks.
#list disk name
for i in $(gcloud compute instances list --filter="name~vm-aerospike-growth-userdata-prod-useast1-*" \
--project agp-growth-prod-d1 --format="value(disks[].deviceName)" | tr ";" " /n")
do
  echo $i | grep shadow
done  

##Output example
vm-aerospike-growth-userdata-prod-useast1-01-0-shadow-0
vm-aerospike-growth-userdata-prod-useast1-01-0-shadow-1
vm-aerospike-growth-userdata-prod-useast1-01-0-shadow-2
vm-aerospike-growth-userdata-prod-useast1-01-0-shadow-3
vm-aerospike-growth-userdata-prod-useast1-01-0-shadow-4
vm-aerospike-growth-userdata-prod-useast1-01-0-shadow-5
vm-aerospike-growth-userdata-prod-useast1-01-0-shadow-6
vm-aerospike-growth-userdata-prod-useast1-01-0-shadow-7

#create snapshot
for i{0..7}
do 
  gcloud compute snapshots create vm-aerospike-growth-userdata-prod-useast1-01-1-shadow-"$i"snapshot \
  --source-disk https://www.googleapis.com/compute/v1/projects/agp-growth-prod-d1/zones/us-east1-c/disks/vm-aerospike-growth-userdata-prod-useast1-01-1-shadow-$i \
  --project agp-growth-dev-fm
done

#create disk from snapshot
#!/bin/bash
n='0'
z='us-east1-b'
for i in {0..7}
do
gcloud compute disks create "vm-aerospike-bkp-growth-userdata-dev-useast1-01-$n-shadow-$i" \
    --zone=$z \
    --source-snapshot=vm-aerospike-growth-userdata-prod-useast1-01-"$n"-shadow-"$i"snapshot \
    --project=agp-growth-dev-fm
done

#attach disk
n='0'
z='us-east1-b'
for i in {0..7}
do
gcloud compute instances attach-disk "vm-aerospike-bkp-growth-userdata-dev-useast1-01-$n" \
  --disk "vm-aerospike-bkp-growth-userdata-dev-useast1-01-$n-shadow-$i" \
  --device-name="vm-aerospike-bkp-growth-userdata-dev-useast1-01-$n-shadow-$i" \
  --project=agp-growth-dev-fm \
  --zone=$z
done

On-demand backup

1. Backup creation

To have on-demand backup we can adjust snapshot-policy schedule or create snapshots using gcloud command line.
Terraform location for snapshot-policy :

appgrowthplatform (Gitlab)

2. Restore backup
As for now the only one tested way to restore backup is creating and mapping disks using gcloud cli. 
In the future, we will work on doing this via terraform code as well.

#list snapshots
gcloud compute snapshots list --project $project --filter="sourceDisk~vm-aerospike-bkp-growth-userdata-dev-useast1-01" --sort-by=SRC_DISK | grep 20240602195811

Here is an example of a bash script that can be used for the full process.

project='agp-growth-dev-fm'
clustername='vm-aerospike-bkp-growth-userdata-dev-useast1-01'
DATE='20240530195810'
for i in {0..7}
do
  for n in {0..2}
  do
  gcloud compute snapshots list --project $project | grep \
  $clustername-$n-shadow-$i | grep $DATE | awk '{print $1}' | read foo 
  gcloud compute disks create $clustername-$n-shadow-new-$i \
  --source-snapshot=$foo --project=$project | read disk
  gcloud compute instances attach-disk $clustername-$n --disk \
  $disk --device-name=$disk --project=$project
  done
done