Aerospike in AWS¶
Archived (pre-2022)
Preserved for reference only -- likely outdated. View original | Last updated: September 2018
General Overview¶
The current AES (Aerospike) cluster running in AWS is an extension of the AES cluster running in DC (TASK)
The AES cluster in AWS is running in one region: aws-production-aerospike-core → eu-west-1

Setup and Configuration¶
As mentioned in the previous section, the AES core cluster in AWS extends the cluster running in the on-premise data centre. The cluster uses the bundled service with Aerospike EE referenced as the "Cross Datacenter Replication", shortly as XDR. This feature allows a replication of each namespace to geographically diverse clusters in 0 downtime.
XDR setup would prevent cluster node failure by suspending the failed the remote cluster and resumes from the problematic cluster once available again. This feature is commonly used during service migration and between data centers with different cluster setup modes (mixed of mesh and multicast layouts).
To read more about XDR, refer to the official Aerospike documentation found here →
The AES configuration for AWS production environment must have the same namespace names and configuration as the on-premise as follows:
- counters.
- advanced_counters.
- netscores.
- cookies.
- fyber_test.
The following section will demonstrate how to enable XDR and start replication for a given namespace from on premise data center to AWS.
XDR - DC && AWS¶
- Make a backup in one of the namespaces in DC using the following command line with at least the next flags:
Where:
-
- asbackup: An AES built-in command line tool to backup namespaces in a given data center
- -h : The target AES instance containing the target namespace (Default localhost)
- -n : The name of the namespace
- -d : The specified directory where the backup files of the target namespace will be stored.
- Note: Use "nohup" or "screen" command line to prevent hangups or asbackup interruption. asbackup AES command line tool can take long time depending on the namespace size.
-
- Restore the namespace in one of the instances in AWS (make sure that it has a replication at least of 2) using the following command line:
$ nohup asrestore -h 10.37.120.25 -d /tmp/backup/
2018-09-14 14:31:08 GMT [INF] [ 5498] Expired 9801 : skipped 0 : inserted 978151 : failed 0 (existed 0, fresher 0)
2018-09-14 14:31:08 GMT [INF] [ 5498] 2% complete, ~20h5m32s remaining
2018-09-14 14:31:18 GMT [INF] [ 5498] 2 UDF file(s), 0 secondary index(es), 994495 record(s) (105 KiB/s, 652 rec/s, 166 B/rec, backed off: 0)
2018-09-14 14:31:18 GMT [INF] [ 5498] Expired 9907 : skipped 0 : inserted 984588 : failed 0 (existed 0, fresher 0)
2018-09-14 14:31:18 GMT [INF] [ 5498] 2% complete, ~20h36m34s remaining
2018-09-14 14:31:28 GMT [INF] [ 5498] 2 UDF file(s), 0 secondary index(es), 1000867 record(s) (103 KiB/s, 641 rec/s, 164 B/rec, backed off: 0)
Where:
-
- asrestore: An AES built-in command line tool to restore namespaces in the target data center (AWS in our example).
- -h : The target AES instance in which the namespace will be replicated. It is essential to note that is sufficient to provide one host of the cluster (by taking into consideration a replication factor more than 1). The specified host will act as an entry point to the target cluster (AWS in our example). Thus, the rest of AES nodes will be automatically discovered within replicated namespaces in all nodes.
- -d : The specified directory containing the backup files (generated by the asbackup command line in the previous step)
3. Enable XDR in the namespace configuration stanza in the Original DC instance node:
$ vim /etc/aerospike/aerospike.conf
...
namespace cookies {
replication-factor 2
memory-size 15G
default-ttl 2d
enable-xdr true
xdr-remote-datacenter AWS
storage-engine device {
device /dev/sdg1
scheduler-mode noop
write-block-size 1M
}
}
...
xdr {
enable-xdr true
xdr-digestlog-path /opt/aerospike/xdr/digestlog 5G
datacenter AWS {
dc-node-address-port 10.37.133.194 3000
dc-node-address-port 10.37.120.25 3000
}
}
...
In the namespace stanza, the following configuration lines are needed:
-
- enable-xdr : Enables XDR for the specific namespace
- xdr-remote-datacenter : Refers to the name of the target data center (as different namespace can target different datacenter )
Add new xdr stanza with the following basic configuration options:
-
- enable-xdr : Enables XDR for the specific namespace
- xdr-digestlog-path : The path of the log file that XDR needs to write to. Make sure to have right permissions of the file and directory path. Adjusting the size of the Digest Log should be enough to handle writing logged namespaces with larger size.
- datacenter : The name of the target data center described in the replicated namespace.
- dc-node-address-port : a sub-stanza of the datacenter option providing at least one entry of the target datacenter describing the IP address of the remote host and service port of the remote cluster (AWS in our case).
Important Note: Make sure to check the configuration update in each cluster node that would need a restart or not (static/dynamic) here → . The following web page resources gives more details of each parameter. Make sure to verify the AES version first. The previous configuration implies the restart of AES cluster (rolling restart) to reflect the XDR configuration because the
To read more about advanced XDR configuration for AES, please refer to the official AES website here → .
- Verify that XDR is working by checking the replication activity in the AES log file in the AES DC cluster as the following:
$ tail -f /var/log/aerospike/aerospike.log | grep xdr
...
Sep 17 2018 10:43:34 GMT: INFO (xdr): (xdr_dlog.c:92) dlog: free-pct 100 reclaimed 44600 glst 1537181013834 (2018-09-17 10:43:33 GMT)
Sep 17 2018 10:43:34 GMT: INFO (xdr): (xdr.c:610) [AWS]: dc-state CLUSTER_UP timelag-sec 0 lst 1537181013834 mlst 1537181013834 (2018-09-17 10:43:33 GMT) fnlst 0 (-) wslst 0 (-) shlat-ms 34 rsas-ms 0.000 rsas-pct 0.0 con 128 errcl 1185 errsrv 978 sz 2
...
Important Note: Rolling restart is only needed when adding XDR config for the first time.
XDR - Monitoring && Troubleshooting¶
- How to check XDR status in the target AWS cluster:
Admin> show stat xdr
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~XDR Statistics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE : ip-10-37-120-25.eu-west-1.compute.internal:3000 ip-10-37-133-194.eu-west-1.compute.internal:3000
dlog_free_pct : 100 100
dlog_logged : 0 0
dlog_overwritten_error : 0 0
dlog_processed_link_down : 0 0
dlog_processed_main : 0 0
dlog_processed_replica : 0 0
dlog_relogged : 0 0
dlog_used_objects : 0 0
xdr_active_failed_node_sessions : 0 0
xdr_active_link_down_sessions : 0 0
xdr_global_lastshiptime : 18446744073709551615 18446744073709551615
xdr_hotkey_fetch : 0 0
.....
- How to granularly check all metrics by namespace in the original cluster:
Admin> show stat AWS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~netscores Namespace Statistics~~~~~~~~~~~~~~~~~
NODE : 10.99.36.97:3000 10.99.36.98:3000 aes001.prd.fyber.com:3000
allow-nonxdr-writes : true true true
allow-xdr-writes : true true true
available_bin_names : 32760 32760 32760
batch_sub_proxy_complete : 0 0 0
batch_sub_proxy_error : 0 0 0
batch_sub_proxy_timeout : 0 0 0
batch_sub_read_error : 0 0 0
batch_sub_read_not_found : 0 0 0
batch_sub_read_success : 0 0 0
batch_sub_read_timeout : 0 0 0
batch_sub_tsvc_error : 0 0 0
batch_sub_tsvc_timeout : 0 0 0
client_delete_error : 0 0 0
client_delete_not_found : 0 0 0
client_delete_success : 0 0 0
client_delete_timeout : 0 0 0
client_lang_delete_success : 0 0 0
client_lang_error : 0 0 0
client_lang_read_success : 0 0 0
client_lang_write_success : 0 0 0
client_proxy_complete : 36 0 8
client_proxy_error : 0 0 0
client_proxy_timeout : 1 0 0
client_read_error : 0 0 0
client_read_not_found : 90529538 74713542 61851130
client_read_success : 821838850 747114969 726169201
client_read_timeout : 0 0 0
client_tsvc_error : 0 0 0
client_tsvc_timeout : 0 0 0
client_udf_complete : 0 0 0
client_udf_error : 0 0 0
client_udf_timeout : 0 0 0
client_write_error : 0 0 0
client_write_success : 18529414 18454711 17637322
client_write_timeout : 0 0 0
cold-start-evict-ttl : 4294967295 4294967295 4294967295
conflict-resolution-policy : generation generation generation
current_time : 274886012 274886012 274886012
....
- How to determine briefly the status of the remote DC (AWS) and new writes shipment details:
Admin> show stat dc
~~~~~~~~~~~~~~~~~~AWS DC Statistics~~~~~~~~~~~~~~~~~~~
NODE : aes001.prd.fyber.com:3000
dc_open_conn : 128
dc_ship_attempt : 313325210
dc_ship_bytes : 372698884639
dc_ship_delete_success : 0
dc_ship_destination_error: 978
dc_ship_idle_avg : 0.000
dc_ship_idle_avg_pct : 0.0
dc_ship_inflight_objects : 0
dc_ship_latency_avg : 43
dc_ship_source_error : 1185
dc_ship_success : 313323047
dc_size : 2
dc_state : CLUSTER_UP
dc_timelag : 0
- How to check which nodes having involved in the new writes shipments from the original cluster and success/error rates:
Admin> info xdr
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~XDR Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Node Build Data Free Lag Req Req Req Cur Avg
. . Shipped Dlog% (sec) Outstanding Shipped Shipped Throughput Latency
. . . . . . Success Errors . (ms)
aes001.prd.fyber.com:3000 3.13.0.8 347.224 GB 100 00:00:00 0.000 313.435 M 2.163 K 364 42
Number of rows: 1