Skip to content

FairBid data transfer between S3 and GCS

Imported from Confluence

Content may be outdated. Verify before following any procedures. View original | Last updated: June 2023

Description

FairBid project requires data replication between AWS and GCP object storage buckets (both ways) following a specific schedule. Taking into account the fact that cloud vendors does not provide any tool to move data out of there management we're forced to utilize two services simultaniously for that purpose. When we copy data FROM-GCP-TO-AWS we utilize AWS DataSync and for FROM-AWS-TO-GCP - GCP Storage Transfer.

Info

Worth to mention that in other projects we've already used another solution - 3rd party tool called flexify.io which is cloud agnostic, but does not provide any embedded scheduling out of the box. They did notify us though that scheduling can be setup somewhere on their side, but this way we won't have any transparent way to manage this automation, which should not be considered a production grade solution.

Data transfer from AWS (S3) to GCP (GCS)

As mentioned earlier GCP's Storage Transfer service is used.

Info

FROM: 003250186609 → fairbid-analytic://druid_lookups/*

TO: ss-shared-ent-data-prod → gcs-core-services-agp-fairbid-silver-regional-useast1-prod://druid_lookups/

The job is running under agp-fairbid-prod-7i project:

image-2023-6-20_14-55-19.png

Configuration-wise the only prerequisite on AWS side is a user with read/list access to a source bucket whose keys are used for the migration task setup (arn:aws:iam::003250186609:user/sa-fairbid-migration-tmp). On GCP side you need only the destination bucket to be created.

Data transfer from GCP (GCS) to AWS (S3)

Respectively for this line we used AWS DataSync.

Info

FROM: ss-shared-ent-data-prod → gcs-core-services-agp-fairbid-bronze-regional-useast1-prod://sdk-events/valid_events_creation_ts/*

TO: 003250186609 → fairbid-sdk-events://valid_events_creation_ts/

Configuration-wise from GCP side you need a service-account with HMAC key (ss-shared-ent-data-prod-fn → Cloud Storage → Settings → INTEROPERABILITY → Access keys for service accounts → CREATE A KEY).

Keep in mind that HMAC key can be provided only for service account created in the same GCP project.

On AWS side a DataSync agent was launched as a EC2 intance and onboarded using embedded instruments. Then a transfer task was created (RW IAM role added automatically by the service):

image-2023-6-20_15-24-17.png

DEVOPSBLN-3584

Migrating Google Cloud Storage To Amazon S3 Using Aws Datasync