Skip to content

Migrate or Recover Druid Cluster data (TBD)

Archived (pre-2022)

Preserved for reference only -- likely outdated. View original | Last updated: October 2021

How Druid deletes data

Speaking on Druid data deletion concept we can describe the process as two separate actions:

Logical Removal - a segment gets marked as unused ('not published' in Druid language), meaning it's still present in a Deep Storage, but not served by the cluster. This action can be done via Druid GUI, either manually or by automatic retention Rule.

Physical Removal - a normal (FULL!) deletion, when the segment is gone forever. Works only for Logically Removed segments, is done by so-called Kill Tasks, which are automated in the current setup, hence every 'unused' segment is automatically removed.

Info

From technical/backend perspective, logical removal simply puts "0" (tinyint which basically equals boolean "false") into the "used" column of "druid_segments" table in metadata storage database (covered below) for a specified segment id.

So what do we fight against

  • The most trivial scenario is a data corruption by the app itself, which obviously requires some kind of a backup;

  • Another thing we would like to cover is a "not intended by the actor" deletion of data. Let's imagine that someone (it might be actually you) accidentally marks a segment as 'disabled' in Druid GUI or even worse, removes a Data Source entry, which disables every single segment of it. In this scenario once any segment is marked unused it then gets deleted by the aformentioned Kill Task and in the blink of an eye you lost the data.

So how do we solve it? For any conventional databases an obvious solution would be to backup the whole thing and simply restore in case of data loss, which is slightly trickier with Druid.

Solution theory

Recovery-wise we are interested only in parts which are stateful and Druid has only two of those:

1. Deep Storage - where segmets are stored. We use S3 bucket for this purpose and treat the thing as veeeery reliable, which basically means that all we need is to be able to recover those segments removed from the bucket.

AWS provides a tool for that - Versioning. As versioning refers to keeping multiple variants of an object in the same bucket we can easily recover from unintended user actions (as well as application failures).

2. Metadata Storage - which stores various metadata about the system, in particular → references to 'deep stored' segments. MySQL managed by RDS in our case. And since it's RDS, we can simply create recovery points via AWS RDS Automated Backups.

Solution actual steps

  1. restore mysql
  2. restore latest version

DataSource segments → ${S3bucket}/segments/${DataSource}

image2021-10-15_17-15-1.png

Druid Storage Migartion Plan

TBD

Links

115004960053 Migrate Existing Druid Cluster To A New Imply Cluster