Skip to content

Druid external monitoring

Imported from Confluence

Content may be outdated. Verify before following any procedures. View original | Last updated: May 2022

Description

We decided to build a solution which monitors Druid as an external service without relying on those metrics provided by the app itself. Hence we've implemented a Kubernetes CronJob which runs a simple Python script querying the cluster and measuring the latency for a set of queries.

Realisation

As the description says we have a Python script, which do the following:

  1. throws queries to Druid and measures the response time;
  2. pushes those fetched values to Prometheus PushGateway application as druid_query_latency metric (can be easily changed via values.yaml).

To be able to run this script automatically we "wrapped" it into a Kubernetes CronJob resource - basically created an image containing the script and moved it under Helm's management.

description link
PromQL getting druid_query_latency for FairBid Druid Prometheus - Graph
Dedicated Grafana Dashboard for Druid query latency monitoring Grafana - Druid Query Latency
Epic ticket DEVOPSBLN-2296
Helm chart latest (Bitbucket)
Script druid_query_latency.py (Github)