Druid external monitoring¶

Imported from Confluence

Content may be outdated. Verify before following any procedures. View original | Last updated: May 2022

Description¶

We decided to build a solution which monitors Druid as an external service without relying on those metrics provided by the app itself. Hence we've implemented a Kubernetes CronJob which runs a simple Python script querying the cluster and measuring the latency for a set of queries.

Realisation¶

As the description says we have a Python script, which do the following:

throws queries to Druid and measures the response time;
pushes those fetched values to Prometheus PushGateway application as druid_query_latency metric (can be easily changed via values.yaml).

To be able to run this script automatically we "wrapped" it into a Kubernetes CronJob resource - basically created an image containing the script and moved it under Helm's management.

Links and other usefull stuff¶

description	link
PromQL getting druid_query_latency for FairBid Druid	Prometheus - Graph
Dedicated Grafana Dashboard for Druid query latency monitoring	Grafana - Druid Query Latency
Epic ticket	DEVOPSBLN-2296
Helm chart	latest (Bitbucket)
Script	druid_query_latency.py (Github)