Druid external monitoring¶
Imported from Confluence
Content may be outdated. Verify before following any procedures. View original | Last updated: May 2022
Description¶
We decided to build a solution which monitors Druid as an external service without relying on those metrics provided by the app itself. Hence we've implemented a Kubernetes CronJob which runs a simple Python script querying the cluster and measuring the latency for a set of queries.
Realisation¶
As the description says we have a Python script, which do the following:
- throws queries to Druid and measures the response time;
- pushes those fetched values to Prometheus PushGateway application as druid_query_latency metric (can be easily changed via values.yaml).
To be able to run this script automatically we "wrapped" it into a Kubernetes CronJob resource - basically created an image containing the script and moved it under Helm's management.
Links and other usefull stuff¶
| description | link |
|---|---|
| PromQL getting druid_query_latency for FairBid Druid | Prometheus - Graph |
| Dedicated Grafana Dashboard for Druid query latency monitoring | Grafana - Druid Query Latency |
| Epic ticket | DEVOPSBLN-2296 |
| Helm chart | latest (Bitbucket) |
| Script | druid_query_latency.py (Github) |