Canary¶
Archived (pre-2022)
Preserved for reference only -- likely outdated. View original | Last updated: December 2019
For Canary deployment, we are using Kubernetes Operator Flagger.
Flagger is a Kubernetes operator that automates the promotion of canary deployments using NGINX (optionally Istio, Linkerd, App Mesh or Gloo) routing for traffic shifting and Prometheus metrics for canary analysis. The canary analysis can be extended with webhooks for running system integration/acceptance tests, load tests, or any other custom validation.
Flagger takes a Kubernetes deployment and optionally a horizontal pod autoscaler (HPA) and creates a series of objects (Kubernetes deployments, ClusterIP services, virtual service, traffic split or ingress) to drive the canary analysis and promotion.
Flagger implements a control loop that gradually shifts traffic to the canary while measuring key performance indicators like HTTP requests success rate, requests average duration and pods health. Based on the analysis of the KPIs a canary is promoted or aborted, and the analysis result is published to Slack.

Flagger overview diagram

Install¶
Flagger requires a Kubernetes cluster v1.11 or newer and NGINX ingress 0.24 or newer. Both our production and staging clusters satisfying these prerequisites, for details check this link.
Flagger as a Kubernetes Operator is deployed via helmfile and related Chart can be found - v0.20.0 (Bitbucket)
helmfile --interactive --environment production_eu_west_1 --file helmfile.yaml apply
# from the bln-k8s-common-helm/helm/config/flagger
To generate traffic during canary analysis we create flagger-loadtester service with appropriate helm Chart - v0.9.0 (Bitbucket)
helmfile --interactive --environment production_eu_west_1 --file helmfile.yaml apply
# from the bln-k8s-common-helm/helm/config/flagger_loadtester
Configure¶
To configure Canary workflow for your project you will simply need to deploy next Kubernetes workloads:
- HorizontalPodAutoscaler to which deployment should be scaled up. For example acp-edge-ui hpa:
# Source: acp-edge-ui/templates/hpa.yaml
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: acp-edge-ui-main
namespace: acp-edge-ui
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: acp-edge-ui-main-acp-edge-ui
minReplicas: 2
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 99
# Source: acp-edge-ui/templates/canary.yaml
apiVersion: flagger.app/v1alpha3
kind: Canary
metadata:
name: acp-edge-ui-main
namespace: acp-edge-ui
spec:
provider: nginx
targetRef:
apiVersion: apps/v1
kind: Deployment
name: acp-edge-ui-main-acp-edge-ui
ingressRef:
apiVersion: extensions/v1beta1
kind: Ingress
name: acp-edge-ui-main-acp-edge-ui
autoscalerRef:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
name: acp-edge-ui-main
progressDeadlineSeconds: 300
Analysis of canary deployment and traffic routing is performed through Prometheus metrics, logic is described in this block:
metrics:
- name: request-success-rate
threshold: 99
interval: 2m
webhooks:
- name: acceptance-test
type: pre-rollout
url: http://flagger-loadtester.kube-system/
timeout: 30s
metadata:
type: bash
cmd: "curl -I -L http://acp-edge-ui-main-acp-edge-ui.acp-edge-ui:3000"
- name: load-test
url: http://flagger-loadtester.kube-system/
timeout: 10s
metadata:
type: cmd
cmd: "hey -z 1m -q 10 -c 2 http://acp-edge.fyber.com/"
- name: promotion approve
type: confirm-promotion
url: http://flagger-loadtester.kube-system/gate/check
There are two stages of testing: acceptance and load tests.
Canary → Jenkins¶
Flagger Canary deployment could be promoted manually or automatically. By default automation promotion is enabled by approve flag in Jenkins job pipeline and canary deployment will be automatically promoted to production if all automation tests passed.

For manual approval of a canary deployment, you can use the confirm-rollout and confirm-promotion webhooks. The confirmation rollout hooks are executed before the pre-rollout hooks. Flagger will halt the canary traffic shifting and analysis until the confirm webhook returns HTTP status 200.
- name: promotion approve
type: confirm-promotion
url: http://flagger-loadtester.kube-system/gate/check
Once this flag is disabled, Flagger will process deployment in the same way as usual until the promotion step, in such case Flagger will hold Canary deployment with 5% of traffic until manual approve is send (can be done via Jenkins)

The logic of automation/manual approve is configured through Jenkins shared libraries (method canaryProcessAction) - canaryProcessAction.groovy (Bitbucket)
Slack¶
After successful canary promotion notification will be sent to slack.
