Kubernetes Archives

Canary rollouts to Kubernetes without additional tools

28/01/202531/03/2025
by Martin Ehrnst

I know you like my cheesy PowerPoint designer illustration

Whenever I search (google) for hot to do canary, A/B, progressive or whatever type of deployment to Kubernetes, the solutions are either Argo or Flagger. Both great tools. In fact I have used Flagger many times with my previous employer, and some day probably at my current as well. After all we already use Flux. But did you know you can do a controlled deployment to a subset of your users, with NGINX Ingress alone?

Progressive delivery

The concept of progressive delivery is already known to most, and you can find multiple definitions. But the whole point is to deploy your latest application version, and slowly shift users to the new version. Hopefully minimizing errors and happier customers. The default deployment strategy for Kubernetes is RollingUpdate and is kind of progressive in it self, but can only handle container errors, and not functional ones. You control your deployment rollout using maxUnavailable and maxSurge. The latter controls how many pods to scale out, and the first how many can be unavailable.

If your deployment have 3 replicas and you set maxSurge: 1 and maxUnavailable: 0 Kubernetes will create one additional pod before removing one existing. Keeping three pods available.

Canary with NGINX

Where I work, many of our applications are tightly coupled. Some because they need to, others because it just happened. While the older application stack is slowly being de-coupled by the introduction of newer services. The need to do a more controlled rollout is emerging. Switching K8S clusters is, in my opinion a bit drastic, but since our customers are in the more conservative end of the spectrum, we need to be in full control, and be able to move fast. And since only a subset of our applications are ready for a canary approach I do not see the need to introduce Flagger just yet.

Our setup is using Flux for GItOps, and NGINX ingress for the canary, and we control the traffic either through a header or a percentage of the traffic. A combination of header present and header value can also be used. In our case we know for example which country our customers comes from, so we can use nginx.ingress.kubernetes.io/canary-by-header: customer-region and nginx.ingress.kubernetes.io/canary-by-header-value norway to route all Norwegian customers to the latest version

With this our developers can monitor and validate new versions with minimal impact, and quickly revert (or promote) the canary by updating their deployment manifests.

Below is a simple setup with two deployments, two services, and two ingresses.

	apiVersion: apps/v1
	kind: Deployment
	metadata:
	name: echo-canary
	spec:
	replicas: 1
	selector:
	matchLabels:
	app: echo
	version: canary
	template:
	metadata:
	labels:
	app: echo
	version: canary
	spec:
	containers:
	– name: echo-container
	image: hashicorp/http-echo:latest
	args:
	– -listen=:80
	– –text="Hello, Canary"
	ports:
	– containerPort: 80

	—
	apiVersion: v1
	kind: Service
	metadata:
	name: echo-canary-service
	spec:
	selector:
	app: echo
	version: canary
	ports:
	– protocol: TCP
	port: 80
	targetPort: 80
	type: ClusterIP

view raw canary-deployment.yaml hosted with ❤ by GitHub

	apiVersion: apps/v1
	kind: Deployment
	metadata:
	name: echo
	spec:
	replicas: 3
	selector:
	matchLabels:
	app: echo
	template:
	metadata:
	labels:
	app: echo
	spec:
	containers:
	– name: echo-container
	image: hashicorp/http-echo:latest
	args:
	– -listen=:80
	– –text="Hello, World"
	ports:
	– containerPort: 80

	—
	apiVersion: v1
	kind: Service
	metadata:
	name: echo-service
	spec:
	selector:
	app: echo
	ports:
	– protocol: TCP
	port: 80
	targetPort: 80
	type: ClusterIP

view raw deployment.yaml hosted with ❤ by GitHub

	apiVersion: networking.k8s.io/v1
	kind: Ingress
	metadata:
	name: echo-ingress
	spec:
	rules:
	– host: echo.example.com
	http:
	paths:
	– path: /
	pathType: Prefix
	backend:
	service:
	name: echo-service
	port:
	number: 80

	—
	apiVersion: networking.k8s.io/v1
	kind: Ingress
	metadata:
	name: echo-canary-ingress
	annotations:
	nginx.ingress.kubernetes.io/canary: "true"
	nginx.ingress.kubernetes.io/canary-weight: "20" # percentage of traffic. You can also chose to run other stragies like cookie based, header based etc.
	spec:
	rules:
	– host: echo.example.com
	http:
	paths:
	– path: /
	pathType: Prefix
	backend:
	service:
	name: echo-canary-service
	port:
	number: 80

view raw ingresses.yaml hosted with ❤ by GitHub

Azure Monitor

Azure managed Prometheus for existing AKS clusters

04/11/202407/01/2025
by Martin Ehrnst

In the company I work for, we have multiple AKS clusters. They have different purpose, and are split between environments, like dev and prod. Nothing special at all. We recently decided to enable the clusters for Prometheus, and build our application dashboards using Grafana. A lot has changed since my last look at Azure Managed Prometheus. This time around I encountered a few challenges, below you’ll find a summary.

Our clusters are deployed using Terraform, and the monitoring “stack” with Bicep. Apart from a small difference in language, we also have decided that Prometheus and Grafana should exist in one region only, and we would only split the data source between dev and production environments. The Grafana instance is the same for both.

Enable monitoring

Azure portal button to enable managed prometheus.

The button above makes it pretty simple to enable Azure Managed Prometheus for that specific cluster – but since we want to do this using code, we need to modify our modules. And what exactly does this Configure button do? It creates a deployment which consist of a data collection rule, data collection endpoint, and a few Prometheus recording rules. During the process it also allows you to specify an existing managed Prometheus (Azure monitor metrics workspace) and managed Grafana.

Deployments created by the automatic onboarding to prometheus

The data collection rule, and association is similar to what we already have with Log Analytics and container insights. That would mean a quick change to our existing Terraform code, adding a new collection rule. I thought…

All my issues is explained in various Microsoft Doc’s and GitHub repositories. However, piecing everything together together took a bit of time.

With clusters in multiple regions. The data collection rule need to exist in the same location as the Azure monitor workspace (Prometheus). Unless you want the collection endpoint to also be in the same region. You will need to create two. One in the cluster region, and one in the monitor workspace region. I used this example as an inspiration, and this doc as a deployment reference guide.
The automatic onboarding process deploy 1:1 relationship of the recording rules for the clusters. I did not want to manage the recording rules together with our clusters. And ended up creating them along-side Prometheus. By only specifying the prometheusWorkspaceId in the scope, these rules are applied to all clusters sending data to the specific workspace. An example Bicep module here. You will also find them here, but without the UX rules.
We did not want to keep performance metrics sent to Log Analytics. If you don’t want that either. You’ll need to modify the data collection rule by specifying the streams you want. Specifically, remove Microsoft-Perf and Microsoft-InsightsMetrics

Canary rollouts to Kubernetes without additional tools

Progressive delivery

Canary with NGINX

Share this:

Azure managed Prometheus for existing AKS clusters

Enable monitoring

Share this: