Skip to content
adatum
  • Home
  •  About adatum
  •  Learn Azure Bicep
  •  SCOM Web API
Azure

Canary rollouts to Kubernetes without additional tools

  • 28/01/202531/03/2025
  • by Martin Ehrnst
I know you like my cheesy PowerPoint designer illustration

Whenever I search (google) for hot to do canary, A/B, progressive or whatever type of deployment to Kubernetes, the solutions are either Argo or Flagger. Both great tools. In fact I have used Flagger many times with my previous employer, and some day probably at my current as well. After all we already use Flux. But did you know you can do a controlled deployment to a subset of your users, with NGINX Ingress alone?

Progressive delivery

The concept of progressive delivery is already known to most, and you can find multiple definitions. But the whole point is to deploy your latest application version, and slowly shift users to the new version. Hopefully minimizing errors and happier customers. The default deployment strategy for Kubernetes is RollingUpdate and is kind of progressive in it self, but can only handle container errors, and not functional ones. You control your deployment rollout using maxUnavailable and maxSurge. The latter controls how many pods to scale out, and the first how many can be unavailable.

If your deployment have 3 replicas and you set maxSurge: 1 and maxUnavailable: 0 Kubernetes will create one additional pod before removing one existing. Keeping three pods available.

Canary with NGINX

Where I work, many of our applications are tightly coupled. Some because they need to, others because it just happened. While the older application stack is slowly being de-coupled by the introduction of newer services. The need to do a more controlled rollout is emerging. Switching K8S clusters is, in my opinion a bit drastic, but since our customers are in the more conservative end of the spectrum, we need to be in full control, and be able to move fast. And since only a subset of our applications are ready for a canary approach I do not see the need to introduce Flagger just yet.

Our setup is using Flux for GItOps, and NGINX ingress for the canary, and we control the traffic either through a header or a percentage of the traffic. A combination of header present and header value can also be used. In our case we know for example which country our customers comes from, so we can use nginx.ingress.kubernetes.io/canary-by-header: customer-region and nginx.ingress.kubernetes.io/canary-by-header-value norway to route all Norwegian customers to the latest version

With this our developers can monitor and validate new versions with minimal impact, and quickly revert (or promote) the canary by updating their deployment manifests.

Below is a simple setup with two deployments, two services, and two ingresses.

apiVersion: apps/v1
kind: Deployment
metadata:
name: echo-canary
spec:
replicas: 1
selector:
matchLabels:
app: echo
version: canary
template:
metadata:
labels:
app: echo
version: canary
spec:
containers:
– name: echo-container
image: hashicorp/http-echo:latest
args:
– -listen=:80
– –text="Hello, Canary"
ports:
– containerPort: 80
—
apiVersion: v1
kind: Service
metadata:
name: echo-canary-service
spec:
selector:
app: echo
version: canary
ports:
– protocol: TCP
port: 80
targetPort: 80
type: ClusterIP
view raw canary-deployment.yaml hosted with ❤ by GitHub
apiVersion: apps/v1
kind: Deployment
metadata:
name: echo
spec:
replicas: 3
selector:
matchLabels:
app: echo
template:
metadata:
labels:
app: echo
spec:
containers:
– name: echo-container
image: hashicorp/http-echo:latest
args:
– -listen=:80
– –text="Hello, World"
ports:
– containerPort: 80
—
apiVersion: v1
kind: Service
metadata:
name: echo-service
spec:
selector:
app: echo
ports:
– protocol: TCP
port: 80
targetPort: 80
type: ClusterIP
view raw deployment.yaml hosted with ❤ by GitHub
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: echo-ingress
spec:
rules:
– host: echo.example.com
http:
paths:
– path: /
pathType: Prefix
backend:
service:
name: echo-service
port:
number: 80
—
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: echo-canary-ingress
annotations:
nginx.ingress.kubernetes.io/canary: "true"
nginx.ingress.kubernetes.io/canary-weight: "20" # percentage of traffic. You can also chose to run other stragies like cookie based, header based etc.
spec:
rules:
– host: echo.example.com
http:
paths:
– path: /
pathType: Prefix
backend:
service:
name: echo-canary-service
port:
number: 80
view raw ingresses.yaml hosted with ❤ by GitHub

Share this:

  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on X (Opens in new window) X
  • Click to share on Reddit (Opens in new window) Reddit
Azure Monitor

Azure managed Prometheus for existing AKS clusters

  • 04/11/202407/01/2025
  • by Martin Ehrnst

In the company I work for, we have multiple AKS clusters. They have different purpose, and are split between environments, like dev and prod. Nothing special at all. We recently decided to enable the clusters for Prometheus, and build our application dashboards using Grafana. A lot has changed since my last look at Azure Managed Prometheus. This time around I encountered a few challenges, below you’ll find a summary.

Our clusters are deployed using Terraform, and the monitoring “stack” with Bicep. Apart from a small difference in language, we also have decided that Prometheus and Grafana should exist in one region only, and we would only split the data source between dev and production environments. The Grafana instance is the same for both.

Enable monitoring

Azure portal button to enable managed prometheus.

The button above makes it pretty simple to enable Azure Managed Prometheus for that specific cluster – but since we want to do this using code, we need to modify our modules. And what exactly does this Configure button do? It creates a deployment which consist of a data collection rule, data collection endpoint, and a few Prometheus recording rules. During the process it also allows you to specify an existing managed Prometheus (Azure monitor metrics workspace) and managed Grafana.

Deployments created by the automatic onboarding to prometheus

The data collection rule, and association is similar to what we already have with Log Analytics and container insights. That would mean a quick change to our existing Terraform code, adding a new collection rule. I thought…

All my issues is explained in various Microsoft Doc’s and GitHub repositories. However, piecing everything together together took a bit of time.

  • With clusters in multiple regions. The data collection rule need to exist in the same location as the Azure monitor workspace (Prometheus). Unless you want the collection endpoint to also be in the same region. You will need to create two. One in the cluster region, and one in the monitor workspace region. I used this example as an inspiration, and this doc as a deployment reference guide.
  • The automatic onboarding process deploy 1:1 relationship of the recording rules for the clusters. I did not want to manage the recording rules together with our clusters. And ended up creating them along-side Prometheus. By only specifying the prometheusWorkspaceId in the scope, these rules are applied to all clusters sending data to the specific workspace. An example Bicep module here. You will also find them here, but without the UX rules.
  • We did not want to keep performance metrics sent to Log Analytics. If you don’t want that either. You’ll need to modify the data collection rule by specifying the streams you want. Specifically, remove Microsoft-Perf and Microsoft-InsightsMetrics
Portal experience with UX rules.

Share this:

  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on X (Opens in new window) X
  • Click to share on Reddit (Opens in new window) Reddit

Popular blog posts

  • SCOM Alerts to Microsoft Teams and Mattermost
  • How to move Azure blobs up the path
  • Windows Admin Center with SquaredUp/SCOM
  • SCOM Task to restart Agent - am i complicating it?
  • Azure Application registrations, Enterprise Apps, and managed identities

Categories

Automation Azure Azure Active Directory Azure Bicep Azure DevOps Azure Functions Azure Lighthouse Azure Logic Apps Azure Monitor Azure Policy Community Conferences CSP Monitoring DevOps GitHub Guest blogs Infrastructure As Code Kubernetes Microsoft CSP MPAuthoring OMS Operations Manager Podcast Powershell Uncategorised Windows Admin Center Windows Server

Follow Martin Ehrnst

  • X
  • LinkedIn

RSS feed RSS - Posts

RSS feed RSS - Comments

Microsoft Azure MVP

Martin Ehrnst Microsoft Azure MVP
Adatum.no use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it. Cookie Policy
Theme by Colorlib Powered by WordPress