Skip to content
adatum
  • Home
  •  About adatum
  •  Learn Azure Bicep
  •  SCOM Web API
Azure Monitor

Azure managed Prometheus for existing AKS clusters

  • 04/11/202407/01/2025
  • by Martin Ehrnst

In the company I work for, we have multiple AKS clusters. They have different purpose, and are split between environments, like dev and prod. Nothing special at all. We recently decided to enable the clusters for Prometheus, and build our application dashboards using Grafana. A lot has changed since my last look at Azure Managed Prometheus. This time around I encountered a few challenges, below you’ll find a summary.

Our clusters are deployed using Terraform, and the monitoring “stack” with Bicep. Apart from a small difference in language, we also have decided that Prometheus and Grafana should exist in one region only, and we would only split the data source between dev and production environments. The Grafana instance is the same for both.

Enable monitoring

Azure portal button to enable managed prometheus.

The button above makes it pretty simple to enable Azure Managed Prometheus for that specific cluster – but since we want to do this using code, we need to modify our modules. And what exactly does this Configure button do? It creates a deployment which consist of a data collection rule, data collection endpoint, and a few Prometheus recording rules. During the process it also allows you to specify an existing managed Prometheus (Azure monitor metrics workspace) and managed Grafana.

Deployments created by the automatic onboarding to prometheus

The data collection rule, and association is similar to what we already have with Log Analytics and container insights. That would mean a quick change to our existing Terraform code, adding a new collection rule. I thought…

All my issues is explained in various Microsoft Doc’s and GitHub repositories. However, piecing everything together together took a bit of time.

  • With clusters in multiple regions. The data collection rule need to exist in the same location as the Azure monitor workspace (Prometheus). Unless you want the collection endpoint to also be in the same region. You will need to create two. One in the cluster region, and one in the monitor workspace region. I used this example as an inspiration, and this doc as a deployment reference guide.
  • The automatic onboarding process deploy 1:1 relationship of the recording rules for the clusters. I did not want to manage the recording rules together with our clusters. And ended up creating them along-side Prometheus. By only specifying the prometheusWorkspaceId in the scope, these rules are applied to all clusters sending data to the specific workspace. An example Bicep module here. You will also find them here, but without the UX rules.
  • We did not want to keep performance metrics sent to Log Analytics. If you don’t want that either. You’ll need to modify the data collection rule by specifying the streams you want. Specifically, remove Microsoft-Perf and Microsoft-InsightsMetrics
Portal experience with UX rules.

Share this:

  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on X (Opens in new window) X
  • Click to share on Reddit (Opens in new window) Reddit
Azure

Azure Monitor Managed Prometheus

  • 29/11/202207/01/2025
  • by Martin Ehrnst

Azure Monitor for Prometheus is released in preview in the fall of 2022. Now let’s just get our terminology right. The Prometheus offering is a part of Azure Monitor Metrics. One part of monitor is Logs, and the other one is Metrics. For some time now Prometheus scraping has been available, but the previous implementation utilized the log part of Azure Monitor. This new offering uses the metric store, which is a way better solution, and the latter is partially why I have an interest in checking it out.

During writing of this post. Azure Monitor Managed Prometheus was in preview. During Microsoft Build 2023. The service is generally available. See the Build Book of news

Why chose a managed Prometheus offering

Generally speaking, if there’s a managed offer, I am keen to see if it’s better than what you can do yourself. Currently, there are multiple managed Prometheus offerings out there, Grafana being one. And Microsoft just jumped in recently with managed Grafana, and now Azure Monitor Managed Prometheus.

In our environment, we introduced Prometheus before any large players had a solid offering, and for a long time, it did not have persistent storage. So if the service was restarted, metrics were lost. That’s not a good solution in the long run. After a while, we jumped both feet in and set up persistent storage using Thanos, backed by Azure storage for long-term retention. However, this comes with a cost. As with any other system, you host yourself, you need to maintain it, and sometimes things go south. Like when I had to clean up our storage. Since we are 100 percent in Azure, it makes total sense to play around with the newly managed Prometheus offering.

Adding managed Prometheus to an existing AKS cluster

To enable managed Prometheus you need an Azure Monitor workspace. Monitor workspace is like a log analytics workspace, but for the Azure monitor metrics store. This is confusing to many (myself included), and I hope Microsoft cleans up the terminology once things are released. I am not going to dive into how you can start from scratch, or how you can create any of this using Azure Bicep. Stanislav Zhelyazkov already went through this in an excellent post with a lot of good comments. Instead, let’s take a look at how Prometheus can be enabled on an existing cluster, and how you can tune your setup.

The backdrop here is that we already had a Prometheus installation in our cluster, and we also had it connected to Log Analytics using Azure Monitor for Containers. I did not want to interfere with any of the existing stuff, so I manually added the preview extension. This will deploy monitor-metrics pods in your kube-system namespace, and a default scraping configuration.

You can now port-forward to any of the ama-metrics pods and see the configuration and targets. If you are familiar with Prometheus, you would probably expect to find the basic, built-in dashboard solution. But that’s not available. This installation is running in agent mode, using the remote write (or Microsoft’s own abbreviation of it) feature to ship the time series to Azure Monitor. This means Grafana is your simplest solution to query any data.

Tuning the default Prometheus configuration

cAdvisor metrics are good to have. However, the default configuration does not scrape any of our custom application metrics. For that, you need to enable it by writing a Prometheus scraping config.

Many existing Prometheus setups utilize either a Service Monitor or a Pod Monitor. However these types of CRDs is not supported.

Microsoft has provided us with documentation on how to create a prometheus scrape config so if you know your way around Prometheus this is familiar stuff. However, there are some quirks. The files you create is merged with the default config, and if you do not follow the documentation point-by-point you are bound to create something that’s not working.

There are two config maps to be created. One to enable or disable “system” scraping and one for creating your custom scrape configuration. The documentation state you should do a kubectl create […] –from-file prometheus-config and that the name of the file is very important. The name is indeed very important if you follow Microsoft docs. But if you suddenly find your way to the GitHub repository with the example files, a keen eye will see these are Kubernetes manifest-files.

If you (like i did) try to kubectl create any of these files with that very special name you’ll end up with a broken YAML file as it will try to create the manifest for you.

After some support from a Microsoft representative, I understood what they where trying to explain, so I expect the documentation to change pretty soon.

Anyway, instead of following the documentation. I modified the K8S manifest files and did a kubectl apply which will create the manifest just how you wrote it (and how the github examples look). Below you can see a custom scraping configuration. This example will scrape all pods with annotation prometheus.io/scrape: true.

kind: ConfigMap
apiVersion: v1
data:  
  prometheus-config: |-
    global:
      scrape_interval: 30s
      scrape_timeout: 10s
    scrape_configs:
    - job_name: testjob
      honor_labels: true
      scrape_interval: 30s
      scheme: http
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - target_label: pod_name
        source_labels: [__meta_kubernetes_pod_name]

  debug-mode: |-
    enabled = true
metadata:
  name: ama-metrics-prometheus-config
  namespace: kube-system

We got metrics!

During Christmas 2022 Microsoft released a fix for the content length issue mentioned below. I Have confirmed the fix in our clusters and everything works as expected

Grafana prometheus graph

In my Azure hosted Grafana I am now able to query and display our custom metrics. As well as the default cAdvisor data. However, after some time I noticed we had some drops in the collected metrics. I tried to find the same drops in our other Prometheus/Thanos setup but there everything looked normal. Checking the logs gave me the answer.

Managed Prometheus content length issue

"ContentLengthLimitExceeded\",\"Message\":\"Maximum allowed content length: 1048576 bytes (1 MB). Provided content length: 25849524

We are producing too large time series, and data is not shipped to Azure Monitor. Due to this we also have pretty high memory usage for the ama-metrics pod, which I am able to check using our other Prometheus instance.

Azure Monitor Prometheus pricing

During the preview, the service is free of charge. However, the pricing is publicly available.

FeaturePrice
Metrics ingestion (preview)$0.16 / 10 million samples ingested
Metrics queries (preview)10$0.10 / 1 billion samples processed
Azure monitor Prometheus’ pricing

USD $0.16 per 10 million metric samples is good compared to the other options out there. But you need to calculate in the queries as well. I suspect in larger environments, dashboards running auto-refresh, etc. this will be a significant part of the total bill. Not to mention that you also need some kind of graphical interface to query your metrics. You could host your own Grafana, or check out Azure Managed Grafana. And remember to calculate the total cost.
Grafana cloud is another option. They offer Prometheus and log ingestion on a per-user basis in their pro plan.

I have yet to sit down and compare pricing between options and will make sure to update or create a new post when we have conducted that work. At first glance, Azure seems favorable, but we will find out once the numbers are crunched.

Summary.

Due to the previous limitation in Azure managed Prometheus we managed to tune our existing scraping configuration. By removing unused time series and labels, we reduced the memory consumption by 1GB. Unfortunately, this tuning was not enough to get under 1MB.
Since a fix was released in December 2022 I will continue my testing with our pre-production cluster. I am really keen to replace the current Thanos setup, and Azure monitor Prometheus looks like a promising option.

Share this:

  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on X (Opens in new window) X
  • Click to share on Reddit (Opens in new window) Reddit
Azure

Jumpstart your Azure Monitor journey

  • 08/04/202007/01/2025
  • by Martin Ehrnst

For the past decade, monitoring has been my main responsibility. I have had my hands on many of the enterprise monitoring systems out there, but System Center Operations Manager (SCOM) is where most of my working hours were spent. Now, I spend my time in Azure and since monitoring is relevant in public cloud as well. Azure Monitor is now my primary tool for my applications (and servers).

I know that starting off with an entirely new monitoring platform can be challenging, at best. Instead of figuring out all bits and pieces by yourself, I will introduce you to the key features of Azure Monitor, such as visualization and alerting. I will also briefly touch on the more advanced capabilities like custom log injection using Azure Monitors REST API.

After reading this you should have the basic knowledge on how to monitor your applications and servers using Azure Monitor. Details related to the various topics can be found in the official Azure Monitor documentation

Azure Monitor Martin Ehrnst

Azure Monitor Logs

Logs in Azure Monitor is backed by a Log Analytics workspace. To fully utilize Azure Monitor, a Log Analytics workspace is mandatory.

With Logs, you can extend your Azure Activity Log retention, collect and analyze Server Event Logs (both built-in and custom logs are supported). Azure Monitor Logs or Log Analytics is Microsoft equivalent to for example Splunk.

To perform analysis and query data, you use a languageĀ called KQL.

Guest blog for Nigel Frank

This is a piece written for Nigel Frank Internationals Azure Blog. Click here to continue reading this post on how to jumpstart your Azure Monitor jurney

Share this:

  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on X (Opens in new window) X
  • Click to share on Reddit (Opens in new window) Reddit

Posts pagination

1 2 3

Popular blog posts

  • Azure Application registrations, Enterprise Apps, and managed identities
  • Azure token from a custom app registration
  • Migrate from Azure DevOps to GitHub - what you need to know
  • Remediate Azure Policy with PowerShell
  • Creating Azure AD Application using Powershell

Categories

Automation Azure Azure Active Directory Azure Bicep Azure DevOps Azure Functions Azure Lighthouse Azure Logic Apps Azure Monitor Azure Policy Community Conferences CSP Monitoring DevOps GitHub Guest blogs Infrastructure As Code Kubernetes Microsoft CSP MPAuthoring OMS Operations Manager Podcast Powershell Uncategorised Windows Admin Center Windows Server

Follow Martin Ehrnst

  • X
  • LinkedIn

RSS feed RSS - Posts

RSS feed RSS - Comments

Microsoft Azure MVP

Martin Ehrnst Microsoft Azure MVP
Adatum.no use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it. Cookie Policy
Theme by Colorlib Powered by WordPress