Metric alerts for Azure monitor logs

A common thing for traditional companies is to have one team responsible for monitoring. A few years ago, this team where close friends with the team provisioning infrastructure. Now, more and more companies are shifting to the “DevOps” world. Even Microsoft have killed SCOM and are only using Azure Monitor. Meaning that the one deployed the code (and the infrastructure) should be responsible for monitoring.  In essence, this is great. But this transition takes time, and one should not underestimate the knowledge of the team who have been responsible for monitoring your entire infrastructure for decades.

If you are familiar with SCOM, you know that rules and monitors is targeted against a class of objects. IE, Windows 2016 operating system. When we move our workloads to Azure, we want to use Azure Monitor to monitor our workloads and VMs.

Enter Log alerts

Log Alerts has been around for quite some time and is commonly used to alert on actual log data. IE custom application logs, Windows event log and so on. But Log alerts has a “hidden” feature, especially for your monitoring teams, not wanting to manage hundreds of duplicate rules.

By using Log alerts with metric measurements you can almost replicate the what discoveries in SCOM does- find resources of a specific type, and attach some kind of monitoring to them. For example, you can create a search query for all your IaaS VMs and alert on their CPU counter.

This will let your monitoring team recreate all their logic, and have control over the entire infrastructure, almost as they had on-permises. At the same time you can leverage more DevOps practices and at the end have every team responsible for their own work.

Kusto examlpe

Below is a simple example that will list all VMs and their processor time. You can create an alert straight from Azure Monitor logs (former Log Analytics) or start from a new alert.

Perf | where ObjectName == "Processor" and CounterName == "% Processor Time" | summarize AggregatedValue = avg(CounterValue) by bin(TimeGenerated, 5m), Computer

Summary

You have the option to monitor multiple VMs using one Alert Rule in Azure Monitor already. But one limitation is that this solution will not add new VMs to the alert rule. And for the time being, it only supports virtual machines
Log alerts are dependent on your query. So as long as your data is available, you can alert on it. Whether it is a web app, a SQL server or a custom log.

With Log Alerts, the transition to a public cloud-based infrastructure might be easier. Your operations teams can use their knowledge and re-create their on-premises monitoring logic as searches.
Application alerts could still be handled by the developers, and you can provision those using ARM templates or similar.

PS: I was going to write a longer post on how to manage and programmatically create log alerts, but with these great examples in Microsoft docs, there’s no need to re-invent the wheel.

Azure monitoring, connecting the dots

Azure Monitoring

Welcome to the continuing saga on how to monitor your customers Azure tenants being a service provider. Previously we have covered how to authenticate against Microsoft CSP, using Azure Resource Health API with Powershell and more.

This post is all about connecting the dots. We are far away from finished, but things are moving in this project and at the time of writing, we have two separate projects going.
The first one  is focused on creating a single pane of glass for all our customers’ workflows. This involves custom coding and management pack development for SCOM. The second one, which this post will cover, is how we have designed each customer tenant and how we plan to use built-in Azure monitoring functionality.

 

Customer tenant setup

Working for a service provider we need to construct Azure tenants by taking in to account that we are going to manage cloud resources, so using many cloud features makes a lot of sense. The challenge ist that we always have to think about how we can integrate with an existing deployment and work with monitoring solutions on premises.

When we first started out this project we looked in-to what have been done before, and most of the examples we found wouldn’t scale to our requirements or used OMS/Log Analytics only. We wanted to use our SCOM environment for alert handling, dashboard and platform health as SCOM is already integrated with customer portals, CMDBs and more. We will discuss more on that later in this blog post.

Things are moving very fast in Azure, we have changed our inital customer tenant setup twice before we found a structure we believe is future friendly.
When a customer sign up for an Azure Subscription, we populate their tenant with a default monitoring resource group and a OMS/Log Analytics workspace (LA). Along with the default LA workspace we add the Azure Activity Log, Web Apps and Office 365 solutions as standard.
For “bread and butter” type of Azure Resources, such as compute and web apps we setup the same type of monitoring regime we provide for on-premise resources, but we use alerts in Azure Monitor. This approach works well for Azure Resources which do not have existing, custom Log Analytics solutions and searches to provide health state. This means that VMs deployed using our custom ARM template will also include Monitor Alerts such as “CPU Usage % above 95” and “Web app response time above x”. In conjunction with Azure Monitor we use Azure Resource Health wich will provide health state data regardless of resource type, and custom alerts in monitor or Log Analytics.

Below is a (not so detailed) illustration on our default tenant.

 

SCOM and Azure Integration

We use System Center Operations Manager (SCOM) as our main monitoring platform for operating systems and applications. As SCOM is already integrated with our ticketing system, CMDB and other internal tools it seems reasonable to provide insight to application and workloads running in Azure on the same monitoring platform. That way we optimally can provide a single pane of glass in to the on premise, hybrid and cloud only workloads.

 

Azure Management Packs

To get monitoring data in to our on prem SCOM we looked in to two major options.

Option #1:
The official Azure Management pack from Microsoft. he official MP discovery process/adding new tenants cannot be automated. It relies on a GUI where you sign it to the tenant etc. neither does it provide any “umbrella” functionality for companies enrolled in the CSP program.

Option #2:
Daniele Grandini’s Azure/OMS management pack. Daniele’s management packs provide insight to Log Analytics, Azure Backup and Automation, but relies on the official Microsoft MP for initial discovery. Daniele’s management packs focuses on the solutions within the “Monitoring + management” (formerly known as OMS) space in Azure. Since much of the alerting features from OMS/Log Analytics are moving to Azure Monitor, I reached out to Daniele and asked if he had looked in to creating a management pack for that. He had looked a little in to it, but was also concerned about the rapid changes. Unfortunately this MP is bound to the initial discovery from the official Azure MP. A service provider managing several hundred tenants (and growing) cannot have that limitation. I hope to be able to help Daniele with the upcoming Azure Monitor MP.

Here’s where our problems started. I wanted to discover all our manged tenants automatically. Take advantage of being a CSP we set out to create our own management pack(s). I have create one management pack for the CSP platform that integrate with the Partner Center API (see example in this blog post) to do the initial discovery. Tenants and subscriptions are populated as objects in SCOM. Further, using a Partner Center Managed Application we can pre-consent access to all managed tenants. That means we can use this applications credentials to authenticate against each of our managed tenants, by-passing the limitation within the official management pack. All resources are the created as object with a hosting relationship to resourcegroup, subscription and tenant. Basic monitoring is done through Azure Resource Health API.

Below is a diagram showing the structure of our CSP management pack

Credentials used to authenticate against partner center and the Azure tenants is provided through SCOM RunAs accounts.

Our next step in SCOM and Azure integration is to create an Azure Monitor Management pack that reference the CSP management pack. This will provide the more enriched monitoring provided by Azure Monitor. Due to many recent changes to the monitor platform I have decided to wait and see where we end up. At the time of writing Azure Monitor have two new alert features in preview and none of their API’s are officially documented – i will come back with examples when I have something tangible.

Summary

To provide effective monitoring as a service provider for customers which span on-prem and cloud environments, we recommend the following:

  1. For “bread and butter” monitoring use a combination of SCOM and Azure Monitor
  2. If in the CSP program. Create a management pack using CSP rest API’s (hopefully I can share our MP later) combined with a custom Azure Monitor MP
  3. Not a CSP? Look in to a combination of the official MP and Daniele’s management packs.
  4. Deploy Log Analytics as default to all tenants. This will give you an advantage when customers require custom solutions and log sources.

Wrapping up

All service providers do their monitoring differently, but hopefully you have gotten some ideas on how you can do yours. Our solution is far from being finished, but I feel we have a structure that are future proof (the modern type of future). Hopefully we can share the SCOM management packs later, but feel free to contact me on specifics. Just remember I cannot share the MP itself at this point in time.

Until further notice, this will be the closing post on how you can do Azure Monitoring as a service provider.

 

Big thanks to Kevin Green and Cameron Fuller for providing feedback and to reach out to other community friends on my behalf.

Working with Azure Monitor Rest API

Before delving in to Azure monitor Rest API and powershell, let’s take a little step back. Azure monitor released in public preview a little over a year ago (September 2016). Introduced as “The built-in solution to make monitoring available for all Azure users”.
At that time I was personally all over the Operations Management Suite or OMS which is now a deprecated brand. All features from OMS is now available under “Monitoring + Management” in the Azure Marketplace.

Therefore, Azure Monitor went a bit under my radar, but when OMS shifted we started to see more and more about Azure Monitor being the one stop shop for Azure monitoring. Especially the alerting feature seems to be richer in Monitor than in Log Analytics and it is my (and others) anticipation that we will see Monitor as the default alert tool, for both metrics and activity logs.

So as part of my never-ending story, Monitoring Azure as a CSP provider. Let’s take a look at Azure Monitor and it’s REST API

We want to retrieve alert rules and incidents (alerts) programmatically, but first we create an alert rule to work with through the GUI. In th Azure portal:

At this time of writing the alert rule is bound to a specific resource. I hope we will see the ability to create rules based on resource type. ie: you want all web apps to have the same standard alert rules for response time.

Locate Azure monitor

Find your resource and metric (or you can jump straight in to alerts)


Verify your alert rule exist

Retrieve Azure Monitor alerts and incidents

From Powershell I am connecting to AAD and generating authentication header, reusing code from earlier blog post about Azure Resource health. For the purpose of this post I will focus on these two Azure Monitor API endpoints: Alert Rules – List By Resource Group and Alert Rule Incidents – List By Alert Rule.

After you have authenticated against Azure AD, and if using my previous sample you should have the following header available.

Using this header we call the alert rules endpoint to get our alert rules. Pay attention to the URL, as it requires you to specify a subscription id and the resource group name.

 

The output from the above should look something like this. I only have one alert configured for the resource group


id : /subscriptions/a2782f8e/resourceGroups/2017/providers/microsoft.insights/alertrules/name
name : name
type : Microsoft.Insights/alertRules
location : westeurope
tags : @{$type=Microsoft.WindowsAzure.Management.Common.Storage.CasePreservedDictionary,}
properties : @{name=No runs; description=; isEnabled=True; condition=; action=; lastUpdatedTime=2017-11-02T12:52:26.9091865Z; provisioningState=Succeeded; actions=System.Object[]}

 

Next we use the ID from our previus result as part of the URL to get our alert incidents

I am jumping straight in to the ‘value’ node at this point. If youre alert rule have triggered an incident it will return a result. We see the time it was activated (and resolved if it’s old), a boolean value of its status, and some information on the resource it self.


id : L3N1YnNjcmlwdGlvbnMvZTUxNGRhY2EtYTA3Ny00NGYwLTljZmEtNjBlMzRjOTk1Zjk3L3Jlc291cmNlR3JvdXBzL1NlbWluYXIyMDE3L3Byb3ZpZGVycy9taWNyb3NvZnQuaW5zaWdodHMvYWxlcnRydWxlcy9ObyUyMHJ1bnMwNjM2NDUyMjM3NjcyODMwODI2
ruleName : /subscriptions/a2782f8e/resourceGroups/2017/providers/microsoft.insights/alertrules/name
isActive : True
activatedTime : 2017-11-02T12:49:27.2830826+00:00
resolvedTime :
targetResourceId :
targetResourceLocation :
legacyResourceId :

 

 

Below I have tied everything together, using my Azure App authentication function we generate the auth header and retrieves alert rules based on user input.

 

 

Wrapping up

We now know the basic consepts on how we authenticate and retrieve alert rules and incidents from Azure Monitor through it’s rest API using Powershell. From here we can easially expand our work to create new alert rules or retrieve metrics from our resources wich lets us build custom solutions on prem or any where else.

If you want to continue exploring Azure Monitoring capabilities i suggest you follow Adin Ermies series describing all the different solutions.

Hopefully I can soon provide some insights on how we are building our CSP monitoring solution in a single blog post, using the different tools mentioned in my latest posts.