Skip to content
adatum
  • Home
  •  About adatum
  •  Learn Azure Bicep
  •  SCOM Web API
Operations Manager

Cookdown for SCOM monitor, extend and integrate

  • 08/04/201907/01/2025
  • by Martin Ehrnst

It’s been a while since I worked daily with SCOM. But I still get my hands dirty with my old friend from time to time. For many years I used most of my time extending SCOMs functionality and integrating with other enterprise systems. I created a REST API before the SCOM had this available, and I have also created alot of custom management packs with PowerShell script monitors.
SCOM is one of the most used enterprise monitoring systems around, and companies will rely on it for many years to come. Integrations with SCOM will still be a key for many organizations. Luckiliy, you got a friend.

Cookdown launch

Cookdown is a new initiative aiming to blow new life in to your existing investment in SCOM and deliver stuff like ServiceNow integration and Easy Tune to help you out with those pesky overrides.

The team behind Cookdown will host a launch webinar on April 10. And if you’re interested in integration and extensions for SCOM you should definitely attend.

Share this:

  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on X (Opens in new window) X
  • Click to share on Reddit (Opens in new window) Reddit
Community

Speaking at SCOM Dagen 2018

  • 01/10/201807/01/2025
  • by Martin Ehrnst

I am speaking on SCOM dagen 2018 which this year focuses on multi cloud, on premises and hybrid monitoring using Azure and SCOM.

The event focuses on how organisations can leverage new technology, and use their existing systems to become a better IT organisation. SCOM day is hosted by approved.se

My talk will scratch the surface on custom management pack development. Custom MP developement is used by service providers and larger organizations to gain better visibility in multi and hybrid cloud scenarios.

The event will also feature much more seasoned presenters like Thomas Maurer and Marcel Zehner

Hopefully I see you at SCOM dagen 2018 in Gothenburg

Share this:

  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on X (Opens in new window) X
  • Click to share on Reddit (Opens in new window) Reddit
Operations Manager

SCOM Virtualization host CPU spikes

  • 14/03/201807/01/2025
  • by Martin Ehrnst

A lot of the core functionality SCOM 2016 has today was released with SCOM 2007. SCOM 2007 was released (as the name states) in 2007, at the very, very early stages of virtualization. 2007 Was also the start of my professional IT career and I remember only the most assertive companies with most capital was thinking about or using SAN and virtualization. I am talking about oil companies, large architectural firms etc. but still they had the environments in-house, making the virtualization environments small.

In 2018 most companies have much larger environments in-house or have moved everything to a service provider or a public cloud, and now, old SCOM 2007 implementations beginning to play a part.

Virtualization hosts

I work for a service provider in Norway, and we have around 4000 vm’s running on VMWare ESX. The environment is monitored in different ways, but visualization is using Grafana and Influx DB – providing very good insight to analyze the environment. See how you can create your own solution following Rudi Martinsens blog series on VMWare performance data.

This chart shows around 3000 VM’s CPU Ready spike every 15 minutes. Previously we had these spikes at 5 and 15. More on that later.

 

Collect Distributed Workflow Test Event

Collect Distributed Workflow Test Event is the rule that logs event id 6022 on all agent managed computers. It is used to “test event collection”.

Here’s a quote from the rule’s KB

This rule runs for each System Center Management Health Service and logs an event. This event is collected and used to verify that the end-to-end workflow to collect events properly is functioning as expected. If you alter the interval for this rule, it can cause the corresponding monitors to change state or generate an alert. The corresponding monitors are “No End to End Event for 45 Minutes (Critical Level)” and “No End to End Event for 30 Minutes (Warning Level)

 

The rule refers to two monitors using this event to check that “end-to-end” workflow is working. By default these two monitors are disabled, so what is the purpose of this rule? I already know from investigation that this rule indeed causes the CPU spikes every 15 minutes, that it has not implemented “spread initialization” which would be the prefered method. Instead it has a sync time forcing the same start interval for all agents. Even though it doesn’t create a noticeable overhead it self, multiply by X VMs on a host and you will see the impact.

I was not sure if the event logged by the rule was used to something else, so I reached out to Microsoft Premier Support. After a few phone calls and emails referring to my uservoice idea explaining the issue we got the following reply.

[…]

To summarize, if you did not enable the two monitors and if you have disabled the collection rule, logging the event is quite useless. There is no point in logging an event that no one checks afterwards. From this perspective, you could disable the rule logging the event and the collection rule as well, if this is not already disabled.

That confirmed my suspicions. This rule has no value (to our environment) and I can disable the whole thing.

Collect agent processor utilization

I have written about this rule exactly a year ago and I was not the first. It is the worst of the two and runs a script every five minutes to collect agent performance data. If you don’t use this data. Disable the rule.

Fun fact: Kevin Holman was the one suggested to run this rule every 321 seconds as he was tired of every workflow was running every 300 seconds by default.

 

Summary

Every SCOM environment differs from the other, but I strongly belive you are impacted by these two rules. “Collect Distributed Workflow Test Event” and “Collect agent processor utilization” both run on a fixed interval with a sync time instead of using Spread Initialization.

Depending on the size of your environment, , but if you don’t use the data generated by these rules I recommend you disable them. Here is a graph showing our two largest clusters hosting around 1000 VM’s.
Just before 11 I disabled “Collect Distributed Workflow Test Event” and you can clearly see the difference.

 

Let me know if you have experienced similar issues or have comments to this post.

 

Share this:

  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on X (Opens in new window) X
  • Click to share on Reddit (Opens in new window) Reddit

Posts pagination

1 2 3

Popular blog posts

  • Azure Application registrations, Enterprise Apps, and managed identities
  • Remediate Azure Policy with PowerShell
  • Using webhook in scom subscription (POC)
  • Access to Blob storage using Managed Identity in Logic Apps - by Nadeem Ahamed
  • Azure token from a custom app registration

Categories

Automation Azure Azure Active Directory Azure Bicep Azure DevOps Azure Functions Azure Lighthouse Azure Logic Apps Azure Monitor Azure Policy Community Conferences CSP Monitoring DevOps GitHub Guest blogs Infrastructure As Code Kubernetes Microsoft CSP MPAuthoring OMS Operations Manager Podcast Powershell Uncategorised Windows Admin Center Windows Server

Follow Martin Ehrnst

  • X
  • LinkedIn

RSS feed RSS - Posts

RSS feed RSS - Comments

Microsoft Azure MVP

Martin Ehrnst Microsoft Azure MVP
Adatum.no use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it. Cookie Policy
Theme by Colorlib Powered by WordPress