SCOMpercentageCPUTimeCounter cause CPU Spike

System Center Opertations manager logo

To be honest this have existed for years, and written about back in 2014. Now, in 2017, SCOM 2016 UR2 is released the problem remains. Perhaps with greater consequence due to virtualization.

If you’re unfamiliar with the problem SCOMpercentageCPUTimeCounter.vbs (.ps1 in SCOM 2016) is a script included in the “System Center Core Monitoring” management pack, and is used as the data source for a rule and a monitor to determinate agent health by gathering ‘HealthService’ CPU usage. The rule and monitor are set to run at a fixed interval of 321 seconds (I assume the person who wrote the MP just tapped 3-2-1 on their numpad 🙂 ) and sync time set to 00:00

 

[supsystic-tables id=1]

If you want to look at the actual code you will find  the data source on SystemCenterCore.com

 

Running this script every 5 minutes isn’t exactly a problem when you have physical servers or a small amount of virtual machines on your Hypervisor. But if you run 100 or 300VM’s on one host and each single VM start this script simultainiasly it will create it creates unnecessary load on your host. If this host is overcommitted as well CPU wait time could cause a ‘freeze’ on your tenant machines as well.

To illustrate the problem, I have attached a graph, that clearly show spikes during script execution.

vcenter host cpu spike SCOM

 

On a monitored computer you will see a cscript.exe process executing the following command line “c:\windows\system32\cscript.exe” /nologo “SCOMPerventageCPUTimeCounter.vbs

Cscript.exe running SCOM Cpu percentange script

 

Unfortunately out of the box there isn’t much to do. Sync time and interval is the only overridable parameters, and these will only help reduce the load on the agent machine itself. So if you experience CPU utilization peaks due to this script, I see only two options

  • Disable the rule and monitor
    • Then you will have to rely on the CPU utilization monitor from the operating system management pack
  • Create a new rule and monitor, using SpreadInitializationOverInterval parameter
    • Reduces load as executions occurs randomly within the set interval
    • Requires authoring skills, but possible. Some information here.

 

To not let this go into oblivion, I have left feedback on Operations Manager user voice. Hopefully, Microsoft will make some changes in the future. If you have suggestions or other experience please let me know and i will update accordingly.

Posted by Martin Ehrnst

Working as a systems engineer in one of Norway's leading enterprise cloud providers. Mainly working with System Center, Azure and Windows server products

*All post are personal

6 Comments

  1. As I am entirely new to SCOM and just stood up my first soon to become production environment, out of the box this script has become the worst! It is saying it has failed to run on all my management servers and my web server (which is all it has been deployed to). I haven’t installed any MPs yet as I really know nothing about the product. I am not certain what I should do but I would be happy to hear some suggestions

  2. As they changed the monitor in SCOM 2016 and introduced some other problems I went ahead and rewrote the monitor to fix some other thing. Though as I had the code it was fast to convert the monitor to use SpreadInitializationOverInterval. In my limited testing (as I only have two agents running in my test environment) it’s hard to tell if it actually do randomize the run time of the monitor. Cookdown has been said does not play well with the scheduler (https://operatingquadrant.com/2009/12/14/scalability-and-performance-design-and-testing-in-the-xsnmp-management-packs/).
    Also discussed here (https://social.technet.microsoft.com/Forums/en-US/e6f9d58c-ca69-4ec8-9662-0f0bc3c21263/systemsystemscheduler-spreadinitializationoverinterval-not-working?forum=operationsmanagerauthoring)

    If you want to try it for yourself the code is here (https://github.com/mortenlerudjordet/lerunTools/tree/master/SCOM/MPs/Microsoft.SystemCenter.2007.Addendum)

      1. Hi again, after working some more on the script logic I found lots of other issues. I will be publishing a new version of the MP during the weekend that fixes a lot of stability issues with the script. Stay tuned 🙂

Engage by commenting