Skip to content
adatum
  •  SCOM Web API
  • About adatum
SCOM Azure monitor Azure Monitor

Microsoft killed SCOM internally

  • 11/03/201911/03/2019
  • by Martin Ehrnst

Microsoft no longer uses SCOM to monitor their own workloads. They have replaced their entire SCOM based monitoring stack with Azure Monitor. Allegedly reduced alert noise and administration overhead.

Even if I have moved from SCOM as my main responsibility, I am still very much involved in the whole monitoring and management scope. Over the last years we have heard alot of talk about Azure Monitor replacing SCOM, but that cooled off after a while, maybe until now?

Technology change or cultural change

Microsoft’s story on how they killed SCOM internally was released one day before the official announcement on Operations Manager 2019. But we first heard the story at Ignite in 2018. One may ask, why the re-initiate this topic now?
For SCOM 2019, the focus is to better support hybrid cloud environments, which is good. If Microsoft doesen’t want to use it, should you?

I have written and spoken about the use of SCOM as your hub for Azure Monitor, and my opinion hasn’t changed that much. I belive that transition to you a new monitoring stack will happen with changes to the infrastructure.

When you read the article you’ll see that this was the case for Microsoft as well. There are two quotes i find partculary interesting in the announcement.

“This is not just a technology change, but a culture change,” Baxter says. “It wasn’t only that we would remove SCOM central monitoring, but we had to tell our application teams, now you’re going to manage alerts..”

It was January of 2017 when Baxter got the call. “Our goal was not just to get rid of SCOM, but to move to a Software as a Service (SaaS) solution and retire Virtual Machine (VM) based infrastructure,” she says.


The key here is change in culture. Microsoft went full on DevOps for their internal IT, and by doing that technology will change, and your monitoring will follow.
Further, the showcase mention monitoring was desentralized, which is true. But ther’s another key part of this story. The monitoring team built an integration service between their monitoring stack (Azure Monitor, app insights) and their ITSM system. This system allows for more meta data on each alert etc before ending up as a ticket.

Final notes

If you’re organization runs most of your IaaS on premises, you don’t have to make change yet. Allow the culture to drive the change. A long the way, your SCOM environment can be that integration service between Azure PaaS, FaaS, XaaS and ITSM.

Share this:

  • LinkedIn
  • Twitter
Operations Manager

SCOM Virtualization host CPU spikes

  • 14/03/201806/04/2018
  • by Martin Ehrnst

A lot of the core functionality SCOM 2016 has today was released with SCOM 2007. SCOM 2007 was released (as the name states) in 2007, at the very, very early stages of virtualization. 2007 Was also the start of my professional IT career and I remember only the most assertive companies with most capital was thinking about or using SAN and virtualization. I am talking about oil companies, large architectural firms etc. but still they had the environments in-house, making the virtualization environments small.

In 2018 most companies have much larger environments in-house or have moved everything to a service provider or a public cloud, and now, old SCOM 2007 implementations beginning to play a part.

Virtualization hosts

I work for a service provider in Norway, and we have around 4000 vm’s running on VMWare ESX. The environment is monitored in different ways, but visualization is using Grafana and Influx DB – providing very good insight to analyze the environment. See how you can create your own solution following Rudi Martinsens blog series on VMWare performance data.

This chart shows around 3000 VM’s CPU Ready spike every 15 minutes. Previously we had these spikes at 5 and 15. More on that later.

 

Collect Distributed Workflow Test Event

Collect Distributed Workflow Test Event is the rule that logs event id 6022 on all agent managed computers. It is used to “test event collection”.

Here’s a quote from the rule’s KB

This rule runs for each System Center Management Health Service and logs an event. This event is collected and used to verify that the end-to-end workflow to collect events properly is functioning as expected. If you alter the interval for this rule, it can cause the corresponding monitors to change state or generate an alert. The corresponding monitors are “No End to End Event for 45 Minutes (Critical Level)” and “No End to End Event for 30 Minutes (Warning Level)

 

The rule refers to two monitors using this event to check that “end-to-end” workflow is working. By default these two monitors are disabled, so what is the purpose of this rule? I already know from investigation that this rule indeed causes the CPU spikes every 15 minutes, that it has not implemented “spread initialization” which would be the prefered method. Instead it has a sync time forcing the same start interval for all agents. Even though it doesn’t create a noticeable overhead it self, multiply by X VMs on a host and you will see the impact.

I was not sure if the event logged by the rule was used to something else, so I reached out to Microsoft Premier Support. After a few phone calls and emails referring to my uservoice idea explaining the issue we got the following reply.

[…]

To summarize, if you did not enable the two monitors and if you have disabled the collection rule, logging the event is quite useless. There is no point in logging an event that no one checks afterwards. From this perspective, you could disable the rule logging the event and the collection rule as well, if this is not already disabled.

That confirmed my suspicions. This rule has no value (to our environment) and I can disable the whole thing.

Collect agent processor utilization

I have written about this rule exactly a year ago and I was not the first. It is the worst of the two and runs a script every five minutes to collect agent performance data. If you don’t use this data. Disable the rule.

Fun fact: Kevin Holman was the one suggested to run this rule every 321 seconds as he was tired of every workflow was running every 300 seconds by default.

 

Summary

Every SCOM environment differs from the other, but I strongly belive you are impacted by these two rules. “Collect Distributed Workflow Test Event” and “Collect agent processor utilization” both run on a fixed interval with a sync time instead of using Spread Initialization.

Depending on the size of your environment, , but if you don’t use the data generated by these rules I recommend you disable them. Here is a graph showing our two largest clusters hosting around 1000 VM’s.
Just before 11 I disabled “Collect Distributed Workflow Test Event” and you can clearly see the difference.

 

Let me know if you have experienced similar issues or have comments to this post.

 

Share this:

  • LinkedIn
  • Twitter
Operations Manager

SCOM 1801 REST API Interfaces

  • 19/02/201819/02/2018
  • by Martin Ehrnst

For many years SCOM have delivered state of the art infrastructure monitoring. The platform itself is very flexible, but it has lacked an easy integration interface. This has now changed.

SCOM UnOfficial REST API

A year ago we needed an easier way to integrate monitoring data with non Microsoft products customer portals, CMDB etc. Some of these systems also needed the ability to trigger maintenance mode and create maintenance schedules. As an internal project with a very steep learning curve I started on a SCOM Web API. In May 2017 everyone on the Internet could see how poorly I knew C# as I pushed the whole project to GitHub (First commit).

Latest version now supports many new features and a lot of code changes.

 

SCOM Official REST API

As I follow Microsoft’s monitoring space closely I was very surprised when Jasper VanDamme started talking about a official SCOM REST API released with SCOM 1801. This was something never seen (by me) in the release notes and not talked about at all. If we had got this news when 1801 first announced I believe people had seen it as one of the big news along side HTML5 dashboards, which I understand is why the API now exists.

Being very passionate about SCOM and it’s possibilities despite being an old dinosaur, I feel this official API can open doors for many non SCOM admins creating very cool solutions. I was hoping this could happen to the one I created (and to some extent it have) but now we have a officialy backed SCOM API which is consistent and professional in every corner – future looks promising.

 

Resources

Official REST API Reference

Custom Dashboard Example

SCOM REST API on GitHub

 

Remarks

When I find the time to upgrade my labs to 1801 I will write a blog post dedicated to the new API. Please let me know if you have developed anything cool using either of the API’s available. I’m happy to check it out and provide feedback.

Share this:

  • LinkedIn
  • Twitter
Operations Manager

SCOM 1801 released

  • 08/02/201813/02/2018
  • by Martin Ehrnst

[Quick publish]

Today Microsoft released System Center 1801 which includes the semi-annual release for Operations Manager. That it self is a huge step for SCOM, but the latest release also includes a lot of fixes and many new features. Read the announcement here

SCOM 1801 Announced

What’s in System Center, version 1801?

System Center, version 1801 focuses on enhancements and features for System Center Operations Manager, Virtual Machine Manager, and Data Protection Manager. Additionally, security and bug fixes, as well as support for TLS 1.2, are available for all System Center components including Orchestrator, Service Management Automation, and Service Manager.

I am pleased to share the capabilities included in this release:

  • Support for additional Windows Server features in Virtual Machine Manager: Customers can now setup nested virtualization, software load balancer configuration, and storage QoS configuration and policy, as well as migrate VMware UEFI VM to Hyper-V VM. In addition to supporting Windows Server, version 1709, we have added support for host monitoring, host management, fall back HGS, configuration of encrypted SDN virtual network, Shielded Linux VMs on Hyper-V management, and backup capabilities.
  • Linux monitoring in Operations Manager: Linux monitoring has been significantly improved with the addition of a customizable FluentD-based Linux agent. Linux log file monitoring is now on par with that of Windows Server (Yes, we heard you! Kick the tires, it really works).
  • Improved web console experience in Operations Manager: The System Center Operations Manager web console is now built on HTML5 for a better experience and support across browsers.
  • Updates and recommendations for third-party Management Packs: System Center Operations Manager has been extended to support the discovery and update of third-party MPs.
  • Faster, cost-effective VMware backup: Using our Modern Backup Storage technology in Data Protection Manager, customers can backup VMware VMs faster and cut storage costs by up to 50%.
  • And much more including Linux Kerberos support and improved UI responsiveness when dealing with many management packs in Operations Manager. In Virtual Machine Manager, we have enabled SLB guest cluster floating IP support, added Storage QoS at VMM cloud, added Storage QoS extended to SAN storage, enabled Remote to VMs in Enhanced Session mode, added seamless update of non-domain host agent, and made host Refresher up to 10X faster.
  • As well as consistent evaluation and license experiences across components.
  • Customers should consider supplementing System Center with Azure security & management capabilities for enhanced on-premises management and for the management of Azure resources. We have included the following updates in System Center, version 1801:
  • • Service Map integration with Operations Manager: Using the Distributed Application Diagram function in SCOM, you can automatically see application, server, and network dependencies deduced from Service Map. This deeper endpoint monitoring from SCOM is surfaced in the diagram view for better diagnostics workflows.
  • Manage Azure ARM VMs and special regions: Using a Virtual Machine Manager add-in, you can now manage Azure ARM VMs, Azure Active Directory, and more regions (China, US Government, and Germany).
  • Service Manager integration with Azure: Using the Azure ITSM integration with Azure Action Groups you can set up rules to create incidents automatically in System Center Service Manager for alerts fired on Azure and non-Azure resources.

Share this:

  • LinkedIn
  • Twitter
Preview of system center 1801 for SCOM Operations Manager

System Center 1801 preview release

  • 08/11/201708/02/2018
  • by Martin Ehrnst

Update February 8 2018:

Microsoft released system Center 1801 today. Read more here https://adatum.no/operationsmanager/scom-1801-released


Today, Microsoft annouced that a preview of the next System Center realease (1801) is available for download. This is a preview of the upcoming version, scheduled for Q1 2018. The key focus points for 1801 was announced at Ignite and includes.

  • Support for Windows Server version 1709: Support for the latest version of Windows Server with host monitoring, host management, configuration of encrypted SDN virtual network, Shielded Linux VMs on Hyper-V management, and backup capabilities.
  • Support for additional Windows Server 2016 features in Virtual Machine Manager: Now customers can setup nested virtualization, software load balancer configuration, Storage QoS configuration and policy settings, and migrate VMware UEFI VM to Hyper-V VM.
  • Linux monitoring in Operations Manager: Customers can now realize granular log file monitoring in Linux using a customizable FluentD-based Linux agent. Linux log file monitoring is now at par with that of Windows Server.
  • Improved web console experience in Operations Manager: The SCOM web console has been moved completely to HTML5 with support for all browsers, out-of-the-box widgets, and widget customization.
  • Updates and recommendations for third-party Management Packs: We released the MP Updates and Recommendations feature in System Center 2016. This has been expanded now to support the discovery and update of third-party MPs.
  • Faster, cost-effective VMware backup: Using our Modern Backup Storage technology in Data Protection Manager, customers can backup VMware VMs faster and cut storage costs by up to 50%.
  • In Operations Manager:
    • Improvements to Linux MPs
    • Linux Kerberos support
    • Improvements to Windows Server OS MP
    • One setup for all languages
    • Improved UI responsiveness with large number MPs
    • Visual Studio 2017 support in VSAE
  • In Virtual Machine Manager:
    • SLB Guest cluster floating IP support
    • Storage QoS at VMM Cloud
    • Storage QoS extended to SAN storage
    • Remote to VMs in Enhanced Session mode
    • Seamless update of non-domain host agent
    • Support for fallback HGS for Shielded VM
    • Host Refresher made up to 10X faster

Read the full announcement of the preview here, and the new release cadence for more information.

Share this:

  • LinkedIn
  • Twitter

Posts navigation

1 2 3 … 5

Top Posts & Pages

  • Creating Azure AD Application using Powershell
  • Azure AD authentication in Azure Functions
  • Working with Azure Monitor Rest API
  • Script to add SCOM agent management group
  • Using Azure pipelines to deploy ARM templates
  • Multi subscription deployment with DevOps and Azure Lighthouse
  • Web API for System Center Operations Manager
  • Access to Blob storage using Managed Identity in Logic Apps - by Nadeem Ahamed
  • Remediate Azure Policy with PowerShell
  • Schedule maintenance mode for group (easy)

Tags

agent announcements api ARM authoring Automation Azure AzureAD AzureFunctions AzureLighthouse AzureMonitor AzureSpringClean Bicep Community CSP database EventGrid ExpertsLive ExpertsLiveEU IaC Infrastructure as code Integrations LogAnalytics management pack monitoring MSIgnite MSIgnite2017 MSOMS MSP nicconf Nordic Virtual Summit OperationsManager OpsMgr Powershell QUickPublish rest SCDPM SCOM SCOM2016 SCVMM Serverless SquaredUP SysCtr system center Webasto

Follow Martin Ehrnst

  • Twitter
  • LinkedIn

RSS Feed RSS - Posts

RSS Feed RSS - Comments

Microsoft Azure MVP

Martin Ehrnst Microsoft Azure MVP

NiCE Active 365 Monitor for Azure

NiCE active 365 monitor for Azure
Adatum.no use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it. Cookie Policy
Theme by Colorlib Powered by WordPress
adatum
Proudly powered by WordPress Theme: Shapely.