One of the coolest things about SCOM is how much monitoring you get out of the box.
That said, one of the biggest performance impacts to SCOM is all the monitoring out of the box, plus all the Management Packs you import. This has a cumulative effect, and over time, can impact the speed of the console, because of all the activity happening.
I have long stated, the biggest performance relief you can give to SCOM, is to reduce the number of workflows, reduce the classes and relationships, and keep things simple.
SCOM 2007 shipped back in March 2007. In 10 years, We have continuously added management packs to a default installation of SCOM, and continuously added workflows to the existing MP’s.
For the most part – this is good. These packs add more and more monitoring and capabilities “out of the box”. However, in many cases, they can also add load to the environment. They discover class instances, relationships, add state calculation, etc. In small SCOM environments (under 1000 agents) this will have very little impact. But at large enterprise scale, every little thing counts.
I have already written about some of the optional things you can consider (IF you don’t use the features), such as removing the APM MP’s, and removing the Advisor MP’s.
Here is one I came across today with a customer:
I noticed on the server that hosts the “All Management Servers Resource Pool” we have some out of the box PowerShell script based rules that were timing out after 300 seconds, and running every 15 minutes:
Collect Agent Health States (ManagementGroupCollectionAgentHealthStatesRule)
Collect Management Group Active Alerts Count (ManagementGroupCollectionAlertsCountRule)
These scripts do things like “Get-SCOMAgent” and “Get-SCOMAlert”. They were timing out, running constantly for 5 minutes, then getting killed by the timeout limit, then starting over again. This kind of thing will have significant impact on SQL blocking, SDK utilization, and overall performance.
Now, in small environments, this isn’t a big deal, and these will return results quickly with little impact. However, in a VERY large environment, Get-SCOMAgent can take 10 minutes or more just to return the data!!!! If you have hundreds of thousands of open alerts, it can take just as long to run the Alert SDK queries as well.
The only thing these two rules are used for is to populate a SCOM Health dashboard – and these are of little value:
I recommend that larger environments disable these two rules….. as they will be very resource intensive for very minimal value. If you feel like you like to keep them, then override them to 86400 seconds, and set a sync time to run each at slightly different times, off peak, like 23:00 (11pm), and set the timeout to 600 seconds. If it cannot complete in 10 minutes, then disable them….. also – stagger the sync time for the other rule to begin at 23:20 (11:20pm) so they aren't both running at the time time.
Additionally, in this same MP (Microsoft.SystemCenter.OperationsManager.SummaryDashboard) there are two discoveries.
Collect Agent Versions (ManagementGroupDiscoveryAgentVersions)
Collect agent configurations (ManagementGroupDiscoveryAgentConfiguration)
These discoveries run once per hour, and also run things like Get-SCOMAgent – which is bad for large environments, especially with that frequency.
The only thing they do is populate this dashboard:
I rarely ever see this being used and recommend large environments disable these as well.
Speed up that SCOM deployment!