Quantcast
Channel: Kevin Holman's System Center Blog
Viewing all 158 articles
Browse latest View live

Useful MOM 2005 SQL queries

$
0
0

SCDW/DTS:

 

Primary 6 tables that SCDW data is retained in:

SC_AlertFact_Table
SC_AlertHistoryFact_Table
SC_AlertToEventFact_Table
SC_EventFact_Table
SC_EventParameterFact_Table
SC_SampledNumericDataFact_Table

Commands to execute to modify the data retention age:  Please run these commands changing the values from 385 to the number to the number of days of data to be retained:

Exec p_updategroomdays 'SC_AlertFact_Table', 385
Exec p_updategroomdays 'SC_AlertHistoryFact_Table', 385
Exec p_updategroomdays 'SC_AlertToEventFact_Table', 385
Exec p_updategroomdays 'SC_EventFact_Table', 385
Exec p_updategroomdays 'SC_EventParameterFact_Table', 385
Exec p_updategroomdays 'SC_SampledNumericDataFact_Table', 385

Queries to verify above setting:

For example, to verify the number of days that the SC_AlertFact_Table retains data, use the following command:

select cs.cs_tablename 'Table Name', wcs.wcs_groomdays 'Groom Days' from warehouseclassschema wcs
join classschemas cs
on cs.cs_classID = wcs.wcs_classID
where cs.cs_tablename = 'SC_AlertFact_Table'
and wcs.wcs_mustbegroomed = 1

Grooming the SCDW: 
If you set the grooming data to attempt to groom too much data at a time, the job will likely fail due to the transaction log filling up.  You should first ensure your transaction log has plenty of free space, or grow it manually, or set it to auto-grow (if ample disk space).  Ensure your tempdb has ample room (disk space) to grow temporarily, and then attempt to run the SCDWgroomjob.  Start with only grooming out 10 days at a time.

Queries to find out how old present data is:

SELECT DATEDIFF(d, MIN(DateTimeLastModified), GETDATE()) AS [Current] FROM SC_AlertFact_View
SELECT DATEDIFF(d, MIN(DateTimeLastModified), GETDATE()) AS [Current] FROM SC_AlertHistoryFact_View
SELECT DATEDIFF(d, MIN(DateTimeEventStored), GETDATE()) AS [Current] FROM SC_AlertToEventFact_View
SELECT DATEDIFF(d, MIN(DateTimeStored), GETDATE()) AS [Current] FROM SC_EventFact_View
SELECT DATEDIFF(d, MIN(DateTimeEventStored), GETDATE()) AS [Current] FROM SC_EventParameterFact_View
SELECT DATEDIFF(d, MIN(DateTimeAdded), GETDATE()) AS [Current] FROM SC_SampledNumericDataFact_View

Delete event data from SCDW (NOT SUPPORTED):

USE SystemCenterReporting
DELETE SC_EventFact_Table
FROM SC_EventFact_Table AS ft
INNER JOIN SC_EventDetailDimension_Table AS ed
ON ft.EventDetail_FK = ed.SMC_InstanceID
WHERE ed.EventID_PK = '1234'

Query to examine the last groom job and how long it took in the SCDW:

Use SystemCenterReporting
GO
Select MWCS.GroomDays, MCS.ClassName, MCS.TableName,MWGI.*, DATEDIFF(s,MWGI.StartTime, MWGI.EndTime) AS GroomTookSeconds FROM dbo.SMC_Meta_WarehouseGroomingInfo MWGI
INNER JOIN SMC_Meta_ClassSchemas MCS
ON MWGI.ClassID=MCS.ClassID
INNER JOIN SMC_Meta_WarehouseClassSchema MWCS
ON MWGI.ClassID=MWCS.ClassID
ORDER BY MCS.ClassName

Running the DTS job with latency:
Add the “/latency:90”  switch to the scheduled task command line for the DTS job (MOM.Datawarehousing.DTSPackageGenerator.exe)
(The latency switch specifies the number of days to skip)

Setting query timeout value:
If a *remote* query or DTS job times out, you can change the value in Ent. Manager, on the properties of the server, connections tab. 
Default is 600, set this to 0 for unlimited.

Query to look at oldest date of event records:

use SystemCenterReporting
select min(timestored) from sdkeventview

Given the event ID – a quick query to view the number of distinct events for each message, and view the text of the event message:

select distinct message, count(*) as number from sdkeventview where nteventid = '7036' group by message order by number desc

Faking the Onepoint database to think DTS is working in order to groom:
Changed the Date of "TimeDTSLastRan" to today's date in ReportingSettings table of Onepoint DB.
Then, run the MOMX Partitioning and grooming job.  This will delete all data in Onepoint based on the global setting of data retention, even if it has not been transferred to the SCDW.

The GroomingSettings table also contains a “TimeDTSLastRan” value – but this will be an old date, as it is typically the time and date of the reporting server install and is apparently not used by the grooming stored procedure.

DTS Logging:

1. Setup Error Logging for the DTS Package. You can use info from the following MS KB to setup DTS Package Logging:
316043 HOW TO: Log Data Transformation Services Package Execution to a Text File
http://support.microsoft.com/?id=316043

2. SQL Log location:  Zip up all the files in the SQL Server Log folder "..\program files\microsoft
sql server\MSSQL\LOG" Note: if you have a SQL Named Instance then the path will be as following for
the LOG Folder:  "..\program files\microsoft sql server\SQLNamedInstance\MSSQL\LOG"

3.     SQL profiler settings to capture for DTS logging:
- Errors and Warnings   ---> Add All events under this
- Stored Proc ---> RPC/SP Starting/Completed, SP:Stmt Starting/Completed, and  SP:
Recompile
- Transactions ---> All of these
- TSQL ---> SQL:Batch/SQL:Stmt Starting/Completed

 

 

Onepoint:

 

 

Alerts:

Most common alerts in a Onepoint DB:
This will tell us how many alerts we are generating per day, and

SELECT CONVERT(char(10), TimeRaised, 101) AS "Alert Date (by Day)", COUNT(*) AS "Number of Alerts"
FROM SDKAlertView
GROUP BY CONVERT(char(10), TimeRaised, 101)
ORDER BY "Alert Date (by Day)" DESC

SELECT Culprit, Name, SUM(1) AS AlertCount, SUM(RepeatCount+1) AS AlertCountWithRepeatCount
FROM Alert WITH (NOLOCK)
WHERE ResolutionState = (0|255)
GROUP BY Culprit, Name
ORDER BY AlertCount DESC
-- ORDER BY AlertCountWithRepeatCount DESC

Events:

Most common events in a Onepoint DB:

SELECT CONVERT(char(10), TimeGenerated, 101) AS "Events Date (by Day)", COUNT(*) AS "Number of Events"
FROM SDKEventView
GROUP BY CONVERT(char(10), TimeGenerated, 101)
ORDER BY "Events Date (by Day)" DESC

SELECT NTEventID, COUNT(*) AS "Numver of Events"
FROM SDKEventView
GROUP BY NTEventID
ORDER BY "Numver of Events" DESC

To view the oldest events in the database (this should mirror the “1” in Lastgroomed from above query):

select min(timestored) from sdkeventview

To count the number of events in Onepoint:

select count(*) from sdkeventview

To determine computers that are generating the most event data:

select distinct ComputerName, count(*) as NumberOfOccurences from SDKEventView
group by Computername order by numberofoccurences desc

To determine the noisiest computers and see the events they are generating:

select top 100 Computername, Message, NTEventID, Source, Count(*) AS TOTAL
from SDKEventView
Group by Computername, Message, NTEventID, Source
Order by TOTAL DESC

Selecting multiple events:

Select * from SDKeventview where NTeventID in (‘9980’,’9981’,’9982’,’9983’)

Given the event ID – a quick query to view the number of distinct events for each message, and view the text of the event message:

Select distinct message, count(*) as number from sdkeventview where nteventid = '7036' group by message order by number desc

Performance:

To display the most common perf insertions:

select performanceobjectname, performancecountername, count(performanceobjectname) as 'count' from sdkperformanceview
group by performanceobjectname, performancecountername
order by 'count' desc

To determine computers are generating the most perf insertions:

select distinct server, count(*) as NumberOfOccurences from SampledNumericDataPerformanceReportView
group by server order by numberofoccurences desc

Top 10% Computers with High Perf Volume, by Counter, by Object, by Day

SELECT top 10 percent CONVERT(char(10), TimeSampled, 101) AS "Perf Date (by Day)"
, Computername, PerformanceObjectName, PerformanceCounterName, COUNT(*) AS "Number of PerfObjects"
FROM SDKPerformanceView
GROUP BY CONVERT(char(10), TimeSampled, 101), computername, PerformanceObjectName, PerformanceCounterName
ORDER BY "Number of PerfObjects" DESC 

Top 10% Computers with High Perf Volume, by Day

SELECT  top 10 percent CONVERT(char(10), TimeSampled, 101) AS "Perf Date (by Day)", computername, COUNT(*) AS "Number of PerfObjects"
FROM SDKPerformanceView
GROUP BY CONVERT(char(10), TimeSampled, 101), computername
ORDER BY "Number of PerfObjects" DESC

Top 10% Computers with High Perf Volume, by Object, by Day

SELECT top 10 percent CONVERT(char(10), TimeSampled, 101) AS "Perf Date (by Day)", computername, performanceobjectname, COUNT(*) AS "Number of PerfObjects"
FROM SDKPerformanceView
GROUP BY CONVERT(char(10), TimeSampled, 101), computername, performanceobjectname
ORDER BY "Number of PerfObjects" DESC

To display the counters causing the most perf impact per computer:

select performanceobjectname, performancecountername, count(performanceobjectname) as 'count'
from sdkperformanceview
where computername = 'COMPUTERNAME'
group by performanceobjectname, performancecountername
order by 'count' desc

Misc Onepoint:

To view the number of Operations Console currently open:

SELECT program_name, count(*)
FROM Master..sysprocesses
WHERE ecid=0 and program_name='Microsoft Operations Manager - DAS Operations Console'
GROUP BY program_name
ORDER BY count(*) desc 

To view the grooming information on tables in one point, especially with respect to the “Current” and “LastGroomed” columns =1
SELECT * FROM dbo.PartitionTables

To view or modify the TimeDTSLastRan field to allow grooming to occur:
select * from ReportingSettings

To display the rule name given a GUID:

select Name FROM onepoint.dbo.ProcessRule WHERE idProcessRule=’GUIDstringHERE’

To find all rules associated with a provider:

select pr.name 'Rule Name', pi.name 'Provider Name' from processrule pr
join ProviderInstance pi
on pr.idproviderinstance = pi.idproviderinstance
where pi.name like 'ISA Server%'

To display the Total number of agents (all agent types)

SELECT * FROM ManagedCountsView WHERE ManagedType = ‘-1’

To display the total number of Unmanaged agents

SELECT * FROM ManagedCountsView WHERE ManagedType = ‘0’

To display the total number of Managed agents

SELECT * FROM ManagedCountsView WHERE ManagedType = ‘2’

To display the total number of Windows Server Cluster computers

SELECT * FROM ManagedCountsView WHERE ManagedType = ‘3’

To display the total number of Agents reporting to a specific Mgmt Server

SELECT COUNT(*) FROM MOMv2_ComputerAllPropertiesView WHERE ConfigManagerName = ‘mgmt server name’

or to list the total number of Agents reporting to all Mgmt Servers (NULL are unmanaged)

SELECT configmanagername, COUNT(*) FROM MOMv2_ComputerAllPropertiesView group by ConfigManagerName

To display the agent action account for all servers:

use OnePoint
Select Name, Value
from Attribute INNER JOIN Computer ON DISCOVERYCOMPUTERID = IDCOMPUTER
WHERE ClassAttributeID IN (Select ClassAttributeID from ClassAttribute where
ClassAttributeName = 'Action Account Identity')
AND IDComputer IN (Select DiscoveryComputerID from Attribute WHERE
ClassAttributeID IN (Select ClassAttributeID from ClassAttribute where
ClassAttributeName = 'Action Account Identity'))
order by value, name

MISC:

Simple query to display large tables:

SELECT
so.name,
8 * Sum(CASE WHEN si.indid IN (0, 1) THEN si.reserved END) AS data_kb,
Coalesce(8 * Sum(CASE WHEN si.indid NOT IN (0, 1, 255) THEN si.reserved END), 0) AS index_kb,
Coalesce(8 * Sum(CASE WHEN si.indid IN (255) THEN si.reserved END), 0) AS blob_kb
FROM dbo.sysobjects AS so JOIN dbo.sysindexes AS si ON (si.id = so.id)
WHERE 'U' = so.type GROUP BY so.name  ORDER BY data_kb DESC

Simple query for Onepoint or SCDW to dump a perfcounter to a table output.  Modify Computername and timestamps:

DECLARE @BeginDate datetime
DECLARE @EndDate datetime
SET @BeginDate = '2006-10-18 05:45:00.287'
SET @EndDate = '2006-10-19 05:45:00.287'
SELECT * FROM dbo.SDKPerformanceView WHERE Computername = 'EXCH1' and PERFORMANCECOUNTERNAME = 'Local Queue Length' and TimeSampled BETWEEN @BeginDate AND @EndDate ORDER BY TimeSampled DESC

Database Performance:

The System Center DW does not come with any maintenance. 
I really like the maintenance plan at:  http://systemcenterforum.org/wp-content/uploads/scdw_reindex1.zip

To get better performance manually:

Update Statistics (will help speed up reports and takes less time than a full reindex):

EXEC sp_updatestats

Show index fragmentation (to determine how badly you need a reindex – logical scan frag > 10% = bad.  Scan density below 80 = bad):

DBCC SHOWCONTIG
DBCC SHOWCONTIG WITH FAST (less data than above – in case you don’t have time)

Reindex the database:

Onepoint:

USE OnePoint
go
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET ARITHABORT ON
SET CONCAT_NULL_YIELDS_NULL ON
SET QUOTED_IDENTIFIER ON
SET NUMERIC_ROUNDABORT OFF
EXEC SP_MSForEachTable "Print 'Reindexing '+'?' DBCC DBREINDEX ('?')"
            SystemCenterReporting:
            DBCC REINDEX (‘TableName’) 
            DBCC REINDEX (‘SC_SampledNumericDataFact_Table’)

Update usage (use when SQL reports a table size that you know not to be correct):

DBCC updateusage ('systemcenterreporting')
DBCC updateusage ('systemcenterreporting','sc_samplednumericdatafact_table')

How to identify your version of SQL server:

SELECT  SERVERPROPERTY('productversion'), SERVERPROPERTY ('productlevel'), SERVERPROPERTY ('edition')

SQL 2005:

SQL Server 2005 RTM                           2005.90.1399
SQL Server 2005 SP1                           2005.90.2047
SQL Server 2005 SP1 plus 918222         2005.90.2153
SQL Server 2005 SP2                           2005.90.3042

SQL 2000:

SQL Server 2000 RTM               2000.80.194.0
SQL Server 2000 SP1                2000.80.384.0
SQL Server 2000 SP2                2000.80.534.0
SQL Server 2000 SP3                2000.80.760.0
SQL Server 2000 SP3a              2000.80.760.0
SQL Server 2000 SP4                2000.8.00.2039

SQL2000 Reporting RTM             8.00.743.00
SQL2000 Reporting SP1              8.00.878.00
SQL2000 Reporting SP2              8.00.1038.00


How grooming and auto-resolution work in the OpsMgr 2007 Operational database

$
0
0

How Grooming and Auto-Resolution works in the OpsMgr 2007 Operations DB

 

 

Warning – don’t read this if you are bored easily. 

 

 

In a simplified view to groom alerts…..

 

Grooming of the ops DB is called once per day at 12:00am…. by the rule:  “Partitioning and Grooming  You can search for this rule in the Authoring space of the console, under Rules.  It is targeted to the “Root Management Server” and is part of the System Center Internal Library.

 

It calls the “p_PartitioningAndGrooming” stored procedure, which calls p_Grooming, which calls p_GroomNonPartitionedObjects (Alerts are not partitioned) which inspects the PartitionAndGroomingSettings table… and executes each stored procedure.  The Alerts stored procedure in that table is referenced as p_AlertGrooming which has the following sql statement:

 

    SELECT AlertId INTO #AlertsToGroom

    FROM dbo.Alert

    WHERE TimeResolved IS NOT NULL

    AND TimeResolved < @GroomingThresholdUTC

    AND ResolutionState = 255

 

So…. the criteria for what is groomed is pretty simple:  In a resolution state of “Closed” (255) and older than the 7 day default setting (or your custom setting referenced in the table above)

 

We won’t groom any alerts that are in New (0), or any custom resolution-states (custom ID #).  Those will have to be set to “Closed” (255)…. either by autoresolution of a monitor returning to healthy, direct user interaction, our built in autoresolution mechanism, or your own custom script.

 

Ok – that covers grooming.

 

However – I can see that brings up the question – how does auto-resolution work?

 

 

 

 

That specifically states “alerts in the new resolution state”.  I don’t think that is completely correct:

 

That is called upon by the rule “Alert Auto Resolve Execute All” which runs p_AlertAutoResolveExecuteAll once per day at 4:00am.  This calls p_AlertAutoResolve twice…. once with a variable of “0” and once with a “1”.

 

Here is the sql statement:

 

IF (@AutoResolveType = 0)

    BEGIN

        SELECT @AlertResolvePeriodInDays = [SettingValue]

        FROM dbo.[GlobalSettings]

        WHERE [ManagedTypePropertyId] = dbo.fn_ManagedTypePropertyId_MicrosoftSystemCenterManagementGroup_HealthyAlertAutoResolvePeriod()

 

        SET @AutoResolveThreshold = DATEADD(dd, -@AlertResolvePeriodInDays, getutcdate())

        SET @RootMonitorId = dbo.fn_ManagedTypeId_SystemHealthEntityState()

   

        -- We will resolve all alerts that have green state and are un-resolved

        -- and haven't been modified for N number of days.

        INSERT INTO @AlertsToBeResolved

        SELECT A.[AlertId]

        FROM dbo.[Alert] A

        JOIN dbo.[State] S

            ON A.[BaseManagedEntityId] = S.[BaseManagedEntityId] AND S.[MonitorId] = @RootMonitorId

        WHERE A.[LastModified] < @AutoResolveThreshold

        AND A.[ResolutionState] <> 255

        AND S.[HealthState] = 1

 

<snip>

 

    ELSE IF (@AutoResolveType = 1)

    BEGIN

        SELECT @AlertResolvePeriodInDays = [SettingValue]

        FROM dbo.[GlobalSettings]

        WHERE [ManagedTypePropertyId] = dbo.fn_ManagedTypePropertyId_MicrosoftSystemCenterManagementGroup_AlertAutoResolvePeriod()

 

        SET @AutoResolveThreshold = DATEADD(dd, -@AlertResolvePeriodInDays, getutcdate())

 

        -- We will resolve all alerts that are un-resolved

        -- and haven't been modified for N number of days.

        INSERT INTO @AlertsToBeResolved

        SELECT A.[AlertId]

        FROM dbo.[Alert] A

        WHERE A.[LastModified] < @AutoResolveThreshold

        AND ResolutionState <> 255

 

 

So we are basically checking that Resolution state <> 255….. not specifically “New” (0) as we would lead you to believe by the wording in the interface.  There are simply two types of auto-resolution:  Resolve all alerts where the object has returned to a healthy state in “N” days….. and Resolve all alerts no matter what, as long as they haven’t been modified in “N” days.

Moving the Data Warehouse and ACS databases in OpsMgr 2007

$
0
0

Moving the Data Warehouse and ACS databases in OpsMgr 2007

 

Instructions for how to move the data warehouse database, and ACS database – have been published.

 

The Operations Manager 2007 Backup and Recovery Guide has been updated as of December 2007, to include moving the DW and ACS DB's, in addition to the previously published move of the OpsDB.

Perhaps you deployed your databases to interim hardware?  Perhaps you outgrew you current disk I/O?  Perhaps you want to move to a high availability cluster?

 

Move ‘em!

 

http://technet.microsoft.com/en-us/opsmgr/bb498235.aspx

Failed tasks aren't groomed from the Operational Database

$
0
0

This appears to be present up to RC-SP1 version, build 6.0.6246.0

 

In the Task Status console view - I noticed an old failed task from 2 months ago..... however, my task grooming is set to 7 days.

 

To view the grooming process:

http://blogs.technet.com/kevinholman/archive/2007/12/13/how-grooming-and-auto-resolution-work-in-the-opsmgr-2007-operational-database.aspx

Basically – select * from PartitionAndGroomingSettings will show you all grooming going on.

Tasks are kept in the jobstatus table.

Select * from jobstatus will show all tasks.

p_jobstatusgrooming is called to groom this table.

Here is the text of that SP:

--------------------------------

USE [OperationsManager]

GO

/****** Object:  StoredProcedure [dbo].[p_JobStatusGrooming]    Script Date: 02/05/2008 10:49:32 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

ALTER PROCEDURE [dbo].[p_JobStatusGrooming]

AS

BEGIN

SET NOCOUNT ON

DECLARE @Err int

DECLARE @Ret int

DECLARE @RowCount int

DECLARE @SaveTranCount int

DECLARE @GroomingThresholdLocal datetime

DECLARE @GroomingThresholdUTC datetime

DECLARE @TimeGroomingRan datetime

DECLARE @MaxTimeGroomed datetime

SET @SaveTranCount = @@TRANCOUNT

SET @TimeGroomingRan = getutcdate()

SELECT @GroomingThresholdLocal = dbo.fn_GroomingThreshold(DaysToKeep, getdate())

FROM dbo.PartitionAndGroomingSettings

WHERE ObjectName = 'JobStatus'

EXEC dbo.p_ConvertLocalTimeToUTC @GroomingThresholdLocal, @GroomingThresholdUTC OUT

IF (@@ERROR <> 0)

BEGIN

GOTO Error_Exit

END

-- Selecting the max time to be groomed to update the table

SELECT @MaxTimeGroomed = MAX(LastModified)

FROM dbo.JobStatus

WHERE TimeFinished IS NOT NULL

AND LastModified < @GroomingThresholdUTC  

IF @MaxTimeGroomed IS NULL

GOTO Success_Exit

BEGIN TRAN

-- Change the Statement below to reflect the new item

-- that needs to be groomed

DELETE FROM dbo.JobStatus

WHERE TimeFinished IS NOT NULL

AND LastModified < @GroomingThresholdUTC

SET @Err = @@ERROR

IF (@Err <> 0)

BEGIN

GOTO Error_Exit

END

UPDATE dbo.PartitionAndGroomingSettings

SET GroomingRunTime = @TimeGroomingRan,

        DataGroomedMaxTime = @MaxTimeGroomed

WHERE ObjectName = 'JobStatus'

SELECT @Err = @@ERROR, @RowCount = @@ROWCOUNT

IF (@Err <> 0 OR @RowCount <> 1)

BEGIN

GOTO Error_Exit

END

COMMIT TRAN

Success_Exit:

RETURN 0

Error_Exit:

-- If there was an error and there is a transaction

-- pending, rollback.

IF (@@TRANCOUNT > @SaveTranCount)

ROLLBACK TRAN

RETURN 1

END

------------------------------------

 

 

Here is the problem in the SP:

 

DELETE FROM dbo.JobStatus

WHERE TimeFinished IS NOT NULL

AND LastModified < @GroomingThresholdUTC

 

 

We only delete (groom) tasks that have a timestamp in TimeFinished.  If a failed task doesn’t finish – this field will be NULL and never gets groomed.

Print Server management pack fills the Operational DB with TONS of perf data

$
0
0

This is something I have noticed in MOM 2005, and seems to be the same in the conversion MP for OpsMgr 2007.  (Version 6.0.5000.0 of the Microsoft.Windows.Server.Printserver (Converted) MP).  When you import this MP, it will fill the Operational and reporting databases with performance data about print jobs and queues, if you have a large number of print servers/queues in your environment.

If reporting on this perf data is not critical to your environment, you should disable these rules:

clip_image002

Grooming process in the Operations Database

$
0
0

This is a continuation of my other post, on general alert grooming:

How grooming and auto-resolution work in the OpsMgr 2007 Operational database

 

Grooming of the OpsDB is called once per day at 12:00am…. by the rule:  “Partitioning and Grooming” You can search for this rule in the Authoring space of the console, under Rules. It is targeted to the “Root Management Server” and is part of the System Center Internal Library.

image

 

It calls the “p_PartitioningAndGrooming” stored procedure.  This SP calls two other SP's:  p_Partitioning and then p_Grooming

p_Partitioning inspects the table PartitionAndGroomingSettings, and then calls the SP p_PartitionObject for each object in the PartitionAndGroomingSettings table where "IsPartitioned = 1"   (note - we partition event and perf into 61 daily tables - just like MOM 2005)

The PartitionAndGroomingSettings table:

image

 

The p_PartitionObject SP first identifies the next partition in the sequence, truncates it to make sure it is empty, and then updates the PartitionTables table in the database, to update the IsCurrent field to the next numeric table for events and perf.  Then it calls the p_PartitionAlterInsertView sproc, to make new data start writing to the current event and perf table.

To review which tables you are writing to - execute the following query:   select * from partitiontables where IsCurrent = '1'

A select * from partitiontables will show you all 61 event and perf tables, and when they were used.  You should see a PartitionStartTime updated every day - around midnight (time is stored in UTC in the database).  If partitioning is failing to run, then we wont see this date changing every day.  

 

Ok - that's the first step of the p_PartitioningAndGrooming sproc - Partitioning.  Now - if that is all successful, we will start grooming!

The p_Grooming is called after partitioning is successful.  One of the first things it does - is to update the InternalJobHistory table.  In this able - we keep a record of all partitioning and grooming jobs.  It is a good spot check to see what's going on with grooming.  To have a peek at this table - execute a select * from InternalJobHistory order by InternalJobHistoryId

image

 

The p_Grooming sproc then calls p_GroomPartitionedObjects  This sproc will first examine the PartitionAndGroomingSettings and compare the days to keep column, against the current date, to figure out how many partitions to groom.  It will then inspect the partitions to ensure they have data, and then truncate the partition, by calling p_PartitionTruncate.  The p_GroomPartitionedObjects sproc will then update the PartitionAndGroomingSettings table with the current time, under the GroomingRunTime column. 

Next - the p_Grooming sproc continues, by calling p_GroomNonPartitionedObjects.  p_GroomNonPartitionedObjects is a short, but complex sproc - in that is calls all the individual sprocs listed in the PartitionAndGroomingSettings table where IsPartitioned = 0.  (see my other post at the link above to follow the logic of one of these non-partitioned sprocs)

Next - the p_Grooming sproc continues, by updating the InternalJobHistory table, to give it a status of success (StatusCode of 1 = success, 2= failed, 0 appears to be never completed?)

 

If you ever have a problem with grooming - or need to get your OpsDB database size under control - simply reduce the data retention days, in the console, under Administration, Settings, Database Grooming.  To start with - I recommend setting all these to just 2 days, fromt he default of 7.  This keeps your OpsDB under control until you have time to tune all the noise fromt he MP's you import.  So just reduce this number, then open up query analyzer, and execute p_PartitioningAndGrooming  When it is done, check the job status by executing select * from InternalJobHistory order by InternalJobHistoryId   The last groom job should be present, and successful.  The OpsDB size should be smaller, with more free space.  And to validate, you can always run my large table query, found at:   Useful Operations Manager 2007 SQL queries

What SQL maintenance should I perform on my OpsMgr databases?

$
0
0

This question comes up a lot.  The answer is really - not what maintenance you should be performing... but what maintenance you should be *excluding*.... or when.  Here is why:

Most SQL DBA's will set up some pretty basic default maintenance on all SQL DB's they support.  This often includes, but is not limited to:

DBCC CHECKDB  (to look for DB errors and report on them)

UPDATE STATISTICS  (to boost query performance)

DBCC DBREINDEX  (to rebuild the table indexes to boost performance)

BACKUP

SQL DBA's might schedule these to run via the SQL Agent to execute nightly, weekly, or some combination of the above depending on DB size and requirements.

On the other side of the coin.... in some companies, the MOM/OpsMgr team installs and owns the SQL server.... and they dont do ANY default maintenance to SQL.  Because of this - a focus in OpsMgr was to have the Ops DB and Datawarehouse DB to be fully self-maintaining.... providing a good level of SQL performance whether or not any default maintenance was being done.

Operational Database:

Reindexing is already taking place against the OperationsManager database for some of the tables.  This is built into the product.  What we need to ensure - is that any default DBA maintenance tasks are not redundant nor conflicting with our built-in maintenance, and our built-in schedules:

There is a rule in OpsMgr that is targeted at the Root Management Server:

image

The rule executes the "p_OptimizeIndexes" stored procedure, every day at 2:30AM:

image

image

This rule cannot be changed or modified.  Therefore - we need to ensure there is not other SQL maintenance (including backups) running at 2:30AM, or performance will be impacted.

If you want to view the built-in UPDATE STATISTICS and DBCC DBREINDEX jobs history - just run the following queries:

select *
from DomainTable dt
inner join DomainTableIndexOptimizationHistory dti
on dt.domaintablerowID = dti.domaintableindexrowID
ORDER BY optimizationdurationseconds DESC

select *
from DomainTable dt
inner join DomainTableStatisticsUpdateHistory dti
on dt.domaintablerowID = dti.domaintablerowID
ORDER BY UpdateDurationSeconds DESC

Take note of the update/optimization duration seconds column.  This will show you how long your maintenance is typically running.  In a healthy environment these should not take very long.

 

If you want to view the fragmentation levels of the current tables in the database, run:

DBCC SHOWCONTIG WITH FAST

Here is some sample output:

----------------------------------------------------------------------------------------------

DBCC SHOWCONTIG scanning 'Alert' table...
Table: 'Alert' (1771153355); index ID: 1, database ID: 5
TABLE level scan performed.
- Pages Scanned................................: 936
- Extent Switches..............................: 427
- Scan Density [Best Count:Actual Count].......: 27.34% [117:428]
- Logical Scan Fragmentation ..................: 60.90%

----------------------------------------------------------------------------------------------

In general - we would like the "Scan density" to be high (Above 80%), and the "Logical Scan Fragmentation" to be low (below 30%).  What you might find... is that *some* of the tables are more fragmented than others, because our built-in maintenance does not reindex all tables.  Especially tables like the raw perf, event, and localizedtext tables.

That said - there is nothing wrong with running a DBA's default maintenance against the Operational database..... reindexing these tables in the database might also help console performance.  We just dont want to run any DBA maintenance during the same time that we run our own internal maintenance, so try not to conflict with this schedule.  Care should also be taken in any default DBA maintenance, that it does not run too long, or impact normal operations of OpsMgr.  Maintenance jobs should be monitored, and should not conflict with the backup schedules as well.

Here is a reindex job you can schedule with SQL agent.... for the OpsDB:

USE OperationsManager
go
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET ARITHABORT ON
SET CONCAT_NULL_YIELDS_NULL ON
SET QUOTED_IDENTIFIER ON
SET NUMERIC_ROUNDABORT OFF
EXEC SP_MSForEachTable "Print 'Reindexing '+'?' DBCC DBREINDEX ('?')"

 

Data Warehouse Database:

The data warehouse DB is also fully self maintaining.  This is called out by a rule "Standard Data Warehouse Data Set maintenance rule" which is targeted to the "Standard Data Set" object type.  This stored procedure is called on the data warehouse every 60 seconds.  It performs many, many tasks, of which Index optimization is but one.

image

This SP calls the StandardDatasetOptimize stored procedure, which handles any index operations.

To examine the index and statistics history - run the following query for the Alert, Event, Perf, and State tables:

 

select basetablename, optimizationstartdatetime, optimizationdurationseconds,
      beforeavgfragmentationinpercent, afteravgfragmentationinpercent,
      optimizationmethod, onlinerebuildlastperformeddatetime
from StandardDatasetOptimizationHistory sdoh
inner join StandardDatasetAggregationStorageIndex sdasi
on sdoh.StandardDatasetAggregationStorageIndexRowId = sdasi.StandardDatasetAggregationStorageIndexRowId
inner join StandardDatasetAggregationStorage sdas
on sdasi.StandardDatasetAggregationStorageRowId = sdas.StandardDatasetAggregationStorageRowId
ORDER BY optimizationdurationseconds DESC

 

Then examine the default domain tables optimization history.... run the same two queries as listed above for the OperationsDB.

In the data warehouse - we can see that all the necessary tables are being updated and reindexed as needed.  When a table is 10% fragmented - we reorganize.  When it is 30% or more, we rebuild the index.

Therefore - there is no need for a DBA to execute any UPDATE STATISTICS or DBCC DBREINDEX maintenance against this database.  Furthermore, since we run our maintenance every 60 seconds, and only execute maintenance when necessary, there is no "set window" where we will run our maintenance jobs.  This means that if a DBA team also sets up a UPDATE STATISTICS or DBCC DBREINDEX job - it can conflict with our jobs and execute concurrently.  This should not be performed. 

 

For the above reasons, I would recommend against any maintenance jobs on the Data Warehouse DB, beyond a CHECKDB (only if DBA's mandate it) and a good backup schedule. 

 

For the OpsDB: any standard maintenance is fine, as long as it does not conflict with the built-in maintenance, or impact production by taking too long, or having an impact on I/O.

 

Lastly - I'd like to discuss the recovery model of the SQL database.  We default to "simple" for all our DB's.  This should be left alone.... unless you have *very* specific reasons to change this.  Some SQL teams automatically assume all databases should be set to "full" recovery model.  This requires that they back up the transaction logs on a very regular basis, but give the added advantage of restoring up to the time of the last t-log backup.  For OpsMgr, this is of very little value, as the data changing on an hourly basis is of little value compared to the complexity added by moving from simple to full.  Also, changing to full will mean that your transaction logs will only checkpoint once a t-log backup is performed.  What I have seen, is that many companies aren't prepared for the amount of data written to these databases.... and their standard transaction log backups (often hourly) are not frequent enough to keep them from filling.  The only valid reason to change to FULL, in my opinion, is when you are using an advanced replication strategy, like log shipping, which requires full recovery model.  When in doubt - keep it simple.  :-)

 

 

P.S....  The Operations Database needs 50% free space at all times.  This is for growth, and for re-index operations to be successful.  This is a general supportability recommendation, but the OpsDB will alert when this falls below 40%. 

For the Data warehouse.... we do not require the same 50% free space.  This would be a temendous requireemnts if we had a multiple-terabyte database!

Think of the data warehouse to have 2 stages... a "growth" stage (while it is adding data and not yet grooming much (havent hit the default 400 days retention) and a "maturity stage" where agent count is steady, MP's are not changing, and the grooming is happening because we are at 400 days retention.  During "growth" we need to watch and maintain free space, and monitor for available disk space.  In "maturity" we only need enough free space to handle our index operations.  when you start talking 1 Terabyte of data.... that means 500GB of free space, which is expensive, and.  If you cannot allocate it.... then just allow auto-grow and monitor the database.... but always plan for it from a volume size perspective.

For transaction log sizing - we don't have any hard rules.  A good rule of thumb for the OpsDB is ~20% to 50% of the database size.... this all depends on your environment.  For the Data warehouse, it depends on how large the warehouse is - but you will probably find steady state to require somewhere around 10% to 20% of the warehouse size.  Any time we are doing any additional grooming of an alert/event/perf storm.... or changin grooming from 400 days to 300 days - this will require a LOT more transaction log space - so keep that in mind as your databases grow.

Event ID 2115 A Bind Data Source in Management Group

$
0
0

I see this event a lot in customer environments.  I am not an expert on troubleshooting this here... but saw this post in the MS newsgroups and felt it was worth capturing....

My experience has been that it is MUCH more common to see these when there is a management pack that collects way too much discovery data.... than any real performance problem with the data warehouse.  In most cases.... if the issue just started after bringing in a new MP.... deleting that MP solves the problem.  I have seen this repeatedly after importing the Cluster MP, Or Exchange 2007 MP.... but haven't been able to fully investigate the root cause yet:

 

In a nutshell.... if they are happening just a couple times an hour.... and the time in seconds is fairly low (under a few minutes) then this is normal. 

If they are happening very frequently - like every minute, and the times are increasing - then there is an issue that needs to be resolved.

 

Taken from the newsgroups:

-------------------------------------------

In OpsMgr 2007 one of the performance concerns is DB/DW data insertion performance. Here is a description of how to identify and troubleshoot problems with DB/DW data insertion.

Symptoms:

DB/DW write action workflows run on a Management Server, they first keep data received from Agent / Gateway in an internal buffer, then they create a batch of data from the buffer and insert the data batch to DB / DW, when the insertion of the first batch finished, they will create another batch and insert it to DB / DW. The size of the batch depends on how much data is available in the buffer when the batch is created, but there is a maximum limit on the size of the batch, a batch can contain up to 5000 data items.  If data item incoming (from Agent / Gateway) throughput becomes larger, or the data item insertion (to DB/DW) throughput becomes smaller, then the buffer will tend to accumulate more data and the batch size will tend to become larger.  There are different write action workflows running on a MS, they handle data insertion to DB / DW for different type of data:

  • Microsoft.SystemCenter.DataWarehouse.CollectEntityHealthStateChange
  • Microsoft.SystemCenter.DataWarehouse.CollectPerformanceData
  • Microsoft.SystemCenter.DataWarehouse.CollectEventData
  • Microsoft.SystemCenter.CollectAlerts
  • Microsoft.SystemCenter.CollectEntityState
  • Microsoft.SystemCenter.CollectPublishedEntityState
  • Microsoft.SystemCenter.CollectDiscoveryData
  • Microsoft.SystemCenter.CollectSignatureData
  • Microsoft.SystemCenter.CollectEventData

When a DB/DW write action workflow on Management Server notices that the insertion of a single data batch is slow (ie. slower than 1 minute), it will start to log a 2115 NT event to OpsMgr NT event log once every minute until the batch is inserted to DB / DW or is dropped by DB / DW write action module.  So you will see 2115 events in management server's "Operations Manager" NT event log when it is slow to insert data to DB /DW.  You might also see 2115 events when there is a burst of data items coming to
Management server and the number of data items in a batch is large.  (This can happen during a large amount of discovery data being inserted - from a freshly imported or noisy management pack.)

2115 events have 2 import pieces of information: the name of the workflow that has insertion problem, and the pending time since the workflow started inserting last data batch.  Here is an example of a 2115 event:

------------------------------------

A Bind Data Source in Management Group OpsMgr07PREMT01 has posted items to the workflow, but has not received a response in 3600 seconds.  This indicates a performance or functional problem with the workflow.

Workflow Id : Microsoft.SystemCenter.CollectSignatureData

Instance    : MOMPREMSMT02.redmond.corp.microsoft.com

Instance Id : {6D52A6BB-9535-9136-0EF2-128511F264C4}

------------------------------------------

This 2115 event is saying DB write action workflow "Microsoft.SystemCenter.CollectSignatureData" (which writes performance
signature data to DB) is trying to insert a batch of signature data to DB and it started inserting 3600 seconds ago but the insertion has not finished yet. Normally inserting of a batch should finish within 1 minutes.

Normally, there should not be much 2115 events happening on Management server, if it happens less than 1 or 2 times every hour (per write action workflow), then it is not a big concern, but if it happens more than that, there is a DB /DW insertion problem.

The following performance counters on Management Server gives information of DB / DW write action insertion batch size and insertion time, if batch size is becoming larger (by default maximum batch size is 5000), it means management server is either slow in inserting data to DB/DW or is getting a burst of data items from Agent/Gateway. From the DB / DW write action's Avg. Processing Time, you will see how much time it takes to write a batch of data to DB / DW.

  • OpsMgr DB Write Action Modules(*)\Avg. Batch Size
  • OpsMgr DB Write Action Modules(*)\Avg. Processing Time
  • OpsMgr DW Writer Module(*)\Avg. Batch Processing Time, ms
  • OpsMgr DW Writer Module(*)\Avg. Batch Size

Possible root causes:

  • In OpsMgr, discovery data insertion is relatively expensive, so a discovery burst (a discovery burst is referring to a short period of time when a lot of discovery data is received by management server) could cause 2115 event (complaining about slow insertion of discovery data), since discovery insertion should not happen frequently.  So if you see consistently 2115 events for discovery data collection. That means you either have DB /DW insertion problem or some discovery rules in a MP is collecting too much
    discovery data.
  • OpsMgr Config update caused by instance space change or MP import will impact the CPU utilization on DB and will have impact on DB data insertion.  After importing a new MP or after a big instance space change in a large environment,  you will probably see more than normal 2115 events.
  • Expensive UI queries can impact the resource utilization on DB and could have impact on DB data insertion. When user is doing expensive UI operation, you will probably see more than normal 2115 events.
  • When DB / DW is out of space / offline you will find Management server keeps logging 2115 events to NT event log and the pending time is becoming higher and higher.
  • Sometimes invalid data item sent from agent /Gateway will cause DB / DW insertion error which will end up with 2115 event complaining about DB /DW slow insertion. In this case please check the OpsMgr event log for relevant error events. It's more common in DW write action workflows.
  • If DB / DW hardware is not configured properly, there could be performance issue,  and it could cause slow data insertion to DB / DW. The problem could be: 
    • The network link between DB / DW to MS is slow (either bandwidth is small / latency is large, as a best practice we recommend MS to be in the same LAN as DB/DW).
    • The data / log / tempdb disk used by DB / DW is slow, (we recommend separating data, log and tempdb to different disks, we recommend using RAID 10 instead of using RAID 5, we also recommend turning on write cache of the array controllers). 
    • The OpsDB tables are too fragmented (this is a common cause of DB performance issues).  Reindex affected tables will solve this issue.
    • The DB / DW does not have enough memory.

 

Now - that is the GENERAL synopsis and how to attack them.  Next - we will cover a specific issue we are seeing with a specific type of 2115 Event:

-----------------------------------------------

It appears we may be hitting cache resolution error we were trying to catch for a while. This is about CollectEventData workflow.  Error is very hard to catch and we're including a fix in SP2 to avoid it.  There are two ways to resolve the problem in the meantime.  Since the error happens very rarely, you can just restart Health Service on the Management Server that is affected.  Or you can prevent it from blocking the workflow by creating overrides in the following way:

-----------------------------------------------


1) Launch Console, switch to Authoring space and click "Rules"
2) In the right top hand side of the screen click "Change Scope"
3) Select "Data Warehouse Connection Server" in the list of types,. click "Ok"
4) Find "Event data collector" rule in the list of rules;
5) Right click "Event data collector" rule, select Overrides/Override the Rule/For all objects of type...
6) Set Max Execution Attempt Count to 10
7) Set Execution Attempt Timeout Interval Seconds to 6

That way if DW event writer fails to process event batch for ~ a minute, it will discard the batch.  2115 events related to
Datawarehouse.CollectEventData should go away after you apply these overrides.  BTW, while you're at it you may want to override "Max Batches To Process Before Maintenance Count" to 50 if you have a relatively large environment.  We think 50 is better default setting then SP1's 20 in this case and we'll switch default to 50 in SP2.

-------------------------------------------------

 

Essentially - to know if you are affected by the specific 2115 issue describe above - here is the criteria:

 

1.  You are seeing 2115 bind events in the OpsMgr event log of the RMS or MS, and they are recurring every minute.

2.  The events have a Workflow ID of:  Workflow Id : Microsoft.SystemCenter.DataWarehouse.CollectEventData

3.  The "has not received a response" time is increasing, and growing to be a very large number over time.

 

Here is an example of a MS with the problem:  Note consecutive events, from the CollectEventData workflow, occurring every minute, with the time being a large number and increasing:

 

Event Type:      Warning
Event Source:   HealthService
Event Category:            None
Event ID:          2115
Date:                5/5/2008
Time:                2:37:06 PM
User:                N/A
Computer:         MS1
Description:
A Bind Data Source in Management Group MG1 has posted items to the workflow, but has not received a response in 706594 seconds.  This indicates a performance or functional problem with the workflow.
Workflow Id : Microsoft.SystemCenter.DataWarehouse.CollectEventData
Instance    : MS1.domain.com
Instance Id : {646486D0-E366-03CA-38E7-79A0D6F34F82}

 

Event Type:      Warning
Event Source:   HealthService
Event Category:            None
Event ID:          2115
Date:                5/5/2008
Time:                2:36:05 PM
User:                N/A
Computer:         MS1
Description:
A Bind Data Source in Management Group MG1 has posted items to the workflow, but has not received a response in 706533 seconds.  This indicates a performance or functional problem with the workflow.
Workflow Id : Microsoft.SystemCenter.DataWarehouse.CollectEventData
Instance    : MS1.domain.com
Instance Id : {646486D0-E366-03CA-38E7-79A0D6F34F82}

 

Event Type:      Warning
Event Source:   HealthService
Event Category:            None
Event ID:          2115
Date:                5/5/2008
Time:                2:35:03 PM
User:                N/A
Computer:         MS1
Description:
A Bind Data Source in Management Group MG1 has posted items to the workflow, but has not received a response in 706471 seconds.  This indicates a performance or functional problem with the workflow.
Workflow Id : Microsoft.SystemCenter.DataWarehouse.CollectEventData
Instance    : MS1.domain.com
Instance Id : {646486D0-E366-03CA-38E7-79A0D6F34F82}


DBcreatewizard or just run good old SetupOM.exe - which should I use to install the Database component of OpsMgr?

$
0
0

There has always been a bit of confusion on when to run the DBCreateWizard.exe tool, or when to just use SetupOM.exe to create the Operational DB or Data Warehouse DB.

Historically.... in MOM 2005, we used the DBcreate Wizard in order to create the Onepoint database on Active/Active clusters..... or when SQL DBA teams refused to run a MSI based setup on one of their SQL servers.  The DB create wizard was a better option for them.... since it did not have to install any binaries on a SQL server.  In practice.... it was pretty rare to see this in widespread use.

 

In OpsMgr 2007, we haven't really documented all the scenarios for when you should run the DBcreate Wizard.... and I will try and do that here. 

 

The DB create wizard is located on the CD - In the \SupportTools folder.  It does require some additional files to run it - these don't have to be "installed", just need to be copied over to the SQL DB server where you will run the wizard.  Follow:  http://support.microsoft.com/kb/938997/en-us

***  Note - the additional files required to run DBCreateWizard.exe are documented in the KB article above.  They were also provided on the SP1 Select CD.  However - the files provided on CD are for 32bit x86 only.  If you are using the DBCreateWizard on a x64 platform - you MUST copy these files listed in the KB article from an x64 server.... any x64 server with the console installed will have them.

Note - there were some significant issues with the RTM version of this tool... in detecting the correct SQL instance on a multi-instance cluster, and leaving some table information blank (http://support.microsoft.com/kb/942865/en-us).  When deploying SP1 - Use the SP1 version of this tool.  If you MUST deploy the RTM version - I would recommend using SetupOM.exe for all installs.

 

Ok.... first, you will notice in the OpsMgr Deployment guide, they instruct to use the DBcreateWizard when installing the database on an Active/Passive cluster.  That's pretty much our first introduction to this tool.  While this isn't required (you can simply run SetupOM.exe on the Active node) it is recommended to use DBCreateWizard.  Essentially, our recommendation is that anytime you have a dedicated SQL server for the OpsDB role... with no other OpsMgr role present, then you should use the DBcreateWizard to create the Operational database.  The reason for this, from an internal discussion I have been involved in.... is because using SetupOM.exe will create some additional registry entries on the database server... and will change how updates are applied to the server from an OpsMgr perspective.  Another scenario to leverage this tool, is anytime your SQL DBA teams refuse to allow you to run a MSI based setup on their SQL servers/clusters.

 

Below, I will just walk through some of the scenarios where using this stand-alone tool really makes good sense.

 

Scenarios:

1.  All in one role/shared roles.  This is where a single server hosts SQL Server 2005 and the Operational Database role, along with the RMS role.  In this case.... you might as well just run SetupOM.exe and create the database while installing the management group.  You potentially could run the DBcreatewizard first.... but this would be an additional step and provides no value.

2.  Split roles:  Dedicated SQL server (Server A) and dedicated RMS (Server B).   In this scenario - we recommend using DBcreatewizard.exe instead of just running SetupOM.exe on the SQL server.   However - you certainly can do either one.... both are fully supported.

3.  Split roles - clustered DB:  Dedicated cluster for SQL (can be A/P or A/A or multi-instance or multi node.... doesn't matter)  In this scenario - we recommend using DBcreatewizard.exe instead of just running SetupOM.exe on the SQL server.  That said.... you can run SetupOM.exe on any node that owns the SQL instance you are creating the DB in.... we just favor using DBcreateWizard.

4.  Draconian DBA's.  In general.... DBA's are used to creating an empty database for an application, then granting permissions to the DB only.... then washing their hands of it.  They don't like running setup's... or even running tools on their SQL servers....  If they must have an application create a database as part of that application install - they MUCH prefer that all the DB creation be handled remotely.  Unfortunately.... MOM 2005 and OpsMgr 2007 do not support what DBA's would most like to see.  We must run our setup or tool on the database server/node in order to install that component.  I suppose we could install the OpsDB using the DBcreatewizard in a test lab SQL box, then detach it.... then hand the files to a SQL team and have them drop in into a production environment to make them happier.... but I haven't really done much testing there.  Anyway.... the DBcreateWizard is the best option when working with a rigid DBA team.  Just follow the KB article listed above... and have the SQL team run the tool to create the database.... then they can delete to tool from the server.  We will still require SA priv over the instance to complete the RMS setup.... but once that is done, they can remove these advanced rights, per my previous post:  http://blogs.technet.com/kevinholman/archive/2008/04/15/opsmgr-security-account-rights-mapping-what-accounts-need-what-privileges.aspx

5.  Multiple Operational Databases in the same SQL instance.  It is possible, if you have multiple management groups, that you could place all the Operational DB's into a single SQL instance.  Now - these had better be small environments (test/dev) or a beefy SQL server to handle all that I/O.... but just for grins.... lets say you are doing it.  If you tried to run SetupOM.exe and install the database component multiple times.... it would detect it was already installed and ask you if you wish to repair or remove OpsMgr.  No good.  In comes the DBcreateWizard.  This tool is the supported method for creating multiple OpsDB's in a single SQL instance.

Agent Pending Actions can get out of synch between the Console, and the database

$
0
0

When you look at your agent pending actions in the Administration pane of the console.... you will see pending actions for things like approving a manual agent install, agent installation in progress, approving agent updates, like from a hotfix, etc.

 

This pending action information is also contained in the SQL table in the OpsDB - agentpendingaction

 

It is possible for the agentpendingaction table to get out of synch with the console, for instance, if the server was in the middle of updating/installing an agent - and the management server Healthservice process crashed or was killed.

 

In this case, you might have a lingering pending action, that blocks you from doing something in the future.  For instance - if you had a pending action to install an agent, that did not show up in the pending actions view of the console.  What might happen, is that when you attempt to discover and push the agent to this same server, you get an error message:

 

"One or more computers you are trying to manage are already in the process of being managed.  Please resolve these issues via the Pending Management view in Administration, prior to attempting to manage them again"

ss2

 

The problem is - they don't show up in this view!

 

To view the database information on pending actions:

select * from agentpendingaction

You should be able to find your pending action there - that does not show up in the Pending Action view in the console, if you are affected by this.

 

To resolve - we should first try and reject these "ghost" pending actions via the SDK... using powershell.  Open a command shell, and run the following:

get-agentpendingaction

To see a prettier view:

get-agentpendingaction | ft agentname,agentpendingactiontype

To see a specific pending action for a specific agent:

get-agentPendingAction | where {$_.AgentName -eq "servername.domain.com"}

To reject the specific pending action:

get-agentPendingAction | where {$_.AgentName -eq "servername.domain.com"}|Reject-agentPendingAction

We can use the last line - to reject the specific pending action we are interested in.

 

You might get an exception running this:

Reject-AgentPendingAction : Microsoft.EnterpriseManagement.Common.UnknownServiceE
xception: The service threw an unknown exception. See inner exception for details
. ---> System.ServiceModel.FaultException`1[System.ServiceModel.ExceptionDetail]:
Exception of type 'Microsoft.EnterpriseManagement.Common.DataItemDoesNotExistExc
eption' was thrown.

If this fails, such as gives an exception, or if our problem pending action doesn't even show up in Powershell.... we have to drop down to the SQL database level.  This is a LAST resort and NOT SUPPORTED.... run at your own risk.

There is a stored procedure to delete pending actions.... here is an example, to run in a SQL query window:

exec p_AgentPendingActionDeleteByAgentName 'agentname.domain.com'

Change 'agentname.domain.com' to the agent name that is showing up in the SQL table, but not in the console view.

Does your OpsDB keep growing? Is your localizedtext table using all the space?

$
0
0

This post is about an issue in OpsMgr SP1 AND R2 – where the localizedtext table in the database may fill and consume large amounts of space.

OpsMgr 2007 no longer has a hard database limit of 30GB like MOM 2005 did.  For this reason, most OpsMgr administrators don't watch this very closely anymore, or freak out when it gets big.

However - it must be noted... console and operational performance are still impacted when this DB gets big.  You really should keep an eye on it and try to keep it as small as possible.  In general, I recommend only keep 2 days of operational data (Database Grooming global setting) from the default of 7 days, until everything is tuned.

One thing I have noticed at several locations, is that there are a couple tables that often grow quite large... depending on the agent count and what management packs are installed.  These are LocalizedText and PublisherMessages.  This is cause by management packs, that create a large amount of events, from script.  I have seen this mostly in environments that have funky converted MOM 2005 MP's what run a lot of backwards-compatibility scripts, or in large Exchange 2007 and SCCM deployments.  Like I said - this won't affect all customers... just those with specific management packs that expose this.  What happens, is each event writes additional data to these tables, and they are not groomed or pruned.... so they keep growing.  Over time, the impact is, that your DB might keep filling and run of of disk space, or your performance might be impacted when you use a view that queries LocalizedText.

 

* Am I impacted by this issue? *

 

To know if you are impacted - I would run the following query against your OpsDB:

Simple query to display large tables, to determine what is taking up space in the database:

SELECT so.name,
8 * Sum(CASE WHEN si.indid IN (0, 1) THEN si.reserved END) AS data_kb,
Coalesce(8 * Sum(CASE WHEN si.indid NOT IN (0, 1, 255) THEN si.reserved END), 0) AS index_kb,
Coalesce(8 * Sum(CASE WHEN si.indid IN (255) THEN si.reserved END), 0) AS blob_kb
FROM dbo.sysobjects AS so JOIN dbo.sysindexes AS si ON (si.id = so.id)
WHERE 'U' = so.type GROUP BY so.name  ORDER BY data_kb DESC

Normally, in most typical environments with typical MP's, we'd expect perf data to be the largest tables, followed by event, state, and alert.  If localizedtext is your largest table, this is impacting you.  You can run the following query:

select count(*) from localizedtext

Generally, if this table is your largest in the database, and over a million rows, you are impacted.  The impact is low... however.... mostly just hogging space in the DB, and possibly impacting console performance.

 

* OK – I am impacted.  What do I do? *

 

You need to run the attached SQL statements to clean this up.  You might need to run these on a regular basis (once a week to once a month) if it grows back quickly.  To run these – you open SQL Server Management Studio, connect to the SQL instance that hosts the OperationsManager DB, and run a “New Query”.  Then paste the text from one of the scripts attached into the query window, and run it.

When you upgrade to R2 – most of this is resolved…. we no longer fill this table, however, you WILL need to run the cleanup at least once to get rid of all the old junk leftover from SP1 days.

I am attaching TWO scripts below, which clean up these tables.  That being said - this script is NOT supported by Microsoft, as it has not been thoroughly tested.  It is being provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified in the Terms of Use

***UPDATED for R2:

There are now TWO scripts.

If you are on SP1 – you run both on a regular basis.

If you are on R2 – you only need to run the LocalizedTextCleanupforSP1.txt script ONCE, and then run the LocalizedTextCleanupSP1andR2.txt script on a regular basis.

This core issue was fixed in R2, however – since R2 released we found another type of data that gets left in the LocalizedText table, so this second script was developed.

 

*** Critical Note:

These scripts will require a LARGE amount of TempDB (mostly TempDBLog) space - make sure your TempDB is on a volume with lots of space to grow... if not - add an additional TempDB file on another volume just in case.  Make sure you take a good SQL backup of your OpsDB FIRST.  The script in general, takes about 20 minutes per million rows in the LocalizedText table, depending on the hardware capabilities of the SQL server.  I have seen it take 10 minutes per million rows on a fast server. 

Now – when I say LOTS of space for your tempDB – I mean it.  LOTS.  I believe it is the tempDBlog that needs most of the space.  Just make sure you have at least as much tempDB space as the size of your LocalizedText table.  That means if your LT table is 40 million rows (~40GB) then I would plan to have at LEAST 40GB of free space for your TempDB/TempDBLog to grow.  Changing the default autogrow on these to a larger value, and growing them out in advance will help speed up the process as well.

 

When the script is done, you wont recognize the space freed up immediately.  You need to run a - DBCC DBREINDEX ('localizedtext') - to reindex the table, and show the newly freed space.  It would likely be a good idea to reindex the entire database at this point, which you can do by running the following:

Reindex the database:

USE OperationsManager
go
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
SET ARITHABORT ON
SET CONCAT_NULL_YIELDS_NULL ON
SET QUOTED_IDENTIFIER ON
SET NUMERIC_ROUNDABORT OFF
EXEC SP_MSForEachTable "Print 'Reindexing '+'?' DBCC DBREINDEX ('?')"

If you first want to troubleshoot, and try and determine what is consuming your tables... or which MP's are generating the most noise in this table.... you can run the following (they might take a LONG time to complete - depending on how big your tables are:

Most common events:

select messageid, ltvalue, count(*) as Count from publishermessages with(nolock)
inner join localizedtext with(nolock)
on messagestringId = localizedtext.ltstringid
group by messageid, ltvalue
order by Count DESC

LT insertions per day/month:

SELECT
DATEPART(mm,timeadded) AS 'MONTH',
DATEPART(dd,timeadded) AS 'DAY',
count(*)
from localizedtext with(nolock)
group by
DATEPART(mm,timeadded),
DATEPART(dd,timeadded)
order by
DATEPART(mm,timeadded),
DATEPART(dd,timeadded)

Boosting OpsMgr performance - by reducing the OpsDB data retention

$
0
0

Here is a little tip I often advise my customers on.....

The default data retention in OpsMgr is 7 days for most data types:

 

image

 

These are default settings which work well for a large cross section of different agent counts.  In MOM 2005 - we defaulted to 4 days.  Many customers, especially with large agent counts, would have to reduce that in MOM 2005 down to 2 days to keep a manageable Onepoint DB size.

 

That being said - to boost UI performance, and reduce OpsDB database size - consider reducing these values down to your real business requirements.  For a new, out of the box management group - I advise my customers to set these to 2 days.  This will keep less noise in your database as you deploy, and tune, agents and management packs.  This keeps a smaller DB, and a more responsive UI, in large agent count environments.

Essentially - set each value to "2" except for Performance Signature, which we will change to 1.  Performance Signature is unique.... the setting here isnt actually "Days" of retention.  It is "business cycles".  This is for self-tuning threshold ONLY.  This data is used for calculating business cycle based self-tuning thresholds.  There is NO REASON for this ever to be larger than the default of "2" business cycles.... and large agent count environments can see a performance benefit by bumping this down to only keeping "1" business cycle.

 

image

 

Then - once your Management group is fully deployed, and you have tuned your alert, performance, event, and state data.... IF you have a business requirement to keep this data for longer - bump it up.

Keep in mind - this will NOT cause you to groom out Alerts that are open - only closed alerts, and still will keep your closed alerts around for a couple days.

These settings have no impact on the data that is being written to the data warehouse - so any alert, event, or perf data needed will always be there.

Populating groups from a SQL server CMDB – step by step

$
0
0

Boris wrote a cool article HERE on how to populate a group of computers in OpsMgr, from an external source…. such as active directory.  In his published example – you run an LDAP query to AD, to return a recordset list if computers, in order to populate them into a group. 

This post will extend that, by showing how to do the same thing – but using a SQL database as the CMDB source for populating groups, instead of AD.  I had a customer who wanted to do this – to dynamically create groups for the purpose of scoping console views and notifications, to the teams that “owned” the different servers.  The CMDB contained this data, but it changes often, so manually controlling this proved to be a pain.

Here is a very simple example of the CMDB, which contains the ServerName, and the team that owns the server, in the “ServerList” table:

image

As you can see… I can easily write a SQL query to show ONLY servers owned by TEAM 1:

image

Let’s use this data source… to populate three groups.  Team 1 Group, Team 2 Group, and Team 3 group.

First – I will post my finished XML example at the bottom – go grab that and open it – it will help you follow along with the XML requirements.

In the XML… we basically need 4 components:

1.  We need to define the name of the MP, and add references to other MP’s we will depend on. (<Manifest> Section)

2.  We need to define the groups themselves, and then define the relationships (stating that they will contain Windows Computer Objects) (<TypeDefinitions> section)

3.  We need to run a discovery to populate the groups… this will be a script based discovery that runs only on the RMS, queries the CMDB, matches on the servername FQDN, and populates the group with a windows computer object. (<Monitoring> section)

4.  We need to modify the display strings in the XML MP – in order to show friendly display names for each of the above, in the UI. (<LanguagePacks> Section)

You can simply take the XML posted below, and just modify each section with your custom group names… or add new groups by adding a new class, relationship, discovery/script, and presentation section to each.

Here we go:

 

Section 1:  <Manifest>

Simply modify the <ID>, <Version>, and <Name> sections based on your custom MP naming standard.

<ManagementPack ContentReadable="true" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <Manifest>
    <Identity>
      <ID>SQLBasedGroupDemo</ID>
      <Version>1.0.0.0</Version>
    </Identity>
    <Name>SQLBasedGroupDemo</Name>
    <References>
      <Reference Alias="SC">
        <ID>Microsoft.SystemCenter.Library</ID>
        <Version>6.0.6278.0</Version>
        <PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
      </Reference>
      <Reference Alias="Windows">
        <ID>Microsoft.Windows.Library</ID>
        <Version>6.0.6278.0</Version>
        <PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
      </Reference>
      <Reference Alias="Health">
        <ID>System.Health.Library</ID>
        <Version>6.0.6278.0</Version>
        <PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
      </Reference>
      <Reference Alias="System">
        <ID>System.Library</ID>
        <Version>6.0.6278.0</Version>
        <PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
      </Reference>
    </References>
  </Manifest>

 

 

Section 2:  <TypeDefinitions>

The example below defines 3 groups, and the relationships for those groups (contains windows computer objects).  Simply replace the bolded red “Team1Group” example with a short name for your custom groups.  (We will define the UI friendly name later, in the <Presentation> section.

<TypeDefinitions>
  <EntityTypes>
    <ClassTypes>
      <ClassType ID="GroupPopulation.Team1Group" Accessibility="Internal" Abstract="false" Base="System!System.Group" Hosted="false" Singleton="true" />
      <ClassType ID="GroupPopulation.Team2Group" Accessibility="Internal" Abstract="false" Base="System!System.Group" Hosted="false" Singleton="true" />
      <ClassType ID="GroupPopulation.Team3Group" Accessibility="Internal" Abstract="false" Base="System!System.Group" Hosted="false" Singleton="true" />
    </ClassTypes>
    <RelationshipTypes>
      <RelationshipType ID="GroupPopulation.Team1GroupContainsWindowsComputers" Accessibility="Internal" Abstract="false" Base="System!System.Containment">
        <Source>GroupPopulation.Team1Group</Source>
        <Target>Windows!Microsoft.Windows.Computer</Target>
      </RelationshipType>
      <RelationshipType ID="GroupPopulation.Team2GroupContainsWindowsComputers" Accessibility="Internal" Abstract="false" Base="System!System.Containment">
        <Source>GroupPopulation.Team2Group</Source>
        <Target>Windows!Microsoft.Windows.Computer</Target>
      </RelationshipType>
      <RelationshipType ID="GroupPopulation.Team3GroupContainsWindowsComputers" Accessibility="Internal" Abstract="false" Base="System!System.Containment">
        <Source>GroupPopulation.Team3Group</Source>
        <Target>Windows!Microsoft.Windows.Computer</Target>
      </RelationshipType>
    </RelationshipTypes>
  </EntityTypes>
</TypeDefinitions>

 

 

Section 3:  <Monitoring>

In this section – we define the discovery, and add the script to run.  In this example – I am running a VBscript with a SQL query to the CMDB.  You will need to modify the group name – just like you did above.  This is where we create a discovery to populate each group – so we will need one of these sections for each group in the MP.  Each of these sections will run a distinct script, with a different query, depending on which computers you want populated.

I bolded in RED the group name sections you will need to modify, just like we did above… and you will need to modify the SQL DB information, and the script name.

I set the discovery time to every 3600 seconds in this example…. you should probably set this to once or twice a day max…. no need to keep re-running it for groups that wont change that often.

  <Monitoring>
    <Discoveries>
      <Discovery ID="Team1Group.Discovery" Enabled="true" Target="SC!Microsoft.SystemCenter.RootManagementServer" ConfirmDelivery="true" Remotable="true" Priority="Normal">
        <Category>Discovery</Category>
        <DiscoveryTypes>
          <DiscoveryClass TypeID="GroupPopulation.Team1Group" />
        </DiscoveryTypes>
        <DataSource ID="DS" TypeID="Windows!Microsoft.Windows.TimedScript.DiscoveryProvider">
          <IntervalSeconds>3600</IntervalSeconds>
          <SyncTime />
          <ScriptName>Team1GroupDiscovery.vbs</ScriptName>
          <Arguments>$MPElement$ $Target/Id$</Arguments>
          <ScriptBody><![CDATA[Dim SourceId
Dim objConnection
Dim oRS
Dim sConnectString
Dim ManagedEntityID
Dim oAPI
Dim oDiscoveryData
SourceId                = WScript.Arguments(0)
ManagedEntityId         = WScript.Arguments(1)
Set oAPI                = CreateObject("MOM.ScriptAPI")
Set oDiscoveryData      = oAPI.CreateDiscoveryData(0,SourceId,ManagedEntityId)
sConnectString = "Driver={SQL Server}; Server=OMDW; Database=CMDB;"
Set objConnection = CreateObject("ADODB.Connection")
objConnection.Open sConnectString
Set oRS = CreateObject("ADODB.Recordset")
oRS.Open "select ServerName from ServerList where MonitorGroup = 'Team 1'", objConnection
Set groupInstance = oDiscoveryData.CreateClassInstance("$MPElement[Name='GroupPopulation.Team1Group']$")
While Not oRS.EOF
Set serverInstance = oDiscoveryData.CreateClassInstance("$MPElement[Name='Windows!Microsoft.Windows.Computer']$")
serverInstance.AddProperty "$MPElement[Name='Windows!Microsoft.Windows.Computer']/PrincipalName$",oRS.Fields("ServerName")
Set relationshipInstance = oDiscoveryData.CreateRelationshipInstance("$MPElement[Name='GroupPopulation.Team1GroupContainsWindowsComputers']$")
relationshipInstance.Source = groupInstance
relationshipInstance.Target = serverInstance
oDiscoveryData.AddInstance relationshipInstance
oRS.MoveNext
Wend
objConnection.Close
Call oAPI.Return(oDiscoveryData)
                     ]]></ScriptBody>
          <TimeoutSeconds>120</TimeoutSeconds>
        </DataSource>
      </Discovery>

 

 

Section 4:  <LanguagePacks>

Here – we will take ALL of the modifications we made in the prior three steps, and match them up with friendly names to show in the UI:

Essentially – just modify the MP name, group name, relationship name, and discovery name, to match what you did above…. and assign each a friendly name that you want to see in the UI.  I will bold in red the sections to modify, for the “Team 1 servers”.  You would continue this for as many groups as you used:

<LanguagePacks>
  <LanguagePack ID="ENU" IsDefault="true">
    <DisplayStrings>
      <DisplayString ElementID="SQLBasedGroupDemo">
        <Name>SQL Based Group Population Demo MP</Name>
      </DisplayString>

      <DisplayString ElementID="GroupPopulation.Team1Group">
        <Name>Team 1 Servers Group</Name>
      </DisplayString>
      <DisplayString ElementID="GroupPopulation.Team1GroupContainsWindowsComputers">
        <Name>Team 1 SQL Based Group Contains Windows Computers</Name>
      </DisplayString>
      <DisplayString ElementID="Team1Group.Discovery">
        <Name>Team 1 SQL Based Group Discovery</Name>
        <Description />
      </DisplayString>

      <DisplayString ElementID="GroupPopulation.Team2Group">
        <Name>Team 2 Servers Group</Name>
      </DisplayString>
      <DisplayString ElementID="GroupPopulation.Team2GroupContainsWindowsComputers">
        <Name>Team 2 SQL Based Group Contains Windows Computers</Name>
      </DisplayString>
      <DisplayString ElementID="Team2Group.Discovery">
        <Name>Team 2 SQL Based Group Discovery</Name>
        <Description />
      </DisplayString>

      <DisplayString ElementID="GroupPopulation.Team3Group">
        <Name>Team 3 Servers Group</Name>
      </DisplayString>
      <DisplayString ElementID="GroupPopulation.Team3GroupContainsWindowsComputers">
        <Name>Team 3 SQL Based Group Contains Windows Computers</Name>
      </DisplayString>
      <DisplayString ElementID="Team3Group.Discovery">
        <Name>Team 3 SQL Based Group Discovery</Name>
        <Description />
      </DisplayString>

    </DisplayStrings>
  </LanguagePack>
</LanguagePacks>

That’s it!  If you get errors trying to import – you most likely modified a definition incompletely…. the import error should help you figure out what's wrong.

Now I can go to my groups – find “Team 1 Group” and see if it is populated:  SUCCESS!

image

 

 

I am attaching my working sample MP below

Maintenance mode – tying the text of the category to the database

$
0
0

I havent seen this discussed before – so I figured I would post this.

In the OpsDB and DWDB – we keep some tables names MaintenanceMode and MaintenanceModeHistory.

When you place an object into maintenance mode – we will log a row in the database for this object.  You could potentially write reports against this data in the data warehouse, and report on MM history. 

From the following query: 

select * from dbo.vMaintenanceModeHistory

There is a column named “ReasonCode”  This has a numeric value.  However – in the UI – this correlates to the “Category” that you must select when you place an object into MM:

 

image

 

 

Here is a table which sorts out the numeric reason code vs the text in the UI:

 

Other (Planned)

0

Other (Unplanned)

1

Hardware: Maintenance (Planned)

2

Hardware: Maintenance (Unplanned)

3

Hardware: Installation (Planned)

4

Hardware: Installation (Unplanned)

5

Operating System: Reconfiguration (Planned)

6

Operating System: Reconfiguration (Unplanned)

7

Application: Maintenance (Planned)

8

Application: Maintenance (Unplanned)

9

Application: Installation (Planned)

10

Application: Unresponsive

11

Application:  Unstable

12

Security Issue

13

Loss of network connectivity (Unplanned)

14

Understanding and modifying Data Warehouse retention and grooming

$
0
0

You will likely find that the default retention in the OpsMgr data warehouse will need to be adjusted for your environment.  I often find customers are reluctant to adjust these – because they don't know what they want to keep.  So – they assume the defaults are good – and they just keep EVERYTHING. 

This is a bad idea. 

A data warehouse will often be one of the largest databases supported by a company.  Large databases cost money.  They cost money to support.  They are more difficult to maintain.  They cost more to backup in time, tape capacity, network impact, etc.  They take longer to restore in the case of a disaster.  The larger they get, the more they cost in hardware (disk space) to support them.  The larger they get, can impact how long reports take to complete.

For these reasons – you should give STRONG consideration to reducing your warehouse retention to your reporting REQUIREMENTS.  If you don't have any – MAKE SOME!

Originally – when the product released – you had to directly edit SQL tables to adjust this.  Then – a command line tool was released to adjust these values – making the process easier and safer.  This post is just going to be a walk through of this process to better understand using this tool – and what each dataset actually means.

Here is the link to the command line tool: 

http://blogs.technet.com/momteam/archive/2008/05/14/data-warehouse-data-retention-policy-dwdatarp-exe.aspx

 

Different data types are kept in the Data Warehouse in unique “Datasets”.  Each dataset represents a different data type (events, alerts, performance, etc..) and the aggregation type (raw, hourly, daily)

Not every customer will have exactly the same data sets.  This is because some management packs will add their own dataset – if that MP has something very unique that it will collect – that does not fit into the default “buckets” that already exist.

 

So – first – we need to understand the different datasets available – and what they mean.  All the datasets for an environment are kept in the “Dataset” table in the Warehouse database.

select * from dataset
order by DataSetDefaultName

This will show us the available datasets.  Common datasets are:

Alert data set
Client Monitoring data set
Event data set
Microsoft.Windows.Client.Vista.Dataset.ClientPerf
Microsoft.Windows.Client.Vista.Dataset.DiskFailure
Microsoft.Windows.Client.Vista.Dataset.Memory
Microsoft.Windows.Client.Vista.Dataset.ShellPerf
Performance data set
State data set

Alert, Event, Performance, and State are the most common ones we look at.

 

However – in the warehouse – we also keep different aggregations of some of the datasets – where it makes sense.  The most common datasets that we will aggregate are Performance data, State data, and Client Monitoring data (AEM).  The reason we have raw, hourly, and daily aggregations – is to be able to keep data for longer periods of time – but still have very good performance on running reports.

In MOM 2005 – we used to stick ALL the raw performance data into a single table in the Warehouse.  After a year of data was reached – this meant the perf table would grow to a HUGE size – and running multiple queries against this table would be impossible to complete with acceptable performance.  It also meant grooming this table would take forever, and would be prone to timeouts and failures.

In OpsMgr – now we aggregate this data into hourly and daily aggregations.  These aggregations allow us to “summarize” the performance, or state data, into MUCH smaller table sizes.  This means we can keep data for a MUCH longer period of time than ever before.  We also optimized this by splitting these into multiple tables.  When a table reaches a pre-determined size, or number of records – we will start a new table for inserting.  This allows grooming to be incredibly efficient – because now we can simply drop the old tables when all of the data in a table is older than the grooming retention setting.

 

Ok – that’s the background on aggregations.  To see this information – we will need to look at the StandardDatasetAggregation table.

select * from StandardDatasetAggregation

That table contains all the datasets, and their aggregation settings.  To help make more sense of this -  I will join the dataset and the StandardDatasetAggregation tables in a single query – to only show you what you need to look at:

SELECT DataSetDefaultName,
AggregationTypeId,
MaxDataAgeDays
FROM StandardDatasetAggregation sda
INNER JOIN dataset ds on ds.datasetid = sda.datasetid
ORDER BY DataSetDefaultName

This query will give us the common dataset name, the aggregation type, and the current maximum retention setting.

For the AggregationTypeId:

0 = Raw

20 = Hourly

30 = Daily

Here is my output:

DataSetDefaultName AggregationTypeId MaxDataAgeDays
Alert data set 0 400
Client Monitoring data set 0 30
Client Monitoring data set 30 400
Event data set 0 100
Microsoft.Windows.Client.Vista.Dataset.ClientPerf 0 7
Microsoft.Windows.Client.Vista.Dataset.ClientPerf 30 91
Microsoft.Windows.Client.Vista.Dataset.DiskFailure 0 7
Microsoft.Windows.Client.Vista.Dataset.DiskFailure 30 182
Microsoft.Windows.Client.Vista.Dataset.Memory 0 7
Microsoft.Windows.Client.Vista.Dataset.Memory 30 91
Microsoft.Windows.Client.Vista.Dataset.ShellPerf 0 7
Microsoft.Windows.Client.Vista.Dataset.ShellPerf 30 91
Performance data set 0 10
Performance data set 20 400
Performance data set 30 400
State data set 0 180
State data set 20 400
State data set 30 400

 

You will probably notice – that we only keep 10 days of RAW Performance by default.  Generally – you don't want to mess with this.  This is simply to keep a short amount of raw data – to build our hourly and daily aggregations from.  All built in performance reports in SCOM run from Hourly, or Daily aggregations by default.

 

Now we are cooking!

Fortunately – there is a command line tool published that will help make changes to these retention periods, and provide more information about how much data we have currently.  This tool is called DWDATARP.EXE.  It is available for download HERE.

This gives us a nice way to view the current settings.  Download this to your tools machine, your RMS, or directly on your warehouse machine.  Run it from a command line.

Run just the tool with no parameters to get help:    

C:\>dwdatarp.exe

To get our current settings – run the tool with ONLY the –s (server\instance) and –d (database) parameters.  This will output the current settings.  However – it does not format well to the screen – so output it to a TXT file and open it:

C:\>dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW > c:\dwoutput.txt

Here is my output (I removed some of the vista/client garbage for brevity)

 

Dataset name Aggregation name Max Age Current Size, Kb
Alert data set Raw data 400 18,560 ( 1%)
Client Monitoring data set Raw data 30 0 ( 0%)
Client Monitoring data set Daily aggregations 400 16 ( 0%)
Configuration dataset Raw data 400 153,016 ( 4%)
Event data set Raw data 100 1,348,168 ( 37%)
Performance data set Raw data 10 467,552 ( 13%)
Performance data set Hourly aggregations 400 1,265,160 ( 35%)
Performance data set Daily aggregations 400 61,176 ( 2%)
State data set Raw data 180 13,024 ( 0%)
State data set Hourly aggregations 400 305,120 ( 8%)
State data set Daily aggregations 400 20,112 ( 1%)

 

Right off the bat – I can see how little data that daily performance actually consumes.  I can see how much data that only 10 days of RAW perf data consume.  I also see a surprising amount of event data consuming space in the database.  Typically – you will see that perf hourly will consume the most space in a warehouse.

 

So – with this information in hand – I can do two things….

  • I can know what is using up most of the space in my warehouse.
  • I can know the Dataset name, and Aggregation name… to input to the command line tool to adjust it!

 

Now – on to the retention adjustments.

 

First thing – I will need to gather my Reporting service level agreement from management.  This is my requirement for how long I need to keep data for reports.  I also need to know “what kind” of reports they want to be able to run for this period.

From this discussion with management – we determined:

  • We require detailed performance reports for 90 days (hourly aggregations)
  • We require less detailed performance reports (daily aggregations) for 1 year for trending and capacity planning.
  • We want to keep a record of all ALERTS for 6 months.
  • We don't use any event reports, so we can reduce this retention from 100 days to 30 days.
  • We don't use AEM (Client Monitoring Dataset) so we will leave this unchanged.
  • We don't report on state changes much (if any) so we will set all of these to 90 days.

Now I will use the DWDATARP.EXE tool – to adjust these values based on my company reporting SLA:

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "Performance data set" -a "Hourly aggregations" -m 90

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "Performance data set" -a "Daily aggregations" -m 365

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "Alert data set" -a "Raw data" -m 180

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "Event data set" -a "Raw Data" -m 30

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "State data set" -a "Raw data" -m 90

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "State data set" -a "Hourly aggregations" -m 90

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "State data set" -a "Daily aggregations" -m 90

 

Now my table reflects my reporting SLA – and my actual space needed in the warehouse will be much reduced in the long term:

 

Dataset name Aggregation name Max Age Current Size, Kb
Alert data set Raw data 180 18,560 ( 1%)
Client Monitoring data set Raw data 30 0 ( 0%)
Client Monitoring data set Daily aggregations 400 16 ( 0%)
Configuration dataset Raw data 400 152,944 ( 4%)
Event data set Raw data 30 1,348,552 ( 37%)
Performance data set Raw data 10 468,960 ( 13%)
Performance data set Hourly aggregations 90 1,265,992 ( 35%)
Performance data set Daily aggregations 365 61,176 ( 2%)
State data set Raw data 90 13,024 ( 0%)
State data set Hourly aggregations 90 305,120 ( 8%)
State data set Daily aggregations 90 20,112 ( 1%)

 

Here are some general rules of thumb (might be different if your environment is unique)

  • Only keep the maximum retention of data in the warehouse per your reporting requirements.
  • Do not modify the performance RAW dataset.
  • Most performance reports are run against Perf Hourly data for detail performance throughout the day.  For reports that span long periods of time (weeks/months) you should generally use Daily aggregation.
  • Daily aggregations should generally be kept for the same retention as hourly – or longer.
  • Hourly datasets use up much more space than daily aggregations.
  • Most people don't use events in reports – and these can often be groomed much sooner than the default of 100 days.
  • Most people don't do a lot of state reporting beyond 30 days, and these can be groomed much sooner as well if desired.
  • Don't modify a setting if you don't use it.  There is no need.
  • The Configuration dataset generally should not be modified.  This keeps data about objects to report on, in the warehouse.  It should be set to at LEAST the longest of any perf, alert, event, or state datasets that you use for reporting.

Installing SQL 2008 into a Windows 2008R2 failover cluster? Slipstream SP1 into the SQL media!

$
0
0

Today I was setting up a new lab server for SQL 2008 x64 on a new Windows 2008R2 x64 failover cluster.

It wasn't exactly what I would call fun.  :-)  I imagine this is because Server 2008R2 released after SQL 2008 did… so you have to do a little “running around” to get SQL 2008 installed in a Server 2008 Failover cluster.

 

I am using my typical iSCSI solution for building a cluster under Hyper-V documented here:

http://blogs.technet.com/kevinholman/archive/2008/10/20/setting-up-a-2-node-server-2008-failover-cluster-under-hyperv.aspx

 

However – I hit all the snags discussed in this KB article:

http://support.microsoft.com/kb/955725/EN-US

 

The solution to all this is to create a new SQL media source, and slipstream the SQL 2008 SP1 update into the SQL 2008 RTM media.  This is documented here:   http://support.microsoft.com/kb/955392

 

Creating a SP1 Slipstream into the SQL 2008 RTM media:

---------------------------------------------------------------------------

  • Copy the original SQL Server 2008 RTM source media to c:\SQLServer2008_FullSP1.
  • Download the Service Pack 1 package to a new folder, C:\SQLSP1. The package names are as follows:
    • SQLServer2008SP1-KB968369-IA64-ENU.exe
    • SQLServer2008SP1-KB968369-x64-ENU.exe
    • SQLServer2008SP1-KB968369-x86-ENU.exe
  • Extract the packages as follows:
    • C:\SQLSP1\SQLServer2008SP1-KB968369-IA64-ENU.exe /x:c:\SQLServer2008_FullSP1\PCU
    • C:\SQLSP1\SQLServer2008SP1-KB968369-x64-ENU.exe /x:c:\SQLServer2008_FullSP1\PCU
    • C:\SQLSP1\SQLServer2008SP1-KB968369-x86-ENU.exe /x:c:\SQLServer2008_FullSP1\PCU
    Note Make sure that you complete this step for all architectures to ensure the original media is updated correctly.
  • Run the following commands to copy the Setup.exe file and the Setup.rll file from the extracted location to the original source media location.
  • robocopy C:\SQLServer2008_FullSP1\PCU c:\SQLServer2008_FullSP1 Setup.exe
    robocopy C:\SQLServer2008_FullSP1\PCU c:\SQLServer2008_FullSP1 Setup.rll

  • Run the following commands to copy all files (not the folders), except the Microsoft.SQL.Chainer.PackageData.dll file, in C:\SQLServer2008_FullSP1\PCU\Architecture to C:\SQLServer2008_FullSP1 \Architecture to update the original files.

  • robocopy C:\SQLServer2008_FullSP1\pcu\x86 C:\SQLServer2008_FullSP1\x86 /XF Microsoft.SQL.Chainer.PackageData.dll
    robocopy C:\SQLServer2008_FullSP1\pcu\x64 C:\SQLServer2008_FullSP1\x64 /XF Microsoft.SQL.Chainer.PackageData.dll
    robocopy C:\SQLServer2008_FullSP1\pcu\ia64 C:\SQLServer2008_FullSP1\ia64 /XF Microsoft.SQL.Chainer.PackageData.dll

    Determine if you have the Defaultsetup.ini file in the following folders:

    • C:\SQLServer2008_FullSP1\x86
    • C:\SQLServer2008_FullSP1\x64
    • C:\SQLServer2008_FullSP1\ia64

    If you have the Defaultsetup.ini file in the folders, open the Defaultsetup.ini file, and then add PCUSOURCE=".\PCU" to the end of the file as follows:

         ;SQLSERVER2008 Configuration File
         [SQLSERVER2008]
         ...
         PCUSOURCE=".\PCU"

    If you do not have the Defaultsetup.ini file in the folders, create the Defaultsetup.ini file in the folders, and add the following content to the file:

        ;SQLSERVER2008 Configuration File
        [SQLSERVER2008]
        PCUSOURCE=".\PCU"

    Note This file tells the Setup program where to locate the SP1 source media that you extracted in step 3.

    -----------------------------------------------------------------------------

     

     

    This completes the slipstream for SP1 SQL 2008 media.

    HOWEVER…. there is still another issue when installing the cluster, which requires a Cumulative Update for SQL.  When installing the cluster, you might see this message:

     

    "Invoke or BeginInvoke cannot be called on a control until the window handle has been created."

    image

     

    This issue is documented here:  http://support.microsoft.com/kb/975055/

    Just start setup again… and you can get past this error.  Then – when complete with all the steps and nodes – apply the latest SQL 2008 SP1 cumulative update from http://support.microsoft.com/kb/970365/  (Alternatively – you could slipstream in the current CU for SQL 2008 SP1)

    I also kept getting an error about my network bindings not being correct – even though they are… this is due to this being a Hyper-V guess, and the registry contains information about previously detected network adapters that were in my base image.  I just ignored this and moved on – but if you care you can check out http://support.microsoft.com/kb/955963

     

    Next up – you have to install SQL 2008 slipstreamed SP1 on the other nodes in the cluster.  In SQL 2005 – setup configured all nodes with SQL.  In SQL 2008 – you must run setup on all nodes that will be possible owners of the SQL instance.  This part went smoothly….

     

    Lastly – I will apply Cumulative Update 7 for SQL 2008 SP1 (the current one at the time of this writing).  I apply it to the passive node, then fail the cluster over, and apply it to the other node – it goes smoothly.  Generally – following this procedure:  http://support.microsoft.com/kb/958734

    Management group checkup – a database perspective

    $
    0
    0

    Attached you will find a PowerPoint slide deck that I used to present to the System Center Virtual Users Group meeting on June 11th.

    This discussion was looking at your management group overall health from a database perspective.  There are many facets to the health of SCOM, this is just one perspective.  I discussed SQL back-end configuration and best practices, and the importance and methodology of tuning.  I also discussed what to look for in the database to discover unknown issues that might be impacting you.  Out of scope for this conversation, but important, are sizing and performance metrics of the SQL server.

     

    You can watch the presentation from a link here:

    http://www.systemcentercentral.com/UserGroups/tabid/120/view/groupdetail/groupid/13/Default.aspx

     

    In the slide deck – I have commented each slide with notes and details to provide more data on each condition.  My goal was for others to be able to use this as a “pocket handbook” when looking at their SCOM database servers from a configuration and tuning perspective. 

    Why do I have duplicate SQL databases or logical disks in the console after a version upgrade?

    $
    0
    0

    This is a rare but interesting scenario… which can cause you to see and monitor duplicate objects (and get duplicate alerts) for specific types of discovered hosted objects that have a parent class which was upgraded from one version to another.

    For instance – if you upgrade SQL 2005 > SQL 2008, or Windows 2000 > Windows 2003.  Some of the child objects could potentially get “left behind” and I will explain how, using the SQL upgrade scenario.

     

    When you upgrade a monitor SQL instance from 2005 > 2008, a few things must happen in the MP.  We need to discover the SQL instance as SQL 2008, which while the same instance technically, is seen as a new object of a new class to OpsMgr.  We also need to “undiscover” the SQL 2005 instance, and delete those previously discovered objects hosted by that class.

    So… when you upgrade SQL from 2005 > 2008, IF (big IF) your SQL 2005 discovery runs FIRST, before we discover SQL 2008, then all will be well.  We will undiscover the SQL 2005 DB engine instance, and mark all hosted objects (i.e. SQL Databases) as “deleted”, since they cannot exist without the parent SQL DB engine.

    However – IF the SQL 2008 discovery runs FIRST… for a short time OpsMgr will have two discovered instances of SQL DB engine… until the SQL 2005 engine discovery runs and removes the SQL 2005 discovered DB engine from OpsMgr.  In THIS scenario… what you will see… is that the SQL 2005 DB engine is indeed gone….  but the SQL 2005 databases which were hosted by the SQL 2005 DB engine remain.  This is BAD. 

    The reason? 

     

    The issue here is how the MP was constructed… and this could affect any multiple-version OS or application that supports an upgrade scenario.  Each version specific database engine uses “SQL.DBEngine” as its base class.  Each version specific SQL DB uses SQL.Database as its base class.

    The *relationship* of “SQL DBEngine hosts SQL Database” is placed on the base classes… not on each version specific class.  The version specific classes inherit these relationships form the base class.

    image

     

    Therefore, what happens is, that when NO DBEngine exists – we MUST delete the SQL databases associated.  But when we discover SQL 2005 and SQL 2008 at the same time, once SQL 2005 DB engine is deleted – we don't delete the SQL 2005 databases, because technically a SQL DB engine exists.

    Ok – that's kind of confusing as to why this happens…. but just understand that this scenario is possible until the MP author figures out a way to better support the upgrade condition and writes that into the discovery model.

     

    This doesn't just affect SQL servers.  This has also been seen when upgrading the OS from one version to another, and keeping old Logical Disks around which were associated with the previous OS version.

     

    What can I do about this???

    That said – we can develop some best practices to keep this from happening.

     

    The best recommendation I can make is when you KNOW you are doing a major upgrade of an OS or core application like this…. uninstall the agent first, instead of using maintenance mode.  When you reinstall the agent after the upgrade, only the NEW stuff will be discovered.

    If you need to recover from this condition for an existing agent… simply delete the agent from agent managed.  This will mark all hosted objects as deleted in the database.  Then – approve it as a manually installed agent.  This will only discover existing applications/objects.  (alternatively, you could run an uninstall/reinstall of the agent *from the console*)  The key step to to make sure that the agent is NOT present in agent managed at some point.

    Moving the Data Warehouse Database and Reporting server to new hardware–my experience

    $
    0
    0

    The time has come to move my Warehouse Database and OpsMgr Reporting Server role to a new server in my lab.  Today – both roles are installed on a single server (named OMDW).  This server is running Windows Server 2008 SP2 x86, and SQL 2008 SP1 DB engine and SQL Reporting (32bit to match the OS).  This machine is OLD, and only has 2GB of memory, so it is time to move it to a 64bit capable machine with 8GB of RAM.  The old server was really limited by the available memory, even for testing in a small lab.  As I do a lot of demo’s in this lab – I need reports to be a bit snappier.

    The server it will be moving to is running Server 2008 R2 (64bit only) and SQL 2008 SP1 (x64).  Since Operations Manager 2007 R2 does not yet support SQL 2008R2 at the time of this writing – we will stick with the same SQL version.

     

    We will be using the OpsMgr doco – from the Administrators Guide:

    http://technet.microsoft.com/en-us/library/cc540402.aspx

     

    So – I map out my plan. 

    1. I will move the warehouse database.
    2. I will test everything to ensure it is functional and working as hoped.
    3. I will move the OpsMgr Reporting role.
    4. I will test everything to ensure it is functional and working as hoped.

     

    Move the Data Warehouse DB:

    Using the TechNet documentation, I look at the high level plan:

    1. Stop Microsoft System Center Operations Manager 2007 services to prevent updates to the OperationsManagerDW database during the move.
    2. Back up the OperationsManagerDW database to preserve the data that Operations Manager has already collected from the management group.
    3. Uninstall the current Data Warehouse component, and delete the OperationsManagerDW database.
    4. Install the Reporting Data Warehouse component on the new Data Warehouse server.
    5. Restore the original OperationsManagerDW database.
    6. Configure Operations Manager to use the OperationsManagerDW database on the new Data Warehouse server.
    7. Restart Operations Manager services.

     

    Sounds easy enough.  (gulp)

     

    • I start with step 1 – stopping all RMS and MS core services.
    • I then take a fresh backup of the DW DB and master.  This is probably one of the most painful steps – as on a large warehouse – this can be a LONG time to wait while my whole management group is down.
    • I then uninstall the DW component from the old server (OMDW) per the guide.
    • I then (gasp) delete the existing OperationsManagerDW database.
    • I install the DW component on the new server (SQLDW1).
    • I delete the newly created and empty OperationsManagerDW database from SQLDW1.
    • I then need to restore the backup I just recently took of the warehouse DB to my new server.  The guide doesn’t give any guidance on these procedures – this is a SQL operations and you would use standard SQL backup/restore procedures here – nothing OpsMgr specific.  I am not a SQL guy – but I figure this out fairly easily.
    • Next up is step 8 in the online guide – “On the new Data Warehouse server, use SQL Management Studio to create a login for the System Center Data Access Service account, the Data Warehouse Action Account, and the Data Reader Account.”  Now – that’s a little bogus documentation.  The first one is simple enough – that is the “SDK” account that we used when we installed OpsMgr.  The second one though – that isnt a real account.  When we installed Reporting – we were asked for two accounts – the "reader” and “write” accounts.  The above referenced Data Warehouse Action Account is really your “write” account.  If you aren't sure – then there is a Run-As profile for this that you can see what credentials you used.
    • I then map my logins I created to the appropriate rights they should have per the guide.  Actually – since I created the logins with the same names – mine were already mapped!
    • I start the Data Access (SDK) service ONLY on the RMS
    • I modify the reporting server data warehouse main datasource in reporting.
    • I edit the registry on the current Reporting server (OMDW) and have to create a new registry value for DWDBInstance per the guide – since it did not exist on my server yet.  I fill it in with “SQLDW1\I01” since that is my servername\instancename
    • I edit my table in the OpsDB to point to the new Warehouse DB servername\instance
    • I edit my table in the DWDB to point to the new Warehouse DB servername\instance
    • I start up all my services.

    Now – I follow the guidance in the guide to check to make sure the move is a success.  Lots of issues can break this – missing a step, misconfiguring SQL rights, firewalls, etc.  When I checked mine – it was actually failing.  Reports would run – but lots of failed events on the RMS and management servers.  Turns out I accidentally missed a step – editing the DW DB table for the new name.  Once I put that in and bounced all the services again – all was well and working fine.

     

    Now – on to moving the OpsMgr Reporting role!

     

    Using the TechNet documentation, I look at the high level plan:

    1. Back up the OperationsManagerDW database.
    2. Note the accounts that are being used for the Data Warehouse Action Account and for the Data Warehouse Report Deployment Account. You will need to use the same accounts later, when you reinstall the Operations Manager reporting server.
    3. Uninstall the current Operations Manager reporting server component.
    4. Restore the original OperationsManagerDW database.
    5. If you are reinstalling the Operations Manager reporting server component on the original server, run the ResetSRS.exe tool to clean up and prepare the reporting server for the reinstallation.
    6. Reinstall the Operations Manager reporting server component.

     

    Hey – even fewer steps than moving the database! 

    ***A special note – if you have authored/uploaded CUSTOM REPORTS that are not deployed/included within a management pack – these will be LOST when you follow these steps.  Make sure you export any custom reports to RDL file format FIRST, so you can bring those back into your new reporting server.

     

    • I back up my DataWarehouse database.  This step isn't just precautionary – it is REQUIRED.  When we uninstall the reporting server from the old server – it modifies the Warehouse DB in such a way that we cannot use – and must return it to the original state before we modified anything – in preparation for the new installation of OpsMgr Reporting on the new server.
    • Once I confirm a successful backup, I uninstall OpsMgr R2 Reporting from my old reporting server.
    • Now I restore my backup of the OperationsManagerDW database I just took prior to the uninstall of OpsMgr reporting.  My initial attempts at a restore failed – because the database was in use.  I needed to kill the connections to this database which were stuck from the RMS and MS servers.
    • I am installing OpsMgr reporting on a new server, so I can skip step 4.
    • In steps 5-10, I confirm that my SQL reporting server is configured and ready to roll.  Ideally – this should have already been done BEFORE we took down reporting in the environment.  This really is a bug in the guide – you should do this FIRST – BEFORE event starting down this road.  If something was broken, we don’t want to be fixing it while reporting is down for all our users.
    • In step 11, I kick of the Reporting server role install.  Another bug in the guide found:  they tell us to configure the DataWarehouse component to “this component will not be available”  That is incorrect.  That would ONLY be the case if we were moving the OpsMgr reporting server to a stand alone SRS?Reporting ONLY server.  In my case – I am moving reporting to a server that contains the DataWarehouse component – so this should be left alone.  I then chose my SQL server name\instance, and type in the DataWarehouse write and reader accounts.  SUCCESS!!!!

    Now – I follow the guide and verify that reporting is working as designed.

    Mine (of course) was failing – I got the following error when trying to run a report:

     

    Date: 8/24/2010 5:49:27 PM
    Application: System Center Operations Manager 2007 R2
    Application Version: 6.1.7221.0
    Severity: Error
    Message: Loading reporting hierarchy failed.

    System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.10.10.12:80
       at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
       at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Int32 timeout, Exception& exception)
       --- End of inner exception stack trace ---
       at System.Net.HttpWebRequest.GetRequestStream(TransportContext& context)
       at System.Net.HttpWebRequest.GetRequestStream()
       at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters)
       at Microsoft.EnterpriseManagement.Mom.Internal.UI.Reporting.ReportingService.ReportingService2005.ListChildren(String Item, Boolean Recursive)
       at Microsoft.EnterpriseManagement.Mom.Internal.UI.Reporting.ManagementGroupReportFolder.GetSubfolders(Boolean includeHidden)
       at Microsoft.EnterpriseManagement.Mom.Internal.UI.Reporting.WunderBar.ReportingPage.LoadReportingSubtree(TreeNode node, ManagementGroupReportFolder folder)
       at Microsoft.EnterpriseManagement.Mom.Internal.UI.Reporting.WunderBar.ReportingPage.LoadReportingTree(ManagementGroupReportFolder folder)
       at Microsoft.EnterpriseManagement.Mom.Internal.UI.Reporting.WunderBar.ReportingPage.LoadReportingTreeJob(Object sender, ConsoleJobEventArgs args)
    System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.10.10.12:80
       at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
       at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Int32 timeout, Exception& exception)

     

    The key area of this is highlighted in yellow above.  I forgot to open a rule in my Windows Firewall on the reporting server to allow access to port 80 for web reporting.  DOH!

    Now – over the next hour – I should see all my reports from all my MP’s trickle back into the reporting server and console.

     

    Relatively pain free.  Smile

    Moving the Operations Database–My Experience

    $
    0
    0

    The time has come to move my Operations Database Server role to a new server in my lab.  Today – this is installed on a single server (named OMDB).  This server is running Windows Server 2008 SP2 x86, and SQL 2008 SP1 DB engine (32bit to match the OS).  This machine is OLD, and only has 2GB of memory, so it is time to move it to a 64bit capable machine with 4GB of RAM. 

    The server it will be moving to is running Server 2008 R2 (64bit only) and SQL 2008 SP1 (x64).  Since Operations Manager 2007 R2 does not yet support SQL 2008R2 at the time of this writing – we will stick with the same SQL version.

     

    We will be using the OpsMgr doco – from the Administrators Guide:

    http://technet.microsoft.com/en-us/library/cc540384.aspx

     

    So – I map out my plan, based on the guide from Technet:

    1. Back up the OperationsManager database.
    2. Uninstall the OperationsManager database.
    3. Delete the Operations Manager database.
    4. Restore the OperationsManager database.
    5. Update management servers with the new database server name.
    6. Update the Operations Manager database with the new database server name.
    7. Update the Operations Manager database logins on the new database server. Ensure that for the root management server, the SDK Account and the Action Account are included in the logins and that they have appropriate permissions. If reporting is installed, ensure that the Data Warehouse Action Account has appropriate permissions.
    8. Set ENABLE_BROKER if needed.
    9. Verify that the move is successful by ensuring that the console is displaying valid data.

     

    Seems easy enough.  Let’s get started.

     

    • In step 1, I install and configure SQL on the new server.  I verify I have configured this SQL server with my corporate security standards, and I have SA rights on this server.
    • In step 2 – the guide states to back up the OpsDB.  I DISAGREE with this step.  The reason for this is that step 3 tells us to then stop the OpsMgr services after the backup in step 2, and I feel this is a bad idea.  The reason is simply because in between the time of the backup, and the time we stop all the services – the Management servers are STILL WRITING to both databases.  When we restore our backup to the new server, it will be out of synch with the Warehouse database from an alert and state change event perspective, and this can wreak some havoc on alert detail reports and availability reports.  What we SHOULD do is FIRST stop all the core OpsMgr services on the RMS and all MS, and THEN take the backup/restore of the OpsDB.  This ensures our two database stay in synch.  It looks like the guide didn’t take into consideration the existence of a warehouse DB.  Therefore – MY recommendation for step 2 will be to perform the steps in “Step 3” (stop all services), FIRST.
    • After I confirm all services are stopped – I take a full backup of the Ops DB.
    • In step 4, I uninstall the OpsDB component from the old server.  I get an error about failed to run a SQL script.  Ignored.  :-)
    • In step 5, I (gasp) delete the OperationsManager database.  (if you are concerned – you might consider restoring the backup to the new server first – to make sure the backup/restore works before taking this drastic step.
    • In step 6, I restore my backup of OperationsManager to the new server.
    • In step 7, I edit the registry of each RMS\MS server, with my new server\instance name.  (SQLDB1\I01).  I DO NOT do step 7f.  Step 7F would have use start up the RMS and MS services.  This SHOULD NOT be done – as there is further configuration that should be done first, editing the DB for the correct name, and establishing the correct account rights.  I recommend leaving these services stopped until this is completed.  The services will just error out until these later steps are performed.
    • In step 8 – I modify the database table per the guide.
    • In step 9 – I add my SDK account login to SQL and make sure the mappings are correct.
    • In step 10 – I add my Management Server Action Account login and set/verify permissions.
    • In step 11 – I add my Data Warehouse Action Account login and set/verify permissions.  (Hint – this is your Data Warehouse Write Account)  I wish we didn’t have so many different names for the same things.
    • Last in the guide – I set ENABLE BROKER per the instructions.

    ***  Note – there is a footnote added by a user in the guide to enable CLR.  This is REQUIRED… it is missing from the core guide.

    In order to support regular expressions in Operations Manager SQL queries, the development team needed to create CLR functions that use .NET's RegEx library. Operations Manager Setup configures SQL to allow execution of the CLR code. When the customer in this case moved the database from one SQL server to another they lost this setting in the OperationsManager Database.


    To resolve this issue run the following query on the OperationsManager database:
    sp_configure @configname=clr_enabled, @configvalue=1
    GO

    Let that command execute successfully then run

    RECONFIGURE
    GO

    This will correct the issue above.

     

    • NOW – we can start up our services on the RMS and MS, and check for error events and validate everything is working.

    Mine actually failed.  I forgot to open the SQL program and ports in the Windows Firewall.  I created a rule for the SQLServer.exe program, and another for UDP 1434 (for SQL browser) and all was well.

    I started my services and validated everything is working and no bad events showing up in the RMS/MS event logs.

     

    *** Note – there is an issue caused by moving the database that needs to be corrected in SQL on the new SQL server.  See:  http://blogs.technet.com/b/kevinholman/archive/2010/10/26/after-moving-your-operationsmanager-database-you-might-find-event-18054-errors-in-the-sql-server-application-log.aspx

    Viewing all 158 articles
    Browse latest View live


    <script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>