Early this year (2012) I started working on a presentation, it would be my first, that I hoped to submit to UKOUG. The thrust of the presentation was to be tips on making your experiences with Oracle Support more pleasant, to help keep your support analyst busy rather than yourself. A prospective title was “with Support like this who needs enemies” – perhaps that’s a big strong ;-). Several things colluded to make it unlikely I would get to present it so I faltered and things ground to a halt. After a period of inactivity I have decided to convert it into a short series of blog posts. This is the first. Part 2 is here – “Production Support Tips & Tricks #2 – SQL Trace”
This post contains some advice for collecting log data when raising SRs. It’s mostly obvious but hopefully not to all.
You already know so I’m not going to waste my breath.
Get everything packaged up, not just the trace files you think Oracle need. Avoids repeat requests.
Nah – see above
Not related to diagnostic collection but listener targets don’t auto purge so your housekeeping scripts need to make calls to adrci to force a purge.
Diagcollection.sh for clusters
diagcollection.sh is a script in your CRS home which collates all CRS related log files on the current cluster node.
It’s not easy manually collecting everything Oracle Support may require. This script makes it easy.
Several options, you can check them with the “-h” option. Or just collect everything:
Uncompressed the resulting tar file can be very large
-rw-r--r-- 1 grid oinstall 1.1G Feb 22 21:49 crsData_n02_20120222_2144.tar
Even compressed the file can still be a lengthy upload to M.O.S (multiplied by the # of nodes)
-rw-r--r-- 1 grid oinstall 69M Feb 22 21:49 crsData_n02_20120222_2144.tar.gz
diagcollection.sh is just a wrapper for diagcollection.pl.
OS Watcher Black Box (OSWbb)
A quote from the user guide:
a collection of UNIX shell scripts intended to collect and archive operating system and network metrics to aid support in diagnosing performance issues
“Because every vendor wants to blame another vendor and OSWbb helps that process”
“Because every issue is the fault of the database so you need ammunition to feed to your vendor”
“insert your own cynical quote here”
Download from M.O.S – “OS Watcher Black Box User Guide [ID 301137.1]”. It is certified on AIX, Tru64, Solaris, HP-UX, Linux.
It is easy to run:
nohup ./startOSWbb.sh &
easy to stop
and easy to send
-rw-r--r-- 1 oracle oinstall 1.2M Feb 8 22:00 osw_archive_0208122216.tar.Z
You can install OSWbb as a Linux service – “How To Start OSWatcher Black Box Every System Boot [ID 580513.1]” or use any scheduling tool. Alternatvely you can control it via CRS, this way it is only active when the cluster is active which has plus and minus points. For details of this see M.O.S note “Making Applications Highly Available Using Oracle Clusterware [ID 1105489.1]”.
To do it you need an action script, there is a perfectly good demo one in “$GRID_HOME/crs/demo”. Alternatively the one I use for testing at home can be found here – osw.scr (use at your peril).
$GRID_HOME/bin/crsctl add resource osw -type ora.local_resource.type \ -attr "AUTO_START=always,ACTION_SCRIPT=$GRID_HOME/crs/script/oswbb.scr" $ $GRID_HOME/bin/crsctl status res osw NAME=osw TYPE=ora.local_resource.type TARGET=ONLINE , ONLINE STATE=ONLINE on n01, ONLINE on n02
From “OS Watcher For Windows (OSWFW) User Guide [ID 433472.1]”:
OS Watcher for Windows is no longer supported.
It has been replace by the Cluster Health Monitor.
From “Cluster Health Monitor (CHM) FAQ [ID 1328466.1]”
Is the Cluster Health Monitor replacing OSWatcher?
…there [is] some information such as top, traceroute, and netstat that the Cluster Health Monitor does not collect, so running the Cluster Health Monitor while running OSWatcher is ideal. Both tools complement each other rather than supplement…
In my opinion another reason for still using OSWbb in spite of CHM is that CHM is very difficult to review yourself, it is also not yet the tool of choice for many within Oracle Support. OSWbb still has a place.
Quote from traceroute Unix man page by way of caveat:
Because of the load it could impose on the network, it is unwise to use traceroute during normal operations or from automated scripts.
“OS Watcher Black Box” was originally called “OS Watcher” but was renamed due to a clash of names with other unrelated, non-Oracle tool(s).
More to follow in the future