Production Support Tips & Tricks #1 – Collecting Log Data

Early this year (2012) I started working on a presentation, it would be my first, that I hoped to submit to UKOUG. The thrust of the presentation was to be tips on making your experiences with Oracle Support more pleasant, to help keep your support analyst busy rather than yourself. A prospective title was “with Support like this who needs enemies” – perhaps that’s a big strong ;-). Several things colluded to make it unlikely I would get to present it so I faltered and things ground to a halt. After a period of inactivity I have decided to convert it into a short series of blog posts. This is the first. Part 2 is here – “Production Support Tips & Tricks #2 – SQL Trace

This post contains some advice for collecting log data when raising SRs. It’s mostly obvious but hopefully not to all.

ADR Package

What?
You already know so I’m not going to waste my breath.

Why?
Get everything packaged up, not just the trace files you think Oracle need. Avoids repeat requests.

How?
Well covered by others so I’m not going near it:
John Hallas quality UKOUG presentation
Uwe Hesse’s super blog entry

Example?
Nah – see above

Trivia
Not related to diagnostic collection but listener targets don’t auto purge so your housekeeping scripts need to make calls to adrci to force a purge.

Diagcollection.sh for clusters

What?
diagcollection.sh is a script in your CRS home which collates all CRS related log files on the current cluster node.

Why?
It’s not easy manually collecting everything Oracle Support may require. This script makes it easy.

How?
Several options, you can check them with the “-h” option. Or just collect everything:

$ diagcollection.sh

Uncompressed the resulting tar file can be very large

-rw-r--r-- 1 grid oinstall 1.1G Feb 22 21:49 crsData_n02_20120222_2144.tar

Even compressed the file can still be a lengthy upload to M.O.S (multiplied by the # of nodes)

-rw-r--r-- 1 grid oinstall 69M Feb 22 21:49 crsData_n02_20120222_2144.tar.gz

Trivia?
diagcollection.sh is just a wrapper for diagcollection.pl.

OS Watcher Black Box (OSWbb)

What?
A quote from the user guide:

a collection of UNIX shell scripts intended to collect and archive operating system and network metrics to aid support in diagnosing performance issues

Why?
“Because every vendor wants to blame another vendor and OSWbb helps that process”
or
“Because every issue is the fault of the database so you need ammunition to feed to your vendor”
or
“insert your own cynical quote here”

How?
Download from M.O.S – “OS Watcher Black Box User Guide [ID 301137.1]”. It is certified on AIX, Tru64, Solaris, HP-UX, Linux.

It is easy to run:

nohup ./startOSWbb.sh &

easy to stop

./stopOSWbb.sh

and easy to send

$ ./tarupfiles.sh
-rw-r--r-- 1 oracle oinstall 1.2M Feb  8 22:00 osw_archive_0208122216.tar.Z

Example?

You can install OSWbb as a Linux service – “How To Start OSWatcher Black Box Every System Boot [ID 580513.1]” or use any scheduling tool. Alternatvely you can control it via CRS, this way it is only active when the cluster is active which has plus and minus points. For details of this see M.O.S note “Making Applications Highly Available Using Oracle Clusterware [ID 1105489.1]”.

To do it you need an action script, there is a perfectly good demo one in “$GRID_HOME/crs/demo”. Alternatively the one I use for testing at home can be found here – osw.scr (use at your peril).

$GRID_HOME/bin/crsctl add resource osw -type ora.local_resource.type \
 -attr "AUTO_START=always,ACTION_SCRIPT=$GRID_HOME/crs/script/oswbb.scr"
$  $GRID_HOME/bin/crsctl status res osw
NAME=osw
TYPE=ora.local_resource.type
TARGET=ONLINE       , ONLINE
STATE=ONLINE on n01, ONLINE on n02

Trivia?
From “OS Watcher For Windows (OSWFW) User Guide [ID 433472.1]”:

OS Watcher for Windows is no longer supported.
It has been replace by the Cluster Health Monitor.

From “Cluster Health Monitor (CHM) FAQ [ID 1328466.1]”

Is the Cluster Health Monitor replacing OSWatcher?
…there [is] some information such as top, traceroute, and netstat that the Cluster Health Monitor does not collect, so running the Cluster Health Monitor while running OSWatcher is ideal. Both tools complement each other rather than supplement…

In my opinion another reason for still using OSWbb in spite of CHM is that CHM is very difficult to review yourself, it is also not yet the tool of choice for many within Oracle Support. OSWbb still has a place.

Quote from traceroute Unix man page by way of caveat:

Because of the load it could impose on the network, it is unwise to use traceroute during normal operations or from automated scripts.

Hmmmmm…..

“OS Watcher Black Box” was originally called “OS Watcher” but was renamed due to a clash of names with other unrelated, non-Oracle tool(s).

More to follow in the future

Oracle’s Center of Expertise Research Articles

While in My Oracle Support the other day I stumbled across a series of white papers packed full of great information and thought I’d share it. If you have access to M.O.S then search for “Center of Expertise Research Articles” to find a whole host of articles.

The papers are quite old (over a decade in some cases) so I wouldn’t take them as a total source of truth however Oracle Database still has a lot of the same DNA and papers such as “Database Writer and Buffer Management” from 1998 are still a good read.

Now I just need to find time to read some more of them.