How are Oracle background processes renamed?

We all know that when an Oracle instance starts, the background processes are all (or perhaps mostly?) copies of the same executable – $ORACLE_HOME/bin/oracle.

We can see this using the “args” and “comm” format options of the Unix “ps” command. Below we can see the name of the SMON process (“args”) and the name of the source executable (“comm”):

$ ps -p 6971 -o args,comm
COMMAND                     COMMAND
ora_smon_orcl1              oracle

This is common knowledge and when in the past I’ve wondered how this works I’ve always just accepted this Oracle magic and moved on to more important thoughts such as “what’s for lunch?”. Well today my belly is full and I have time to let such thoughts develop!

It turns out this is a quite simple trick. If you have a basic knowledge of the C programming language you’ll be familiar with the argc and argv parameters. e.g.

int main (int argc, char *argv[])

argc – the number of command line arguments
argv – an array of pointers to the command line arguments with argv[0] being the process name.

Surprisingly the values of this array are modifiable. So we can manipulate the name of a running process with a simple sprintf() command (other options are available). e.g.

sprintf(argv[0], "renamed");

My description of this is high level and I’m sure there are risks in doing this, such as overflowing argv[0] into argv[1], but by way of a simple example here is a demo C program:

#include <stdio.h>
void main (int argc, char *argv[])
  int pid;
  pid = (int) getpid ();
  printf("My name is '%s'\n",argv[0]);
  printf("Sleeping for 10 seconds (quick - 'ps -p %d -o args,comm')...\n",pid);
  sprintf(argv[0], "renamed");
  printf("My name is now '%s'\n",argv[0]);
  printf("Sleeping for 10 seconds (quick - 'ps -p %d -o args,comm')...\n",pid);

This program sleeps so I can capture process details from another session using “ps”, renames the running process and then sleeps again so I can capture more details. So let’s compile the C program and see what happens when I run it from one session and capture ps details from another session:

Session 1

$ cc newname.c -o newname
$ ./newname
My name is './newname'
Sleeping for 10 seconds (quick - 'ps -p 7930 -o args,comm')...
My name is now 'renamed'
Sleeping for 10 seconds (quick - 'ps -p 7930 -o args,comm')...

Session 2

$ ps -p 7930 -o args,comm
COMMAND                     COMMAND
./newname                   newname
$ ps -p 7930 -o args,comm
COMMAND                     COMMAND
renamed e                   newname

Pretty cool. Notice how my renamed process is now called “renamed e”. The trailing “e” is left over from the original name of “./newname” thus proving it’s not quite as simple as I suggest and that my C skills are basic. None-the-less I think this is pretty cool.

Unexpected Database Directory Permissions

In some secure environments code deployment privileges are split away from data modification ones. Below is an example user created only to execute DDL for application schema changes.

create user deploy identified by deploy;
grant create any directory to deploy;
grant create any table to deploy;
grant create session to deploy;

And here we see the account being used to make some changes to an application schema, creating a new table and a database directory.

conn deploy/deploy@pdb
create table app.secrets
( cardnum varchar2(15)
, holder varchar(40)
, balance number);
create directory sensitive_files as '/tmp/secrets';

Two weeks later my code monkey technician decides to have a cheeky look at the contents of the secrets table created previously.

select * from app.secrets
ERROR at line 1:
ORA-01031: insufficient privileges

Lovely – no access and my secrets are safe. However the technician happens to be aware of the file naming convention used by the ETL process and tries to read a dump file using UTL_FILE.

set serveroutput on
  fh utl_file.file_type;
  str varchar2(1024);
  fh := utl_file.fopen('SENSITIVE_FILES', 'secrets.dat', 'R');

      utl_file.get_line(fh, str);
    end loop;
    when no_data_found then


Oh my, the “CREATE ANY DIRECTORY” privilege also gives me read permissions on the “contents” of the directory. And…

set serveroutput on
  fh utl_file.file_type;
  fh := utl_file.fopen('SENSITIVE_FILES', 'secrets.dat', 'W');


PL/SQL procedure successfully completed.

We can modify the file too. Here are the new contents when read again using UTL_FILE.


Looking at the permissions it may not be clear how the file contents were changed, “CREATE ANY DIRECTORY” is more powerful than it looks.

  1* select * from dba_sys_privs where grantee = 'DEPLOY'
SQL&gt; /

---------- -------------------- --- ---

It’s probably an edge case but I found it surprising.

Checking Oracle Directory Object Path Validity

Bit of a mouthful that title. Today I was asked to check which, if any, directory objects had an invalid path. Normally this would be trivial but on the system I was on there were more than 90. So I knocked up a quick bit of SQL/KSH to do it for me and thought I’d share it here. I could have done something with PL/SQL but I was happy with quick and dirty.

In summary the snippet below generates shell test commands to a temporary file and then uses “sed” to chop off the first and last lines before running the resulting shell script. I just pasted this in to a putty window SQL*Plus prompt – your mileage may vary.

set define off heading off pages 0 lines 200 trimspool on feedback off echo off
spool /tmp/dirchk.tmp
select '[[ -d '||directory_path||' ]] || echo "'||directory_name||' missing ('||directory_path||')"' from dba_directories order by 1;
spool off

!ls -l /tmp/dirchk.tmp
!sed -e '1d;$d' /tmp/dirchk.tmp > /tmp/
!chmod 700 /tmp/

On Linux I could have used the sed “-i” switch the edit in place but this was AIX.

Example output:

NJTEMP missing (/home/nj)
RESPONSE_DIR missing (/mnt/app/files/response)
SCHEMA_DIR missing (/mnt/app/schema)
CONVERSION missing (/mnt/app/conversion)
FTP_DIR missing (/mnt/app/ftp_files)

UKOUG Database Server SIG (Leeds 2013)

On Thursday I attended the UKOUG Database Server SIG in Leeds. All slides have been uploaded to the UKOUG website.

It’s the first SIG I’ve attended this year and after enjoying it so much I ought to try harder to get to other events. We had Oak Table presence from David Kurtz talking all things partitioning/compression and purging, two Oracle employees discussing first support and then ZFS/NetApp (I particularly enjoyed this one) and then Edgars Rudans on his evolution of the Exadata Minimal Patching process. All of these presentations are well worth downloading and checking out.

The last presentation of the day was me. I’ve never presented before and it took a big step out of my comfort zone to get up there but I’m so glad I did. I would recommend it to anyone currently spending time in the audience thinking “I wish I had the confidence to do that”. It’s nerve racking beforehand but exhilarating afterwards.

When blogging in the past I’ve liked how it makes you think a little bit harder before pressing the publish button. I think the best thing I got out of the whole presentation process was that it made me dig even deeper to make sure I’d done my homework.

After the SIG there was a good group headed out for Leeds Oracle Beers #3 which involved local beer, good burgers and Morris Dancing, all good fun.

Advanced Compression: Visualising Insert Overhead

There is a My Oracle Support note, “Master Note for OLTP Compression [ID 1223705.1]“, which talks about the overheads of “COMPRESS FOR ALL OPERATIONS”. It’s a great note and, for a very artificial test, quotes overheads of around 25% CPU, 100% redo and 100% undo when inserting data via conventional methods. The purpose of this post is to visualise when the extra resources are used.

My test uses a SQL*Plus session that repeatedly inserts a single row into a table with “COMPRESS FOR ALL OPERATIONS” enabled. After each insert I use the SQL*Plus HOST command to capture redo and undo statistics for my session. I struggled to measure CPU for each insert. The “CPU used by this session” statistic was not granular enough for my single row inserts and the CPU tracked by SQL Trace was being rounded to 0ms or 1ms for each insert. However the execute elapsed attribute from SQL Trace did give me good granularity. Obviously SQL Trace has it’s own overheads as I’ve talked about before but I don’t think they are important in this case as I’m just after a visualisation of when extra resources are used.

The shell and SQL scripts used to capture statistics are at the foot of this post.

My SQL*Plus control session looks a little like this:

sqlplus neil
create table compteststats
(	testid		varchar2(10)
,	snapid		number
,	statname	varchar2(64)
,	statval		number);

create table compobj
(	col1 varchar2(30)
,	col2 number
,	col3 varchar2(30)
,	col4 varchar2(30)
,	col5 number
,	col6 varchar2(30))
compress for all operations;

exec dbms_application_info.set_module('TESTSESS','TESTSESS');
exec dbms_monitor.session_trace_enable

--capture starting values
host ./ TEST1

insert into compobj values (lpad('w',30,'w'),1,lpad('x',30,'x'),lpad('y',30,'y'),2,lpad('z',30,'z'));
host ./ TEST1

insert into compobj values (lpad('w',30,'w'),1,lpad('x',30,'x'),lpad('y',30,'y'),2,lpad('z',30,'z'));
host ./ TEST1

-- repeat above insert/host combination until the first block is full and we move to a second


I was then able to report redo and undo statistics using the SQL below

select snapid row#
,	max(decode(statname,'redo size',delta,0)) redo_size
,	max(decode(statname,'undo change vector size',delta,0)) undo_size
,	max(decode(statname,'HSC OLTP positive compression',delta,0)) HSC_OLTP_positive_comp
,	max(decode(statname,'HSC OLTP Compressed Blocks',statval,0)) HSC_OLTP_Compressed_Blocks
from (
	select testid,snapid,statname, statval
	,statval-lag(statval) over (partition by testid,statname order by snapid) delta
	from compteststats
	where testid = 'TEST2'
	and snapid > 0
group by snapid
order by snapid;

And I captured the elapsed time for each insert from the SQL Trace file using the grep/awk combo below. I then combined these values with the redo/undo figures in a spreadsheet.

grep "^EXEC #139771649698992" orcl1_ora_6987.trc | awk -F[,=] '{print $4}'

The chart produced from the spreadsheet looks like this, you can click to enlarge but get the gist from the small version.


The number of inserts is plotted on the X axis and the Y axis is either the value from v$sesstat or elapsed micro seconds from the trace file. The values are not important. What I found interesting was being able to visualise the life of the block. You can see the spikes in redo and undo each time the block is (re)compressed and as the block gradually fills up the compressions become more frequent until we get a lull in the compressions as we start to fill a second data block. The last spike on the chart is first compression of the second block.

Excerpts from my data is below. You can see the “HSC OLTP positive compression” has a delta of 1 for each spike demonstrating that the spike is caused by block compression.

---------- ---------- ---------- ---------------------- --------------------------
         1        604        112                      0                          0
         2        432         68                      0                          0
         3        432         68                      0                          0
         4        432         68                      0                          0
         5        432         68                      0                          0
         6        432         68                      0                          0
         7        432         68                      0                          0
         8        432         68                      0                          0
        51        432         68                      0                          0
        52        432         68                      0                          0
        53       9092       8320                      1                          1
        54        420         68                      0                          1
        55        420         68                      0                          1
        97        420         68                      0                          1
        98        420         68                      0                          1
        99       9152       8320                      1                          1
       100        420         68                      0                          1
       101        420         68                      0                          1
       613        432         68                      0                          1
       614        432         68                      0                          1
       615       9092       8320                      1                          2
       616        420         68                      0                          2
       617        420         68                      0                          2

As stated previously – here are my scripts.

sqlplus -S neil/neil <<SQLINP
@inscompstat $1


set verify off
insert into compteststats
select '&1', c.maxsnap,, s.value
from (
        select, s.value
        from v$session u, v$sesstat s, v$statname n
        where u.sid = s.sid
        and s.statistic# = n.statistic#
        and u.module = 'TESTSESS'
        and in ( 'redo size'
                ,       'undo change vector size'
                ,       'HSC OLTP positive compression'
                ,       'HSC OLTP Compressed Blocks')
        ) s
,       (select nvl(max(snapid),-1)+1 maxsnap
        from compteststats
        where testid = '&1') c

Compression Advisory: DBMS_COMPRESSION

I’ve been doing a lot of testing with COMPRESS FOR ALL OPERATIONS (OLTP compression) recently and spent a bit of time with the Compression Advisor. This post documents a few items I noticed along the way.

I picked up instructions on how to run the Compression Advisor from My Oracle Support note “Compression Advisory in 11GR2: Using DBMS_COMPRESSION [ID 762974.1]“. The note advises me to create a new tablespace and informs me that two temporary segments are created.

Please create a tablespace before running compression advisor as it creates two temporary tables and will do significant amount of I/O traffic against them.

Below is a sneak preview of the temporary segments that will be created.

create table "ACME".DBMS_TABCOMP_TEMP_UNCMP tablespace "SCRATCH" nologging
 as select /*+ DYNAMIC_SAMPLING(0) FULL("ACME"."ACCS") */ *
 from "ACME"."ACCS"  sample block( 99) mytab

create table "ACME".DBMS_TABCOMP_TEMP_CMP organization heap 
 tablespace "SCRATCH" compress for all operations nologging
 as select /*+ DYNAMIC_SAMPLING(0) */ *

DBMS_COMPRESSION runs using invoker rights so the privileges of the user I log on as are relevant.

select procedure_name, authid 
from dba_procedures 
where owner = 'SYS' 
and object_name = 'DBMS_COMPRESSION';

------------------------------ ------------

Taking all of the above into account I’ve created a new SCRATCH tablespace for my temporary segments and will run the analysis logged in as myself and not the schema user who owns the segment I want to compress. As we at ORAganism are so security minded ;-) I am not permitted to have the standard “dba” role and have a cut down one with only the permissions I need.

create tablespace scratch datafile '+data' size 50m;

create role nj_dba;
create user nj identified by nj;
grant create session, create any table, drop any table, select any table to nj_dba;
grant execute on sys.dbms_monitor to nj_dba;
grant nj_dba to nj;

Also according to note “ORA-01536: space quota exceeded for tablespace along with DBMS_COMPRESSION.GET_COMPRESSION_RATIO [ID 970324.1]” I need to give the owner of the segment being analysed a quota on my scratch tablespace.

To get rid of the errors, the user specified as the second parameter in the DBMS_COMPRESSION.GET_COMPRESSION_RATIO need to have … some QUOTA over the scratch tablespace specified by the first parameter.


alter user acme quota unlimited on scratch;

I have created my script to call the advisor, dbms_comp.sql, as below.

set serveroutput on
set feedback on
set verify off

	blkcnt_cmp BINARY_integer;
	blkcnt_uncmp BINARY_integer;
	row_cmp BINARY_integer;
	row_uncmp BINARY_integer;
	cmp_ratio number;
	comptype_str varchar2(60);
		scratchtbsname => upper('&3.')
	,	ownname => upper('&1.')
	,	tabname => upper('&2.')
	,	partname => null
	,	comptype => dbms_compression.comp_for_oltp
	,	blkcnt_cmp => blkcnt_cmp
	,	blkcnt_uncmp => blkcnt_uncmp
	,	row_cmp => row_cmp
	,	row_uncmp => row_uncmp
	,	cmp_ratio => cmp_ratio
	,	comptype_str => comptype_str
	,	subset_numrows => &4.
	DBMS_OUTPUT.PUT_LINE('Block count compressed = ' || blkcnt_cmp);
	DBMS_OUTPUT.PUT_LINE('Block count uncompressed = ' || blkcnt_uncmp);
	DBMS_OUTPUT.PUT_LINE('Row count per block compressed = ' || row_cmp);
	DBMS_OUTPUT.PUT_LINE('Row count per block uncompressed = ' || row_uncmp);
	--DBMS_OUTPUT.PUT_LINE('Compression type = ' ||comptype_str);
	DBMS_OUTPUT.PUT_LINE('Compression ratio = '||round(blkcnt_uncmp/blkcnt_cmp,1)||' to 1');
	DBMS_OUTPUT.PUT_LINE('Compression % benefit = '||round((blkcnt_uncmp-blkcnt_cmp)/blkcnt_uncmp*100,1));
	--DBMS_OUTPUT.PUT_LINE('Compression ratio org= '||cmp_ratio);
set verify on

So we should be good to go:

conn nj/nj
@orcl> @dbms_comp acme accs scratch 200000
ERROR at line 1:
ORA-20000: Compression Advisor tablespace "SCRATCH" does not exist
ORA-06512: at "SYS.PRVT_COMPRESSION", line 776
ORA-06512: at "SYS.DBMS_COMPRESSION", line 214
ORA-06512: at line 9

With SQL Trace enabled I see the following in the trace file.

PARSING IN CURSOR #139655103675152 len=64 dep=1 uid=37 oct=3 lid=37 tim=1357852093619559 hv=814796271 ad='67de3b40' sqlid='g6pggnss91mgg'
SELECT count(*) FROM user_tablespaces WHERE tablespace_name= :b1
BINDS #139655103675152:

Even though the temporary segments are created in the same schema as the object being analysed my own account also needs a quota on the scratch tablespace. No segments are created in my schema but I need the quota in order to get past the check on “user_tablespaces”.

alter user nj quota unlimited on scratch;

Let’s try again:

@orcl> @dbms_comp acme accs scratch 200000
ERROR at line 1:
ORA-01031: insufficient privileges
ORA-06512: at "SYS.PRVT_COMPRESSION", line 776
ORA-06512: at "SYS.DBMS_COMPRESSION", line 214
ORA-06512: at line 9

I couldn’t find anything about this in M.O.S or find any evidence from SQL Trace. However the power of trial and error tracked the missing privilege down to “ANALYZE ANY” (I could have granted OEM_MONITOR to provide this privilege but we’re all about the principle of least privilege today).


Let’s try again… again:

@orcl> @dbms_comp acme accs scratch 200000
Block count compressed = 1612
Block count uncompressed = 1632
Row count per block compressed = 123
Row count per block uncompressed = 121
Compression ratio = 1 to 1
Compression % benefit = 1.2

PL/SQL procedure successfully completed.

Elapsed: 00:00:09.82

Sadly my table is not going to benefit from compression but it’s a success nonetheless. So the full list of required privileges is:


You also need a quota for your schema user and an unnecessary quota for the executing user.

If you use VPD then don’t forget to take that into account.

And one last point, the temporary tables are created in thje target schema and the names are not session/object specific so don’t try to run Compress Advisor calls in parallel:

@orcl> @dbms_comp acme accs scratch 200000
ERROR at line 1:
ORA-00955: name is already used by an existing object
ORA-06512: at "SYS.PRVT_COMPRESSION", line 776
ORA-06512: at "SYS.DBMS_COMPRESSION", line 214
ORA-06512: at line 9 Client, Password Expiry and ORA-1017

I have had an intermittent issue at work recently where I am occasionally unable to change my password when it has expired.

From my weapon of choice, DOS prompt/”sqlplus” on Windows, I enter my current password and am prompted to change it due to its expiry. I then enter a new password, one I am confident is consistent with password policy and get a message stating my username/password is incorrect. You can see example output below (and yes I am absolutely certain I entered the same password twice :) ).

u:\sqls>sqlplus neiljohnson@db10g1

SQL*Plus: Release Production on Wed Sep 19 09:36:05 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Enter password:
ORA-28001: the password has expired

Changing password for neiljohnson
New password:
Retype new password:
ORA-01017: invalid username/password; logon denied

Password unchanged
Enter user-name:

I was unable to find a hit in M.O.S and settled on a work-a-round of changing my password on the Unix database server, silently hoping I wasn’t going mad. After a few more occurrences I resorted to Google and via the OTN forums found an M.O.S note detailing the problem.

ORA-1017 Trying To Change Expired Password Through Programmatic Interface From To Lower Database Version [ID 1426233.1]

Below is the suggested solution from this note and I find it out of step with Oracle’s usual stance on these things:

There are no workarounds available to programatically changing the password, apart from;

a) Use an older client version, or
b) Change the user’s password via an ALTER USER statement from a different session.
c) Upgrade the database version to

I’ve always been quite proud to tell my Sysadmin brethren that Oracle clients are backward compatible (and within reason forward compatible too). What we have here is the latest and greatest of the Oracle clients,, and a statement that I read as “this is not backward compatible – tough luck!”

I tried to diagnose this further with SQL trace on the Oracle server process but I only capture the insert into sys.aud$. I also tried SQL Net client tracing and in the trace file I can see an error recorded in between the ORA-28001 and ORA-01017. I’ve highlighted it in the extracts from the trace file below.

2012-09-19 10:44:33.304271 : nsbasic_brc:00 00 24 4F 52 41 2D 32  |..$ORA-2|
2012-09-19 10:44:33.304290 : nsbasic_brc:38 30 30 31 3A 20 74 68  ||
2012-09-19 10:44:33.304309 : nsbasic_brc:65 20 70 61 73 73 77 6F  |e.passwo|
2012-09-19 10:44:33.304327 : nsbasic_brc:72 64 20 68 61 73 20 65  |rd.has.e|
2012-09-19 10:44:33.304346 : nsbasic_brc:78 70 69 72 65 64 0A     |xpired. |
2012-09-19 10:44:33.304364 : nsbasic_brc:exit: oln=0, dln=69, tot=79, rc=0
2012-09-19 10:44:33.304383 : nioqrc:exit
2012-09-19 10:45:59.358748 : nsbasic_brc:exit: oln=11, dln=1, tot=22, rc=0
2012-09-19 10:45:59.359278 : nioqrc: found a break marker...
2012-09-19 10:45:59.359797 : nioqrc: Recieve: returning error: 3111
2012-09-19 10:45:59.360333 : nioqrc:exit
2012-09-19 10:45:59.360834 : nioqrs:entry
2012-09-19 10:45:59.406650 : nsbasic_brc:00 00 33 4F 52 41 2D 30  |..3ORA-0|
2012-09-19 10:45:59.407173 : nsbasic_brc:31 30 31 37 3A 20 69 6E  ||
2012-09-19 10:45:59.407702 : nsbasic_brc:76 61 6C 69 64 20 75 73  ||
2012-09-19 10:45:59.408227 : nsbasic_brc:65 72 6E 61 6D 65 2F 70  |ername/p|
2012-09-19 10:45:59.408753 : nsbasic_brc:61 73 73 77 6F 72 64 3B  |assword;|
2012-09-19 10:45:59.409280 : nsbasic_brc:20 6C 6F 67 6F 6E 20 64  |.logon.d|
2012-09-19 10:45:59.409820 : nsbasic_brc:65 6E 69 65 64 0A        |enied.  |

It seems from the M.O.S information below that ORA-03111 is not something I’m going to get very far with, in fact it’s probably a side effect of some other piece of code failing. Either way – it’s back to the work-a-round for me.

Error:  ORA 3111 
Text:   break received on communication channel 
Cause:  A break was processed on the communications channel, but was not 
        properly handled by SQL*Net software.
        This is an internal error message not usually issued.
Action: Contact customer support.

Hopefully this post will save someone else some time if faced with the same issue, sadly I am not currently able to provide a solution.

Production Support Tips & Tricks #2 – SQL Trace

Part 1: Production Support Tip & Tricks #1 – Collecting Log Data

There’s no need to discuss SQL Trace other than the age-old concern – the overhead. Incident managers are risk averse and when you ask if you can trace a session on production you will always be asked what overhead it has and invariably you will say “erm… 5%… ish”. We can see from the Oracle documentation that Oracle’s stance on this has changed over the years:

9i docs:

“The additional overhead … is normally insignificant compared with the inherent overhead caused by the application’s inefficiency.”

11.2 docs:

“Using the SQL Trace facility can have a severe performance impact and may result in increased system overhead, excessive CPU usage, and inadequate disk space”

In this post I am going to attempt to assess the overhead of SQL Trace. My theory is that there are 2 potential factors.

  1. The overhead of the SQL Trace code. There must be extra code involved and it must take some amount of CPU to execute it.
  2. The overhead of writing trace files. Writing to a file can not be without any overhead.

My cunning plan is to test queries in 3 states:

  • With SQL Trace disabled
  • With SQL Trace enabled
  • With SQL Trace enabled but redirected to /dev/null thus removing any overhead of writing to the trace file

The queries will be tested in IO bound and CPU bound states. The tests were done on a Linux laptop running Oracle XE 11g with a single hard drive so the results are not “enterprise class” – your mileage may vary.

All of the code used can be found at these links, for brevity I have not included them inline – build-acme-schema.sql sqltrace-long1.sql sqltrace-long2.sql sqltrace-short1.sql sqltrace-short2.sql

Test #1 – IO bound queries with a long(ish) elapsed time

I say “long(ish)” as the queries only take a few seconds, the key thing is that the majority of time is spent fetching data. I flush the buffer cache before every execution to make sure the queries are IO bound. The test code can be found in sqltrace-long1.sql.

The results are:

Test \ Execution 1 2 3 4 5 6 Average % Overhead
Long IO bound no trace 5.45s 5.45s 5.52s 5.52s 5.45s 5.34s 5.45s -
Long IO bound with trace (32KB) 6.90s 6.89s 6.94s 7.09s 6.84s 7.01s 6.94s 27.31%
Long IO bound with trace (to null) 6.89s 6.82s 6.89s 6.94s 6.78s 7.12s 6.90s 26.61%

I was really surprised by this. For a small number of queries generating small trace files, only 32KB, I saw a 25% degradation in run time with SQL Trace enabled. The minimal size of the trace files did nothing to prove my theory about the overhead in writing the trace file but this was an eye opener.

Test #2 – CPU bound queries with a long(ish) elapsed time

This test used the same queries as before but I removed the flush of the buffer cache from before each query. This removed all IO waits. The test code can be found in sqltrace-long2.sql.

Test \ Execution 1 2 3 4 5 6 Average % Overhead
Long CPU bound no trace 3.79s 3.76s 3.80s 3.73s 3.71s 3.79s 3.76s -
Long CPU bound with trace (11KB) 4.90s 4.99s 5.02s 5.00s 4.89s 4.87s 4.94s 31.39%
Long CPU bound with trace (to null) 4.95s 5.00s 5.05s 5.05s 4.90s 4.86s 4.96s 32.02%

Interesting – so by running the same workload but with no disk reads to slow it down the overhead is slightly more pronounced. But infrequent long running queries are not the only fruit.

Test #3 – IO bound frequent queries with a short elapsed time

This test uses very fast queries in a tight loop. The proportion of SQL executes to row fetches is very different to the previous tests. The buffer cache is flushed before each execution. The test code can be found in sqltrace-short1.sql.

Test \ Execution 1 2 3 4 5 6 Average % Overhead
Short IO bound no trace 43.99s 43.2s 39.44s 41.81s 39.00s 40.03s 41.24s -
Short IO bound with trace (2.9MB) 41.35s 43.63s 40.70s 41.72s 43.24s 39.86s 41.75s 1.22%
Short IO bound with trace (to null) 41.12s 45.21s 41.50s 40.82s 42.44s 40.05s 41.85s 1.48%

So this more like it – only a 1.5% overhead.

Test #4 – CPU bound frequent queries with a short elapsed time

This test uses the same very fast queries in a tight loop from the previous test. The buffer cache flushes were removed and the number of iterations increased to get a longer elapsed time. The test code can be found in sqltrace-short2.sql.

Test \ Execution 1 2 3 4 5 6 Average % Overhead
Short CPU bound no trace 2.58s 2.46s 2.46s 2.63s 2.49s 2.47s 2.51s -
Short CPU bound with trace (32MB) 7.55s 7.52s 7.57s 7.64s 7.43s 7.67s 7.56s 200.72%
Short CPU bound with trace (to null) 4.45s 4.57s 4.69s 4.63s 4.54s 4.62 4.58s 82.23%

Ouch – a 200% overhead! Also the overhead of writing to the trace file was visible this time, SQL Trace overhead dropped from 200% to 80% when the trace file was /dev/null.


I am almost reluctant to summarise as these are not the most scientific of tests but the things I am taking away from this excercise are:

  • SQL workload that is CPU bound will see a dramatic degradation in elapsed time when SQL Trace is enabled – I’ll be wary of this in the future
  • SQL workload that is IO bound will see less of an impact from SQL Trace, but the impact could still be significant.
  • When trace file size is large then the speed of the trace file target filesystem is important. In these tests I used /dev/null which is of course cheating.
  • When IO response is important to a test remember to set filesystemio_option=setall. I wasted several evenings scratching my head at inconsistent, unpredictable test results.
  • Your mileage may vary

Feel free to point out mistakes in my train of thought or in the tests themselves. I’m not too happy with tests 1 & 2 – I’m not convinced they prove anything worthwhile. Tests 3 & 4 however have results I have found interesting and informative.

I think I’ll continue to tell Incident Managers that the overhead is around 5-10%, but only after I’ve given some careful thought to the criticality of the workload I want to trace.

Production Support Tips & Tricks #1 – Collecting Log Data

Early this year (2012) I started working on a presentation, it would be my first, that I hoped to submit to UKOUG. The thrust of the presentation was to be tips on making your experiences with Oracle Support more pleasant, to help keep your support analyst busy rather than yourself. A prospective title was “with Support like this who needs enemies” – perhaps that’s a big strong ;-). Several things colluded to make it unlikely I would get to present it so I faltered and things ground to a halt. After a period of inactivity I have decided to convert it into a short series of blog posts. This is the first. Part 2 is here – “Production Support Tips & Tricks #2 – SQL Trace

This post contains some advice for collecting log data when raising SRs. It’s mostly obvious but hopefully not to all.

ADR Package

You already know so I’m not going to waste my breath.

Get everything packaged up, not just the trace files you think Oracle need. Avoids repeat requests.

Well covered by others so I’m not going near it:
John Hallas quality UKOUG presentation
Uwe Hesse’s super blog entry

Nah – see above

Not related to diagnostic collection but listener targets don’t auto purge so your housekeeping scripts need to make calls to adrci to force a purge. for clusters

What? is a script in your CRS home which collates all CRS related log files on the current cluster node.

It’s not easy manually collecting everything Oracle Support may require. This script makes it easy.

Several options, you can check them with the “-h” option. Or just collect everything:


Uncompressed the resulting tar file can be very large

-rw-r--r-- 1 grid oinstall 1.1G Feb 22 21:49 crsData_n02_20120222_2144.tar

Even compressed the file can still be a lengthy upload to M.O.S (multiplied by the # of nodes)

-rw-r--r-- 1 grid oinstall 69M Feb 22 21:49 crsData_n02_20120222_2144.tar.gz

Trivia? is just a wrapper for

OS Watcher Black Box (OSWbb)

A quote from the user guide:

a collection of UNIX shell scripts intended to collect and archive operating system and network metrics to aid support in diagnosing performance issues

“Because every vendor wants to blame another vendor and OSWbb helps that process”
“Because every issue is the fault of the database so you need ammunition to feed to your vendor”
“insert your own cynical quote here”

Download from M.O.S – “OS Watcher Black Box User Guide [ID 301137.1]“. It is certified on AIX, Tru64, Solaris, HP-UX, Linux.

It is easy to run:

nohup ./ &

easy to stop


and easy to send

$ ./
-rw-r--r-- 1 oracle oinstall 1.2M Feb  8 22:00 osw_archive_0208122216.tar.Z


You can install OSWbb as a Linux service – “How To Start OSWatcher Black Box Every System Boot [ID 580513.1]” or use any scheduling tool. Alternatvely you can control it via CRS, this way it is only active when the cluster is active which has plus and minus points. For details of this see M.O.S note “Making Applications Highly Available Using Oracle Clusterware [ID 1105489.1]“.

To do it you need an action script, there is a perfectly good demo one in “$GRID_HOME/crs/demo”. Alternatively the one I use for testing at home can be found here – osw.scr (use at your peril).

$GRID_HOME/bin/crsctl add resource osw -type ora.local_resource.type \
 -attr "AUTO_START=always,ACTION_SCRIPT=$GRID_HOME/crs/script/oswbb.scr"
$  $GRID_HOME/bin/crsctl status res osw
STATE=ONLINE on n01, ONLINE on n02

From “OS Watcher For Windows (OSWFW) User Guide [ID 433472.1]“:

OS Watcher for Windows is no longer supported.
It has been replace by the Cluster Health Monitor.

From “Cluster Health Monitor (CHM) FAQ [ID 1328466.1]“

Is the Cluster Health Monitor replacing OSWatcher?
…there [is] some information such as top, traceroute, and netstat that the Cluster Health Monitor does not collect, so running the Cluster Health Monitor while running OSWatcher is ideal. Both tools complement each other rather than supplement…

In my opinion another reason for still using OSWbb in spite of CHM is that CHM is very difficult to review yourself, it is also not yet the tool of choice for many within Oracle Support. OSWbb still has a place.

Quote from traceroute Unix man page by way of caveat:

Because of the load it could impose on the network, it is unwise to use traceroute during normal operations or from automated scripts.


“OS Watcher Black Box” was originally called “OS Watcher” but was renamed due to a clash of names with other unrelated, non-Oracle tool(s).

More to follow in the future

Mining the listener log with Perl

Update – 02-MAY-2012
I have had cause to use this script again at work so thought I’d develop it further. This post has been updated to reflect the new version.

Recently at work I was engaged by one of our application support teams because of application calls failing with “ORA-00020: maximum number of processes (300) exceeded”. We quickly identified that these errors were the symptom of another problem and not a concern in themselves, the real issue was slow database calls due to an infrastructure issue. The ORA-00020 errors were because the application was spawning new connections to try to work around the slow connections. Now I don’t know much about application servers but I know this is not uncommon. The application team however could not find any indication of this in their logs so it was up to me to prove it.

As far as I am aware there are 2 options for retrospectively proving this at a granular level (please share any other options I’ve missed):

1) Query the database audit trail – DBA_AUDIT_SESSION
2) Mine the listener log

On this database no auditing was enabled so I was left with the listener log as my data source.

I knocked up a quick KSH/AWK script to get my data and then massaged it via various means. The output was quite pleasing so over the recent long UK Easter holiday I Googled some Perl commands and had some fun with it, this post is to share the script and an example chart from Microsoft Excel based on the incident I described above.

The script can be downloaded from –

Below is an example of how to call the script – it works on *nix and Windows. The “-f” parameter is used to filter log entries on the date, this example is matching any minute between “11:00″ and “12:59″. I’ve included a full usage map further down this post.

C:\>perl -excel -f "04-APR-2012 1[12]" listener.log  > llog.csv

This is the chart showing the logon spike – click to enlarge. You can see the connection spike at 12:56 of over 250 connection attempts.

Connection rate from listener log

The script allows a few other options such as allowing listener logs to be “piped” into the script to allow multiple files to be processed in one command. Usage notes below.

C:\>perl -help

    Usage: [-help] [-filter <string>] [-[no]excel] [-[no]host] <listener_log_path|->
         : -excel|noexcel       : Output dates in Excel friendly format (default is noexcel)
         : -help                : Print this help
         : -host|nohost         : Include host on program (default is to include)
         : -filter <string>     : A regexp to match beginning of log line. e.g. "23-DEC-2012 0[12]" or "..-MAR-2013"
         : listener_log_path|-  : Input either a valid listener log name or "-" to read STDIN

         # Process listener.log looking for entries on the 21st and 22nd of Sept and format dates for Excel
         C:\>perl -filter "2[12]-SEP-2011" -excel listener.log

         # Process two listener log files looking for entries on Jan 3rd between 20:00 and 21:59 and exclude the hostname
         $ cat listener.log listener_db.log | -filter "03-JAN-2012 2[01]" -nohost -

An improvement I may make in the future is to expand minutes where there are no connection attempts. I have drafted some code to fill in gaps but have kept it to myself for now as it’s too ugly to share (at the moment).

I hope someone finds this useful and if not, well I found it entertaining if nothing else :-)