How Eter Pani became Oracle Certified Master

I got my OCM. Hurray!
Now one of the common question people ask me is how to pass this OCM exam.
Firstly I am very grateful to Oracle to introduce this exam. Finally it is the real exam, but not test with set of answers. I clearly see complicatedness for examiners, but it is the real proof of DBA skills, not general memory skills.
Secondly this exam does not prove your exceptional knowledge of Oracle. It is the proof that you are fluent in all basic skills. During the exam everything works according to documentation. Personally I have collection of my favourite Metalink Notes and Advanced instructions that are used at least once a week. You do not need such things on exam.
The environment is ready. You do not need to reconfigure switches for RAC or install additional OS packages. What you are doing is only Oracle Database Administration, no OS specific. If something is really different between different UNIX OS, forget about it. It should not be part of the exam.
When I came to client site I frequently have no access to internet thus have a copy of Oracle Documentation locally. Moreover reading through local copy is actually faster then browsing through internet copy. I used to use local copy and it was really useful during exam.
Another my habit that I find useful is preparing all scripts in text file and then copy it to SQL*Plus or Shell window. If you need to rerun script or slightly alter it for a different skill-set you can reuse your one history. E.g. I store in this file backup/restore scripts.
You have 2 days, 14 hours, including lunch and breaks for 7 skill-sets. None of skill-set takes more then 2 hours. If you do not believe you can do something in less then 2 hours forget about it. Even if it would be on exam you would not be able to do it in time. Focus on things that you would be able to do.
The exam is based on 11.2g database. If something is different between patch sets again forget about it. Asking information specific for patch set is unfair to people who used to basic one, thus this question would not arrive on exam.
When you read through skill-set task at the beginning, read it up to the end. Mark for yourself tasks that would require you some investigation through the documentation. Mark for yourself tasks that you doubt to solve. Estimate time for each task. Start from the short and easy one and if you significantly overflow the time frame you set switch to the next task in your ordered list. If you have time you can came back and fix the issues later.
I recommend 15 minutes before end of skill-set to check the whole environment, there is special button for end state of the skill-set. 15 minutes should be enough to bring it to correct state.
Read tasks carefully, frequently tasks include markers how to avoid hidden rocks of the environment, e.g. check all options of the objects to create. If you would not follow it exactly the problems would make your life significantly harder.
Some tasks are not clear, you can ask your proctor for clarification. But proctor not always can rephrase task without violation of exam rules, if he could not provide explanation what is requested in a task follow “best practice”.
In general be concentrated, careful and have a lot of practice before exam. I passed preparation courses but honestly it was just way to guarantee time and environment for training. You can do preparation yourself if your management and family would grant the opportunity to do it. If you have no such generous option apply for a preparation course, it is really value for money, very useful and informative. Course provide to you experience of working on the same machines that you would use on exam. In my case the machines was really physically the same, just different OS image. BTW try to became used to local keyboards of the country where you are going to pass the exam. English and US keyboards are different and this difference can be that point which consume the vital time on exam.
Good Luck.
Eter

UKOUG Database Server SIG (Leeds 2013)

On Thursday I attended the UKOUG Database Server SIG in Leeds. All slides have been uploaded to the UKOUG website.

https://www.ukoug.org/events/ukoug-database-server-sig-meeting-may-2013/

It’s the first SIG I’ve attended this year and after enjoying it so much I ought to try harder to get to other events. We had Oak Table presence from David Kurtz talking all things partitioning/compression and purging, two Oracle employees discussing first support and then ZFS/NetApp (I particularly enjoyed this one) and then Edgars Rudans on his evolution of the Exadata Minimal Patching process. All of these presentations are well worth downloading and checking out.

The last presentation of the day was me. I’ve never presented before and it took a big step out of my comfort zone to get up there but I’m so glad I did. I would recommend it to anyone currently spending time in the audience thinking “I wish I had the confidence to do that”. It’s nerve racking beforehand but exhilarating afterwards.

When blogging in the past I’ve liked how it makes you think a little bit harder before pressing the publish button. I think the best thing I got out of the whole presentation process was that it made me dig even deeper to make sure I’d done my homework.

After the SIG there was a good group headed out for Leeds Oracle Beers #3 which involved local beer, good burgers and Morris Dancing, all good fun.

Column Encryption tricks

This story starts when someone ask me to Look on Transparent Data Encryption issue. We discover that we could not do Partition Exchange due to the difference in encryption keys on the involved table columns. It have clear common sense to bring them in sync thus we happy trying to implement Oracle Recommendation (Unable To exchange A Partition With A Table If It Has An Encrypted Column [ID 958728.1]) using the following scripts.

ALTER TABLE TESTTBL MODIFY (COL1 decrypt);
ALTER TABLE TESTTBL MODIFY (COL1 encrypt using '3DES168' identified by "MyKey2013");
ALTER TABLE TESTTBL MODIFY (COL2 decrypt);
ALTER TABLE TESTTBL MODIFY (COL2 encrypt using '3DES168' identified by "MyKey2013");

But it does not change the situation. Why keys are different and how to check it. Officially oracle does not provide any information about encrypting keys but… the following query returns it to me.

SELECT u.name OWNER, o.name TABLE_NAME, c.name COLUMN_NAME,
          case e.ENCALG when 1 then '3 Key Triple DES 168 bits key'
                        when 2 then 'AES 128 bits key'
                        when 3 then 'AES 192 bits key'
                        when 4 then 'AES 256 bits key'
                        else 'Internal Err'
          end KEY_TYPE,
          decode(bitand(c.property, 536870912), 0, 'YES', 'NO'),
          case e.INTALG when 1 then 'SHA-1'
                        when 2 then 'NOMAC'
                        else 'Internal Err'
          end KEY_ALG,
          e.COLKLC as KEY_VAL
   from sys.user$ u, sys.obj$ o, sys.col$ c, sys.enc$ e
   where e.obj#=o.obj# and o.owner#=u.user# and bitand(flags, 128)=0 and
         e.obj#=c.obj# and bitand(c.property, 67108864) = 67108864
ORDER BY 1,2,3;

Actually it is not THE KEYS but the keys encrypted by master key that I do not know, but for our purpose when we just compare the keys it provide required information.

EPANI TESTTBL COL1 3 Key Triple DES 168 bits key YES SHA-1
4177414141414141414141414141414141414141414141462F5955334D6532392B54453450566747626F4F7570635A51454162786335394A4A524D4E30576335366370344F6D5A364A37515365544F7A6C4B4C534C77633D
EPANI TESTTBL COL2 3 Key Triple DES 168 bits key YES SHA-1
4177414141414141414141414141414141414141414141462F5955334D6532392B54453450566747626F4F7570635A51454162786335394A4A524D4E30576335366370344F6D5A364A37515365544F7A6C4B4C534C77633D
EPANI ARCHTESTTBL COL1 3 Key Triple DES 168 bits key YES SHA-1
4177414141414141414141414141414141414141414141536576676A5A79514B544B50592F3257762F3359726A6D6A63525747634A6F4B53754645665A7139376677336C394E306D3071706C6667306B6564586233524D3D
EPANI ARCHTESTTBL COL2 3 Key Triple DES 168 bits key YES SHA-1
4177414141414141414141414141414141414141414141536576676A5A79514B544B50592F3257762F3359726A6D6A63525747634A6F4B53754645665A7139376677336C394E306D3071706C6667306B6564586233524D3D

Now we see that encryption keys are still different. Lets do it one more time running one command at the time.

ALTER TABLE TESTTBL MODIFY (COL1 decrypt);
EPANI TESTTBL COL2 3 Key Triple DES 168 bits key YES SHA-1
4177414141414141414141414141414141414141414141462F5955334D6532392B54453450566747626F4F7570635A51454162786335394A4A524D4E30576335366370344F6D5A364A37515365544F7A6C4B4C534C77633D
EPANI ARCHTESTTBL COL1 3 Key Triple DES 168 bits key YES SHA-1
4177414141414141414141414141414141414141414141536576676A5A79514B544B50592F3257762F3359726A6D6A63525747634A6F4B53754645665A7139376677336C394E306D3071706C6667306B6564586233524D3D
EPANI ARCHTESTTBL COL2 3 Key Triple DES 168 bits key YES SHA-1
4177414141414141414141414141414141414141414141536576676A5A79514B544B50592F3257762F3359726A6D6A63525747634A6F4B53754645665A7139376677336C394E306D3071706C6667306B6564586233524D3D

We got 3 rows. The column was decrypted successfully.
Lets run the second command from our script.

ALTER TABLE TESTTBL MODIFY (COL1 encrypt using '3DES168' identified by "MyKey2013");

The command has run successfully but what we have in a key table?

EPANI TESTTBL COL1 3 Key Triple DES 168 bits key YES SHA-1
4177414141414141414141414141414141414141414141462F5955334D6532392B54453450566747626F4F7570635A51454162786335394A4A524D4E30576335366370344F6D5A364A37515365544F7A6C4B4C534C77633D
EPANI TESTTBL COL2 3 Key Triple DES 168 bits key YES SHA-1
4177414141414141414141414141414141414141414141462F5955334D6532392B54453450566747626F4F7570635A51454162786335394A4A524D4E30576335366370344F6D5A364A37515365544F7A6C4B4C534C77633D
EPANI ARCHTESTTBL COL1 3 Key Triple DES 168 bits key YES SHA-1
4177414141414141414141414141414141414141414141536576676A5A79514B544B50592F3257762F3359726A6D6A63525747634A6F4B53754645665A7139376677336C394E306D3071706C6667306B6564586233524D3D
EPANI ARCHTESTTBL COL2 3 Key Triple DES 168 bits key YES SHA-1
4177414141414141414141414141414141414141414141536576676A5A79514B544B50592F3257762F3359726A6D6A63525747634A6F4B53754645665A7139376677336C394E306D3071706C6667306B6564586233524D3D

We got 4 rows but the encryption key has not changed. At this point we can put our attention that for all columns in the same table the encryption key is the same, even for those which was encrypted without “identified by” clause.

Now we try to decrypt both columns at once and then encrypt both columns back.

ALTER TABLE TESTTBL MODIFY (COL1 decrypt);
ALTER TABLE TESTTBL MODIFY (COL2 decrypt);
ALTER TABLE TESTTBL MODIFY (COL1 encrypt using '3DES168' identified by "MyKey2013");
ALTER TABLE TESTTBL MODIFY (COL2 encrypt using '3DES168' identified by "MyKey2013");

And check the status of encrypt keys in a storage.

EPANI TESTTBL COL1 3 Key Triple DES 168 bits key YES SHA-1
4177414141414141414141414141414141414141414142622B726E53615843627A437379797646783333745031517833496542486D514D55765138724A4E4967525A7248534B706C735148756D454C745243597A6C6B6B3D
EPANI TESTTBL COL2 3 Key Triple DES 168 bits key YES SHA-1
4177414141414141414141414141414141414141414142622B726E53615843627A437379797646783333745031517833496542486D514D55765138724A4E4967525A7248534B706C735148756D454C745243597A6C6B6B3D
EPANI ARCHTESTTBL COL1 3 Key Triple DES 168 bits key YES SHA-1
4177414141414141414141414141414141414141414141536576676A5A79514B544B50592F3257762F3359726A6D6A63525747634A6F4B53754645665A7139376677336C394E306D3071706C6667306B6564586233524D3D
EPANI ARCHTESTTBL COL2 3 Key Triple DES 168 bits key YES SHA-1
4177414141414141414141414141414141414141414141536576676A5A79514B544B50592F3257762F3359726A6D6A63525747634A6F4B53754645665A7139376677336C394E306D3071706C6667306B6564586233524D3D

As you see in resultset, the columns reencrypted using new key.

Explanation.Oracle use single key to encrypt all columns in a single table and until you fully decrypt all columns in a table the newly encrypted columns would still use the old keys. I fully understand logic behind it, but it would have more sense get raise an error when you try to encrypt column using specific derivation key but could not do it, rather silently encrypt column using different key.

Advanced Compression: Visualising Insert Overhead

There is a My Oracle Support note, “Master Note for OLTP Compression [ID 1223705.1]“, which talks about the overheads of “COMPRESS FOR ALL OPERATIONS”. It’s a great note and, for a very artificial test, quotes overheads of around 25% CPU, 100% redo and 100% undo when inserting data via conventional methods. The purpose of this post is to visualise when the extra resources are used.

My test uses a SQL*Plus session that repeatedly inserts a single row into a table with “COMPRESS FOR ALL OPERATIONS” enabled. After each insert I use the SQL*Plus HOST command to capture redo and undo statistics for my session. I struggled to measure CPU for each insert. The “CPU used by this session” statistic was not granular enough for my single row inserts and the CPU tracked by SQL Trace was being rounded to 0ms or 1ms for each insert. However the execute elapsed attribute from SQL Trace did give me good granularity. Obviously SQL Trace has it’s own overheads as I’ve talked about before but I don’t think they are important in this case as I’m just after a visualisation of when extra resources are used.

The shell and SQL scripts used to capture statistics are at the foot of this post.

My SQL*Plus control session looks a little like this:

sqlplus neil
create table compteststats
(	testid		varchar2(10)
,	snapid		number
,	statname	varchar2(64)
,	statval		number);

create table compobj
(	col1 varchar2(30)
,	col2 number
,	col3 varchar2(30)
,	col4 varchar2(30)
,	col5 number
,	col6 varchar2(30))
compress for all operations;

exec dbms_application_info.set_module('TESTSESS','TESTSESS');
exec dbms_monitor.session_trace_enable

--capture starting values
host ./inscompstat.sh TEST1

insert into compobj values (lpad('w',30,'w'),1,lpad('x',30,'x'),lpad('y',30,'y'),2,lpad('z',30,'z'));
host ./inscompstat.sh TEST1

insert into compobj values (lpad('w',30,'w'),1,lpad('x',30,'x'),lpad('y',30,'y'),2,lpad('z',30,'z'));
host ./inscompstat.sh TEST1

-- repeat above insert/host combination until the first block is full and we move to a second

exit

I was then able to report redo and undo statistics using the SQL below

select snapid row#
,	max(decode(statname,'redo size',delta,0)) redo_size
,	max(decode(statname,'undo change vector size',delta,0)) undo_size
,	max(decode(statname,'HSC OLTP positive compression',delta,0)) HSC_OLTP_positive_comp
,	max(decode(statname,'HSC OLTP Compressed Blocks',statval,0)) HSC_OLTP_Compressed_Blocks
from (
	select testid,snapid,statname, statval
	,statval-lag(statval) over (partition by testid,statname order by snapid) delta
	from compteststats
	where testid = 'TEST2'
	and snapid > 0
)
group by snapid
order by snapid;

And I captured the elapsed time for each insert from the SQL Trace file using the grep/awk combo below. I then combined these values with the redo/undo figures in a spreadsheet.

grep "^EXEC #139771649698992" orcl1_ora_6987.trc | awk -F[,=] '{print $4}'

The chart produced from the spreadsheet looks like this, you can click to enlarge but get the gist from the small version.

vis-advcomp_1

The number of inserts is plotted on the X axis and the Y axis is either the value from v$sesstat or elapsed micro seconds from the trace file. The values are not important. What I found interesting was being able to visualise the life of the block. You can see the spikes in redo and undo each time the block is (re)compressed and as the block gradually fills up the compressions become more frequent until we get a lull in the compressions as we start to fill a second data block. The last spike on the chart is first compression of the second block.

Excerpts from my data is below. You can see the “HSC OLTP positive compression” has a delta of 1 for each spike demonstrating that the spike is caused by block compression.

      ROW#  REDO_SIZE  UNDO_SIZE HSC_OLTP_POSITIVE_COMP HSC_OLTP_COMPRESSED_BLOCKS
---------- ---------- ---------- ---------------------- --------------------------
         1        604        112                      0                          0
         2        432         68                      0                          0
         3        432         68                      0                          0
         4        432         68                      0                          0
         5        432         68                      0                          0
         6        432         68                      0                          0
         7        432         68                      0                          0
         8        432         68                      0                          0
...
        51        432         68                      0                          0
        52        432         68                      0                          0
        53       9092       8320                      1                          1
        54        420         68                      0                          1
        55        420         68                      0                          1
...
        97        420         68                      0                          1
        98        420         68                      0                          1
        99       9152       8320                      1                          1
       100        420         68                      0                          1
       101        420         68                      0                          1
...
       613        432         68                      0                          1
       614        432         68                      0                          1
       615       9092       8320                      1                          2
       616        420         68                      0                          2
       617        420         68                      0                          2

As stated previously – here are my scripts.

inscompstat.sh

sqlplus -S neil/neil <<SQLINP
@inscompstat $1
exit
SQLINP

inscompstat.sql

set verify off
insert into compteststats
(testid,snapid,statname,statval)
select '&1', c.maxsnap, s.name, s.value
from (
        select n.name, s.value
        from v$session u, v$sesstat s, v$statname n
        where u.sid = s.sid
        and s.statistic# = n.statistic#
        and u.module = 'TESTSESS'
        and n.name in ( 'redo size'
                ,       'undo change vector size'
                ,       'HSC OLTP positive compression'
                ,       'HSC OLTP Compressed Blocks')
        ) s
,       (select nvl(max(snapid),-1)+1 maxsnap
        from compteststats
        where testid = '&1') c
;

Compression Advisory: DBMS_COMPRESSION

I’ve been doing a lot of testing with COMPRESS FOR ALL OPERATIONS (OLTP compression) recently and spent a bit of time with the Compression Advisor. This post documents a few items I noticed along the way.

I picked up instructions on how to run the Compression Advisor from My Oracle Support note “Compression Advisory in 11GR2: Using DBMS_COMPRESSION [ID 762974.1]“. The note advises me to create a new tablespace and informs me that two temporary segments are created.

Please create a tablespace before running compression advisor as it creates two temporary tables and will do significant amount of I/O traffic against them.

Below is a sneak preview of the temporary segments that will be created.

create table "ACME".DBMS_TABCOMP_TEMP_UNCMP tablespace "SCRATCH" nologging
 as select /*+ DYNAMIC_SAMPLING(0) FULL("ACME"."ACCS") */ *
 from "ACME"."ACCS"  sample block( 99) mytab

create table "ACME".DBMS_TABCOMP_TEMP_CMP organization heap 
 tablespace "SCRATCH" compress for all operations nologging
 as select /*+ DYNAMIC_SAMPLING(0) */ *
 from "ACME".DBMS_TABCOMP_TEMP_UNCMP mytab

DBMS_COMPRESSION runs using invoker rights so the privileges of the user I log on as are relevant.

select procedure_name, authid 
from dba_procedures 
where owner = 'SYS' 
and object_name = 'DBMS_COMPRESSION';

PROCEDURE_NAME                 AUTHID
------------------------------ ------------
INCREMENTAL_COMPRESS           CURRENT_USER
GET_COMPRESSION_TYPE           CURRENT_USER
GET_COMPRESSION_RATIO          CURRENT_USER
                               CURRENT_USER

Taking all of the above into account I’ve created a new SCRATCH tablespace for my temporary segments and will run the analysis logged in as myself and not the schema user who owns the segment I want to compress. As we at ORAganism are so security minded ;-) I am not permitted to have the standard “dba” role and have a cut down one with only the permissions I need.

create tablespace scratch datafile '+data' size 50m;

create role nj_dba;
create user nj identified by nj;
grant create session, create any table, drop any table, select any table to nj_dba;
grant execute on sys.dbms_monitor to nj_dba;
grant nj_dba to nj;

Also according to note “ORA-01536: space quota exceeded for tablespace along with DBMS_COMPRESSION.GET_COMPRESSION_RATIO [ID 970324.1]” I need to give the owner of the segment being analysed a quota on my scratch tablespace.

To get rid of the errors, the user specified as the second parameter in the DBMS_COMPRESSION.GET_COMPRESSION_RATIO need to have … some QUOTA over the scratch tablespace specified by the first parameter.

Therefore:

alter user acme quota unlimited on scratch;

I have created my script to call the advisor, dbms_comp.sql, as below.

set serveroutput on
set feedback on
set verify off

declare
	blkcnt_cmp BINARY_integer;
	blkcnt_uncmp BINARY_integer;
	row_cmp BINARY_integer;
	row_uncmp BINARY_integer;
	cmp_ratio number;
	comptype_str varchar2(60);
begin
	dbms_compression.get_compression_ratio(
		scratchtbsname => upper('&3.')
	,	ownname => upper('&1.')
	,	tabname => upper('&2.')
	,	partname => null
	,	comptype => dbms_compression.comp_for_oltp
	,	blkcnt_cmp => blkcnt_cmp
	,	blkcnt_uncmp => blkcnt_uncmp
	,	row_cmp => row_cmp
	,	row_uncmp => row_uncmp
	,	cmp_ratio => cmp_ratio
	,	comptype_str => comptype_str
	,	subset_numrows => &4.
	);
	DBMS_OUTPUT.PUT_LINE('Block count compressed = ' || blkcnt_cmp);
	DBMS_OUTPUT.PUT_LINE('Block count uncompressed = ' || blkcnt_uncmp);
	DBMS_OUTPUT.PUT_LINE('Row count per block compressed = ' || row_cmp);
	DBMS_OUTPUT.PUT_LINE('Row count per block uncompressed = ' || row_uncmp);
	--DBMS_OUTPUT.PUT_LINE('Compression type = ' ||comptype_str);
	DBMS_OUTPUT.PUT_LINE('Compression ratio = '||round(blkcnt_uncmp/blkcnt_cmp,1)||' to 1');
	DBMS_OUTPUT.PUT_LINE('Compression % benefit = '||round((blkcnt_uncmp-blkcnt_cmp)/blkcnt_uncmp*100,1));
	--DBMS_OUTPUT.PUT_LINE('Compression ratio org= '||cmp_ratio);
end;
/
set verify on

So we should be good to go:

conn nj/nj
@orcl> @dbms_comp acme accs scratch 200000
declare
*
ERROR at line 1:
ORA-20000: Compression Advisor tablespace "SCRATCH" does not exist
ORA-06512: at "SYS.PRVT_COMPRESSION", line 776
ORA-06512: at "SYS.DBMS_COMPRESSION", line 214
ORA-06512: at line 9

With SQL Trace enabled I see the following in the trace file.

PARSING IN CURSOR #139655103675152 len=64 dep=1 uid=37 oct=3 lid=37 tim=1357852093619559 hv=814796271 ad='67de3b40' sqlid='g6pggnss91mgg'
SELECT count(*) FROM user_tablespaces WHERE tablespace_name= :b1
END OF STMT
...
BINDS #139655103675152:
 Bind#0
  ...
  value="SCRATCH"

Even though the temporary segments are created in the same schema as the object being analysed my own account also needs a quota on the scratch tablespace. No segments are created in my schema but I need the quota in order to get past the check on “user_tablespaces”.

alter user nj quota unlimited on scratch;

Let’s try again:

@orcl> @dbms_comp acme accs scratch 200000
declare
*
ERROR at line 1:
ORA-01031: insufficient privileges
ORA-06512: at "SYS.PRVT_COMPRESSION", line 776
ORA-06512: at "SYS.DBMS_COMPRESSION", line 214
ORA-06512: at line 9

I couldn’t find anything about this in M.O.S or find any evidence from SQL Trace. However the power of trial and error tracked the missing privilege down to “ANALYZE ANY” (I could have granted OEM_MONITOR to provide this privilege but we’re all about the principle of least privilege today).

grant ANALYZE ANY to NJ_DBA;

Let’s try again… again:

@orcl> @dbms_comp acme accs scratch 200000
Block count compressed = 1612
Block count uncompressed = 1632
Row count per block compressed = 123
Row count per block uncompressed = 121
Compression ratio = 1 to 1
Compression % benefit = 1.2

PL/SQL procedure successfully completed.

Elapsed: 00:00:09.82

Sadly my table is not going to benefit from compression but it’s a success nonetheless. So the full list of required privileges is:

grant CREATE SESSION to NJ_DBA;
grant ANALYZE ANY to NJ_DBA;
grant CREATE ANY TABLE to NJ_DBA;
grant DROP ANY TABLE to NJ_DBA;
grant SELECT ANY TABLE to NJ_DBA;

You also need a quota for your schema user and an unnecessary quota for the executing user.

If you use VPD then don’t forget to take that into account.

And one last point, the temporary tables are created in thje target schema and the names are not session/object specific so don’t try to run Compress Advisor calls in parallel:

@orcl> @dbms_comp acme accs scratch 200000
declare
*
ERROR at line 1:
ORA-00955: name is already used by an existing object
ORA-06512: at "SYS.PRVT_COMPRESSION", line 776
ORA-06512: at "SYS.DBMS_COMPRESSION", line 214
ORA-06512: at line 9

Dynamic temporary tablespace switching bug/feature.

Up to last day I was absolutely sure that in Oracle terms USER and SCHEMA actually synonyms. The schema does not have attributes and exists just like container for user objects. I was doing some R&D about allocating different TEMPORARY Tablespaces to different users in a system when discover that the same user use different tablespaces in different sessions, without any configuration changes.

I have two temporary tablespaces TEMP and TESTTEMP. Tablespace TEMP is default and associated to all users except TEST.

SELECT USERNAME,TEMPORARY_TABLESPACE FROM DBA_USERS WHERE USERNAME='TEST';
USERNAME TEMPORARY_TABLESPACE
TEST TESTTEMP

We have table TESTTABLE with 100000 rows. And I create spacial sql with massive sort operation.

SELECT * FROM TEST.TESTTABLE a FULL OUTER JOIN TEST.TESTTABLE b on 
a.COL1=b.COL2 FULL OUTER JOIN TEST.TESTTABLE c on b.COL1=c.COL1 ORDER BY 
1,2,3,4,5,6,7,8,9;

By default the query use expected TESTTEMP tablespaces

SELECT S.sid, S.username, SUM (T.blocks) * 8192 / 1024 / 1024 mb_used, T.tablespace
FROM gv$sort_usage T, gv$session S
WHERE T.session_addr = S.saddr AND t.inst_id=S.inst_id
GROUP BY S.sid, S.username, T.tablespace
ORDER BY sid;
SID USERNAME MB_USED TABLESPACE
23 TEST 408 TESTTEMP

But when I change the CURRENT_SCHEMA in a session (ALTER SESSION SET CURRENT_SCHEMA=SYSTEM;) I got different results

SID USERNAME MB_USED TABLESPACE
23 TEST 487 TEMP

The same query using the same objects from the same session without changing user configuration use different Temporary Tablespaces. I believe this feature can be very helpful for fine grand sort operation development.

Couple bonus remarks:
a) Calling SQL from PL/SQL objects created with definer rights works like switching to the object owner schema.
b) Using synonyms does not switch temporary tablespace to the tablespace of the referencing objects.

11.2.0.3 Client, Password Expiry and ORA-1017

I have had an intermittent issue at work recently where I am occasionally unable to change my password when it has expired.

From my weapon of choice, DOS prompt/”sqlplus” on Windows, I enter my current password and am prompted to change it due to its expiry. I then enter a new password, one I am confident is consistent with password policy and get a message stating my username/password is incorrect. You can see example output below (and yes I am absolutely certain I entered the same password twice :) ).

u:\sqls>sqlplus neiljohnson@db10g1

SQL*Plus: Release 11.2.0.3.0 Production on Wed Sep 19 09:36:05 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Enter password:
ERROR:
ORA-28001: the password has expired


Changing password for neiljohnson
New password:
Retype new password:
ERROR:
ORA-01017: invalid username/password; logon denied


Password unchanged
Enter user-name:

I was unable to find a hit in M.O.S and settled on a work-a-round of changing my password on the Unix database server, silently hoping I wasn’t going mad. After a few more occurrences I resorted to Google and via the OTN forums found an M.O.S note detailing the problem.

ORA-1017 Trying To Change Expired Password Through Programmatic Interface From 11.2.0.3 To Lower Database Version [ID 1426233.1]

Below is the suggested solution from this note and I find it out of step with Oracle’s usual stance on these things:

Solution
There are no workarounds available to programatically changing the password, apart from;

a) Use an older client version, or
b) Change the user’s password via an ALTER USER statement from a different session.
c) Upgrade the database version to 11.2.0.3

I’ve always been quite proud to tell my Sysadmin brethren that Oracle clients are backward compatible (and within reason forward compatible too). What we have here is the latest and greatest of the Oracle clients, 11.2.0.3, and a statement that I read as “this is not backward compatible – tough luck!”

I tried to diagnose this further with SQL trace on the Oracle server process but I only capture the insert into sys.aud$. I also tried SQL Net client tracing and in the trace file I can see an error recorded in between the ORA-28001 and ORA-01017. I’ve highlighted it in the extracts from the trace file below.

2012-09-19 10:44:33.304271 : nsbasic_brc:00 00 24 4F 52 41 2D 32  |..$ORA-2|
2012-09-19 10:44:33.304290 : nsbasic_brc:38 30 30 31 3A 20 74 68  |8001:.th|
2012-09-19 10:44:33.304309 : nsbasic_brc:65 20 70 61 73 73 77 6F  |e.passwo|
2012-09-19 10:44:33.304327 : nsbasic_brc:72 64 20 68 61 73 20 65  |rd.has.e|
2012-09-19 10:44:33.304346 : nsbasic_brc:78 70 69 72 65 64 0A     |xpired. |
2012-09-19 10:44:33.304364 : nsbasic_brc:exit: oln=0, dln=69, tot=79, rc=0
2012-09-19 10:44:33.304383 : nioqrc:exit
...
2012-09-19 10:45:59.358748 : nsbasic_brc:exit: oln=11, dln=1, tot=22, rc=0
2012-09-19 10:45:59.359278 : nioqrc: found a break marker...
2012-09-19 10:45:59.359797 : nioqrc: Recieve: returning error: 3111
2012-09-19 10:45:59.360333 : nioqrc:exit
2012-09-19 10:45:59.360834 : nioqrs:entry
...
2012-09-19 10:45:59.406650 : nsbasic_brc:00 00 33 4F 52 41 2D 30  |..3ORA-0|
2012-09-19 10:45:59.407173 : nsbasic_brc:31 30 31 37 3A 20 69 6E  |1017:.in|
2012-09-19 10:45:59.407702 : nsbasic_brc:76 61 6C 69 64 20 75 73  |valid.us|
2012-09-19 10:45:59.408227 : nsbasic_brc:65 72 6E 61 6D 65 2F 70  |ername/p|
2012-09-19 10:45:59.408753 : nsbasic_brc:61 73 73 77 6F 72 64 3B  |assword;|
2012-09-19 10:45:59.409280 : nsbasic_brc:20 6C 6F 67 6F 6E 20 64  |.logon.d|
2012-09-19 10:45:59.409820 : nsbasic_brc:65 6E 69 65 64 0A        |enied.  |

It seems from the M.O.S information below that ORA-03111 is not something I’m going to get very far with, in fact it’s probably a side effect of some other piece of code failing. Either way – it’s back to the work-a-round for me.

Error:  ORA 3111 
Text:   break received on communication channel 
-------------------------------------------------------------------------------
Cause:  A break was processed on the communications channel, but was not 
        properly handled by SQL*Net software.
        This is an internal error message not usually issued.
Action: Contact customer support.

Hopefully this post will save someone else some time if faced with the same issue, sadly I am not currently able to provide a solution.

Production Support Tips & Tricks #2 – SQL Trace

Part 1: Production Support Tip & Tricks #1 – Collecting Log Data

There’s no need to discuss SQL Trace other than the age-old concern – the overhead. Incident managers are risk averse and when you ask if you can trace a session on production you will always be asked what overhead it has and invariably you will say “erm… 5%… ish”. We can see from the Oracle documentation that Oracle’s stance on this has changed over the years:

9i docs:

“The additional overhead … is normally insignificant compared with the inherent overhead caused by the application’s inefficiency.”

11.2 docs:

“Using the SQL Trace facility can have a severe performance impact and may result in increased system overhead, excessive CPU usage, and inadequate disk space”

In this post I am going to attempt to assess the overhead of SQL Trace. My theory is that there are 2 potential factors.

  1. The overhead of the SQL Trace code. There must be extra code involved and it must take some amount of CPU to execute it.
  2. The overhead of writing trace files. Writing to a file can not be without any overhead.

My cunning plan is to test queries in 3 states:

  • With SQL Trace disabled
  • With SQL Trace enabled
  • With SQL Trace enabled but redirected to /dev/null thus removing any overhead of writing to the trace file

The queries will be tested in IO bound and CPU bound states. The tests were done on a Linux laptop running Oracle XE 11g with a single hard drive so the results are not “enterprise class” – your mileage may vary.

All of the code used can be found at these links, for brevity I have not included them inline – build-acme-schema.sql sqltrace-long1.sql sqltrace-long2.sql sqltrace-short1.sql sqltrace-short2.sql

Test #1 – IO bound queries with a long(ish) elapsed time

I say “long(ish)” as the queries only take a few seconds, the key thing is that the majority of time is spent fetching data. I flush the buffer cache before every execution to make sure the queries are IO bound. The test code can be found in sqltrace-long1.sql.

The results are:

Test \ Execution 1 2 3 4 5 6 Average % Overhead
Long IO bound no trace 5.45s 5.45s 5.52s 5.52s 5.45s 5.34s 5.45s -
Long IO bound with trace (32KB) 6.90s 6.89s 6.94s 7.09s 6.84s 7.01s 6.94s 27.31%
Long IO bound with trace (to null) 6.89s 6.82s 6.89s 6.94s 6.78s 7.12s 6.90s 26.61%

I was really surprised by this. For a small number of queries generating small trace files, only 32KB, I saw a 25% degradation in run time with SQL Trace enabled. The minimal size of the trace files did nothing to prove my theory about the overhead in writing the trace file but this was an eye opener.

Test #2 – CPU bound queries with a long(ish) elapsed time

This test used the same queries as before but I removed the flush of the buffer cache from before each query. This removed all IO waits. The test code can be found in sqltrace-long2.sql.

Test \ Execution 1 2 3 4 5 6 Average % Overhead
Long CPU bound no trace 3.79s 3.76s 3.80s 3.73s 3.71s 3.79s 3.76s -
Long CPU bound with trace (11KB) 4.90s 4.99s 5.02s 5.00s 4.89s 4.87s 4.94s 31.39%
Long CPU bound with trace (to null) 4.95s 5.00s 5.05s 5.05s 4.90s 4.86s 4.96s 32.02%

Interesting – so by running the same workload but with no disk reads to slow it down the overhead is slightly more pronounced. But infrequent long running queries are not the only fruit.

Test #3 – IO bound frequent queries with a short elapsed time

This test uses very fast queries in a tight loop. The proportion of SQL executes to row fetches is very different to the previous tests. The buffer cache is flushed before each execution. The test code can be found in sqltrace-short1.sql.

Test \ Execution 1 2 3 4 5 6 Average % Overhead
Short IO bound no trace 43.99s 43.2s 39.44s 41.81s 39.00s 40.03s 41.24s -
Short IO bound with trace (2.9MB) 41.35s 43.63s 40.70s 41.72s 43.24s 39.86s 41.75s 1.22%
Short IO bound with trace (to null) 41.12s 45.21s 41.50s 40.82s 42.44s 40.05s 41.85s 1.48%

So this more like it – only a 1.5% overhead.

Test #4 – CPU bound frequent queries with a short elapsed time

This test uses the same very fast queries in a tight loop from the previous test. The buffer cache flushes were removed and the number of iterations increased to get a longer elapsed time. The test code can be found in sqltrace-short2.sql.

Test \ Execution 1 2 3 4 5 6 Average % Overhead
Short CPU bound no trace 2.58s 2.46s 2.46s 2.63s 2.49s 2.47s 2.51s -
Short CPU bound with trace (32MB) 7.55s 7.52s 7.57s 7.64s 7.43s 7.67s 7.56s 200.72%
Short CPU bound with trace (to null) 4.45s 4.57s 4.69s 4.63s 4.54s 4.62 4.58s 82.23%

Ouch – a 200% overhead! Also the overhead of writing to the trace file was visible this time, SQL Trace overhead dropped from 200% to 80% when the trace file was /dev/null.

Summary

I am almost reluctant to summarise as these are not the most scientific of tests but the things I am taking away from this excercise are:

  • SQL workload that is CPU bound will see a dramatic degradation in elapsed time when SQL Trace is enabled – I’ll be wary of this in the future
  • SQL workload that is IO bound will see less of an impact from SQL Trace, but the impact could still be significant.
  • When trace file size is large then the speed of the trace file target filesystem is important. In these tests I used /dev/null which is of course cheating.
  • When IO response is important to a test remember to set filesystemio_option=setall. I wasted several evenings scratching my head at inconsistent, unpredictable test results.
  • Your mileage may vary

Feel free to point out mistakes in my train of thought or in the tests themselves. I’m not too happy with tests 1 & 2 – I’m not convinced they prove anything worthwhile. Tests 3 & 4 however have results I have found interesting and informative.

I think I’ll continue to tell Incident Managers that the overhead is around 5-10%, but only after I’ve given some careful thought to the criticality of the workload I want to trace.

Production Support Tips & Tricks #1 – Collecting Log Data

Early this year (2012) I started working on a presentation, it would be my first, that I hoped to submit to UKOUG. The thrust of the presentation was to be tips on making your experiences with Oracle Support more pleasant, to help keep your support analyst busy rather than yourself. A prospective title was “with Support like this who needs enemies” – perhaps that’s a big strong ;-). Several things colluded to make it unlikely I would get to present it so I faltered and things ground to a halt. After a period of inactivity I have decided to convert it into a short series of blog posts. This is the first. Part 2 is here – “Production Support Tips & Tricks #2 – SQL Trace

This post contains some advice for collecting log data when raising SRs. It’s mostly obvious but hopefully not to all.

ADR Package

What?
You already know so I’m not going to waste my breath.

Why?
Get everything packaged up, not just the trace files you think Oracle need. Avoids repeat requests.

How?
Well covered by others so I’m not going near it:
John Hallas quality UKOUG presentation
Uwe Hesse’s super blog entry

Example?
Nah – see above

Trivia
Not related to diagnostic collection but listener targets don’t auto purge so your housekeeping scripts need to make calls to adrci to force a purge.

Diagcollection.sh for clusters

What?
diagcollection.sh is a script in your CRS home which collates all CRS related log files on the current cluster node.

Why?
It’s not easy manually collecting everything Oracle Support may require. This script makes it easy.

How?
Several options, you can check them with the “-h” option. Or just collect everything:

$ diagcollection.sh

Uncompressed the resulting tar file can be very large

-rw-r--r-- 1 grid oinstall 1.1G Feb 22 21:49 crsData_n02_20120222_2144.tar

Even compressed the file can still be a lengthy upload to M.O.S (multiplied by the # of nodes)

-rw-r--r-- 1 grid oinstall 69M Feb 22 21:49 crsData_n02_20120222_2144.tar.gz

Trivia?
diagcollection.sh is just a wrapper for diagcollection.pl.

OS Watcher Black Box (OSWbb)

What?
A quote from the user guide:

a collection of UNIX shell scripts intended to collect and archive operating system and network metrics to aid support in diagnosing performance issues

Why?
“Because every vendor wants to blame another vendor and OSWbb helps that process”
or
“Because every issue is the fault of the database so you need ammunition to feed to your vendor”
or
“insert your own cynical quote here”

How?
Download from M.O.S – “OS Watcher Black Box User Guide [ID 301137.1]“. It is certified on AIX, Tru64, Solaris, HP-UX, Linux.

It is easy to run:

nohup ./startOSWbb.sh &

easy to stop

./stopOSWbb.sh

and easy to send

$ ./tarupfiles.sh
-rw-r--r-- 1 oracle oinstall 1.2M Feb  8 22:00 osw_archive_0208122216.tar.Z

Example?

You can install OSWbb as a Linux service – “How To Start OSWatcher Black Box Every System Boot [ID 580513.1]” or use any scheduling tool. Alternatvely you can control it via CRS, this way it is only active when the cluster is active which has plus and minus points. For details of this see M.O.S note “Making Applications Highly Available Using Oracle Clusterware [ID 1105489.1]“.

To do it you need an action script, there is a perfectly good demo one in “$GRID_HOME/crs/demo”. Alternatively the one I use for testing at home can be found here – osw.scr (use at your peril).

$GRID_HOME/bin/crsctl add resource osw -type ora.local_resource.type \
 -attr "AUTO_START=always,ACTION_SCRIPT=$GRID_HOME/crs/script/oswbb.scr"
$  $GRID_HOME/bin/crsctl status res osw
NAME=osw
TYPE=ora.local_resource.type
TARGET=ONLINE       , ONLINE
STATE=ONLINE on n01, ONLINE on n02

Trivia?
From “OS Watcher For Windows (OSWFW) User Guide [ID 433472.1]“:

OS Watcher for Windows is no longer supported.
It has been replace by the Cluster Health Monitor.

From “Cluster Health Monitor (CHM) FAQ [ID 1328466.1]“

Is the Cluster Health Monitor replacing OSWatcher?
…there [is] some information such as top, traceroute, and netstat that the Cluster Health Monitor does not collect, so running the Cluster Health Monitor while running OSWatcher is ideal. Both tools complement each other rather than supplement…

In my opinion another reason for still using OSWbb in spite of CHM is that CHM is very difficult to review yourself, it is also not yet the tool of choice for many within Oracle Support. OSWbb still has a place.

Quote from traceroute Unix man page by way of caveat:

Because of the load it could impose on the network, it is unwise to use traceroute during normal operations or from automated scripts.

Hmmmmm…..

“OS Watcher Black Box” was originally called “OS Watcher” but was renamed due to a clash of names with other unrelated, non-Oracle tool(s).

More to follow in the future

Mining the listener log with Perl

Update – 02-MAY-2012
I have had cause to use this script again at work so thought I’d develop it further. This post has been updated to reflect the new version.

Recently at work I was engaged by one of our application support teams because of application calls failing with “ORA-00020: maximum number of processes (300) exceeded”. We quickly identified that these errors were the symptom of another problem and not a concern in themselves, the real issue was slow database calls due to an infrastructure issue. The ORA-00020 errors were because the application was spawning new connections to try to work around the slow connections. Now I don’t know much about application servers but I know this is not uncommon. The application team however could not find any indication of this in their logs so it was up to me to prove it.

As far as I am aware there are 2 options for retrospectively proving this at a granular level (please share any other options I’ve missed):

1) Query the database audit trail – DBA_AUDIT_SESSION
2) Mine the listener log

On this database no auditing was enabled so I was left with the listener log as my data source.

I knocked up a quick KSH/AWK script to get my data and then massaged it via various means. The output was quite pleasing so over the recent long UK Easter holiday I Googled some Perl commands and had some fun with it, this post is to share the script and an example chart from Microsoft Excel based on the incident I described above.

The script can be downloaded from – llog-miner.pl

Below is an example of how to call the script – it works on *nix and Windows. The “-f” parameter is used to filter log entries on the date, this example is matching any minute between “11:00″ and “12:59″. I’ve included a full usage map further down this post.

C:\>perl llog-miner.pl -excel -f "04-APR-2012 1[12]" listener.log  > llog.csv

This is the chart showing the logon spike – click to enlarge. You can see the connection spike at 12:56 of over 250 connection attempts.

Connection rate from listener log

The script allows a few other options such as allowing listener logs to be “piped” into the script to allow multiple files to be processed in one command. Usage notes below.

C:\>perl llog-miner.pl -help

    Usage: llog-miner.pl [-help] [-filter <string>] [-[no]excel] [-[no]host] <listener_log_path|->
         : -excel|noexcel       : Output dates in Excel friendly format (default is noexcel)
         : -help                : Print this help
         : -host|nohost         : Include host on program (default is to include)
         : -filter <string>     : A regexp to match beginning of log line. e.g. "23-DEC-2012 0[12]" or "..-MAR-2013"
         : listener_log_path|-  : Input either a valid listener log name or "-" to read STDIN

    Examples:
         # Process listener.log looking for entries on the 21st and 22nd of Sept and format dates for Excel
         C:\>perl llog-miner.pl -filter "2[12]-SEP-2011" -excel listener.log

         # Process two listener log files looking for entries on Jan 3rd between 20:00 and 21:59 and exclude the hostname
         $ cat listener.log listener_db.log | llog-miner.pl -filter "03-JAN-2012 2[01]" -nohost -

An improvement I may make in the future is to expand minutes where there are no connection attempts. I have drafted some code to fill in gaps but have kept it to myself for now as it’s too ugly to share (at the moment).

I hope someone finds this useful and if not, well I found it entertaining if nothing else :-)