Oracle 12c PDB snapshot copy on regular filesystem

A month ago I was at Paris Oracle meetup with 4 great presentations from 2 cool Swiss guys that I knew from their really good blogs :
– Ludovico Caldara
– Franck Pachot

After the first presentation by Ludovico Caldara about “Oracle ACFS and Oracle Database: the perfect match ?”, I almost ruined the presentation when I said that you don’t need ACFS, Direct-NFS (CloneDB) or other Copy-On-Write filesystem to do this as it is available on the latest Oracle 12c.. working on “conventional” filesystem.

So I promise to write about it.. It is a quite simple post, this is just to “advertise” the feature. I didn’t had the time to look more into it when I discover it. But I am sure, some people will look more at it.

Test : Oracle on RedHat 6.6 (filesystem ext4).
Pre-requisite : CLONEDB initialization parameter is set to TRUE

Show PDB information

---------- ---------- ---------- -------------------------------- ------------------------------ ---------- --- --------------------------------------------------------------------------- ---------- ---------- ---------- -------- ----------------------
 2 3587124922 3587124922 0ED34F67D0F81298E053ED22070A751B PDB$SEED READ ONLY NO 30-APR-15 PM +02:00 1594330 838860800 8192 ENABLED 0
 3 2362709155 2362709155 0ED37F0B74A72742E053ED22070A2DCA DB12C_PDB1 READ WRITE NO 30-APR-15 PM +02:00 1792203 2116812800 8192 ENABLED 0

Let’s use the PDB called DB12C_PDB1, and create a snapshot from it.

First get a consistent (Closed) PDB to use a source. Then re-opened it as Read-Only.

SQL> alter pluggable database exit close; 

SQL> alter pluggable database DB12C_PDB1 open read only; 

Now, let’s create the directory that will host the sparsed files for the snapshot copies PDB.

SQL> ! mkdir -p /u01/app/oracle/oradata/DB12C/DB12C_JCD1

Here we go, we create the snapshot copy !

SQL> create pluggable database DB12C_JCD1 from DB12C_PDB1 snapshot copy file_name_convert=('/u01/app/oracle/oradata/DB12C/DB12C_PDB1','/u01/app/oracle/oradata/DB12C/DB12C_JCD1');

Pluggable database created.

Then we open the target PDB.

SQL> alter pluggable database DB12C_JCD1 open;

Pluggable database altered.

SQL> alter session set container=DB12C_JCD1;

Now let’s check the datafiles on the ext4 filesystem and check the space actually used (sparse files).

SQL> @datafile %

Show Tablespace information for TBS matching %%%

 ID# filename                                                                   Tablespace              TotalMB     UsedMB     FreeMB  MaxsizeMB     Blocks    Extents Status     AutoExt       Incr
---- -------------------------------------------------------------------------- -------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ------- ----------
  81 /u01/app/oracle/oradata/DB12C/DB12C_JCD1/APPDATA01.DBF                     APPDATA                     100          1         99      32767      12800          1 A          Y            65536
  82 /u01/app/oracle/oradata/DB12C/DB12C_JCD1/APPDATA02.DBF                     APPDATA                     100          1         99      32767      12800          1 A          Y            65536
  83 /u01/app/oracle/oradata/DB12C/DB12C_JCD1/APPDATA03.DBF                     APPDATA                     100          1         99      32767      12800          1 A          Y            65536
  84 /u01/app/oracle/oradata/DB12C/DB12C_JCD1/APPINDEXES01.DBF                  APPINDEXES                  100          1         99      32767      12800          1 A          Y            65536
  85 /u01/app/oracle/oradata/DB12C/DB12C_JCD1/APPINDEXES02.DBF                  APPINDEXES                  100          1         99      32767      12800          1 A          Y            65536
  74 /u01/app/oracle/oradata/DB12C/DB12C_JCD1/sysaux01.dbf                      SYSAUX                      580        543         36      32767      74240          1 A          Y             1280
  73 /u01/app/oracle/oradata/DB12C/DB12C_JCD1/system01.dbf                      SYSTEM                      410        405          4      32767      52480          2 A          Y             1280
  75 /u01/app/oracle/oradata/DB12C/DB12C_JCD1/DB12C_PDB1_users01.dbf            USERS                         8          7          1      32767       1120          2 A          Y              160

13 rows selected.

Show Temporary Tablespace information for TBS matching %%%

 ID# filename                                                                         Tablespace              TotalMB     UsedMB     FreeMB  MaxsizeMB     Blocks    Extents Status     AutoExt       Incr Used
---- -------------------------------------------------------------------------------- -------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ------- ---------- ------------------------------
   4 /u01/app/oracle/oradata/DB12C/DB12C_JCD1/pdbseed_temp01DB12C.dbf        TEMP                         20         20          0      32767       2560          1 ONLINE     Y               80 |####################|

Let’s check the “actual” space used on the filesystem using “du”. You’ll notice that only a few kilobytes are used on the filesystem. Of course, it will increase as you modify your snapshot PDB !

SQL> ! du -k /u01/app/oracle/oradata/DB12C/DB12C_JCD1/*

16      /u01/app/oracle/oradata/DB12C/DB12C_JCD1/APPDATA01.DBF
16      /u01/app/oracle/oradata/DB12C/DB12C_JCD1/APPDATA02.DBF
16      /u01/app/oracle/oradata/DB12C/DB12C_JCD1/APPDATA03.DBF
16      /u01/app/oracle/oradata/DB12C/DB12C_JCD1/APPINDEXES01.DBF
16      /u01/app/oracle/oradata/DB12C/DB12C_JCD1/APPINDEXES02.DBF
16      /u01/app/oracle/oradata/DB12C/DB12C_JCD1/DB12C_PDB1_users01.dbf
56      /u01/app/oracle/oradata/DB12C/DB12C_JCD1/pdbseed_temp01DB12C.dbf
16      /u01/app/oracle/oradata/DB12C/DB12C_JCD1/sysaux01.dbf
172     /u01/app/oracle/oradata/DB12C/DB12C_JCD1/system01.dbf

Now let’s check the real space used on the original PDB.

SQL> alter session set container=DB12C_PDB1;

SQL> @datafile %

Show Tablespace information for TBS matching %%%

 ID# filename                                                                   Tablespace              TotalMB     UsedMB     FreeMB  MaxsizeMB     Blocks    Extents Status     AutoExt       Incr
---- -------------------------------------------------------------------------- -------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ------- ----------
  16 /u01/app/oracle/oradata/DB12C/DB12C_PDB1/APPDATA01.DBF                     APPDATA                     100          1         99      32767      12800          1 A          Y            65536
  17 /u01/app/oracle/oradata/DB12C/DB12C_PDB1/APPDATA02.DBF                     APPDATA                     100          1         99      32767      12800          1 A          Y            65536
  18 /u01/app/oracle/oradata/DB12C/DB12C_PDB1/APPDATA03.DBF                     APPDATA                     100          1         99      32767      12800          1 A          Y            65536
  19 /u01/app/oracle/oradata/DB12C/DB12C_PDB1/APPINDEXES01.DBF                  APPINDEXES                  100          1         99      32767      12800          1 A          Y            65536
  20 /u01/app/oracle/oradata/DB12C/DB12C_PDB1/APPINDEXES02.DBF                  APPINDEXES                  100          1         99      32767      12800          1 A          Y            65536
   9 /u01/app/oracle/oradata/DB12C/DB12C_PDB1/sysaux01.dbf                      SYSAUX                      580        543         36      32767      74240          1 A          Y             1280
   8 /u01/app/oracle/oradata/DB12C/DB12C_PDB1/system01.dbf                      SYSTEM                      410        405          4      32767      52480          2 A          Y             1280
  10 /u01/app/oracle/oradata/DB12C/DB12C_PDB1/DB12C_PDB1_users01.dbf            USERS                         8          7          1      32767       1120          2 A          Y              160

13 rows selected.

Show Temporary Tablespace information for TBS matching %%%

 ID# filename                                                                Tablespace              TotalMB     UsedMB     FreeMB  MaxsizeMB     Blocks    Extents Status     AutoExt       Incr Used
---- ----------------------------------------------------------------------- -------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ------- ---------- ------------------------------
   3 /u01/app/oracle/oradata/DB12C/DB12C_PDB1/pdbseed_temp01DB12C.dbf        TEMP                         20         20          0      32767       2560          1 OFFLINE    Y               80 |####################|

And on the filesystem.

! du -k /u01/app/oracle/oradata/DB12C/DB12C_PDB1/*
102412  /u01/app/oracle/oradata/DB12C/DB12C_PDB1/APPDATA01.DBF
102412  /u01/app/oracle/oradata/DB12C/DB12C_PDB1/APPDATA02.DBF
102412  /u01/app/oracle/oradata/DB12C/DB12C_PDB1/APPDATA03.DBF
102412  /u01/app/oracle/oradata/DB12C/DB12C_PDB1/APPINDEXES01.DBF
102412  /u01/app/oracle/oradata/DB12C/DB12C_PDB1/APPINDEXES02.DBF
8968    /u01/app/oracle/oradata/DB12C/DB12C_PDB1/DB12C_PDB1_users01.dbf
1024    /u01/app/oracle/oradata/DB12C/DB12C_PDB1/pdbseed_temp01DB12C.dbf
593932  /u01/app/oracle/oradata/DB12C/DB12C_PDB1/sysaux01.dbf
419852  /u01/app/oracle/oradata/DB12C/DB12C_PDB1/system01.dbf

That’s it for the demo, really simple. I am sure Ludovico Caldara will be eager to investigate more on the internals of this nice feature that not many people seem to know about.

Hope this helps.

Slow JDBC Connections, STRACE and Random Numbers

I was involved in an interesting problem in the office recently. The problem is already documented in other posts such as:

Oracle 11g JDBC driver hangs blocked by /dev/random – entropy pool empty
Oracle JDBC Intermittent Connection Issue
MOS note “ Hangs After Upgrade To (Doc ID 1585759.1)”

Essentially the problem is that /dev/random will block when there is insuffient entropy and JDBC uses /dev/random when negotiating a connection. I’ll not go back over information from the web pages above, please have a quick read of one of them.

What I wanted to do was demonstrate the issue using a Java program from the above MOS note and show evidence of the issue and delays. All the program does is connect to a database using a JDBC thin driver and “SELECT USER FROM dual”, I’ll run it three times as below:

time java oraConn &
time java oraConn &
time java oraConn &

and this is the output (“NJ” is my database account name):

real    0m2.001s
user    0m1.093s
sys     0m0.094s

real    0m39.051s
user    0m1.163s
sys     0m0.089s

real    0m44.316s
user    0m1.162s
sys     0m0.087s

2 seconds for the first connection and 39 and 44 seconds for the others! Not good. We can use “strace” to find out what the processes are doing during these delays.

strace -t -o runjv1.trc -f java oraConn &
strace -t -o runjv2.trc -f java oraConn &
strace -t -o runjv3.trc -f java oraConn &

This is what I see in my strace output for one of the delayed sessions, repeated timeouts, many more than displayed below:

tail -f runjv1.trc
2755  21:02:02 futex(0x7f1c440b1054, FUTEX_WAIT_BITSET_PRIVATE, 1, {3390, 520708960}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
2755  21:02:02 futex(0x7f1c440b1028, FUTEX_WAKE_PRIVATE, 1) = 0
2755  21:02:02 futex(0x7f1c440b1054, FUTEX_WAIT_BITSET_PRIVATE, 1, {3390, 571177374}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
2755  21:02:02 futex(0x7f1c440b1028, FUTEX_WAKE_PRIVATE, 1) = 0
2755  21:02:02 futex(0x7f1c440b1054, FUTEX_WAIT_BITSET_PRIVATE, 1, {3390, 621626926}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
2755  21:02:02 futex(0x7f1c440b1028, FUTEX_WAKE_PRIVATE, 1) = 0
2755  21:02:02 futex(0x7f1c440b1054, FUTEX_WAIT_BITSET_PRIVATE, 1, {3390, 672072270}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
2755  21:02:02 futex(0x7f1c440b1028, FUTEX_WAKE_PRIVATE, 1) = 0
2755  21:02:02 futex(0x7f1c440b1054, FUTEX_WAIT_BITSET_PRIVATE, 1, {3390, 722524322}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
2755  21:02:02 futex(0x7f1c440b1028, FUTEX_WAKE_PRIVATE, 1) = 0
2755  21:02:02 futex(0x7f1c440b1054, FUTEX_WAIT_BITSET_PRIVATE, 1, {3390, 772872653}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
2755  21:02:02 futex(0x7f1c440b1028, FUTEX_WAKE_PRIVATE, 1) = 0
2755  21:02:02 futex(0x7f1c440b1054, FUTEX_WAIT_BITSET_PRIVATE, 1, {3390, 823298750}, ffffffff) = -1 ETIMEDOUT (Connection timed out)

If I scroll up to the start of the block of timeouts I see this:

2734  21:01:30 read(15,  <unfinished ...>
2755  21:01:30 <... futex resumed> )    = -1 ETIMEDOUT (Connection timed out)
2755  21:01:30 futex(0x7f1c440b1028, FUTEX_WAKE_PRIVATE, 1) = 0
2755  21:01:30 futex(0x7f1c440b1054, FUTEX_WAIT_BITSET_PRIVATE, 1, {3359, 26945244}, ffffffff) = -1 ETIMEDOUT (Connection timed out)

We see the “read” on file descriptor 15 executed by PID 2734 on line 1 above. What are we reading from when reading descriptor 15? Further up the trace file we see:

2734  21:01:30 open("/dev/random", O_RDONLY) = 15
2734  21:01:30 fstat(15, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 8), ...}) = 0
2734  21:01:30 fcntl(15, F_GETFD)       = 0
2734  21:01:30 fcntl(15, F_SETFD, FD_CLOEXEC) = 0

“/dev/random”! Nice, this agrees with the referenced posts.

We can monitor the available entropy by running this shell snippet during a re-test:

while [ 1 ];do
cat /proc/sys/kernel/random/entropy_avail
sleep 1

And this is what we see, the available entropy drops causing reads on /dev/random to block. There are a few solutions in the referenced blog and MOS notes.

Wed Aug 19 21:28:07 BST 2015
Wed Aug 19 21:28:08 BST 2015
Wed Aug 19 21:28:09 BST 2015
Wed Aug 19 21:28:10 BST 2015
Wed Aug 19 21:28:11 BST 2015
Wed Aug 19 21:28:13 BST 2015
Wed Aug 19 21:28:14 BST 2015
Wed Aug 19 21:28:15 BST 2015
Wed Aug 19 21:28:16 BST 2015
Wed Aug 19 21:28:17 BST 2015
Wed Aug 19 21:28:18 BST 2015
Wed Aug 19 21:28:19 BST 2015
Wed Aug 19 21:28:20 BST 2015
Wed Aug 19 21:28:21 BST 2015
Wed Aug 19 21:28:22 BST 2015
Wed Aug 19 21:28:23 BST 2015
Wed Aug 19 21:28:24 BST 2015
Wed Aug 19 21:28:25 BST 2015
Wed Aug 19 21:28:26 BST 2015

I fully appreciate I am going over work covered by others but I thought the “strace” analysis added to other posts. Also it doesn’t hurt to highlight this issue again as I’m really surprised this is not more widely known as the use of virtual machines is so prevalent these days. I presume connection pools hide this from us most of the time.

ORA-64359: INMEMORY clause may not be specified for virtual columns

The title above gives you immediate knowledge of the end result of this post. If you want to know the whys and wherefores then by all means read on.

I was on a call yesterday listening to a presentation on Database 12c new features. The In-Memory Column Store received a lot of interest, as did the new JSON functionality.

One person on the call asked an interesting question, it was something along the lines of:

“If we have a table containing a JSON document can we use the In-Memory Column Store to optimise reports on the JSON attributes”

After some discussion it was decided that the best way to do that is to store the required JSON attributes in dedicated regular columns and report on those. After the call I thought about this some more and wondered if we can:

a) Expose JSON attributes as virtual columns and…
b) Utilise the In-Memory Column Store to report on those virtual columns

I thought these tests were worth executing. Here they are:

a) Can we expose JSON attributes as virtual columns

In order to set up the test I create a table with a JSON column and a virtual column using the function JSON_VALUE to retrieve a scalar value for an attribute called “Name”.

(   id          NUMBER (10) NOT NULL
,   date_loaded DATE
,   doc         CLOB
    CONSTRAINT ensure_json CHECK (doc IS JSON)
,   doc_name AS (JSON_VALUE(doc,'$.Name'))

Table created.

SQL> desc json_docs
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                        NOT NULL NUMBER(10)
 DATE_LOADED                                        DATE
 DOC                                                CLOB
 DOC_NAME                                           VARCHAR2(4000)

Looks promising. Let’s add some data.

insert into json_docs (id, date_loaded, doc)
values (1,sysdate,'{"Name": "Neil", "Email": "", "Job": "DBA"}');
insert into json_docs (id, date_loaded, doc)
values (2,sysdate,'{"name": "Bob", "Email": "", "Job": "DBA"}');

set lines 120
column doc format a60
column doc_name format a10
select * from json_docs;

 ID DATE_LOAD DOC                                                          DOC_NAME
--- --------- ------------------------------------------------------------ --------
  1 13-NOV-14 {"Name": "Neil", "Email": "", "Job": "DBA"}    Neil
  2 13-NOV-14 {"name": "Bob", "Email": "", "Job": "DBA"}

And there we have it. When the attribute “Name” is present at the top level of the JSON document then we can show it as a dedicated virtual column. I’m not suggesting this is a good idea but… it works.

b) Can we utilise the In-Memory Column Store to report on virtual columns

First I’ll add more data to the JSON_DOCS table.

delete json_docs;
insert into json_docs (id,date_loaded,doc)
select rownum, sysdate, '{"Name": "'||object_name||'", "Type": "'||object_type||'"}'
from all_objects
where rownum <= 500;

And set the table to be INMEMORY, excluding the JSON document column.

SQL> show parameter inmemory_size

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
inmemory_size                        big integer 252M


Next query the data to kick of a load to the “In-Memory Area”

select count(*) from json_docs;
column segment_name format a30
select segment_name,populate_status,bytes_not_populated from v$im_segments order by 1;

------------------------------ --------- -------------------
JSON_DOCS                      COMPLETED                   0

That looks promising. So let’s check which columns are cached.


--------------- ---------------- ---------------------
JSON_DOCS       ID               DEFAULT
JSON_DOCS       DOC              NO INMEMORY

Not the virtual column… Can we force it:

ALTER TABLE json_docs INMEMORY (date_loaded, doc_name) NO INMEMORY (doc) ;

ERROR at line 1:
ORA-64359: INMEMORY clause may not be specified for virtual columns

Nope. And probably for good reasons.


This post is part of a series on why sometimes we don’t need to pack data in, we need to spread it out. In the previous post I discussed some old-school methods to achieve this. This post talks about a technique that has been available for many Oracle versions so is also pretty old school but I thought it was worth a post all to itself. The technique uses the ALTER TABLE clause “MINIMIZE RECORDS_PER_BLOCK”.



I have included a link to Oracle documentation above but one of the first things to highlight is this feature is light on official documentation. In fact the above page and a brief reference in the SQL Language Quick Reference are just about all there is. Here is a quote from the above link, the emphasis is mine.

“The records_per_block_clause lets you specify whether Oracle Database restricts the number of records that can be stored in a block. This clause ensures that any bitmap indexes subsequently created on the table will be as compressed as possible.”

There is no example in the official documentation but there is a very nice example on Richard Foote’s blog titled “Bitmap Indexes & Minimize Records_Per_Block (Little Wonder)”, and he works for Oracle so perhaps it counts as official documentation!

In the documentation above we have a mention of “restricts the number of records that can be stored in a block” but no clear guidance that it can be used for the reason we desire, to spread out data. If I can’t find what I want in the Oracle documentation my next port of call is My Oracle Support and there is a note describing how we might use this clause for our purpose.

Oracle Performance Diagnostic Guide (OPDG) [ID 390374.1]

In essence the note describes a “hot block” issue and a solution which is to spread data out. Suggested methods are to utilise PCTFREE or the table option, MINIMIZE RECORDS_PER_BLOCK. The note also goes through steps to highlight how to achieve our goal. Which leads us on to the next section.

How To [Mis]use MINIMIZE RECORDS_PER_BLOCK To Spread Out Data

The basic set of steps when using this clause to spread out data is:

  1. Temporarily remove data from the table, typically with CTAS or Datapump/Export followed by truncate or delete
  2. Insert the desired number of dummy records any data block should hold
  3. Restrict the number of records that can be stored in any block to the maximum number currently held
  4. Delete the dummy rows
  5. Reinstate the application data

Here is an example.

In the first post in the series I introduced the PROCSTATE table with all rows of data in a single block.

select dbms_rowid.rowid_block_number(rowid) blockno
,      count(*)
from procstate
group by dbms_rowid.rowid_block_number(rowid);
---------- ----------
    239004         12

First let’s copy the data elsewhere and truncate the table.

SQL> create table procstate_store
  2  as select * from procstate;

Table created.

SQL> truncate table procstate;

Table truncated.

Next we insert a dummy row and restrict the number of rows any block will hold.

SQL> insert into procstate
  2  select * from procstate_store
  3  where rownum = 1;

1 row created.

SQL> commit;

Commit complete.

SQL> select count(*) from procstate;


SQL> alter table procstate minimize records_per_block;

Table altered.

Finally we delete the dummy row and re-insert the original data.

SQL> truncate table procstate;

Table truncated.

SQL> insert into procstate
  2  select * from procstate_store;

12 rows created.

SQL> commit;

Commit complete.

And we should have only one row per block right? Wrong!

SQL> select dbms_rowid.ROWID_BLOCK_NUMBER(rowid) blockno, count(*)
  2  from procstate group by dbms_rowid.ROWID_BLOCK_NUMBER(rowid);

---------- ----------
    278668          2
    278669          2
    278670          2
    278667          2
    278688          2
    278671          2

6 rows selected.

Two rows in each block… but that’s still a good result and using the same test as in the previous post I can see a huge reduction in contention when running my primitive test case. Original results on the left and new results with two rows per block on the right.

Contention  11g normal heap table chart-11g-mrpb

Under The Covers

In this section we’ll dig a bit deeper in to how this works and perhaps get some insight into why the example above resulted in two rows per block.

When MINIMIZE RECORDS_PER_BLOCK is used it manipulates a property in SYS.TAB$ in the SPARE1 column. This property is known as the Hakan Factor (no I don’t know why either but I do notice there is a Mr Hakan Jacobsson listed as an author of the Performance Tuning Guide and Data Warehousing Guide… who knows). Below is a query showing the Hakan Factor for a simple table stored in a tablespace using 8KB blocks.

select spare1 from$ where obj# = 18013;


The Hakan Factor is set by default for all heap tables, or more correctly, for all table segments of heap tables. Below is a table showing how it changes as block size changes. It makes sense that larger blocks can hold more rows.

----------  ------
       4KB     364
       8KB     736
      16KB    1481
      32KB    2971

After a minimize operation with only a single row in a table it would be reasonable to expect SPARE1 to be set to “1”. So let’s check the value stored after a MINIMIZE operation on the PROCSTATE table.


This is because the MINIMIZE operation sets a flag in the 16th bit of the Hakan factor. We can see this using the BITAND SQL function in a query like the one below. This query uses BITAND to check if a specific bit is set in a number. So the increasing powers of 2 have been passed in. I have then used the LEAST() or GREATEST() functions to convert the result to a “1” or “0”.

select spare1
,      least(1,BITAND(spare1,32768)) c32k
,      least(1,BITAND(spare1,16384)) c16k
,      least(1,BITAND(spare1,8192)) c8k
,      least(1,BITAND(spare1,4096)) c4k
,      least(1,BITAND(spare1,2048)) c2k
,      least(1,BITAND(spare1,1024)) c1k
,      least(1,BITAND(spare1,512)) c512
,      least(1,BITAND(spare1,256)) c256
,      least(1,BITAND(spare1,128)) c128
,      least(1,BITAND(spare1,64)) c64
,      least(1,BITAND(spare1,32)) c32
,      least(1,BITAND(spare1,16)) c16
,      least(1,BITAND(spare1,8)) c8
,      least(1,BITAND(spare1,4)) c4
,      least(1,BITAND(spare1,2)) c2
,      greatest(0,BITAND(spare1,1)) c1
where obj# = (select obj# from sys.obj$
              where name = 'PROCSTATE');

-- After MINIMIZE with 1 row in PROCSTATE                                 ( decimal 1 )
SPARE1 C32K C16K  C8K  C4K  C2K  C1K C512 C256 C128  C64  C32  C16   C8   C4   C2   C1
------ ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
 32769    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1

-- After MINIMIZE with 2 rows in PROCSTATE                                ( decimal 1 )
 32769    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1

-- After MINIMIZE with 3 rows in PROCSTATE                                ( decimal 2 )
 32770    1    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0

-- After MINIMIZE with 4 rows in PROCSTATE                                ( decimal 3 )
 32771    1    0    0    0    0    0    0    0    0    0    0    0    0    0    1    1

-- After MINIMIZE with 5 rows in PROCSTATE                                ( decimal 4 )
 32771    1    0    0    0    0    0    0    0    0    0    0    0    0    1    0    0

-- etc etc

Notice the 32K flag is set for all numbers and the lower bits specify the row limit. There is an anomaly for a single row and then the values lag one behind the desired row limit. For example with three rows in the table the Hakan factor has the MINIMIZE flag in the 16th bit and binary “10” (decimal 2) stored. So with a default Hakan factor of 736 the limit is actually set at 737 rows. I can only think that the anomaly where 1 row and 2 rows have the same value stored are down to this feature being in place to optimise bitmap indexes and the difference between one and two rows is irrelevant. Or perhaps storing a zero was ugly and a compromise was made, we’ll never know.

It is worth noting at this point that SPARE1 is a multi use column and a COMPRESS operation will set a flag in the 18th bit. I have not seen a case where the 13th – 15th bits are used. Perhaps they are left for future proofing MINIMIZE from block sizes above 32KB (should Oracle ever decide to implement them). Anyway, back on track…

When “MINIMIZE RECORDS_PER_BLOCK” is executed the Oracle process full scans the table and uses the undocumented SYS_OP_RPB() function to retrieve the row number of every row within its block. e.g.

select max(sys_op_rpb(rowid)) from procstate;

The SYS_OP_RPB() function returns the same result as the documented DBMS_ROWID.ROWID_ROW_NUMBER() function. Output from a table containing four records is below:

select rownum
, dbms_rowid.rowid_relative_fno(rowid) file_no
, dbms_rowid.rowid_block_number(rowid) blk_no
, dbms_rowid.rowid_row_number(rowid) row_no
, sys_op_rpb(rowid) row_rno2
from haktest;

---------- ---------- ---------- ---------- ----------
         1         11      18676          0          0
         2         11      18676          1          1
         3         11      18676          2          2
         4         11      18676          3          3

Notice the numbering starts at 0, this explains why the value stored in TAB$.SPARE1 is one less to our eyes than the actual limit. This is true of a block dump too, the numbering of rows in a data block starts at 0.

Finally, to return a table to its default setting the ALTER TABLE NOMINIMIZE clause is used.


In summary the MINIMIZE clause is not very well documented and definitely not very well exposed in the data dictionary. It is intended for use with bitmap indexes but can be also be used to spread data out in a similar way to PCTFREE but by managing the number of rows rather than amount of free space. I have used MINIMIZE in the past but now that I am older and less excitable I would probably stick to other methods of achieving my goals (e.g. PCTFREE or partitioning).

In the next post in the series we’ll look at some newer ways we can spread out data.

Spreading Out Data – Old School Methods

In the previous post in this series we introduced a crude test case in which we had data packed in to a single block which was causing severe buffer contention. To re-cap – here is a chart of ASH data demonstrating the issue, the large grey stripe is time wasted on global cache contention.

Contention  11g normal heap table

This post is to cover some “old school” methods that can be used to spread out the data and hopefully reduce contention. I have purposefully ignored the possibility of a design flaw in the code, for the purposes of this series we are assuming the code is in good shape.

Node Affinity

Nothing to do with spreading data out but an obvious remedy for global cache contention by managing where the workload runs. I’m not going to discuss this too much as it doesn’t protect us from “buffer busy waits” within a single instance, however in some cases node affinity would be the correct solution. I have included a chart from the same test case as before but usilising only a single RAC node. You can see that the global cache contention (grey) has gone but we do still have a little “buffer busy wait” contention (red).

Contention 11g nodeaffinity


Oracle documentation: PCTFREE

This option doesn’t really need any explanation but here we create the test table with the clause “PCTFREE 99” and reload the data.

create table procstate
) pctfree 99;

And the data is now spread out.

select dbms_rowid.rowid_block_number(rowid) blockno
, count(*)
from procstate
group by dbms_rowid.rowid_block_number(rowid);

---------- ----------
    239021          2
    239040          2
    239022          2
    239020          2
    239019          2
    239023          2

The test case row size is quite small, approximately 0.5% of the free space in a block, so we are left with two rows in each block. Looking at ASH data from the test case shows that contention is much reduced, a nice solution.

Contention 11g pctfree

Single Table Hash Cluster

Oracle documentation: Single Table Hash Cluster

This option didn’t occur to me initially. I tend to avoid Hash Clusters but when I presented on this topic at UKOUG Tech13 a leading light in the Oracle Community suggested this would also be a good solution. So here it is.

The test table has been created in a Hash Cluster stating the size of a cluster of records as 8000 bytes. This is a cheat to ensure each cluster of records ends up in its own block. We are clustering on the primary key column so each block will contain only a single row.

create cluster procstate_cluster
(proc_id number(3)) 
size 8000 single table hashkeys 100;

create table procstate
cluster procstate_cluster(proc_id);

Before loading the data we already have 110 empty blocks. This is because we have configured the cluster to have 100 hash keys and, as stated above, ensured each key maps to a different block. Oracle then rounds up the number of cluster keys to the next prime number (109) and the extra block is the table header.


After loading the test data we see there is only a single row in each block.

---------- ----------
    172398          1
    172361          1
    239067          1
    172402          1
    172365          1
    239079          1
    172389          1
    172352          1
    239075          1
    172356          1
    239071          1
    172394          1

Running the test case again shows the contention has vanished completely. All sessions spend their time on CPU doing useful work.

Contention 11g Hash Cluster

Another nice if somewhat obscure solution.

Minimize Records Per Block

Speaking of obscure solutions we come to the “MINMIZE RECORDS_PER_BLOCK” clause, we’ll save that for the next installment as I have quite a lot of material to cover.

Securing Oracle DB Accounts With Default Passwords

One view I didn’t know about until recently is DBA_USERS_WITH_DEFPWD. This view appeared in 11g but it obviously passed me by. The reason it cropped up recently was a requirement to ensure that the default accounts in an Oracle database were not left with default passwords, regardless of their account status. In order to achieve this I knocked up a quick snippet of PL/SQL which could be added to automation scripts and therefore tick another box on the audit checklist. The code specifically doesn’t output the passwords to avoid leaving them in log files. I thought it was worth sharing here.

set serveroutput on
  for i in (	select 'alter user '||u.username||' identified by '
                     ||dbms_random.string('a', 10)||'_'||trunc(dbms_random.value(1,99)) cmd
                     , username
                from sys.dba_users_with_defpwd u
                where username <> 'XS$NULL')
    dbms_output.put_line('Securing '||i.username||'...');
    execute immediate i.cmd;
  end loop;

And the output

SQL> set serveroutput on
SQL> begin
  2    for i in (    select 'alter user '||u.username||' identified by '
  3                       ||dbms_random.string('a', 10)||'_'||trunc(dbms_random.value(1,99)) cmd
  4                       , username
  5                  from sys.dba_users_with_defpwd u
  6                  where username <> 'XS$NULL')
  7    loop
  8      dbms_output.put_line('Securing '||i.username||'...');
  9      execute immediate i.cmd;
 10    end loop;
 11  end;
 12  /
Securing GSMUSER...
Securing MDSYS...
Securing OLAPSYS...
Securing LBACSYS...
Securing ORDDATA...
Securing ORDSYS...
Securing DVF...
Securing SYSDG...
Securing APPQOSSYS...
Securing WMSYS...
Securing GSMCATUSER...
Securing OJVMSYS...
Securing SYSTEM...
Securing XDB...
Securing CTXSYS...
Securing ORACLE_OCM...
Securing MDDATA...
Securing ORDPLUGINS...
Securing DVSYS...
Securing DBSNMP...
Securing SYS...
Securing SYSKM...
Securing DIP...
Securing ANONYMOUS...
Securing AUDSYS...
Securing SYSBACKUP...
Securing OUTLN...

PL/SQL procedure successfully completed.

And a second time, there is nothing to do

SQL> /

PL/SQL procedure successfully completed.

The snippet could be changed to add “ACCOUNT LOCK” if required. Though beware locking SYS on and above:

ORA-28000: The Account Is Locked When Log In As SYS User Remotely While SYS User Was Locked (Doc ID 1601360.1)

Sometimes we don’t need to pack data in, we need to spread it out

I’ve gotten away with doing a presentation called “Contentious Small Tables” (download here) for the UKOUG three times now so I think it’s time to retire it from duty and serialise it here.

The planned instalments are below, I’ll change the items to links as the posts are published. The methods discussed in these posts are not exhaustive but hopefully cover most of the useful or interesting options.

  1. Sometimes we don’t need to pack data in, we need to spread it out (this post)
  2. Spreading out data – old school methods
  3. Spreading out data – with minimize records per block (also old school)
  4. Spreading out data – some partitioning methods
  5. Spreading out data – some modern methods

This post is the first instalment.

Sometimes we don’t need to pack data in, we need to spread it out

As data sets continue to grow there is more and more focus on packing data as tightly as we can. For performance reasons there are cases where we DBAs & developers need to turn this on its head and try to spread data out. An example is a small table with frequently updated/locked rows. There is no TX blocking but instead contention on the buffers containing the rows. This contention can manifest itself as “buffer busy waits”, one of the “gc” variants of “buffer busy waits” such as “gc buffer busy acquire” or “gc buffer busy release” or maybe on “latch: cache buffers chains”.

You’ll probably come at a problem of this nature from a session perspective via OEM, ASH data, v$session_event or SQL*Trace data. Taking ASH data as an example you can see below that for my exaggerated test, global cache busy waits dominate the time taken and, key for this series of posts, the “P1” and “P2” columns contain a small number of values.

column event format a35
select * from (
 select NVL(event,'CPU') event
 , count(*) waits
 , p1, p2
 from gv$active_session_history
 where sample_time between 
       to_date(to_char(sysdate,'DD-MON-YYYY')||' &from_time.','DD-MON-YYYY HH24:MI')
   and to_date(to_char(sysdate,'DD-MON-YYYY')||' &to_time.','DD-MON-YYYY HH24:MI')
 and module = 'NJTEST'
 group by NVL(event,'CPU'), p1, p2
 order by 2 desc
) where rownum <= 5;

EVENT                   WAITS    P1      P2
---------------------- ------ ----- -------
gc buffer busy acquire   1012     5  239004
gc buffer busy release    755     5  239004
gc current block busy     373     5  239004
CPU                        65     0       0

For the wait events above “P1” and “P2” have the following meanings:

select distinct parameter1, parameter2
from v$event_name
where name in ('gc buffer busy acquire'
              ,'gc buffer busy release'
              ,'gc current block busy');

-------------- --------------
file#          block#

So above we have contention on a single buffer. From ASH data we can also get the “SQL_ID” or “CURRENT_OBJ#” in order to identify the problem table. Below we show that all rows in the test table are in a single block – 239004.

select dbms_rowid.rowid_block_number(rowid) blockno
, count(*)
from procstate
group by dbms_rowid.rowid_block_number(rowid);

---------- ----------
    239004         12

After discovering this information the first port of call is to look at the SQL statement, execution plan and supporting code but assuming the application is written as best it can be then perhaps it’s time to look at the segment and data organisation.

In order to help illustrate the point of this series here is a chart showing contention in ASH data from a crude test case. The huge grey stripe being “gc” busy waits and the thin red stripe being standard “buffer busy waits”.

Contention  11g normal heap table

The next post in this series will look at some “Old School” methods of reducing this contention.

How are Oracle background processes renamed?

We all know that when an Oracle instance starts, the background processes are all (or perhaps mostly?) copies of the same executable – $ORACLE_HOME/bin/oracle.

We can see this using the “args” and “comm” format options of the Unix “ps” command. Below we can see the name of the SMON process (“args”) and the name of the source executable (“comm”):

$ ps -p 6971 -o args,comm
COMMAND                     COMMAND
ora_smon_orcl1              oracle

This is common knowledge and when in the past I’ve wondered how this works I’ve always just accepted this Oracle magic and moved on to more important thoughts such as “what’s for lunch?”. Well today my belly is full and I have time to let such thoughts develop!

It turns out this is a quite simple trick. If you have a basic knowledge of the C programming language you’ll be familiar with the argc and argv parameters. e.g.

int main (int argc, char *argv[])

argc – the number of command line arguments
argv – an array of pointers to the command line arguments with argv[0] being the process name.

Surprisingly the values of this array are modifiable. So we can manipulate the name of a running process with a simple sprintf() command (other options are available). e.g.

sprintf(argv[0], "renamed");

My description of this is high level and I’m sure there are risks in doing this, such as overflowing argv[0] into argv[1], but by way of a simple example here is a demo C program:

#include <stdio.h>
void main (int argc, char *argv[])
  int pid;
  pid = (int) getpid ();
  printf("My name is '%s'\n",argv[0]);
  printf("Sleeping for 10 seconds (quick - 'ps -p %d -o args,comm')...\n",pid);
  sprintf(argv[0], "renamed");
  printf("My name is now '%s'\n",argv[0]);
  printf("Sleeping for 10 seconds (quick - 'ps -p %d -o args,comm')...\n",pid);

This program sleeps so I can capture process details from another session using “ps”, renames the running process and then sleeps again so I can capture more details. So let’s compile the C program and see what happens when I run it from one session and capture ps details from another session:

Session 1

$ cc newname.c -o newname
$ ./newname
My name is './newname'
Sleeping for 10 seconds (quick - 'ps -p 7930 -o args,comm')...
My name is now 'renamed'
Sleeping for 10 seconds (quick - 'ps -p 7930 -o args,comm')...

Session 2

$ ps -p 7930 -o args,comm
COMMAND                     COMMAND
./newname                   newname
$ ps -p 7930 -o args,comm
COMMAND                     COMMAND
renamed e                   newname

Pretty cool. Notice how my renamed process is now called “renamed e”. The trailing “e” is left over from the original name of “./newname” thus proving it’s not quite as simple as I suggest and that my C skills are basic. None-the-less I think this is pretty cool.

How Eter Pani became Oracle Certified Master

I got my OCM. Hurray!
Now one of the common question people ask me is how to pass this OCM exam.
Firstly I am very grateful to Oracle to introduce this exam. Finally it is the real exam, but not test with set of answers. I clearly see complicatedness for examiners, but it is the real proof of DBA skills, not general memory skills.
Secondly this exam does not prove your exceptional knowledge of Oracle. It is the proof that you are fluent in all basic skills. During the exam everything works according to documentation. Personally I have collection of my favourite Metalink Notes and Advanced instructions that are used at least once a week. You do not need such things on exam.
The environment is ready. You do not need to reconfigure switches for RAC or install additional OS packages. What you are doing is only Oracle Database Administration, no OS specific. If something is really different between different UNIX OS, forget about it. It should not be part of the exam.
When I came to client site I frequently have no access to internet thus have a copy of Oracle Documentation locally. Moreover reading through local copy is actually faster then browsing through internet copy. I used to use local copy and it was really useful during exam.
Another my habit that I find useful is preparing all scripts in text file and then copy it to SQL*Plus or Shell window. If you need to rerun script or slightly alter it for a different skill-set you can reuse your one history. E.g. I store in this file backup/restore scripts.
You have 2 days, 14 hours, including lunch and breaks for 7 skill-sets. None of skill-set takes more then 2 hours. If you do not believe you can do something in less then 2 hours forget about it. Even if it would be on exam you would not be able to do it in time. Focus on things that you would be able to do.
The exam is based on 11.2g database. If something is different between patch sets again forget about it. Asking information specific for patch set is unfair to people who used to basic one, thus this question would not arrive on exam.
When you read through skill-set task at the beginning, read it up to the end. Mark for yourself tasks that would require you some investigation through the documentation. Mark for yourself tasks that you doubt to solve. Estimate time for each task. Start from the short and easy one and if you significantly overflow the time frame you set switch to the next task in your ordered list. If you have time you can came back and fix the issues later.
I recommend 15 minutes before end of skill-set to check the whole environment, there is special button for end state of the skill-set. 15 minutes should be enough to bring it to correct state.
Read tasks carefully, frequently tasks include markers how to avoid hidden rocks of the environment, e.g. check all options of the objects to create. If you would not follow it exactly the problems would make your life significantly harder.
Some tasks are not clear, you can ask your proctor for clarification. But proctor not always can rephrase task without violation of exam rules, if he could not provide explanation what is requested in a task follow “best practice”.
In general be concentrated, careful and have a lot of practice before exam. I passed preparation courses but honestly it was just way to guarantee time and environment for training. You can do preparation yourself if your management and family would grant the opportunity to do it. If you have no such generous option apply for a preparation course, it is really value for money, very useful and informative. Course provide to you experience of working on the same machines that you would use on exam. In my case the machines was really physically the same, just different OS image. BTW try to became used to local keyboards of the country where you are going to pass the exam. English and US keyboards are different and this difference can be that point which consume the vital time on exam.
Good Luck.

UKOUG Database Server SIG (Leeds 2013)

On Thursday I attended the UKOUG Database Server SIG in Leeds. All slides have been uploaded to the UKOUG website.

It’s the first SIG I’ve attended this year and after enjoying it so much I ought to try harder to get to other events. We had Oak Table presence from David Kurtz talking all things partitioning/compression and purging, two Oracle employees discussing first support and then ZFS/NetApp (I particularly enjoyed this one) and then Edgars Rudans on his evolution of the Exadata Minimal Patching process. All of these presentations are well worth downloading and checking out.

The last presentation of the day was me. I’ve never presented before and it took a big step out of my comfort zone to get up there but I’m so glad I did. I would recommend it to anyone currently spending time in the audience thinking “I wish I had the confidence to do that”. It’s nerve racking beforehand but exhilarating afterwards.

When blogging in the past I’ve liked how it makes you think a little bit harder before pressing the publish button. I think the best thing I got out of the whole presentation process was that it made me dig even deeper to make sure I’d done my homework.

After the SIG there was a good group headed out for Leeds Oracle Beers #3 which involved local beer, good burgers and Morris Dancing, all good fun.