cell smart file creation

One of the offload capabilities you get with Exadata that I’ve been appreciating recently is “cell smart file creation”. In a nutshell, the fact that formatting of blocks is offloaded to the cells rather than being performed by a process on the database server, or “compute node” if you want to use Exadata speak, makes the creation of datafiles faster.

As as quick demonstration look at the elapsed time for the creation of a 100G datafile. First with _cell_fast_file_create at the default value of TRUE and then set to FALSE in order to disable the offloading of the datafile creation:

SQL> create bigfile tablespace martin01 datafile '+DATA01' size 100G;

Tablespace created.

Elapsed: 00:01:40.39
SQL> alter session set "_cell_fast_file_create" = false;

Session altered.

Elapsed: 00:00:00.00
SQL> create bigfile tablespace martin02 datafile '+DATA01' size 100G;

Tablespace created.

Elapsed: 00:04:15.17

101 secs with offload vs. 255 secs without represents a saving of 60%. Examining the wait events associated with the 2 different datafile creations we see “cell smart file creation” for the offloaded file creation and “Data file init write” for the non-offloaded file creation.

Great, creating datafiles is faster on Exadata… But what about redo logs?

OK, I take the point that redo log files aren’t generally created with the same regularity as datafiles, but one of the things I’ve been working on recently has involved a “roll your own” DUPLICATE FROM ACTIVE DATABASE which itself involves creating a reasonable number of redo logs (4G log files, 4 redo groups, 8 threads = waiting). I don’t like waiting, especially when I’ve been asked to make something go as fast as I can, so I thought I’d have a little look and confirm that creation of redo logs is not offloaded…

Tracing the creation of 2 x 4G redo logs revealed the following:

SQL ID: 3r832sm0yj1gv Plan Hash: 0

alter database add logfile thread 1 group 99 ('+DATA01','+RECO') size 4096m
  blocksize 512

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      3.81      37.96          0          0          2           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      3.81      37.96          0          0          2           0

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 36

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  enq: FS - contention                            2        0.00          0.00
  control file sequential read                   58        0.05          0.13
  KSV master wait                                20        1.71          3.27
  ASM file metadata operation                     6        0.00          0.00
  kfk: async disk IO                              4        0.00          0.00
  Disk file operations I/O                        4        0.00          0.00
  Log file init write                          4381        0.13         31.35
  enq: CF - contention                            1        0.00          0.00
  control file parallel write                    14        0.01          0.03
  log file single write                           2        0.00          0.00
  ges inquiry response                            1        0.00          0.00
  log file sync                                   1        0.00          0.00
  SQL*Net message to client                       1        0.00          0.00
  SQL*Net message from client                     1        0.00          0.00
********************************************************************************

38 seconds might not be long, but when you consider the number of redo log groups and number of redo threads it soon adds up. And when you have 7 databases to do this for, the opportunity to save 60% of the time in this phase starts to look attractive. Add to this that ALTER DATABASE OPEN RESETLOGS spends the vast majority of its time waiting on “Log file init write” and I can’t help feeling that “cell smart log creation” would be a nicer event to be waiting on :-)

I realise that it’s probably way down on the list of enhancements, but I for one would appreciate it.

A Session Busy Doing What? – Work of Course!

I was recently called out to look at a batch job exceeding it’s usual execution time. Ordinarily a quick look at the wait events would tell me what I needed to know, however due to a poor execution plan (caused by incorrect statistics) the query nicely highlighted how not all work can be classified as a wait. Below is a reproduction of the scenario I’m describing.

Create some test tables.

CREATE TABLE tab1 AS
SELECT ROWNUM rn, 'xxxxx' pad1, 'yyyyy' pad2
  FROM dual CONNECT BY LEVEL <= 25000;

CREATE TABLE tab2 AS
SELECT ROWNUM rn, 'xxxxx' pad1, 'yyyyy' pad2
  FROM dual CONNECT BY LEVEL <= 25000;

exec dbms_stats.gather_table_stats(null,'tab1');
exec dbms_stats.gather_table_stats(null,'tab2');

A test query is below, it has been hinted to ensure it repeatedly scans the same blocks in memory.

SELECT /*+ USE_NL(tab1 tab2) ORDERED */ COUNT(*)
FROM 	tab1
INNER JOIN tab2 ON tab1.rn = tab2.rn;

------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    10 |       |   477K  (6)| 01:35:30 |
|   1 |  SORT AGGREGATE     |      |     1 |    10 |       |            |          |
|*  2 |   NESTED LOOPS      |      | 24859 |   242K|   477K|   477K  (6)| 01:35:30 |
|   3 |    TABLE ACCESS FULL| TAB1 | 24918 |   121K|       |    21   (5)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| TAB2 |     1 |     5 |       |    19   (6)| 00:00:01 |
------------------------------------------------------------------------------------

First off run the query to get it parsed and charge the buffer cache and then reset the session and run it again.

-- may want to ctrl^c once this has run for a few seconds
SELECT /*+ USE_NL(tab1 tab2) ORDERED */ COUNT(*)
FROM 	tab1
INNER JOIN tab2 ON tab1.rn = tab2.rn;

-- conn fresh session
conn /

SELECT sid FROM v$mystat WHERE ROWNUM = 1;

SELECT /*+ USE_NL(tab1 tab2) ORDERED */ COUNT(*)
FROM 	tab1
INNER JOIN tab2 ON tab1.rn = tab2.rn;

From another session check what the original session is up to:

SELECT sid, status, event, state 
FROM v$session 
WHERE sid = &sid_of_other_session;

       SID STATUS   EVENT                               STATE
---------- -------- ----------------------------------- -------------------
       149 ACTIVE   SQL*Net message to client           WAITED SHORT TIME

SELECT event, time_waited 
FROM v$session_event 
WHERE sid = &sid_of_other_session;

EVENT                               TIME_WAITED
----------------------------------- -----------
SQL*Net message to client                     0
SQL*Net message from client                1111

No wait events are being recorded yet “top” shows the session to be the number 1 session on the box with 78% of CPU resources.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                            
28008 oracle    25   0  344m  20m  18m R 78.3  2.7   5:57.28 oracle      

The missing link is that an Oracle instance does not count logical IO as a wait – it is work. V$SESSTAT is where we have visibility of this work.

SELECT n.name, s.value 
FROM v$sesstat s, v$statname n
WHERE s.sid = &sid_of_other_session 
AND n.statistic# = s.statistic#
AND n.name = 'consistent gets'
ORDER BY s.value;

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
consistent gets                                                     1975123