I was recently called out to look at a batch job exceeding it’s usual execution time. Ordinarily a quick look at the wait events would tell me what I needed to know, however due to a poor execution plan (caused by incorrect statistics) the query nicely highlighted how not all work can be classified as a wait. Below is a reproduction of the scenario I’m describing.
Create some test tables.
CREATE TABLE tab1 AS SELECT ROWNUM rn, 'xxxxx' pad1, 'yyyyy' pad2 FROM dual CONNECT BY LEVEL <= 25000; CREATE TABLE tab2 AS SELECT ROWNUM rn, 'xxxxx' pad1, 'yyyyy' pad2 FROM dual CONNECT BY LEVEL <= 25000; exec dbms_stats.gather_table_stats(null,'tab1'); exec dbms_stats.gather_table_stats(null,'tab2');
A test query is below, it has been hinted to ensure it repeatedly scans the same blocks in memory.
SELECT /*+ USE_NL(tab1 tab2) ORDERED */ COUNT(*) FROM tab1 INNER JOIN tab2 ON tab1.rn = tab2.rn; ------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 10 | | 477K (6)| 01:35:30 | | 1 | SORT AGGREGATE | | 1 | 10 | | | | |* 2 | NESTED LOOPS | | 24859 | 242K| 477K| 477K (6)| 01:35:30 | | 3 | TABLE ACCESS FULL| TAB1 | 24918 | 121K| | 21 (5)| 00:00:01 | | 4 | TABLE ACCESS FULL| TAB2 | 1 | 5 | | 19 (6)| 00:00:01 | ------------------------------------------------------------------------------------
First off run the query to get it parsed and charge the buffer cache and then reset the session and run it again.
-- may want to ctrl^c once this has run for a few seconds SELECT /*+ USE_NL(tab1 tab2) ORDERED */ COUNT(*) FROM tab1 INNER JOIN tab2 ON tab1.rn = tab2.rn; -- conn fresh session conn / SELECT sid FROM v$mystat WHERE ROWNUM = 1; SELECT /*+ USE_NL(tab1 tab2) ORDERED */ COUNT(*) FROM tab1 INNER JOIN tab2 ON tab1.rn = tab2.rn;
From another session check what the original session is up to:
SELECT sid, status, event, state FROM v$session WHERE sid = &sid_of_other_session; SID STATUS EVENT STATE ---------- -------- ----------------------------------- ------------------- 149 ACTIVE SQL*Net message to client WAITED SHORT TIME SELECT event, time_waited FROM v$session_event WHERE sid = &sid_of_other_session; EVENT TIME_WAITED ----------------------------------- ----------- SQL*Net message to client 0 SQL*Net message from client 1111
No wait events are being recorded yet “top” shows the session to be the number 1 session on the box with 78% of CPU resources.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28008 oracle 25 0 344m 20m 18m R 78.3 2.7 5:57.28 oracle
The missing link is that an Oracle instance does not count logical IO as a wait – it is work. V$SESSTAT is where we have visibility of this work.
SELECT n.name, s.value FROM v$sesstat s, v$statname n WHERE s.sid = &sid_of_other_session AND n.statistic# = s.statistic# AND n.name = 'consistent gets' ORDER BY s.value; NAME VALUE ---------------------------------------------------------------- ---------- consistent gets 1975123