Here is a quote from the Oracle Documentation about V$UNDOSTAT.MAXQUERYLEN:
MAXQUERYLEN NUMBER Identifies the length of the longest query (in seconds) executed in the instance during the period. You can use this statistic to estimate the proper setting of the UNDO_RETENTION initialization parameter. The length of a query is measured from the cursor open time to the last fetch/execute time of the cursor. Only the length of those cursors that have been fetched/executed during the period are reflected in the view.
And here is a quote from My Oracle Support note “V$UNDOSTAT MAXQUERYLEN Suddenly Appears With a High Value For a Simple Query [ID 1307600.1]”
the MAXQUERYLEN value is defined as “the length of a query measured from the cursor open time to the last fetch/execute time of the cursor”. Hence if the cursor is suspended (e.g. dbms_lock.sleep) or the session is switched out (session switching) then the MAXQUERYLEN time will continue to increase but will not be show as no current execution/fetch is occurring on the cursor.
…
This is expected behaviour
Nothing wrong with any of this but it is all relevant to the test that follows. In order to set up the test I change my local database to have a small undo tablespace and generate some workload. I’ve collapsed the code in case you want to jump straight to the story, in summary it generates some transactions and random queries with a duration between 1 and 10 minutes.
alter system set undo_tablespace = undo scope=both sid='*';
create undo tablespace undo_small
datafile 'C:\ORACLEXE\ORADATA\XE\UNDO_SMALL.DBF' size 30m reuse autoextend on maxsize 500m;
alter system set undo_tablespace = undo_small scope=both sid='*';
create or replace function slow_query(i_seconds_to_wait in number)
return number
as
l_n number;
l_pad2 bigtab.pad2%TYPE;
cursor c_bigtab is
select pad2 from bigtab;
begin
open c_bigtab;
fetch c_bigtab into l_pad2;
sys.dbms_lock.sleep(i_seconds_to_wait);
fetch c_bigtab into l_pad2;
close c_bigtab;
return i_seconds_to_wait;
end slow_query;
/
-- session #1 - generate trxns
declare
L_rno number;
L_i number;
L_pad1 bigtab.pad1%TYPE;
begin
while true loop
L_i := L_i + 1;
L_rno := round(dbms_random.value(1,1000000));
update bigtab
set pad1 = pad1 where rno = L_rno;
commit;
dbms_lock.sleep(0.01);
end loop;
end;
/
-- session #2 - generate queries of random length betwen 1 min and 10 mins
declare
l_sleep number;
l_slept number;
begin
while true loop
l_sleep := round(dbms_random.value(60,600));
select slow_query(l_sleep) into l_slept from dual;
end loop;
end;
/
Output from V$UNDOSTAT after the background workload has been running for a while is below along with the current size of the undo tablespace.
END_TIME_CHAR TXNCOUNT MAXQUERYLEN MAXQUERYID ACTIVEBLKS UNEXPIREDBLKS EXPIREDBLKS TUNED_UNDORETENTION
-------------------- ---------- ----------- ------------- ---------- ------------- ----------- -------------------
28-SEP-2011 19:45:50 38118 584 200k2np23z57q 160 912 496 1425
28-SEP-2011 19:55:50 38404 452 200k2np23z57q 160 1120 1280 1292
28-SEP-2011 20:05:50 38507 479 200k2np23z57q 160 1704 1464 1320
28-SEP-2011 20:15:50 38340 383 200k2np23z57q 160 2656 496 1224
FILE_ID FILE_NAME MB
---------- --------------------------------------------- ----------
5 C:\ORACLEXE\ORADATA\XE\UNDO_SMALL.DBF 30
Now a simple query is executed from SQL Developer and the first set of rows is fetched. V$SESSION output below.
SQL> select username,program,status,state,event,sql_id,seconds_in_wait siw
2 from v$session
3 where username = 'ADHOC';
USERNAME PROGRAM STATUS STATE EVENT SQL_ID SIW
-------- --------------- -------- ------------------- --------------------------- ------------- ----------
ADHOC SQL Developer INACTIVE WAITING SQL*Net message from client 6fwqzurbc8y7k 7
SQL> select sql_text from v$sql where sql_id = '6fwqzurbc8y7k';
SQL_TEXT
-----------------------------------------------------------------------
select * from bigtab
Another quote from the Oracle Documentation
SQL_ID VARCHAR2(13) SQL identifier of the SQL statement that is currently being executed
You can see from the V$SESSION output above that we have a SQL ID and therefore must be executing some SQL. But we also have a session status of “INACTIVE” and are waiting on event “SQL*Net message from client”. That is a nice way of spotting sessions that are similar to the focus of this post.
Around 15 minutes later the session is still waiting on the user to act and V$UNDOSTAT looks consistent with my steady state.
USERNAME PROGRAM STATUS STATE EVENT SQL_ID SIW
-------- --------------- -------- ------------------- --------------------------- ------------- ----------
ADHOC SQL Developer INACTIVE WAITING SQL*Net message from client 6fwqzurbc8y7k 912
END_TIME_CHAR TXNCOUNT MAXQUERYLEN MAXQUERYID ACTIVEBLKS UNEXPIREDBLKS EXPIREDBLKS TUNED_UNDORETENTION
-------------------- ---------- ----------- ------------- ---------- ------------- ----------- -------------------
28-SEP-2011 19:45:50 38118 584 200k2np23z57q 160 912 496 1425
28-SEP-2011 19:55:50 38404 452 200k2np23z57q 160 1120 1280 1292
28-SEP-2011 20:05:50 38507 479 200k2np23z57q 160 1704 1464 1320
28-SEP-2011 20:15:50 38340 383 200k2np23z57q 160 2656 496 1224
28-SEP-2011 20:25:50 38557 985 200k2np23z57q 160 2592 816 1825
28-SEP-2011 20:31:28 20492 866 200k2np23z57q 160 2592 816 1706
We now fetch the next set of records and check V$SESSION and V$UNDOSTAT again.
USERNAME PROGRAM STATUS STATE EVENT SQL_ID SIW
-------- --------------- -------- ------------------- --------------------------- ------------- ----------
ADHOC SQL Developer INACTIVE WAITING SQL*Net message from client 6fwqzurbc8y7k 35
END_TIME_CHAR TXNCOUNT MAXQUERYLEN MAXQUERYID ACTIVEBLKS UNEXPIREDBLKS EXPIREDBLKS TUNED_UNDORETENTION
-------------------- ---------- ----------- ------------- ---------- ------------- ----------- -------------------
28-SEP-2011 19:45:50 38118 584 200k2np23z57q 160 912 496 1425
28-SEP-2011 19:55:50 38404 452 200k2np23z57q 160 1120 1280 1292
28-SEP-2011 20:05:50 38507 479 200k2np23z57q 160 1704 1464 1320
28-SEP-2011 20:15:50 38340 383 200k2np23z57q 160 2656 496 1224
28-SEP-2011 20:25:50 38557 985 200k2np23z57q 160 2592 816 1825
28-SEP-2011 20:32:49 26370 992 6fwqzurbc8y7k 160 2592 816 1766
V$UNDOSTAT.MAXQUERYID is now that of the SQL Developer session. What if an extra 10 minutes had elapsed?
10 minutes later…
END_TIME_CHAR TXNCOUNT MAXQUERYLEN MAXQUERYID ACTIVEBLKS UNEXPIREDBLKS EXPIREDBLKS TUNED_UNDORETENTION
-------------------- ---------- ----------- ------------- ---------- ------------- ----------- -------------------
28-SEP-2011 19:45:50 38118 584 200k2np23z57q 160 912 496 1425
28-SEP-2011 19:55:50 38404 452 200k2np23z57q 160 1120 1280 1292
28-SEP-2011 20:05:50 38507 479 200k2np23z57q 160 1704 1464 1320
28-SEP-2011 20:15:50 38340 383 200k2np23z57q 160 2656 496 1224
28-SEP-2011 20:25:50 38557 985 200k2np23z57q 160 2592 816 1825
28-SEP-2011 20:35:50 41787 992 6fwqzurbc8y7k 160 3440 688 1833
28-SEP-2011 20:41:31 20088 1293 6fwqzurbc8y7k 160 3440 688 2134
The TUNED_UNDORETENTION is starting to climb but the number of expired blocks is unchanged because our UNDO tablespace is extending – it is now at 42M.
FILE_ID FILE_NAME MB
---------- --------------------------------------------- ----------
5 C:\ORACLEXE\ORADATA\XE\UNDO_SMALL.DBF 42
So let’s prevent it from extending and wait another 10 minutes before fetching the next set of rows.
alter database datafile 5 autoextend off;
END_TIME_CHAR TXNCOUNT MAXQUERYLEN MAXQUERYID ACTIVEBLKS UNEXPIREDBLKS EXPIREDBLKS TUNED_UNDORETENTION
-------------------- ---------- ----------- ------------- ---------- ------------- ----------- -------------------
28-SEP-2011 20:05:50 38507 479 200k2np23z57q 160 1704 1464 1320
28-SEP-2011 20:15:50 38340 383 200k2np23z57q 160 2656 496 1224
28-SEP-2011 20:25:50 38557 985 200k2np23z57q 160 2592 816 1825
28-SEP-2011 20:35:50 41787 992 6fwqzurbc8y7k 160 3440 688 1833
28-SEP-2011 20:45:50 41534 1594 6fwqzurbc8y7k 160 4608 576 2752
28-SEP-2011 20:55:50 44669 2196 6fwqzurbc8y7k 160 5200 264 2766
We can see that the number of expired blocks is falling and the number of un-expired blocks is rising. 10 minutes later I click to fetch the next set of records and…
ORA-01555: snapshot too old: rollback segment number 17 with name "_SYSSMU17$" too small
01555. 00000 - "snapshot too old: rollback segment number %s with name \"%s\" too small"
*Cause: rollback records needed by a reader for consistent read are
overwritten by other writers
*Action: If in Automatic Undo Management mode, increase undo_retention
setting. Otherwise, use larger rollback segments
That was only after 35-40 minutes – well within reach of a normal trip to the canteen (admittedly my undo tablespace is sized modestly in order to help me prove a point).
So, here, in reverse order, are my top 3 undo related threats when giving read only access via a tool such as SQL Developer or Toad to application support staff (even though they have the best of intentions).
#3 Undo tablespace pressure.
Unbeknownst to the ad-hoc user he can quite easily cause pressure on the undo tablespace causing an AUTOEXTEND file to grow purely to satisfy their next page of data.
#2 Undo tablespace alerts
For undo tablespaces with no AUTOEXTEND files Grid Control or some other monitoring tool will alert as the number of unexpired blocks starts to fall causing unnecessary (albeit low priority) work for your first line DBAs.
#1 ORA-1555
For undo tablespaces with no AUTOEXTEND files and either a sufficiently high transaction rate or sufficiently high gap between page 1 and page X of our ad-hoc user’s query we may find that the undo required to construct page X has been overwritten. The user gets ORA-1555, thinks “no harm done” and re-starts the query. However the alert log now contains something like below and an alert is fired shortly afterwards.
Wed Sep 28 21:14:48 2011
ORA-01555 caused by SQL statement below (SQL ID: 6fwqzurbc8y7k, Query Duration=3538 sec, SCN: 0x0000.001aad4e):
If the hour is late and the company is pretty twitchy then this could involve someone getting out of bed to investigate a potentially failed batch run.
Have I missed any undo related threats?