DBMS_STATS.SET_TABLE_STATS and statistic staleness

Recently at work there was a discussion regarding setting optimiser statistics using DBMS_STATS SET_*_STATISTICS procedures. The question was asked:

If we set the statistics ourselves will the automatic stat’s job not overwrite them the following night?

Now I was confident that this was not the case but, having never tested this explicitly, thought it better to get my facts straight.

First some quotes from the Oracle Documentation

Setting Statistics

You can set table, column, index, and system statistics using the SET_*_STATISTICS procedures. Setting statistics in the manner is not recommended, because inaccurate or inconsistent statistics can lead to poor performance.

Determining Stale Statistics

Monitoring tracks the approximate number of INSERTs, UPDATEs, and DELETEs for that table and whether the table has been truncated since the last time statistics were gathered. You can access information about changes of tables in the USER_TAB_MODIFICATIONS view. Following a data-modification, there may be a few minutes delay while Oracle Database propagates the information to this view. Use the DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO procedure to immediately reflect the outstanding monitored information kept in the memory.

The GATHER_DATABASE_STATS or GATHER_SCHEMA_STATS procedures gather new statistics for tables with stale statistics when the OPTIONS parameter is set to GATHER STALE or GATHER AUTO. If a monitored table has been modified more than 10%, then these statistics are considered stale and gathered again.

The case under discussion was that old favourite – new partitions added to existing tables. The test that follows creates a simple table with four partitions. Three of the partitions are mature, each containing 10,000 rows. The fourth partition is quite new and only contains 1,000 rows.

drop table part_tab;
create table part_tab
(	id	number
,	yrmon	number(6)
,	padcol	varchar2(100))
partition by list (yrmon)
(partition p201101 values (201101)
,partition p201102 values (201102)
,partition p201103 values (201103)
,partition p201104 values (201104));

insert into part_tab
select rownum,201101,lpad('x',100,'y')
from dual connect by rownum <= 10000;
insert into part_tab
select rownum,201102,lpad('x',100,'y')
from dual connect by rownum <= 10000;
insert into part_tab
select rownum,201103,lpad('x',100,'y')
from dual connect by rownum <= 10000;
insert into part_tab
select rownum,201104,lpad('x',100,'y')
from dual connect by rownum <= 1000;

select PARTITION_NAME,NUM_ROWS,BLOCKS,AVG_ROW_LEN,LAST_ANALYZED
from user_tab_statistics
where table_name = 'PART_TAB';

PARTITION_NAME                   NUM_ROWS     BLOCKS AVG_ROW_LEN LAST_ANALYZED
------------------------------ ---------- ---------- ----------- -----------------
                                    31000        568         110 05-MAR-2011 21:24
P201101                             10000        174         110 05-MAR-2011 21:24
P201102                             10000        174         110 05-MAR-2011 21:24
P201103                             10000        174         110 05-MAR-2011 21:24
P201104                              1000         46         110 05-MAR-2011 21:24

The statistics on the fourth partition are now faked using DBMS_STATS.SET_TABLE_STATS to match the other partitions.

-- fake statistics in partition 201104
begin
 dbms_stats.set_table_stats(null
 , 'part_tab'
 , 'P201104'
 , numrows=>10000
 , numblks=>174
 , avgrlen=>110);
end;
/

PARTITION_NAME                   NUM_ROWS     BLOCKS AVG_ROW_LEN LAST_ANALYZED
------------------------------ ---------- ---------- ----------- -----------------
                                    31000        568         110 05-MAR-2011 21:24
P201101                             10000        174         110 05-MAR-2011 21:24
P201102                             10000        174         110 05-MAR-2011 21:24
P201103                             10000        174         110 05-MAR-2011 21:24
P201104                             10000        174         110 05-MAR-2011 21:24

Now we update the partitions in a variety of ways as described by the comments in the code below.

-- partition "P201101": update 11% of rows  (> staleness threshold) and commit
update part_tab
set padcol = lpad('x',100,'z')
where yrmon = 201101 and rownum <= 1100;
commit;

-- partition "P201102": update 11% of rows  (> staleness threshold) and rollback
update part_tab
set padcol = lpad('x',100,'z')
where yrmon = 201102 and rownum <= 1100;
rollback;

-- partition "P201103": update 9% of rows  (< staleness threshold) and commit
update part_tab
set padcol = lpad('x',100,'z')
where yrmon = 201103 and rownum <= 900;
commit;

-- partition "P201104": update 9% of rows according to optimiser statistics
--  (< staleness threshold) but 90% or actual rows (> staleness threshold) 
update part_tab
set padcol = lpad('x',100,'z')
where yrmon = 201104 and rownum <= 900;
commit;

exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO

select PARTITION_NAME,INSERTS,UPDATES,DELETES 
from USER_TAB_MODIFICATIONS
where table_name = 'PART_TAB';

PARTITION_NAME                    INSERTS    UPDATES    DELETES
------------------------------ ---------- ---------- ----------
                                        0       4000          0
P201101                                 0       1100          0
P201102                                 0       1100          0
P201103                                 0        900          0
P201104                                 0        900          0

From the output above we can see that the rolled back modifications on partition “P201102″ still increase the counters in USER_TAB_MODIFICATIONS. Interesting.

Now we gather optimiser statistics. If DBMS_STATS respects the faked statistics then partition P201104′s statistics should remain unchanged. If DBMS_STATS can see through our fakery then partition P201104′s statistics should be recollected.

Watch to see which partitions’ LAST_ANALYZED value change from “21:24″ to “21:26″.

exec dbms_stats.GATHER_DATABASE_STATS(options=>'gather auto')

select PARTITION_NAME,NUM_ROWS,BLOCKS,AVG_ROW_LEN,LAST_ANALYZED
from user_tab_statistics
where table_name = 'PART_TAB';

PARTITION_NAME                   NUM_ROWS     BLOCKS AVG_ROW_LEN LAST_ANALYZED
------------------------------ ---------- ---------- ----------- -----------------
                                    31000        568         110 05-MAR-2011 21:26
P201101                             10000        174         110 05-MAR-2011 21:26
P201102                             10000        174         110 05-MAR-2011 21:26
P201103                             10000        174         110 05-MAR-2011 21:24
P201104                             10000        174         110 05-MAR-2011 21:24

As expected USER_TAB_STATISTICS is the truth as far DBMS_STATS is concerned. The partitions with 11% modifications have new statistics collected. The partitions with 9% modifications keep their existing statistics, including partition “P201104″.

Nothing new or surprising here but I now have the hard facts to put in front of my colleague, remaining mindful of the comment “Setting statistics in the manner is not recommended” at all times :)

About these ads

4 thoughts on “DBMS_STATS.SET_TABLE_STATS and statistic staleness

  1. Hi Stefan,

    To quote Mr Kyte – “it depends” :)

    The reason this cropped up at work was to do with copying statistics from monthly partitions to the new month. I didn’t want to lock the statistics as the new month could end up much larger, or at least look different in some way. But I wanted to prove that the fixed statistics were taken as gospel when working out whether the number of modifications meant the stat’s were stale.

    So if I copied stat’s of 1 million rows then after the first 100k insertions I’d get a more accurate collection. What I would avoid is that infinity shift between 0 rows and the first X thousand. That infinity shift is a dark place where cartesian joins roam and ugly nested loops look sexy.

    Neil

    PS In truth, yes, you are right. You could pick some good numbers and lock them and life would be good. But then I’d have had nothing to test.

    PPS Thanks for stopping by

  2. Hi Neil,

    Thank you for the clear and concise explanation. Like Tom Kyte, you didn’t just spout off, but did some considerate testing to find out. Love that! Thanks for answering this question.

    Jack

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s