Where is METHOD_OPT SIZE REPEAT’s repeat value stored?

In my last post When does METHOD_OPT SIZE REPEAT not obey we saw how SIZE REPEAT sometimes appears to know best and creates a FREQUENCY histogram with a different number of buckets than previously specified.

The mechanics of this became clearer while pondering another question that I often ask myself.

How does METHOD_OPT SIZE REPEAT remember the value last specified in order to repeat it?

My primitive testing leads me to believe that the key value is stored in column “SYS.HIST_HEAD$.BUCKET_CNT”. The test that follows shows my working.

The table created below has two columns with HEIGHT BALANCED histograms, one with no histogram and one with a FREQUENCY BASED histogram. The column names should help in following the test.

drop  table tab1 ;
create table tab1 
(	colskew1 number
,	colskew2 number
,	colnoskew number
,	colfreq number

insert into tab1
select case when rownum <= 500 then rownum else 0 end
,      case when rownum <= 500 then rownum else 0 end
,      rownum
,      case when rownum <= 50 then rownum else 0 end
from   all_objects 
where  rownum <= 4000;

exec dbms_stats.gather_table_stats(null,'tab1',estimate_percent=>100,method_opt=>'for columns colskew1 size 150');
exec dbms_stats.gather_table_stats(null,'tab1',estimate_percent=>100,method_opt=>'for columns colskew2 size 254');
exec dbms_stats.gather_table_stats(null,'tab1',estimate_percent=>100,method_opt=>'for columns colnoskew size 1');
exec dbms_stats.gather_table_stats(null,'tab1',estimate_percent=>100,method_opt=>'for columns colfreq size 254');

column column_name format a12
select c.column_name,t.num_rows, c.num_distinct,c.num_buckets,c.histogram 
from user_tab_columns c, user_tables t
where t.table_name = 'TAB1' and c.table_name = t.table_name order by 1;

------------ ---------- ------------ ----------- ---------------
COLSKEW1           4000          501         150 HEIGHT BALANCED
COLSKEW2           4000          501         254 HEIGHT BALANCED
COLNOSKEW          4000         4000           1 NONE
COLFREQ            4000           51          51 FREQUENCY

Now if we check the value stored in “SYS.HIST_HEAD$.BUCKET_CNT” we see the following.

select c.name,c.intcol#,h.bucket_cnt
from sys.col$ c , sys.hist_head$ h
where c.obj# = (select object_id from dba_objects where object_name ='TAB1')
and h.obj# = c.obj#
and h.intcol# = c.intcol#;

NAME                              INTCOL# BUCKET_CNT
------------------------------ ---------- ----------
COLSKEW1                                1        150
COLSKEW2                                2        254
COLNOSKEW                               3          1
COLFREQ                                 4       4000

So for HEIGHT BALANCED histograms (including an absence of a histogram) BUCKET_CNT holds the number of buckets specified when creating the histogram. The FREQUENCY BASED histogram however stores a value equal to the row count. This is different to the NUM_BUCKETS value displayed when querying USER_TAB_COLUMNS.

Now a statistics collection with a METHOD_OPT of SIZE REPEAT simply needs to use a value similar to “MIN(BUCKET_CNT,254)” as its target bucket count.

This was backed up (if not exactly confirmed) by then checking the definition of the SYS view USER_TAB_COLS. The extract below is for the NUM_BUCKETS column:

       case when nvl(h.distcnt,0) = 0 then h.distcnt
            when h.row_cnt = 0 then 1
            when (h.bucket_cnt > 255
                  (h.bucket_cnt > h.distcnt
                   and h.row_cnt = h.distcnt
                   and h.density*h.bucket_cnt < 1))
                then h.row_cnt
            else h.bucket_cnt

We can see that for a BUCKET_CNT > 255 the value used for NUM_BUCKETS is switched for HIST_HEAD$.ROW_CNT – this is the number of records in USER_HISTOGRAMS for the column, i.e. for FREQUENCY BASED histograms the number of buckets actually stored.

I’m not sure I’ve demonstrated this very well. In essence for SIZE REPEAT:

  • FREQUENCY BASED histograms disregard any value previously specified and pick up a bucket count of the number of distinct rows in the row sample (if NUM_DISTINCT is <= 254 of course).
  • HEIGHT BALANCED histograms can always reuse the value stored in HIST_HEAD$.BUCKET_CNT which will only be between 1 and 254.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s