In my last post When does METHOD_OPT SIZE REPEAT not obey we saw how SIZE REPEAT sometimes appears to know best and creates a FREQUENCY histogram with a different number of buckets than previously specified.
The mechanics of this became clearer while pondering another question that I often ask myself.
How does METHOD_OPT SIZE REPEAT remember the value last specified in order to repeat it?
My primitive testing leads me to believe that the key value is stored in column “SYS.HIST_HEAD$.BUCKET_CNT”. The test that follows shows my working.
The table created below has two columns with HEIGHT BALANCED histograms, one with no histogram and one with a FREQUENCY BASED histogram. The column names should help in following the test.
drop table tab1 ; create table tab1 ( colskew1 number , colskew2 number , colnoskew number , colfreq number ); insert into tab1 select case when rownum <= 500 then rownum else 0 end , case when rownum <= 500 then rownum else 0 end , rownum , case when rownum <= 50 then rownum else 0 end from all_objects where rownum <= 4000; commit; exec dbms_stats.gather_table_stats(null,'tab1',estimate_percent=>100,method_opt=>'for columns colskew1 size 150'); exec dbms_stats.gather_table_stats(null,'tab1',estimate_percent=>100,method_opt=>'for columns colskew2 size 254'); exec dbms_stats.gather_table_stats(null,'tab1',estimate_percent=>100,method_opt=>'for columns colnoskew size 1'); exec dbms_stats.gather_table_stats(null,'tab1',estimate_percent=>100,method_opt=>'for columns colfreq size 254'); column column_name format a12 select c.column_name,t.num_rows, c.num_distinct,c.num_buckets,c.histogram from user_tab_columns c, user_tables t where t.table_name = 'TAB1' and c.table_name = t.table_name order by 1; COLUMN_NAME NUM_ROWS NUM_DISTINCT NUM_BUCKETS HISTOGRAM ------------ ---------- ------------ ----------- --------------- COLSKEW1 4000 501 150 HEIGHT BALANCED COLSKEW2 4000 501 254 HEIGHT BALANCED COLNOSKEW 4000 4000 1 NONE COLFREQ 4000 51 51 FREQUENCY
Now if we check the value stored in “SYS.HIST_HEAD$.BUCKET_CNT” we see the following.
select c.name,c.intcol#,h.bucket_cnt from sys.col$ c , sys.hist_head$ h where c.obj# = (select object_id from dba_objects where object_name ='TAB1') and h.obj# = c.obj# and h.intcol# = c.intcol#; NAME INTCOL# BUCKET_CNT ------------------------------ ---------- ---------- COLSKEW1 1 150 COLSKEW2 2 254 COLNOSKEW 3 1 COLFREQ 4 4000
So for HEIGHT BALANCED histograms (including an absence of a histogram) BUCKET_CNT holds the number of buckets specified when creating the histogram. The FREQUENCY BASED histogram however stores a value equal to the row count. This is different to the NUM_BUCKETS value displayed when querying USER_TAB_COLUMNS.
Now a statistics collection with a METHOD_OPT of SIZE REPEAT simply needs to use a value similar to “MIN(BUCKET_CNT,254)” as its target bucket count.
This was backed up (if not exactly confirmed) by then checking the definition of the SYS view USER_TAB_COLS. The extract below is for the NUM_BUCKETS column:
case when nvl(h.distcnt,0) = 0 then h.distcnt when h.row_cnt = 0 then 1 when (h.bucket_cnt > 255 or (h.bucket_cnt > h.distcnt and h.row_cnt = h.distcnt and h.density*h.bucket_cnt < 1)) then h.row_cnt else h.bucket_cnt end
We can see that for a BUCKET_CNT > 255 the value used for NUM_BUCKETS is switched for HIST_HEAD$.ROW_CNT – this is the number of records in USER_HISTOGRAMS for the column, i.e. for FREQUENCY BASED histograms the number of buckets actually stored.
I’m not sure I’ve demonstrated this very well. In essence for SIZE REPEAT:
- FREQUENCY BASED histograms disregard any value previously specified and pick up a bucket count of the number of distinct rows in the row sample (if NUM_DISTINCT is <= 254 of course).
- HEIGHT BALANCED histograms can always reuse the value stored in HIST_HEAD$.BUCKET_CNT which will only be between 1 and 254.