I have an example here of a case where a DBMS_STATS METHOD_OPT parameter setting of REPEAT does not do as expected (by me at least).

We have a table with 3 columns. The table is loaded with 1000 rows, all columns in all rows are set to 1.

drop table tab1 ; create table tab1 ( colsz1 number , colsz3 number , colsz254 number ); insert into tab1 select 1,1,1 from all_objects where rownum <= 1000; commit;

We now gather statistics on each column with varying histogram sizes and check the resulting histogram settings for each one. The column names will help keep track of which is which.

exec dbms_stats.gather_table_stats(null,'tab1',estimate_percent=>100,method_opt=>'for columns colsz1 size 1'); exec dbms_stats.gather_table_stats(null,'tab1',estimate_percent=>100,method_opt=>'for columns colsz3 size 3'); exec dbms_stats.gather_table_stats(null,'tab1',estimate_percent=>100,method_opt=>'for columns colsz254 size 254'); column column_name format a12 select column_name,num_distinct,num_buckets,histogram from dba_tab_columns where table_name = 'TAB1' order by 1;

Below we can observe that “SIZE 1″ did not create a histogram and “SIZE 3″ & “SIZE 254″ created frequency histograms, each with 1 bucket. All as expected.

COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM ------------ ------------ ----------- --------------- COLSZ1 1 1 NONE COLSZ254 1 1 FREQUENCY COLSZ3 1 1 FREQUENCY

Now we add 3 new values to each column in the table giving us 4 distinct values in total.

insert into tab1 values (2,2,2); insert into tab1 values (3,3,3); insert into tab1 values (4,4,4); commit;

If we gather statistics for all of the columns using METHOD_OPT size REPEAT what do you expect to happen? My expectation is that:

- COLSZ1 will have 1 bucket
- COLSZ3 will have 3 buckets and become a HEIGHT BALANCED histogram because we have more values that the number of buckets
- COLSZ254 will have 4 buckets and remain a FREQUENCY histogram

Let’s see…

exec dbms_stats.gather_table_stats(null,'tab1',estimate_percent=>100,method_opt=>'for all columns size repeat'); COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM ------------ ------------ ----------- --------------- COLSZ1 4 1 NONE COLSZ254 4 4 FREQUENCY COLSZ3 4 4 FREQUENCY

I was wrong! Size REPEAT ignored me (or perhaps corrected me) and created a fourth bucket. What if I don’t use size REPEAT and force a size of 3…

exec dbms_stats.gather_table_stats(null,'tab1',estimate_percent=>100,method_opt=>'for columns colsz3 size 3'); COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM ------------ ------------ ----------- --------------- COLSZ1 4 1 NONE COLSZ254 4 4 FREQUENCY COLSZ3 4 3 HEIGHT BALANCED

Now that’s what I expected. So now the histogram is HEIGHT BALANCED let’s try size REPEAT again.

exec dbms_stats.gather_table_stats(null,'tab1',estimate_percent=>100,method_opt=>'for all columns size repeat'); COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM ------------ ------------ ----------- --------------- COLSZ1 4 1 NONE COLSZ254 4 4 FREQUENCY COLSZ3 4 3 HEIGHT BALANCED

Interesting. So this time size REPEAT did not introduce a fourth bucket.

At first this seemed to be significant and I was searching for a case where I could catch it out. However it looks to me that the DBMS_STATS code is behaving sensibly:

- If we have a frequency histogram and a new value appears then DBMS_STATS will add an extra bucket if possible. In the example above I specifically asked for three buckets when there was only one value, when a fourth distinct value was added DBMS_STATS was smart enough to add another bucket.
- If we have a HEIGHT BALANCED histogram and another distinct value is added to the column then DBMS_STATS assumes there was a good reason for specifying fewer buckets than values and leaves it well alone.

The title of this post was “When does METHOD_OPT size REPEAT not obey?”. The answer of course is “When it knows better!”

The implementation of this turns out to be quite simple and will be demonstrated in my next post.

Pingback: Where is METHOD_OPT SIZE REPEAT’s repeat value stored? « ORAganism