When does METHOD_OPT size REPEAT not obey?

I have an example here of a case where a DBMS_STATS METHOD_OPT parameter setting of REPEAT does not do as expected (by me at least).

We have a table with 3 columns. The table is loaded with 1000 rows, all columns in all rows are set to 1.

drop  table tab1 ;
create table tab1 
(	colsz1 number
,	colsz3 number
,	colsz254 number
);
insert into tab1
select 1,1,1 from all_objects where rownum <= 1000;
commit;

We now gather statistics on each column with varying histogram sizes and check the resulting histogram settings for each one. The column names will help keep track of which is which.

exec dbms_stats.gather_table_stats(null,'tab1',estimate_percent=>100,method_opt=>'for columns colsz1 size 1');
exec dbms_stats.gather_table_stats(null,'tab1',estimate_percent=>100,method_opt=>'for columns colsz3 size 3');
exec dbms_stats.gather_table_stats(null,'tab1',estimate_percent=>100,method_opt=>'for columns colsz254 size 254');

column column_name format a12
select column_name,num_distinct,num_buckets,histogram 
from dba_tab_columns 
where table_name = 'TAB1' order by 1;

Below we can observe that “SIZE 1″ did not create a histogram and “SIZE 3″ & “SIZE 254″ created frequency histograms, each with 1 bucket. All as expected.

COLUMN_NAME  NUM_DISTINCT NUM_BUCKETS HISTOGRAM
------------ ------------ ----------- ---------------
COLSZ1                  1           1 NONE
COLSZ254                1           1 FREQUENCY
COLSZ3                  1           1 FREQUENCY

Now we add 3 new values to each column in the table giving us 4 distinct values in total.

insert into tab1 values (2,2,2);
insert into tab1 values (3,3,3);
insert into tab1 values (4,4,4);
commit;

If we gather statistics for all of the columns using METHOD_OPT size REPEAT what do you expect to happen? My expectation is that:

  • COLSZ1 will have 1 bucket
  • COLSZ3 will have 3 buckets and become a HEIGHT BALANCED histogram because we have more values that the number of buckets
  • COLSZ254 will have 4 buckets and remain a FREQUENCY histogram

Let’s see…

exec dbms_stats.gather_table_stats(null,'tab1',estimate_percent=>100,method_opt=>'for all columns size repeat');

COLUMN_NAME  NUM_DISTINCT NUM_BUCKETS HISTOGRAM
------------ ------------ ----------- ---------------
COLSZ1                  4           1 NONE
COLSZ254                4           4 FREQUENCY
COLSZ3                  4           4 FREQUENCY

I was wrong! Size REPEAT ignored me (or perhaps corrected me) and created a fourth bucket. What if I don’t use size REPEAT and force a size of 3…

exec dbms_stats.gather_table_stats(null,'tab1',estimate_percent=>100,method_opt=>'for columns colsz3 size 3');

COLUMN_NAME  NUM_DISTINCT NUM_BUCKETS HISTOGRAM
------------ ------------ ----------- ---------------
COLSZ1                  4           1 NONE
COLSZ254                4           4 FREQUENCY
COLSZ3                  4           3 HEIGHT BALANCED

Now that’s what I expected. So now the histogram is HEIGHT BALANCED let’s try size REPEAT again.

exec dbms_stats.gather_table_stats(null,'tab1',estimate_percent=>100,method_opt=>'for all columns size repeat');

COLUMN_NAME  NUM_DISTINCT NUM_BUCKETS HISTOGRAM
------------ ------------ ----------- ---------------
COLSZ1                  4           1 NONE
COLSZ254                4           4 FREQUENCY
COLSZ3                  4           3 HEIGHT BALANCED

Interesting. So this time size REPEAT did not introduce a fourth bucket.

At first this seemed to be significant and I was searching for a case where I could catch it out. However it looks to me that the DBMS_STATS code is behaving sensibly:

  • If we have a frequency histogram and a new value appears then DBMS_STATS will add an extra bucket if possible. In the example above I specifically asked for three buckets when there was only one value, when a fourth distinct value was added DBMS_STATS was smart enough to add another bucket.
  • If we have a HEIGHT BALANCED histogram and another distinct value is added to the column then DBMS_STATS assumes there was a good reason for specifying fewer buckets than values and leaves it well alone.

The title of this post was “When does METHOD_OPT size REPEAT not obey?”. The answer of course is “When it knows better!”

The implementation of this turns out to be quite simple and will be demonstrated in my next post.

About these ads

One thought on “When does METHOD_OPT size REPEAT not obey?

  1. Pingback: Where is METHOD_OPT SIZE REPEAT’s repeat value stored? « ORAganism

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s