Oracle Dictionary fragmentation

The purpose of this post is mainly to highlight the performance degradation due to dictionary index fragmentation. It is something that oracle not widely announce but it came from the physical structure of the database.

Oracle databases have the AGE and the age mainly came from the number of DDL operations done on the database. The DDL operations modify the dictionary and introduce fragmentation to the indexes and tables.

I have made the small test case

-- CREATE TABLE
DROP TABLE list_customers
/
CREATE TABLE list_customers
   ( customer_id             NUMBER(6)
   , cust_first_name         VARCHAR2(20)
   , cust_last_name          VARCHAR2(20)
   , nls_territory           VARCHAR2(30)
   , cust_email              VARCHAR2(30))
   PARTITION BY LIST (nls_territory) (
   PARTITION asia VALUES ('CHINA', 'THAILAND'),
   PARTITION europe VALUES ('GERMANY', 'ITALY', 'SWITZERLAND'),
   PARTITION west VALUES ('AMERICA'),
   PARTITION east VALUES ('INDIA'))
/
-- ADD partitions
ALTER SESSION SET EVENTS '10046 trace name context forever, level 4'
/
ALTER TABLE list_customers ADD PARTITION south VALUES ('ANTARCTICA')
/
EXIT
-- DROP partition
ALTER SESSION SET EVENTS '10046 trace name context forever, level 4'
/
ALTER TABLE list_customers DROP PARTITION south
/
EXIT

It is oversimplified method without dependancies to the object and object statistics. But it already create two massive traces.
In summary during the INSERT command we insert to tables
OBJ$, DEFERRED_STG$ and TABPART$
During delete operation we remove rows from this tables
As you all know insert-delete tables have high level of fragmentation on the indexed columns

We run the standard report for indexes on 3 mentioned tables and it shows that estimated number of leaf blocks for some of them dramatically smaller then the actual one.

INDEX Estimated Size Actual Size
SYS.TABPART$.I_TABPART_OBJ$ 47 279
SYS.OBJ$.I_OBJ3 4 20
SYS.OBJ$.I_OBJ4 422 1475
SYS.OBJ$.I_OBJ1 422 969
SYS.TABPART$.I_TABPART_BOPART$ 68 125
SYS.DEFERRED_STG$.I_DEFERRED_STG1 30 53
SYS.OBJ$.I_OBJ5 1269 1728
SYS.OBJ$.I_OBJ2 1269 1726

In case you would try to rebuild this indexes in usual way, you would get the oracle error

ORA-00701: object necessary for warmstarting database cannot be altered

that actually block all attempts to fix the fragmentation.

Index fragmentation primarelly affect index FULL and RANGE scans operation but not UNIQUE index scan. UNIQUE scan would be affected only when INDEX would grow for additional level.

The number in a table does not show something dramatic but it looks like we already have mensurable performance impact on common database operations, like name resolution.

In long term I think every database with high number of structure modifications has to go through process of APPLICATION DATA migration regularly once in 5-15 years.

UKOUG Database Server SIG (Leeds 2013)

On Thursday I attended the UKOUG Database Server SIG in Leeds. All slides have been uploaded to the UKOUG website.

https://www.ukoug.org/events/ukoug-database-server-sig-meeting-may-2013/

It’s the first SIG I’ve attended this year and after enjoying it so much I ought to try harder to get to other events. We had Oak Table presence from David Kurtz talking all things partitioning/compression and purging, two Oracle employees discussing first support and then ZFS/NetApp (I particularly enjoyed this one) and then Edgars Rudans on his evolution of the Exadata Minimal Patching process. All of these presentations are well worth downloading and checking out.

The last presentation of the day was me. I’ve never presented before and it took a big step out of my comfort zone to get up there but I’m so glad I did. I would recommend it to anyone currently spending time in the audience thinking “I wish I had the confidence to do that”. It’s nerve racking beforehand but exhilarating afterwards.

When blogging in the past I’ve liked how it makes you think a little bit harder before pressing the publish button. I think the best thing I got out of the whole presentation process was that it made me dig even deeper to make sure I’d done my homework.

After the SIG there was a good group headed out for Leeds Oracle Beers #3 which involved local beer, good burgers and Morris Dancing, all good fun.

Monitoring real-time apply progress

In many cases monitoring Data Guard, at least in Maximum Performance mode, is down to checking transport and apply lag to make sure standby database is not “too far” behind primary, and quite often it is sufficient in a day to day operation. By checking v$archived_log we can easily get information about the last archived and applied log sequence number:

SQL> select sequence#,ARCHIVED,APPLIED from v$archived_log order by sequence#;
SEQUENCE# ARCHIVED  APPLIED
---------- --------- ---------
… output truncated …
       208 YES       YES
       209 YES       YES
       210 YES       YES
       211 YES       IN-MEMORY

If we use real-time apply changes are applied to the standby database as soon as they are written to the standby redo logs, which should keep standby database as up to date as possibly without going to Maximum Availability or Maximum Protection.

However, if our primary database fails, how can we check what is the last change transmitted from the primary, or that all changes transmitted were actually applied before we activate standby?
Of course we can trust that Oracle will apply all the transmitted redo, when standby is activated, but without knowing what was the last transmitted change we have no way of verifying it. It is also possible that the failover decision depends on how far behind the primary the standby was at the time of failure or how much data can potentially be lost.

One way of finding the information we need is by checking v$standby_log and v$recovery_progress views.
v$standby_log displays information about standby redo logs. Columns LAST_CHANGE# and LAST_TIME can be used to find last changes transmitted from primary.

SQL> select GROUP#,THREAD#,SEQUENCE#,STATUS,LAST_CHANGE#,LAST_TIME from v$standby_log;

    GROUP#    THREAD#  SEQUENCE# STATUS           LAST_CHANGE# LAST_TIME
---------- ---------- ---------- ---------- ------------------ -------------------
         4          1        213 ACTIVE                1699715 2013-04-18 20:26:15
         5          1          0 UNASSIGNED
         6          1          0 UNASSIGNED
         7          1          0 UNASSIGNED

SQL>

v$recovery_progress can be used to monitoring database recovery operations, and gives us access to information like: Last Applied Redo, Active Apply Rate, Average Apply Rate, Apply Time per Log and a few more. The item we are interested in is “Last Applied Redo” and the value of the “TIMESTAMP” column. The value should be very close to or match the value of LAST_TIME column from v$standby_log view.

SQL> select START_TIME,TYPE, ITEM,UNITS,SOFAR,TIMESTAMP 
  2  from v$recovery_progress where ITEM='Last Applied Redo';
START_TIME          TYPE             ITEM                UNITS     TIMESTAMP
------------------- ---------------- ------------------- --------- -------------------
2013-04-17 22:46:26 Media Recovery   Last Applied Redo   SCN+Time  2013-04-18 20:26:16

SQL>

NOLOGGING in numbers

Hi All
I have made small investigation about redo generation. From early days of my career I was remember that nologging operation is very performance effective but never try to quantify this very.
Every application can theoretically be split into 4 groups of tables (I use my personal names but hopefully it has sense):
1) Regular tables – contain valuable information need to be stored for legal and functional purposes. Stored as normal tables.
2) Staging tables – contain process lifetime specific information, easily re-creatable. Used for transferring information between sessions and report generation. Stored as regular tables or materialized views.
3) Session Temporary tables– contain process lifetime specific information, easily re-creatable. Used for reporting stored as GLOBAL TEMPORARY tables ON COMMIT PRESERVE.
4) Transaction Temporary tables– contain process lifetime specific information, easily re-creatable. Used for processing optimisation stored as GLOBAL TEMPORARY tables ON COMMIT DELETE.
By default all 4 groups generate REDO logs records that can be significant amount of resources. The redo information is valuable if we:
1) Support StandBy database
2) Information inside tables is valuable and have to be safe in case of database crush.
To make the standby or backup completely usable after a nologging statement is run, a mechanism other than database recovery must be used to get or create current copies of the affected blocks. You have to drop and recreate the object with the invalidated blocks or truncate it, using the program that maintains the object. Thus extra step to manage switchover/failover to standby database process have to be introduced.
Again based on my understanding the only business requirements for logging is to keep data from “Regular tables”. The safety of the data from other groups is not such important.
The only DML operation that can be optimised in terms of REDO log generation is INSERT with APPEND hint. (MERGE is actually presentation layer above INSERT thus can be treated together) . Hint APPEND if it works have one negative issue. The data in new table is not actually available until end of transaction.Due to the following error.
ORA-12838: cannot read/modify an object after modifying it in parallel
It linked to the fact that oracle could not make consistent model of block if there is no UNDO information. This actually makes using this hint on Global Temporary tables with ON COMMIT DELETE rows unreasonable. You can insert data but never be able to use it until it would be deleted.
Another fact that I have to highlight UPDATE and DELETE always generate REDO information. Thus if the table intensively update the gains would be minimal. Avoiding this operation on a temporary tables is another skills that developers have to be used to for optimal performance of your application.
There are 5 parameters that actually affect SEGMENT logging: Database LOGGING, Database FORCE LOGGING, Tablespace LOGGING, Tablespace FORCE LOGGING (Can be switched on tablespaces with “Regular tables” and switched off on tablespaces with “Staging tables” , Table LOGGIN. Global Temporary tables actually always in NOLOGGING mode thus we can assume for table groups “Session Temporary tables” and “Transaction Temporary tables” always have all parameters equal to NO. Production databases should always be in protected mode thus the value DATABASE LOGGING should always be in YES, it takes value NO outside of investigation.
To test I have created the table TEST.BIGTABLE (column1 NUMBER) with 39999960 rows and few tables to generate INSERT as SELECT statement from BIGTABLE dataset. The results are below.

Regular table

TABLE LOGGING * * N Y N Y Y
TABLESPACE LOGGING * * Y N N Y Y
TABLESPACE FORCE LOGGING * Y N N N N N
DATABASE LOGGING Y Y Y Y Y Y Y
DATABASE FORCE LOGGING Y N N N N N N
Amount of redo for INSERT APPEND 501000K 501000K 457K 501000K 456K 501000K 501000K
Amount of redo for Standard INSERT AS SELECT 501000K 501000K 501000K 501000K 501000K 501000K 501000K

Amount of redo for temporary tables

Standard INSERT AS SELECT INSERT APPEND value
Transaction Temp Table 110K 0.3K
Session Temp Table 110K 0.3K

Hope all above have sense and can be used for good

P.S. The “redo size” values has been got from AUTOTRACE statistics.

OUGN Spring Seminar 2013

I arrived back in the UK yesterday after my second time at the Oracle User Group Norway’s Spring Seminar. I had a great time and even those that suffered with sea-sickness enjoyed themselves when they weren’t praying to the porcelain god. It was definitely a rougher sea on the first night this year compared to last, but lucky for me I was pretty much unaffected. However, the “Martin Cluster” suffered some major node failures with outages from Bach and Widlake.

The first day of the conference is on land in Oslo and some guy called Justin Bieber did a really good job of making sure that hotels in Oslo were in demand. I heard reports that his guys has fans that booked rooms in multiple hotels in the hope of one of them being the same hotel that Justin was staying in… Madness and an inconvenience of some of the conference attendees.

On day 1 Martin Bach and I ran a workshop on client connectivity to RAC databases under the banner of “RAC Attack II”. We covered Fast Connection Failover (FCF) for both Java and C# clients with particular focus on the bugs and gotcha that await those attempting to use the feature. On day 2 I did a presentation entitled “How Virtualisation Changed My Life” that aims to encourage attendees to make active use free virtualisation products on their own hardware in order to increase their knowledge and hands-on experience with the technology they work with or want to work with.

Outside of my speaking commitments I attended some great sessions and the following is a selection of my notes:

“Happiness is a state change” – Cary Millsap. Without the context of the rest of the keynote presentation (“Learning about Life through Business and Software”) this quotation might not make much sense. The point that Cary is making is that it is development and progression that we humans find rewarding rather than our state at specific point.

e-Vita employees Cato Aune and Jon Petter Hjulstad co-presented a session on “Weblogic 12c – experiences”. My only exposure Weblogic is when installing or managing Oracle Enterprise Manager and Oracle Identity Management products, neither of which use/support Weblogic 12c at this time, but I wanted to hear about what the latest Weblogic will surely bring my way in due course.

Joel Goodman gave a very good presentation on “RAC Global Resource Management Concepts” revealing the complexity of what goes on under the covers of your RAC database. Unfortunately the slides are not available even to conference attendees.

Connor McDonald‘s “Odds & Ends” was very enjoyable and it’s definitely worth grabbing the slides. My notes include:

  1. Use of oradebug suspend/resume as an alternative to killing a resource hungry session is an appealing idea
  2. I wasn’t aware of the use of “#” to run SQL*Plus command mid way through typing a SQL statement in SQL*Plus
  3. Making use of “set errorlogging on” isn’t something I currently do, but will look at
  4. The unsupported, but interesting “overlaps” clause in SQL is worth being aware of and Connor provides an associated MOS note ID in the slides

Frits Hoogland gave 3 presentations during the conference. Unfortunately the first (“Exadata OLTP”) was at the same time as mine. Fortunately I saw the other 2: “About multiblock reads” and “Advanced Profiling of Oracle Using Function Calls—A Hacking Session”. These work very well together and the hacking session was the highlight of the conference for me. There were no slides, so you can’t download them, but Frits and documented what he covers in “Profile of Oracle Using Function Calls (PDF)“. Notes from the sessions include:

  1. Frits prefers to set db_file_multiblock_read_count manually rather than unset or setting to zero
  2. The “physical reads” in autotrace output is number of blocks read not number IOs, which is a mistake he sees others making
  3. Direct path reads don’t stop at extent boundaries and a single request can read multiple [contiguous] extents
  4. Use perf to break out what CPU is being used for

Kai Yu presented “Optimizing OLTP Oracle Database Performance using PCIe SSD”. He shared his experiences and covered the use cases for this type of storage in an Oracle database infrastructure. Very significant performance improvements are available, but as always it depends on your implementation/workload.

Bjoern Rost‘s “The ins and outs of Total Recall” covered his experiences using Total Recall aka Flashback Data Archive (FBA). Does it really need 2 names? He showed how it had been used for what I understood to be a slowly changing dimension use case without the need to change existing parts of the application. They had been bitten by the change covered by MOS Note “Initial Extent Size of a Partition Changed To 8MB From 64KB [ID 1295484.1]“. The most interesting part of presentation was detailed coverage of DISSOCIATE_FBA so grab the slides if you use FBA. It’s also worth noting that Total Recall/Flashback Data Archive is included in Advanced Compression so you might find you have the option of using it without specifically purchasing it.

Cary Millsap‘s “Millsap’s Grand Unified Theory of ʺTuningʺ” emphasised the point that end user experience is what really matters and covered what tools are appropriate in specific phases of performance analysis.

If the agenda for next year is anything like this year then it’s definitely worth considering a trip to Oslo for a boat ride to Kiel and back.

A massive thank you to OUGN for putting on the seminar, accepting my presentations, excellent organisation and fantastic hospitality.

OSWatcher Startup/Restart On Exadata

When the question of what starts OSWatcher (OSW) on Exadata was raised at a client site I thought I’d take a quick look. It took me a little longer than I expected to work out the detail and therefore it seems worth sharing.

If you’re simply looking to change the “snapshot interval”, “archive retention” or “compression command” then /opt/oracle.cellos/validations/init.d/oswatcher is what you need to modify and you’ll find a line with ./startOSW.sh X Y Z. Where X is the snapshot interval, Y is the archive retention and Z is the compression command used to compress the output files.

If you’re curious to know the details of what starts and restarts OSWatcher than read on.

The following is applicable to the X2-2 I regularly get my hands on which is running 11.2.2.4.2 and I don’t know if things change with later versions, so apologies if this isn’t applicable to your Exadata environment.

Startup of OSWatcher on boot is indirectly handled by /etc/init.d/rc.local, which includes:

########### BEGIN DO NOT REMOVE Added by Oracle Exadata ###########
if [ -x /etc/rc.d/rc.Oracle.Exadata ]; then
  . /etc/rc.d/rc.Oracle.Exadata
fi
########### END DO NOT REMOVE Added by Oracle Exadata ###########

/etc/rc.d/rc.Oracle.Exadata includes:

# Perform validations step
/opt/oracle.cellos/vldrun -all

The main purpose of /opt/oracle.cellos/vldrun and the Perl script /opt/oracle.cellos/validations/bin/vldrun.pl appears to be ensuring configuration changes are made on initial boot and after upgrades, although I haven’t looked into all the detail yet. The part of /opt/oracle.cellos/vldrun that is relevant in the context of starting OSWatcher on every boot is:

$VLDRUN_PL -quiet "$@"

This executes /opt/oracle.cellos/validations/bin/vldrun.pl with the -quiet and -all arguments (as that was passed to /opt/oracle.cellos/vldrun)

The “quiet” argument is pretty obvious and a little reading reveals that “all” simply means that all scripts in /opt/oracle.cellos/validations/init.d/ should be executed.

So off to /opt/oracle.cellos/validations/init.d/ we go:

root@my-host ~]# ls -1 /opt/oracle.cellos/validations/init.d/
beginfirstboot
biosbootorder
cellpreconfig
checkconfigs
checkdeveachboot
checklsi
diskhealth
ipmisettings
misceachboot
misczeroboot
oswatcher
postinstall
sosreport
syscheck
[root@my-host ~]#

… and in oswatcher, as already mentioned in the second paragraph of the post, you’ll find ./startOSW.sh X Y Z, where X is the snapshot interval, Y is the archive retention and Z is the compression command used to compress the output files.

OK, so that’s what starts OSWatcher on boot, but you should also know that OSWatcher is restarted daily by /etc/cron.daily/cellos, which includes:

/opt/oracle.cellos/validations/bin/vldrun.pl -script oswatcher > /dev/null 2>&1

The only bit of all this that doesn’t really sit right with me is that OSWatcher is included with “validations”. That doesn’t seem like an appropriate description to me.

Trivial as it may be, I hope that later version of the Exadata software move from what is described above to the “service” based approach used on non-Exadata platforms and documented in How To Start OSWatcher Black Box Every System Boot [ID 580513.1]. This feel like a much more standard approach and allows control of the service using the /sbin/service and /sbin/chkconfig commands.

Rename LVM Volume Group Holding Root File System Volume

I’m not sure exactly when this change happened, but in Oracle [Enterprise] Linux 5 days a default installation would result in the root file system being created as a LVM logical volume (LV) named LogVol00 in a volume group (VG) named VolGroup00. I must confess that I wasn’t paying too much attention to LV and VG default names in the days I was playing with OL5 a lot, but that’s partly because there was nothing to drag my attention to them.

Along comes Oracle Linux 6 and off I go creating VMs, cloning them and then really not liking the fact that the default VG created during the installation, and holding the LV for the root file system, is named vg_<hostname> where <hostname> is the hostname of the original VM I installed. If I clone a VM the next thing I do is change the hostname, which means that I’m left with an inconsistent and somewhat confusing VG name. I think I screwed up one VM before realising that it wasn’t just simply a case of renaming the VG and updating /etc/fstab. I asked a friend who does much more Linux admin how to achieve what I wanted and didn’t take it any further when he said words to the effect of, “Yeah, it’s more complicated than that.”

Update [5th August 2014]

It turns out that renaming the volume group that hold the logical volume the root file system is on is not as complex as I had previously thought. Comments from Brian suggest that there is no need to recreate the initramfs and that it can be done without booting into rescue mode. I’ve just tested Brian suggestions and he’s right. It is as simple as:

  1. Rename Volume Group
  2. Update /etc/fstab
  3. Update /boot/grub/grub.conf
  4. Reboot

Brian – Thanks a lot for your comments and pointing out unnecessary steps.

This update makes the rest of the post mostly useless, but I’ll leave it all there for context.

End of update [5th August 2014]

Fairly recently I walked into the same situation again, only this time I decided that I wasn’t going to take “more complicated” for an answer :-). I searched, found a few articles that seemed to have logic in their approach and figured I had nothing to lose. I also thought there were some redundant steps in the posts I was following, hence feeling it’s worth blogging my slimmed down approach.

Well, that’s enough preamble. Here’s the details:

1) Boot Into Rescue Mode

For me booting from CD/iso was the easiest way to get into “rescue mode”. To do this select “Rescue installed system” when the welcome screen is presented during the boot process. You will then be prompted with:

Choose a Language – You know better than I what’s the best choice for you. Select it and then OK.

Keyboard Type – Again, pick what you think best matches your keyboard. Then select OK.

Rescue Method – Select “Local CD/DVD”, then OK.

Setup Networking – Select “No”

Rescue – Select “Continue”

Rescue (message about your system being mounted under /mnt/sysimage and use of chroot) – OK.

Rescue (another message about having mounted your system under /mnt/sysimage) – OK.

First Aid Kit quickstart menu – Select “shell  Start shell”, then OK.

The above will get you to a prompt so you can actually do what you came here for!

2) Rename Volume Group

The LVM commands you issue are the same as usual, only they need to be prefixed with lvm. I suggest listing the VGs to be sure the state of the system is as you expect, and using more is a good idea as you don’t have a scrollbar, i.e.:

lvm vgdisplay | more

Once you’re happy, rename the VG as below:

lvm vgrename <original> <desired>

You should get a success message after this command.

3) Update All References

Change the root directory to that of your installed system using chroot:

chroot /mnt/sysimage

The following files need to be modified to replace references to the old VG name with the new VG name:

  • /etc/fstab
  • /boot/grub/grub.conf

There will be multiple references per line in grub.conf, so a bit of “global replace” is in order.

4) Create New Ramdisk Image

Run the following command to make a new initial ramdisk image

mkinitrd --force /boot/initramfs-<kernel version>.img <kernel version>

Note that the force option is only required because there is already an existing image with the same name. You could use a different name if you want, but you’d need to add an appropriate entry to the grub.conf to reflect this.

If the above command completes without error messages and you didn’t make any errors in the editing of the files earlier then you should be all set… Only one way to find out!

5) Reboot Machine

Exit out of the chroot environment (type “exit”).

Exit out of the shell to return to the “First Aid Kit quickstart menu” (type “exit”).

First Aid Kit quickstart menu – Select “reboot Reboot”, then OK.

At the welcome screen select “Boot from local drive”.

If all goes well then remember to remove the CD/iso from your [virtual] CD drive.

References

The 2 articles that helped me with this are the following, so thanks to the authors:

http://technoconfessions.blogspot.co.uk/2010/05/renaming-volume-group-that-your-root.html
https://we.riseup.net/debian/renaming-a-lvm-root-volume-group

Update [27th January 2014]

I have just noticed that the default volume group name when installing Oracle Linux 6.5 has changed from “vg_<hostname>” to “VolGroup”.