What’s in a Name?

I’ve been meaning to write this up for a while now, but just haven’t found the time. Anyway, this is a little “gotcha” for those installing 11.2 Grid Infrastructure that care about consistency of naming… Maybe you don’t? Maybe I shouldn’t?

While building a 4 node RAC system I got to the point of:

You must run the root.sh script on the first node and wait for it to finish. If your cluster has four or more nodes, then root.sh can be run concurrently on all nodes but the first and last. As with the first node, the root.sh script on the last node must be run separately.

So, I merrily run root.sh and afterwards find that my ASM instances are named in a way I didn’t like or expect. My 4 servers were named: ora11-2-1, ora11-2-2, ora11-2-3, ora11-2-4; and I ended up with ASM instances: +ASM1, +ASM2, +ASM3, +ASM4. All as you’d expect. However, +ASM2 was running on ora11-2-3 and +ASM3 was running on ora11-2-2!

Q1: Does it really matter?

A1: No. At least I can’t see a reason why it would matter, but if you can think of any then please comment.

Q2: Did I want to understand why it happened and how to avoid it?

A2: Of course.

So, a little digging and experimentation later I found what I believe to be the cause of the “problem”. In the rootcrs_`hostname`.log files I found the start time and the point where the ASM instance is created.

Note: There wasn’t anything specifically stating that the ASM instance was being created, but while running root.sh during later tests I watched for the creation of the ASM record in /etc/oratab and correlated that with the log file.

Start of the root.sh on nodes 2 and 3:

[root@ora11-2-2 ~]# grep "The configuration" $ORACLE_HOME/cfgtoollogs/crsconfig/rootcrs_ora11-2-*.log
2011-01-08 00:48:48: The configuration parameter file /u01/app/ is valid

[root@ora11-2-3 ~]# grep "The configuration" $ORACLE_HOME/cfgtoollogs/crsconfig/rootcrs_ora11-2-*.log
2011-01-08 00:48:54: The configuration parameter file /u01/app/ is valid

Creation of ASM instance on nodes 2 and 3:

[root@ora11-2-2 ~]# grep "Start of resource \"ora.cluster_interconnect.haip\" Succeeded" $ORACLE_HOME/cfgtoollogs/crsconfig/rootcrs_ora11-2-*.log
2011-01-08 00:56:50: Start of resource "ora.cluster_interconnect.haip" Succeeded

[root@ora11-2-3 ~]# grep "Start of resource \"ora.cluster_interconnect.haip\" Succeeded" $ORACLE_HOME/cfgtoollogs/crsconfig/rootcrs_ora11-2-*.log
2011-01-08 00:56:34: Start of resource "ora.cluster_interconnect.haip" Succeeded

The key thing to note is the times. The running of root.sh on ora11-2-2 started before ora11-2-3, but for whatever reason it got to the creation of the ASM instance on ora11-2-3 before it did on ora11-2-3.

I found it impossible to leave the system with the naming mismatch, so used rootcrs.pl to deconfigure Clusterware and re-ran root.sh, this time allowing it to finish on each node before starting the next. I ended with the ASM instance names matching the hostnames and got on with creating databases.

I haven’t tested this or dug deep enough into the code to be 100% sure of the above explanation, so if anyone has alternative suggestions then please share them.