Extended RAC Cluster

Introduction

I’ve just finished building an extended RAC cluster on Oracle VM following the instructions written by Jakub Wartak. I can’t claim it was plan sailing, so I’m listing the issues I encountered here in the hope that it helps someone else.

Before I start with the issues I want to thank Jakub for making his article available. After first seeing it about 6 or 7 months ago I wanted to get some kit to play with… It took a while for me to decide on what to order and there were other distractions to attend to, but I ordered the following a couple of weeks ago.

  • Asus V3-M3N8200 AM2 Barebone
  • AMD Phenom X4 9350e
  • Kingston DDR2 800MHz/PC2-6400 HyperX Memory (8GB)
  • Western Digital WD5000AAKS 500GB SATA II x 3

My management box for Oracle VM Manager and NFS for 3rd voting disk is an old Compaq EVO D510 SFF (2.0GHz, 512MB RAM, 80G HDD) – It’s worth noting that Oracle state that Oracle VM Manager 2.1.2 requires 2GB RAM, but I’ve managed with 512MB.

A question I asked myself before completing the installation(s) and something that I’ve been asked by a couple of colleages is, “Is it possible to install Oracle VM Server and not use Oracle VM Manager?” The answer to which seems like a definite YES.

Oracle VM Manager Installation

I am running Oracle Enterprise Linux 5 and the Oracle VM Manager installed with no issues. The only slight gotcha was the installer complaining about insufficient swap space. This was something I hit the second time I was installing Oracle VM Manager and on investigation was due to swap space being allocated. I shut a few things down and ran a quick “swapoff -a; swapon -a”.

Oracle VM Server Installation

This is where the majority of my time has been spent. The first issue I hit was the installer not being able to see my 3 SATA disks. After a fair amount of frustration and reading I discovered the Linux boot option of *pci=nomsi*. This combined with setting my BIOS to treat the disks at *AHCI* rather than SATA resolved this issue.

The next problem was stopping my machine (all brand new kit) from rebooting. I was probably a bit slow to work this one out, but it turns out that one of my four 2GB RAM sticks was bad and as soon as I pin pointed the problem and just stuck to 6GB I could move on. Based on this experience discussion with my Sys Admin colleagues I’d recommend running memtest86 from the Oracle VM Server installation CD on your machine before attempting the installation.

Well not quite. There was one more issue holding me back from the stuff I really wanted to be doing. I don’t know if this can be explained by different version of Oracle VM templates or Oracle VM Server, but it turns out that I was hitting bug 223947. The symptons were the below messages showing up on the console for my VM Server.

raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 5
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921017 5
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921022 5
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921020 5
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921022 5
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921022 5
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921019 11
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521988084 8
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521993716 8
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921022 5

Maybe the version of the VM template that Jakub used did not use LVM? Anyway moving to a disk configuration that relied on RAID 1 and RAID 5 got me around this issue.

Creating the Openfilers

When I attempted to start up the Openfilers for the first time I received an error relating to the bridge sanbr0.

Error: Device 1 (vif) could not be connected. Could not find bridge device sanbr0

To reslove this I just skipped on a bit to the section that sets up the bridges and run those commands earlier that specified.

brctl addbr sanbr0
ip link set dev sanbr0 up

Once I’d got the Openfiler VMs running and followed the instructions Jakub provided for configuration I experienced a peculiar issue with the webpage. When logging in I was not being taken to the “Administration Section”, but instead to “Home”. On the Home page there is a link “administer the storage device from here.”, which when clicked took me back to the Home page. I did a bit of searching and found a post on the Openfiler forum. This didn’t really give me much to go on, but I left everything running whilst I went to work and returned to find the same problem… I then tried restarting Firefox and hey presto, it worked. I don’t have a good answer to why it worked other then something cache related.

Configuration of Oracle Enterprise Linux VMs

Use of quotation marks in echo “MTU=9000” >> /etc/sysconfig/network-scripts/ifcfg-eth1 (etc) caused an issue when restarting the network service and the quotation marks needed to be removed from the file.

Update: The above issue is due to copy & paste from HTML to shell – the double quotation marks in the HTML are not translated to “simple” double quotation marks in shell as shown below (thanks Jakub):

     [vnull@xeno ~]$ echo “MTU=9000” | cat -v
     M-bM-^@M-^\MTU=9000M-bM-^@M-^]
     [vnull@xeno ~]$ echo "MTU=9000" | cat -v
     MTU=9000

My Oracle Enterprise Linux VMs did not have a /dev/hdd, so a ran fdisk -l to discover /dev/xvdb, which can also be seen in the vm.cfg file. I assume that this has changed in the VM templates since Jakub downloaded his.

The iSCSI disks were presented differently than described the article, which I believe is a result of something not going to plan in the /etc/udev/scripts/iscsidev.sh script, but I don’t know this for sure. I became aware of the problem when running the script to partition the iSCSI disk as errors were generated. fdisk -l showed me disks sda – sdf, so I just created partitions on these and used them directly with any problems to date (it’s only been 3 days). The output below might be helpful in working out what has gone wrong.

[root@erac1 ~]# ls -l /dev/iscsi/
total 0
drwxr-xr-x 2 root root 380 Mar 1 19:09 lun
[root@erac1 ~]# ls -l /dev/iscsi/lun
total 0
lrwxrwxrwx 1 root root 12 Mar 1 19:09 part -> ../../../sdf
lrwxrwxrwx 1 root root 12 Mar 1 19:09 part0 -> ../../../sg0
lrwxrwxrwx 1 root root 13 Mar 1 19:09 part1 -> ../../../sde1
lrwxrwxrwx 1 root root 14 Mar 1 19:08 part10 -> ../../../ram10
lrwxrwxrwx 1 root root 14 Mar 1 19:08 part11 -> ../../../ram11
lrwxrwxrwx 1 root root 14 Mar 1 19:08 part12 -> ../../../ram12
lrwxrwxrwx 1 root root 14 Mar 1 19:08 part13 -> ../../../ram13
lrwxrwxrwx 1 root root 14 Mar 1 19:08 part14 -> ../../../ram14
lrwxrwxrwx 1 root root 14 Mar 1 19:08 part15 -> ../../../ram15
lrwxrwxrwx 1 root root 12 Mar 1 19:09 part2 -> ../../../sg2
lrwxrwxrwx 1 root root 12 Mar 1 19:09 part3 -> ../../../sg3
lrwxrwxrwx 1 root root 12 Mar 1 19:09 part4 -> ../../../sg4
lrwxrwxrwx 1 root root 12 Mar 1 19:09 part5 -> ../../../sg5
lrwxrwxrwx 1 root root 13 Mar 1 19:08 part6 -> ../../../ram6
lrwxrwxrwx 1 root root 13 Mar 1 19:08 part7 -> ../../../ram7
lrwxrwxrwx 1 root root 13 Mar 1 19:08 part8 -> ../../../ram8
lrwxrwxrwx 1 root root 13 Mar 1 19:08 part9 -> ../../../ram9

From looking at the script and later in the instructions I would have expected a the lun directory to had a digit at the end. As this isn’t currently causing me any issues I’ve not looked into it further.

Third Voting Disk

During installation of Oracle Clusterware I received an error when specifying 3 locations for my voting disks.

The location /votedisk/third_votedisk.crs, entered for the Additional Cluster Synchronization Services (CSS) voting disk is not shared across all the nodes in the cluster. Specify a shared raw partition or cluster file system file that is visible by the same name on all nodes of the cluster.

I continued the installation with only one voting disk and went back after the installation to work out what the issue was. It turned out to be a permissions problem and I needed to modify the options in /etc/exports as show below.

/votedisk *(rw,sync,all_squash,anonuid=500,anongid=500)

to

/votedisk *(rw,sync,all_squash,anonuid=500,anongid=501)

The permissions of the third_votedisk.crs file also required changing to match the “anon” settings, which in my case due to differing UID and GID values on the Oracle VM Manager box meant setting the following permissions.

[martin@ora-vmm ~]$ ls -l /votedisk/third_votedisk.crs
-rw-r----- 1 martin dba 335544320 Mar 1 20:04 /votedisk/third_votedisk.crs

The important thing is not what the permissions show as locally, but how they appear on the RAC nodes, i.e.:

[oracle@erac1 ~]$ ls -l /votedisk/third_votedisk.crs
-rw-r----- 1 oracle oinstall 335544320 Mar 1 2009 /votedisk/third_votedisk.crs

I assume that the group read permission could be safely removed if deemed desirable from a security point of view.

Enterprise Manager

Near the end of the database installation I received an error regarding Enterprise Manager, which I don’t recall the details of, but I can access the Enterprise Manager console and things seems to work so far. I’ll update the post if I discover any issues.