ORAganism

RAC/Exadata Shell Aliases

Posted in Oracle by Martin Nash on May 27, 2012

I’ve been working on Oracle RAC systems for a while now and the subject of how best to setup shell environment variables and aliases seems to be something that there is a fair range of opinion on.

  • Do you have entries in oratab for both the database and the instance on that server?
  • Do you use oraenv or set the variables required for each environment via another method?
  • If you use oraenv, do you customise it?
  • How do you switch ORACLE_SID between the db_name and instance_name, if you do?

Please feel free to answer these questions via a comment below, however, the main point of this post is to share a few aliases that I’m finding very useful at the moment, and to encourage you to share aliases you use in your RAC environments. Actually, the non-RAC specific, Oracle related, aliases you create would be interesting to me too, but more on that later…

As I mentioned in a much earlier post I feel that in the world of RAC databases there is scope for an environment variable of ORACLE_DB or ORA_DB_NAME… But, I also feel that there’s a lot to be said for not introducing more variables then is necessary. To this end I have taken to creating the following aliases in RAC environments that I’m working in[1].

alias inst='ORACLE_SID=${ORACLE_SID}$($(grep "^+ASM[1-8]" /etc/oratab | cut -d":" -f2)/bin/olsnodes -l -n | cut -f2)'

alias db='ORACLE_SID=${ORACLE_SID%$($(grep "^+ASM[1-8]" /etc/oratab | cut -d":" -f2)/bin/olsnodes -l -n | cut -f2)}'

These aliases allow me to quickly switch ORACLE_SID from being set to the relevant value for the database to it being set to the relevant value for the instance on the given host. Obviously it doesn’t stop you modifying the value of ORACLE_SID to something that is not valid, but at least it is easy to undo a mistake you might have made.

He’s a quick example:

[ORCL@rac-02 ~]$ inst
[ORCL2@rac-02 ~]$ inst
[ORCL22@rac-02 ~]$ inst
[ORCL222@rac-02 ~]$ inst
[ORCL2222@rac-02 ~]$ db
[ORCL222@rac-02 ~]$ db
[ORCL22@rac-02 ~]$ db
[ORCL2@rac-02 ~]$ db
[ORCL@rac-02 ~]$ 

And, for a little more alias sharing… In my time working on Exadata environments I’ve found myself using dcli a lot and created the aliases below to save me typing on the basis that I, almost always, either want to use dcli to communicate with all “compute nodes” or all “storage cells” so why not simplify the process and shorten the commands:

alias ddcli='dcli -g ~/dbs_group -l oracle'

alias cdcli='dcli -g ~/cell_group -l cellmonitor'

I’d be keen to get a page going for useful aliases so we can all benefit from the ideas of each other… If you’ve got something to share please comment and if the collection builds up we’ll start a page for alias sharing.
__________
[1] – If I’m working in an environment where a shared “oracle” account is in use then I define my aliases in a separate file and source this when I login so that my preferences are not inflicted on others :-)

UK OUG conference: RAC & HA special interest group round table

Posted in Oracle by Ben Thompson on December 1, 2010

The RAC and HA roundtable gives users a chance to share experience and gotchas. Often the discussion points that generate most interest are when something unusual or unexplained has happened, with the participants attempting to drill down to where the problem was by firing questions at the DBA telling the tale. The discussion this time there was such a discussion around a RAC stretch cluster where the storage connection had been deliberately broken during a test (but not the RAC interconnect between the two sites). The end result of this was a failure of the cluster but after a restart, the database had been rolled back to a point in time several minutes before the failure. This is discussed here: http://davidburnham.wordpress.com/category/oracle-rac-stretched-cluster-testing/

Some participants suggested that Active Data Guard was a better option than a stretch RAC cluster with one saying that Active Data Guard should be the default position with RAC only deployed if Active DataGuard didn’t fit the bill. Unfortunately as a presentation on DataGuard was taking place at the same time, most people using it were presumably elsewhere.

Another discussion point was whether RAC actually reduces the volume of outages, ensuring high availability. Participants generally agreed that it is great for installing rolling updates and so on without disruption – in other words, planned maintenance work – and dealing with some other modes of failure. But sometimes the added complexity could cause unplanned outages that may never have happened on a single instance system. One participant argued that some of the outages cited should not happen, since they were down to human error.

Talk moved on to whether an Oracle Exadata box represents good value for money. One participant mentioned that he could easily build a similar set up from commodity hardware for a fraction of the cost, but it would not include a high-speed interconnect or the Exadata software. He cited some data-warehouse type queries that were sped up by an order of magnitude when implemented using Exadata. This had transformed the attitude to these queries in the company concerned: because they were quick to run, users were more likely to use them. So the answer is that it depends.

A somewhat inconclusive discussion took place on how best to deploy ASM when there are multiple databases: should all databases share the same ASM instance or diskgroup? One participant suggested that ASM was effectively a single point of failure which could cause all databases to fail in the event on a software bug. Another responded that he had been using ASM for four years without any such disruption. I guess ASM should just be viewed as another filesystem for this point of view and what are the chances of GPFS or even raw devices failing?

The number of RAC nodes in a cluster was another talking point. With a two node cluster, in the event of a failure you are left with just a single node which ought to be able to manage the full system load. Many participants seemed to prefer a three or four node cluster as 4 to 3 or 3 to 2 isn’t as dramatic a change as 2 to 1.

So, it was an interesting discussion and definitely a refreshing change from sitting passively through presentations.

Tagged with: , ,

Changing SYS Password in RAC Databases

Posted in Oracle by Martin Nash on May 22, 2010

The post title should really be “Changing SYS Password in RAC Databases when a password reuse limit is in place for SYS and how this has changed from 10.2 to 11.1″, but that would be a bit of a mouthful.

This is the post I was originally working on when I got side tracked by the change in behaviour of profiles assigned to SYS as detailed in “SYS and Password Resource Limits“. At the time I started this I wasn’t aware of the change and was just planning to use SYS passwords as an example of why you should listen to Tom Kyte when he says “… sys is magical, sys is special, sys is governed by a set of rules different from the rest of the world.” He’s right (of course) and the magical powers of SYS are subject to change!

The post is split into pre-11g and 11g or later. We can only assume that the behaviour won’t change back any time soon.

I’ll start with a disclaimer stating that I try to use the “password” command in SQL*Plus wherever possible to avoid displaying passwords in clear text on my screen and I would also need to have a very good reason to put passwords on the command line (OS or SQL*Plus), so the approach used in the examples is purely to make them clear.

Pre-11g

The example shows:

  1. Change password on node 1
  2. Connect to node 1 using new password successfully
  3. Validate instance name
  4. Connect to node 2 using new password unsuccessfully
  5. Connect to node 2 using original password successfully
  6. Attempt to change password on node 2 to new password fails as password reuse is restricted in profile assigned to SYS
  7. Copy and rename password file from node 1 to node 2 (command not shown, but I use scp)
  8. Connect to both nodes using same password successfully

SYS@orcl1> alter user sys identified by temp_pass1;

User altered.

SYS@orcl1> connect sys@orcl1/temp_pass1 as sysdba
Connected.
SYS@orcl1> select instance_name from v$instance;

INSTANCE_NAME
----------------
orcl1

SYS@orcl1> connect sys@orcl2/temp_pass1 as sysdba
ERROR:
ORA-01017: invalid username/password; logon denied

Warning: You are no longer connected to ORACLE.
@> connect sys@orcl2/<original password> as sysdba
Connected.
SYS@orcl2> select instance_name from v$instance;

INSTANCE_NAME
----------------
orcl2

SYS@orcl2> alter user sys identified by temp_pass1;
alter user sys identified by temp_pass1
*
ERROR at line 1:
ORA-28007: the password cannot be reused


SYS@orcl2>

Need to copy password file from node 1 and rename at this point (or change profile/resource limit)…

SYS@orcl2> connect sys@kn10st1/temp_pass1 as sysdba
Connected.
SYS@orcl1> select instance_name from v$instance;

INSTANCE_NAME
----------------
orcl1

SYS@orcl1> connect sys@kn10st2/temp_pass1 as sysdba
Connected.
SYS@orcl2> select instance_name from v$instance;

INSTANCE_NAME
----------------
orcl2

SYS@orcl2>

11g or later

The example shows:

  1. Change password on node 1
  2. Connect to node 1 using new password successfully
  3. Validate instance name
  4. Connect to node 2 using new password unsuccessfully
  5. Connect to node 2 using original password successfully
  6. Change password on node 2 to new password successful even though password reuse is restricted in profile assigned to SYS
  7. Connect to both nodes using same password successfully

SYS@orcl1> alter user sys identified by temp_pass1;

User altered.

SYS@orcl1> connect sys@orcl1/temp_pass1 as sysdba
Connected.
SYS@orcl1> select instance_name from v$instance;

INSTANCE_NAME
----------------
orcl1

SYS@orcl1> connect sys@orcl2/temp_pass1 as sysdba
ERROR:
ORA-01017: invalid username/password; logon denied

Warning: You are no longer connected to ORACLE.
@> connect sys@orcl2/<original password> as sysdba
Connected.
SYS@orcl2> select instance_name from v$instance;

INSTANCE_NAME
----------------
orcl2

SYS@orcl2> alter user sys identified by temp_pass1;

User altered.

SYS@orcl2> connect sys@orcl1/temp_pass1 as sysdba
Connected.
SYS@orcl1> select instance_name from v$instance;

INSTANCE_NAME
----------------
orcl1

SYS@orcl1> connect sys@orcl2/temp_pass1 as sysdba
Connected.
SYS@orcl2> select instance_name from v$instance;

INSTANCE_NAME
----------------
orcl2

SYS@orcl2>

The key point is that the password for SYS is instance specific. I don’t really understand why Oracle have not implemented something that updates the SYS password file on all nodes of a clustered database, but maybe some reason why this would not be desirable behaviour. If there is then I can’t see it. A friend has suggested that one solution to this would be to use symbolic links for each instance to point to a common password file (on shared storage).

Tagged with: , ,

Extended RAC Cluster

Posted in Oracle by Martin Nash on March 3, 2009

Introduction

I’ve just finished building an extended RAC cluster on Oracle VM following the instructions written by Jakub Wartak. I can’t claim it was plan sailing, so I’m listing the issues I encountered here in the hope that it helps someone else.

Before I start with the issues I want to thank Jakub for making his article available. After first seeing it about 6 or 7 months ago I wanted to get some kit to play with… It took a while for me to decide on what to order and there were other distractions to attend to, but I ordered the following a couple of weeks ago.

  • Asus V3-M3N8200 AM2 Barebone
  • AMD Phenom X4 9350e
  • Kingston DDR2 800MHz/PC2-6400 HyperX Memory (8GB)
  • Western Digital WD5000AAKS 500GB SATA II x 3

My management box for Oracle VM Manager and NFS for 3rd voting disk is an old Compaq EVO D510 SFF (2.0GHz, 512MB RAM, 80G HDD) – It’s worth noting that Oracle state that Oracle VM Manager 2.1.2 requires 2GB RAM, but I’ve managed with 512MB.

A question I asked myself before completing the installation(s) and something that I’ve been asked by a couple of colleages is, “Is it possible to install Oracle VM Server and not use Oracle VM Manager?” The answer to which seems like a definite YES.

Oracle VM Manager Installation

I am running Oracle Enterprise Linux 5 and the Oracle VM Manager installed with no issues. The only slight gotcha was the installer complaining about insufficient swap space. This was something I hit the second time I was installing Oracle VM Manager and on investigation was due to swap space being allocated. I shut a few things down and ran a quick “swapoff -a; swapon -a”.

Oracle VM Server Installation

This is where the majority of my time has been spent. The first issue I hit was the installer not being able to see my 3 SATA disks. After a fair amount of frustration and reading I discovered the Linux boot option of *pci=nomsi*. This combined with setting my BIOS to treat the disks at *AHCI* rather than SATA resolved this issue.

The next problem was stopping my machine (all brand new kit) from rebooting. I was probably a bit slow to work this one out, but it turns out that one of my four 2GB RAM sticks was bad and as soon as I pin pointed the problem and just stuck to 6GB I could move on. Based on this experience discussion with my Sys Admin colleagues I’d recommend running memtest86 from the Oracle VM Server installation CD on your machine before attempting the installation.

Well not quite. There was one more issue holding me back from the stuff I really wanted to be doing. I don’t know if this can be explained by different version of Oracle VM templates or Oracle VM Server, but it turns out that I was hitting bug 223947. The symptons were the below messages showing up on the console for my VM Server.

raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 5
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921017 5
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921022 5
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921020 5
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921022 5
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921022 5
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921019 11
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521988084 8
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521993716 8
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921022 5

Maybe the version of the VM template that Jakub used did not use LVM? Anyway moving to a disk configuration that relied on RAID 1 and RAID 5 got me around this issue.

Creating the Openfilers

When I attempted to start up the Openfilers for the first time I received an error relating to the bridge sanbr0.

Error: Device 1 (vif) could not be connected. Could not find bridge device sanbr0

To reslove this I just skipped on a bit to the section that sets up the bridges and run those commands earlier that specified.

brctl addbr sanbr0
ip link set dev sanbr0 up

Once I’d got the Openfiler VMs running and followed the instructions Jakub provided for configuration I experienced a peculiar issue with the webpage. When logging in I was not being taken to the “Administration Section”, but instead to “Home”. On the Home page there is a link “administer the storage device from here.”, which when clicked took me back to the Home page. I did a bit of searching and found a post on the Openfiler forum. This didn’t really give me much to go on, but I left everything running whilst I went to work and returned to find the same problem… I then tried restarting Firefox and hey presto, it worked. I don’t have a good answer to why it worked other then something cache related.

Configuration of Oracle Enterprise Linux VMs

Use of quotation marks in echo “MTU=9000” >> /etc/sysconfig/network-scripts/ifcfg-eth1 (etc) caused an issue when restarting the network service and the quotation marks needed to be removed from the file.

Update: The above issue is due to copy & paste from HTML to shell – the double quotation marks in the HTML are not translated to “simple” double quotation marks in shell as shown below (thanks Jakub):

     [vnull@xeno ~]$ echo “MTU=9000” | cat -v
     M-bM-^@M-^\MTU=9000M-bM-^@M-^]
     [vnull@xeno ~]$ echo "MTU=9000" | cat -v
     MTU=9000

My Oracle Enterprise Linux VMs did not have a /dev/hdd, so a ran fdisk -l to discover /dev/xvdb, which can also be seen in the vm.cfg file. I assume that this has changed in the VM templates since Jakub downloaded his.

The iSCSI disks were presented differently than described the article, which I believe is a result of something not going to plan in the /etc/udev/scripts/iscsidev.sh script, but I don’t know this for sure. I became aware of the problem when running the script to partition the iSCSI disk as errors were generated. fdisk -l showed me disks sda – sdf, so I just created partitions on these and used them directly with any problems to date (it’s only been 3 days). The output below might be helpful in working out what has gone wrong.

[root@erac1 ~]# ls -l /dev/iscsi/
total 0
drwxr-xr-x 2 root root 380 Mar 1 19:09 lun
[root@erac1 ~]# ls -l /dev/iscsi/lun
total 0
lrwxrwxrwx 1 root root 12 Mar 1 19:09 part -> ../../../sdf
lrwxrwxrwx 1 root root 12 Mar 1 19:09 part0 -> ../../../sg0
lrwxrwxrwx 1 root root 13 Mar 1 19:09 part1 -> ../../../sde1
lrwxrwxrwx 1 root root 14 Mar 1 19:08 part10 -> ../../../ram10
lrwxrwxrwx 1 root root 14 Mar 1 19:08 part11 -> ../../../ram11
lrwxrwxrwx 1 root root 14 Mar 1 19:08 part12 -> ../../../ram12
lrwxrwxrwx 1 root root 14 Mar 1 19:08 part13 -> ../../../ram13
lrwxrwxrwx 1 root root 14 Mar 1 19:08 part14 -> ../../../ram14
lrwxrwxrwx 1 root root 14 Mar 1 19:08 part15 -> ../../../ram15
lrwxrwxrwx 1 root root 12 Mar 1 19:09 part2 -> ../../../sg2
lrwxrwxrwx 1 root root 12 Mar 1 19:09 part3 -> ../../../sg3
lrwxrwxrwx 1 root root 12 Mar 1 19:09 part4 -> ../../../sg4
lrwxrwxrwx 1 root root 12 Mar 1 19:09 part5 -> ../../../sg5
lrwxrwxrwx 1 root root 13 Mar 1 19:08 part6 -> ../../../ram6
lrwxrwxrwx 1 root root 13 Mar 1 19:08 part7 -> ../../../ram7
lrwxrwxrwx 1 root root 13 Mar 1 19:08 part8 -> ../../../ram8
lrwxrwxrwx 1 root root 13 Mar 1 19:08 part9 -> ../../../ram9

From looking at the script and later in the instructions I would have expected a the lun directory to had a digit at the end. As this isn’t currently causing me any issues I’ve not looked into it further.

Third Voting Disk

During installation of Oracle Clusterware I received an error when specifying 3 locations for my voting disks.

The location /votedisk/third_votedisk.crs, entered for the Additional Cluster Synchronization Services (CSS) voting disk is not shared across all the nodes in the cluster. Specify a shared raw partition or cluster file system file that is visible by the same name on all nodes of the cluster.

I continued the installation with only one voting disk and went back after the installation to work out what the issue was. It turned out to be a permissions problem and I needed to modify the options in /etc/exports as show below.

/votedisk *(rw,sync,all_squash,anonuid=500,anongid=500)

to

/votedisk *(rw,sync,all_squash,anonuid=500,anongid=501)

The permissions of the third_votedisk.crs file also required changing to match the “anon” settings, which in my case due to differing UID and GID values on the Oracle VM Manager box meant setting the following permissions.

[martin@ora-vmm ~]$ ls -l /votedisk/third_votedisk.crs
-rw-r----- 1 martin dba 335544320 Mar 1 20:04 /votedisk/third_votedisk.crs

The important thing is not what the permissions show as locally, but how they appear on the RAC nodes, i.e.:

[oracle@erac1 ~]$ ls -l /votedisk/third_votedisk.crs
-rw-r----- 1 oracle oinstall 335544320 Mar 1 2009 /votedisk/third_votedisk.crs

I assume that the group read permission could be safely removed if deemed desirable from a security point of view.

Enterprise Manager

Near the end of the database installation I received an error regarding Enterprise Manager, which I don’t recall the details of, but I can access the Enterprise Manager console and things seems to work so far. I’ll update the post if I discover any issues.

Follow

Get every new post delivered to your Inbox.

Join 264 other followers