Protect single instance with GI (without RAC-One-Node)

This is my first post since I meet Martin Nash in London 2 years ago on a Storage seminar with James Morle. He invited me on ORAganism blog and since then I wanted to start blogging but didn’t take the time to do so.
I spent one year working as a Consultant for a company called Digora where I have had great time and learnt a lot with all the different consultants, Laurent Leturget made me felt guilty not to blog ( laurent-leturgez.com ) !

Anyway, I discovered lately that it is now really simple to protect a single instance with “Cold failover” technique  through the Grid Infrastructure. Before discovering this technique, it was not this easy. There was an article on Oracle blogs talking about an equivalent technique :
https://blogs.oracle.com/xpsoluxdb/entry/clusterware_11gr2_setting_up_an_activepassive_failover_configuration

Compared to this post, the new technique can’t be much easier, there are 2 commands :

- Check the current properties of your database - dbtest here
 crsctl status resource ora.dbtest.db -p
- Change the placement to FAVORED for your database
 crsctl modify resource ora.dbtest.db -attr "PLACEMENT=favored"
- Change the list of HOSTs that can receive the instance of your database (here host1 or host2)
 crsctl modify resource ora.dbtest.db -attr "HOSTING_MEMBERS=host1 host2"
- Check back the new properties of your database - dbtest here
 crsctl status resource ora.dbtest.db -p

If you want your client connection to be handle the failover automatically and the SCAN listener to be aware of the host change, you need to modify your remote_listener parameter so that it points to the SCAN address of our GI setup

sqlplus / as sysdba 
alter system set remote_listener="host-scan:1521" scope=both;

That’s it !!

You now have a single-instance database that is protected again hardware failure. So next question is why using this technique when you can use the RAC-One Node and what are the differences ?

- Reason – Cost
This is no additional cost and no extra licence compared to the RAC-One Node solution and moreover, you can use this technique with a Standard Edition database.

- Difference 1 – No RAC feature

Compared to the RAC-One Node solution, the database cannot failover smoothly to the other host (online failover). When you use RAC-One Node, if you choose in advance that you want to failover to another host (host2), you can use the following command :

srvctl relocate database -d dbtest -n host2 -v

When you launch this command, while they are some active transactions on the first host, the instance will be up and one second instance will start on the future host, so you are in the case of RAC features since both instances are up at the time, that’s why we have 2 redolog threads and 2 undo on RAC-One Node configuration.

If you want to learn more on RAC One Node, Marcin Przepiorowski has a serie on the subject (http://oracleprof.blogspot.fr/2009/12/oracle-rac-one-node-part-1.html)

If you try to use the relocate command with “poor’s man” failover solution, you’ll get this error :
   PRCD-1027 : Failed to retrieve database orcltest
   PRCD-1146 : Database orcltest is not a RAC One Node database

- Difference 2 – No native support in Cloud Control 12c

If you want to add this database to your beloved Cloud Control, you will need to add a wee extra script in case of failover so that CC12c can know it has to connect to the new host, here is an example below. If needed I wrote a script which handle this automatically (ask for it).

- Step 1 : create blackout (better but not compulsary)

emcli create_blackout -name="relocating active passive targets" -add_targets=${DB_NAME}_${INSTANCE_NAME}:oracle_database -schedule="frequency:once;duration:0:30" -reason="DB relocate"

- Step 2 – relocate the target from host1 to host2

emcli relocate_targets -src_agent=${SERVER_SOURCE}:3872 -dest_agent=${SERVER_DEST}:3872 -target_name=${DB_NAME}_${INSTANCE_NAME} -target_type=oracle_database -copy_from_src -force=yes -changed_param=MachineName:${VIP_NAME}

- Step 3 : stop blackout

emcli stop_blackout -name="relocating active passive targets"

- Difference 3 - No “Omotion like” relocation

At the beginning, RAC-One Node was named “Omotion“, it could be used almost as “Vmotion” from Vsphere. In the configuration using the Grid Infrastructure, you can only failover if you shutdown the host or shutdown the instance down on host1 and start it again on the second host :

-- From Host 1
srvctl stop database -d dbtest
-- From Host 2
srvctl start database -d dbtest -n host2

You are now ready to test this out !! This is amazing how Oracle handle this configuration, even if you delete the spfile from host1, the GI will recreate it automatically.

Hope this helps. I did not find any blog article talking about this type of configuration so this is my first contribution to the Oracle community from which I learned so much.

 

 

 

 

Deterministic Service Failover

Until some testing about 6 months ago I was under the impression that if a database service was defined as being “preferred” on 1 instance of a RAC database and “available” on N other instances (where N is greater than 1); and the preferred instance is killed then the instance that the service moves to is not deterministic, i.e., Oracle picks one of the available instances to start the service on in a unpredictable way. I cannot work out what gave me this impression, so will just have to put it down to an assumption I made at some point in the past and never validated. From conversations with other DBAs I was not alone.

Anyway, I now believe that the failover it deterministic as demonstrated below.

Creating some test services

$ srvctl add service -d DB_TST -s DB_TST_SVC1 -r "DB_TST1" -a "DB_TST2,DB_TST3,DB_TST4"
$ srvctl add service -d DB_TST -s DB_TST_SVC2 -r "DB_TST1" -a "DB_TST3,DB_TST4,DB_TST2"
$ srvctl add service -d DB_TST -s DB_TST_SVC3 -r "DB_TST1" -a "DB_TST4,DB_TST2,DB_TST3"

Verifying the state of the services

$ srvctl status service -d DB_TST
Service DB_TST_SVC1 is running on instance(s) DB_TST1
Service DB_TST_SVC2 is running on instance(s) DB_TST1
Service DB_TST_SVC3 is running on instance(s) DB_TST1

Killing instance 1

At this point I killed instance 1 using “shutdown abort” in SQL*Plus.

Checking the status of the services

$ srvctl status service -d DB_TST
Service DB_TST_SVC1 is running on instance(s) DB_TST2
Service DB_TST_SVC2 is running on instance(s) DB_TST3
Service DB_TST_SVC3 is running on instance(s) DB_TST4

As you can see the service failed over to the instance based on the order the instances are defined in the “available” list. I did a few more tests to convince myself that this wasn’t just a one off, but won’t fill this post with the output.

If you’re reading this and thinking, “Yes, that was obvious.” Sorry!

During the testing I ran into something else that appears very trivial, but worth being aware of… When is shutdown abort not an immediate abort.

My Fight with FCF

I’m at the end of a what feels like a long battle to get Fast Connection Failover (FCF) working with Universal Connection Pool (UCP). It should have been easy, or so I thought…

The most notable reference I’m aware of for setting up FCF with UCP outside of the Oracle documentation is this blog post by Martin Bach and its take 2. I have to thank Martin for publishing his work as without it I wouldn’t have been in a position to carry out my own testing. However, as I’ve discussed directly with him, I don’t think Martin’s implementation is complete for a couple of reasons that I’ll cover below.

Getting Started

I followed Martin’s blog posts…

When I’d got as far as the end of the first post I had a web page looking like this:

Hitting reload would show the instance name in “Hello World from …” changing every so often as the connection returned from the pool happened to switch instance.

I then moved on to following the second of Martin’s posts and ended up with this:

So far, so good.

Breaking It

Next I tinkered with the configuration a bit and changed the following from the configuration provided by Martin:

  • minPoolSize
  • maxPoolSize
  • inactiveConnectionTimeout

I’d been listening when members of the Oracle Real World Performance Group said that setting minPoolSize and maxPoolSize to different values was not a great idea and so I set them both to the same number in order to keep the pool size as steady as possible. I set inactiveConnectionTimeout to 0 (zero) as closing inactive connections and then creating new connections would be an unnecessary overhead on the database.

After making the changes I started playing around with stopping and starting services on individual instances…

Nothing happened! My connections did not failover! Fast Connection Failover was broken!

I changed the servlet to include some more details about the pool as shown below:

The most important thing to note in the screenshot is that “ONS Configuration” is not populated with a value. This could be as a result of my code not doing what I wanted or simply because getONSConfiguration was returning null. At this point you’ll have to trust me that it’s the latter, but this will be demonstrated shortly.

Digging Into The Detail

The key line in the log output is:

oracle.ucp.jdbc.oracle.OracleJDBCConnectionPool.setONSConfiguration onsConfigStr:

But, if you don’t know that then you might not be searching for it; and when logging is set to ALL you get a lot to look at.

I’ll add a couple of other extracts from the log files I generated during my testing (when things didn’t work) that may help others get here:

1) Exception in thread “main” oracle.ons.SubscriptionException: Subscription request timed out after 30000 millseconds. Possible causes: OPMN may not be running, you may have an OPMN running in an alternate ORACLE_HOME using duplicate port values, or OPMN may be misconfigured.

To verify that this is not actually the problem you can check that ONS is up via “srvctl status nodeapps” or you can use onsctl (as the owner of the GI installation) in the form of “onsctl ping” or “onsctl debug” depending on how deep you need/want to dig. I found being able to telnet to the remote ONS port on each of the database servers the most convincing way to reassure myself that ONS really was up and that there was nothing blocking communication between my client on the database servers.

2) oracle.ons.ONSException: Unable to open config file

There’s not much to say about this other than when you’re using “remote ONS configuration” there is no ons.config file and if it is being looked for then it implies the remote configuration has not been detected. I also observed “ERROR: config file (ons.config) could not be found.” in the Tomcat console window.

[Note] – I saw the messages above in different environments: the first was when I had a full client install with $ORACLE_HOME/opmn/ons.config present on the client, but not modified from the installation; and the second was when I’d installed the instant client and $ORACLE_HOME/opmn did not exist.

False Hope

During my investigation I found myself reading this thread on forum.springsource.org and changed my contents.xml to have onsConfigStr in place of ONSConfiguration. I could see the logic given the log output I’d seen (and shown above), but unfortunately this didn’t get things working.

Bingo

It was only when I found my way to Constant Field Values in Oracle Universal Connection Pool for JDBC Java API Reference that I could see what was going wrong.

onsConfiguration

I updated my context.xml changing ONSConfiguration to onsConfiguration as well as adding the missing “nodes=” to the start of the string before compiling and deploying the application yet again.

Lo and behold, I found the following in my trace file:

2012-08-05T02:31:46.494+0100 UCP FINEST seq-227,thread-53 oracle.ucp.jdbc.oracle.OracleJDBCConnectionPool.setONSConfiguration onsConfigStr: nodes=rac-01:6200,rac-02:6200
2012-08-05T02:31:46.494+0100 UCP FINEST seq-228,thread-53 oracle.ucp.jdbc.oracle.RACManagerImpl.setONSConfiguration onsConfigStr: nodes=rac-01:6200,rac-02:6200

The output in the servlet also provided the proof that my ONS Configuration was finally be picked up…

At this point I got stuck into testing the closing of inactive/returned connections when a service is stopped. It worked and it worked very quickly. However, during my testing I discovered, or rather uncovered, a few things worth noting:

Case of Service Name

This is actually covered in MOS note Fast Connection Failover Does Not Work With Upper Case Database Service Name [ID 1457806.1], but what caught me out was something I was already aware of, but initially overlooked: When you remove a service using “srvctl remove service …” it does not clean up the service in the database. Therefore, when you recreate the service with a name in a difference case it will not update the case of the service name in the database unless you manually remove the service from the database using DBMS_SERVICE.DELETE_SERVICE.

EZConnect

In my experience do not use EZConnect for the URL otherwise your connections may not move as expected due to messages like this in the trace file:

2012-08-05T02:44:53.444+0100 UCP FINEST seq-1328,thread-59 oracle.ucp.jdbc.oracle.RACManagerImpl.processUpEvent eventType: database/event/service, status = up
2012-08-05T02:44:53.444+0100 UCP FINEST seq-1329,thread-59 oracle.ucp.jdbc.oracle.RACManagerImpl.processServiceUpEvent initialSize: 0, maxSize: 10, instanceName: orcl2, dbUniqueName: orcl
2012-08-05T02:44:53.444+0100 UCP FINEST seq-1330,thread-59 oracle.ucp.jdbc.oracle.OracleDatabaseInstanceInfo. DbUniqNameKey: orcl, InstNameKey: orcl2, HostNameKey: rac-02
2012-08-05T02:44:53.444+0100 UCP FINEST seq-1331,thread-59 oracle.ucp.jdbc.oracle.OracleDatabaseInstanceInfoList.setNamedInstanceUrl connect data keyword not found
2012-08-05T02:44:53.444+0100 UCP FINEST seq-1332,thread-59 oracle.ucp.jdbc.oracle.OracleDatabaseInstanceInfoList.setNamedInstanceUrl instance=orcl2, namedInstanceUrl=jdbc:oracle:thin:@rac-scan.orasavon.com/oltp_fcf
2012-08-05T02:44:53.444+0100 UCP FINEST seq-1333,thread-59 oracle.ucp.jdbc.oracle.OracleDatabaseInstanceInfoList.getConnectionToNamedInstance URL invalid for connecting to named instance
2012-08-05T02:44:53.444+0100 UCP FINEST seq-1334,thread-59 oracle.ucp.jdbc.oracle.OracleDatabaseInstanceInfoList.markUpInstanceForUpEvent namedInstanceConn is null
2012-08-05T02:44:53.444+0100 UCP FINEST seq-1335,thread-59 oracle.ucp.jdbc.oracle.RACManagerImpl.processServiceUpEvent first up instance, no new connections to obtain
2012-08-05T02:44:53.444+0100 UCP FINEST seq-1336,thread-59 oracle.ucp.jdbc.oracle.RACManagerImpl.processUpEvent2ndPhase Fast Connection Failover succeeded
2012-08-05T02:44:53.444+0100 UCP FINE seq-1337,thread-59 oracle.ucp.jdbc.oracle.ONSDatabaseEventHandlerTask.run event triggered: Service name: oltp_fcf, Instance name: orcl2, Unique name: orcl, Host name: rac-02, Status: up, Cardinality: 2, Reason: user, Event type: database/event/service

Initially I just moved away from using EZConnect, but after doing some more detailed testing I established that this only happens if there has never been any connections to the instance that has just come up, i.e., the problem does not occur if an instance is shutdown and then brought back during the life of the connection pool, but if the pool is started without a particular instance offering a service and then the service is started on that instance you’ll get the error.

initialPoolSize

MOS note Universal Connection Pool For Jdbc Doesn’T Work Properly Using Jndi [ID 1307692.1] covers the problem with this setting not taking effect, but I wasn’t aware of that until I had spent a fair amount of time trying to work out why my testing with JNDI was not working in the same way as when I used the code provided by Oracle in MOS note How to Verify Universal Connection Pool (UCP) / Fast Connection Failover (FCF) Setup [ID 1064652.1], which is an excellent reference when trying to get FCF with UCP working, but does not cover JNDI whereas Martin Bach’s blog posts do.

You may have noticed that the “Available Connections” displayed in all the screenshots is 1. This is because of this bug. I only got the available connections to increase above 1 during my testing by either using inactiveConnectionTimeout, as this appears to cause the connection pool be reestablished based on minPoolSize, or by running multiple simulated users hitting the webpage at the same time to increase the demand for connections.

MOS ID 1307692.1 offers both a patch and a workaround for the bug.

Summary

If you want to be sure that your FCF configuration is working correctly then I recommend not setting inactiveConnectionTimeout as I think this is a very easy way to create the illusion that the pool is being redistributed as a result of FCF when the connections are actually being redistributed due to connect-time load balancing and the fact that an instance that does not offer a service will not receive any connection requests for that service.

Tracing/logging is your friend. Refer to Martin’s “take 2” blog post and MOS note How to Verify Universal Connection Pool (UCP) / Fast Connection Failover (FCF) Setup [ID 1064652.1] for details.

Finally, another MOS note that was useful along the way is How to Log or Trace ONS in RAC 11g Release 2 [ID 1270841.1].

If you want the Java source code for my final Hello.java (which borrows a lot from both the Tomcat sample and Martin Bach’s blog posts) then it’s available for download here.

RAC/Exadata Shell Aliases

I’ve been working on Oracle RAC systems for a while now and the subject of how best to setup shell environment variables and aliases seems to be something that there is a fair range of opinion on.

  • Do you have entries in oratab for both the database and the instance on that server?
  • Do you use oraenv or set the variables required for each environment via another method?
  • If you use oraenv, do you customise it?
  • How do you switch ORACLE_SID between the db_name and instance_name, if you do?

Please feel free to answer these questions via a comment below, however, the main point of this post is to share a few aliases that I’m finding very useful at the moment, and to encourage you to share aliases you use in your RAC environments. Actually, the non-RAC specific, Oracle related, aliases you create would be interesting to me too, but more on that later…

As I mentioned in a much earlier post I feel that in the world of RAC databases there is scope for an environment variable of ORACLE_DB or ORA_DB_NAME… But, I also feel that there’s a lot to be said for not introducing more variables then is necessary. To this end I have taken to creating the following aliases in RAC environments that I’m working in[1].

alias inst='ORACLE_SID=${ORACLE_SID}$($(grep "^+ASM[1-8]" /etc/oratab | cut -d":" -f2)/bin/olsnodes -l -n | cut -f2)'

alias db='ORACLE_SID=${ORACLE_SID%$($(grep "^+ASM[1-8]" /etc/oratab | cut -d":" -f2)/bin/olsnodes -l -n | cut -f2)}'

These aliases allow me to quickly switch ORACLE_SID from being set to the relevant value for the database to it being set to the relevant value for the instance on the given host. Obviously it doesn’t stop you modifying the value of ORACLE_SID to something that is not valid, but at least it is easy to undo a mistake you might have made.

He’s a quick example:

[ORCL@rac-02 ~]$ inst
[ORCL2@rac-02 ~]$ inst
[ORCL22@rac-02 ~]$ inst
[ORCL222@rac-02 ~]$ inst
[ORCL2222@rac-02 ~]$ db
[ORCL222@rac-02 ~]$ db
[ORCL22@rac-02 ~]$ db
[ORCL2@rac-02 ~]$ db
[ORCL@rac-02 ~]$ 

And, for a little more alias sharing… In my time working on Exadata environments I’ve found myself using dcli a lot and created the aliases below to save me typing on the basis that I, almost always, either want to use dcli to communicate with all “compute nodes” or all “storage cells” so why not simplify the process and shorten the commands:

alias ddcli='dcli -g ~/dbs_group -l oracle'

alias cdcli='dcli -g ~/cell_group -l cellmonitor'

I’d be keen to get a page going for useful aliases so we can all benefit from the ideas of each other… If you’ve got something to share please comment and if the collection builds up we’ll start a page for alias sharing.
__________
[1] – If I’m working in an environment where a shared “oracle” account is in use then I define my aliases in a separate file and source this when I login so that my preferences are not inflicted on others :-)

UK OUG conference: RAC & HA special interest group round table

The RAC and HA roundtable gives users a chance to share experience and gotchas. Often the discussion points that generate most interest are when something unusual or unexplained has happened, with the participants attempting to drill down to where the problem was by firing questions at the DBA telling the tale. The discussion this time there was such a discussion around a RAC stretch cluster where the storage connection had been deliberately broken during a test (but not the RAC interconnect between the two sites). The end result of this was a failure of the cluster but after a restart, the database had been rolled back to a point in time several minutes before the failure. This is discussed here: http://davidburnham.wordpress.com/category/oracle-rac-stretched-cluster-testing/

Some participants suggested that Active Data Guard was a better option than a stretch RAC cluster with one saying that Active Data Guard should be the default position with RAC only deployed if Active DataGuard didn’t fit the bill. Unfortunately as a presentation on DataGuard was taking place at the same time, most people using it were presumably elsewhere.

Another discussion point was whether RAC actually reduces the volume of outages, ensuring high availability. Participants generally agreed that it is great for installing rolling updates and so on without disruption – in other words, planned maintenance work – and dealing with some other modes of failure. But sometimes the added complexity could cause unplanned outages that may never have happened on a single instance system. One participant argued that some of the outages cited should not happen, since they were down to human error.

Talk moved on to whether an Oracle Exadata box represents good value for money. One participant mentioned that he could easily build a similar set up from commodity hardware for a fraction of the cost, but it would not include a high-speed interconnect or the Exadata software. He cited some data-warehouse type queries that were sped up by an order of magnitude when implemented using Exadata. This had transformed the attitude to these queries in the company concerned: because they were quick to run, users were more likely to use them. So the answer is that it depends.

A somewhat inconclusive discussion took place on how best to deploy ASM when there are multiple databases: should all databases share the same ASM instance or diskgroup? One participant suggested that ASM was effectively a single point of failure which could cause all databases to fail in the event on a software bug. Another responded that he had been using ASM for four years without any such disruption. I guess ASM should just be viewed as another filesystem for this point of view and what are the chances of GPFS or even raw devices failing?

The number of RAC nodes in a cluster was another talking point. With a two node cluster, in the event of a failure you are left with just a single node which ought to be able to manage the full system load. Many participants seemed to prefer a three or four node cluster as 4 to 3 or 3 to 2 isn’t as dramatic a change as 2 to 1.

So, it was an interesting discussion and definitely a refreshing change from sitting passively through presentations.

Changing SYS Password in RAC Databases

The post title should really be “Changing SYS Password in RAC Databases when a password reuse limit is in place for SYS and how this has changed from 10.2 to 11.1″, but that would be a bit of a mouthful.

This is the post I was originally working on when I got side tracked by the change in behaviour of profiles assigned to SYS as detailed in “SYS and Password Resource Limits“. At the time I started this I wasn’t aware of the change and was just planning to use SYS passwords as an example of why you should listen to Tom Kyte when he says “… sys is magical, sys is special, sys is governed by a set of rules different from the rest of the world.” He’s right (of course) and the magical powers of SYS are subject to change!

The post is split into pre-11g and 11g or later. We can only assume that the behaviour won’t change back any time soon.

I’ll start with a disclaimer stating that I try to use the “password” command in SQL*Plus wherever possible to avoid displaying passwords in clear text on my screen and I would also need to have a very good reason to put passwords on the command line (OS or SQL*Plus), so the approach used in the examples is purely to make them clear.

Pre-11g

The example shows:

  1. Change password on node 1
  2. Connect to node 1 using new password successfully
  3. Validate instance name
  4. Connect to node 2 using new password unsuccessfully
  5. Connect to node 2 using original password successfully
  6. Attempt to change password on node 2 to new password fails as password reuse is restricted in profile assigned to SYS
  7. Copy and rename password file from node 1 to node 2 (command not shown, but I use scp)
  8. Connect to both nodes using same password successfully
SYS@orcl1> alter user sys identified by temp_pass1;

User altered.

SYS@orcl1> connect sys@orcl1/temp_pass1 as sysdba
Connected.
SYS@orcl1> select instance_name from v$instance;

INSTANCE_NAME
----------------
orcl1

SYS@orcl1> connect sys@orcl2/temp_pass1 as sysdba
ERROR:
ORA-01017: invalid username/password; logon denied

Warning: You are no longer connected to ORACLE.
@> connect sys@orcl2/<original password> as sysdba
Connected.
SYS@orcl2> select instance_name from v$instance;

INSTANCE_NAME
----------------
orcl2

SYS@orcl2> alter user sys identified by temp_pass1;
alter user sys identified by temp_pass1
*
ERROR at line 1:
ORA-28007: the password cannot be reused


SYS@orcl2>

Need to copy password file from node 1 and rename at this point (or change profile/resource limit)…

SYS@orcl2> connect sys@kn10st1/temp_pass1 as sysdba
Connected.
SYS@orcl1> select instance_name from v$instance;

INSTANCE_NAME
----------------
orcl1

SYS@orcl1> connect sys@kn10st2/temp_pass1 as sysdba
Connected.
SYS@orcl2> select instance_name from v$instance;

INSTANCE_NAME
----------------
orcl2

SYS@orcl2>

11g or later

The example shows:

  1. Change password on node 1
  2. Connect to node 1 using new password successfully
  3. Validate instance name
  4. Connect to node 2 using new password unsuccessfully
  5. Connect to node 2 using original password successfully
  6. Change password on node 2 to new password successful even though password reuse is restricted in profile assigned to SYS
  7. Connect to both nodes using same password successfully
SYS@orcl1> alter user sys identified by temp_pass1;

User altered.

SYS@orcl1> connect sys@orcl1/temp_pass1 as sysdba
Connected.
SYS@orcl1> select instance_name from v$instance;

INSTANCE_NAME
----------------
orcl1

SYS@orcl1> connect sys@orcl2/temp_pass1 as sysdba
ERROR:
ORA-01017: invalid username/password; logon denied

Warning: You are no longer connected to ORACLE.
@> connect sys@orcl2/<original password> as sysdba
Connected.
SYS@orcl2> select instance_name from v$instance;

INSTANCE_NAME
----------------
orcl2

SYS@orcl2> alter user sys identified by temp_pass1;

User altered.

SYS@orcl2> connect sys@orcl1/temp_pass1 as sysdba
Connected.
SYS@orcl1> select instance_name from v$instance;

INSTANCE_NAME
----------------
orcl1

SYS@orcl1> connect sys@orcl2/temp_pass1 as sysdba
Connected.
SYS@orcl2> select instance_name from v$instance;

INSTANCE_NAME
----------------
orcl2

SYS@orcl2>

The key point is that the password for SYS is instance specific. I don’t really understand why Oracle have not implemented something that updates the SYS password file on all nodes of a clustered database, but maybe some reason why this would not be desirable behaviour. If there is then I can’t see it. A friend has suggested that one solution to this would be to use symbolic links for each instance to point to a common password file (on shared storage).

Extended RAC Cluster

Introduction

I’ve just finished building an extended RAC cluster on Oracle VM following the instructions written by Jakub Wartak. I can’t claim it was plan sailing, so I’m listing the issues I encountered here in the hope that it helps someone else.

Before I start with the issues I want to thank Jakub for making his article available. After first seeing it about 6 or 7 months ago I wanted to get some kit to play with… It took a while for me to decide on what to order and there were other distractions to attend to, but I ordered the following a couple of weeks ago.

  • Asus V3-M3N8200 AM2 Barebone
  • AMD Phenom X4 9350e
  • Kingston DDR2 800MHz/PC2-6400 HyperX Memory (8GB)
  • Western Digital WD5000AAKS 500GB SATA II x 3

My management box for Oracle VM Manager and NFS for 3rd voting disk is an old Compaq EVO D510 SFF (2.0GHz, 512MB RAM, 80G HDD) – It’s worth noting that Oracle state that Oracle VM Manager 2.1.2 requires 2GB RAM, but I’ve managed with 512MB.

A question I asked myself before completing the installation(s) and something that I’ve been asked by a couple of colleages is, “Is it possible to install Oracle VM Server and not use Oracle VM Manager?” The answer to which seems like a definite YES.

Oracle VM Manager Installation

I am running Oracle Enterprise Linux 5 and the Oracle VM Manager installed with no issues. The only slight gotcha was the installer complaining about insufficient swap space. This was something I hit the second time I was installing Oracle VM Manager and on investigation was due to swap space being allocated. I shut a few things down and ran a quick “swapoff -a; swapon -a”.

Oracle VM Server Installation

This is where the majority of my time has been spent. The first issue I hit was the installer not being able to see my 3 SATA disks. After a fair amount of frustration and reading I discovered the Linux boot option of *pci=nomsi*. This combined with setting my BIOS to treat the disks at *AHCI* rather than SATA resolved this issue.

The next problem was stopping my machine (all brand new kit) from rebooting. I was probably a bit slow to work this one out, but it turns out that one of my four 2GB RAM sticks was bad and as soon as I pin pointed the problem and just stuck to 6GB I could move on. Based on this experience discussion with my Sys Admin colleagues I’d recommend running memtest86 from the Oracle VM Server installation CD on your machine before attempting the installation.

Well not quite. There was one more issue holding me back from the stuff I really wanted to be doing. I don’t know if this can be explained by different version of Oracle VM templates or Oracle VM Server, but it turns out that I was hitting bug 223947. The symptons were the below messages showing up on the console for my VM Server.

raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 5
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2490174971 4
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921017 5
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921022 5
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921020 5
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921022 5
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921022 5
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921019 11
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521988084 8
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521993716 8
raid0_make_request bug: can't convert block across chunks or bigger than 256k 2521921022 5

Maybe the version of the VM template that Jakub used did not use LVM? Anyway moving to a disk configuration that relied on RAID 1 and RAID 5 got me around this issue.

Creating the Openfilers

When I attempted to start up the Openfilers for the first time I received an error relating to the bridge sanbr0.

Error: Device 1 (vif) could not be connected. Could not find bridge device sanbr0

To reslove this I just skipped on a bit to the section that sets up the bridges and run those commands earlier that specified.

brctl addbr sanbr0
ip link set dev sanbr0 up

Once I’d got the Openfiler VMs running and followed the instructions Jakub provided for configuration I experienced a peculiar issue with the webpage. When logging in I was not being taken to the “Administration Section”, but instead to “Home”. On the Home page there is a link “administer the storage device from here.”, which when clicked took me back to the Home page. I did a bit of searching and found a post on the Openfiler forum. This didn’t really give me much to go on, but I left everything running whilst I went to work and returned to find the same problem… I then tried restarting Firefox and hey presto, it worked. I don’t have a good answer to why it worked other then something cache related.

Configuration of Oracle Enterprise Linux VMs

Use of quotation marks in echo “MTU=9000” >> /etc/sysconfig/network-scripts/ifcfg-eth1 (etc) caused an issue when restarting the network service and the quotation marks needed to be removed from the file.

Update: The above issue is due to copy & paste from HTML to shell – the double quotation marks in the HTML are not translated to “simple” double quotation marks in shell as shown below (thanks Jakub):

     [vnull@xeno ~]$ echo “MTU=9000” | cat -v
     M-bM-^@M-^\MTU=9000M-bM-^@M-^]
     [vnull@xeno ~]$ echo "MTU=9000" | cat -v
     MTU=9000

My Oracle Enterprise Linux VMs did not have a /dev/hdd, so a ran fdisk -l to discover /dev/xvdb, which can also be seen in the vm.cfg file. I assume that this has changed in the VM templates since Jakub downloaded his.

The iSCSI disks were presented differently than described the article, which I believe is a result of something not going to plan in the /etc/udev/scripts/iscsidev.sh script, but I don’t know this for sure. I became aware of the problem when running the script to partition the iSCSI disk as errors were generated. fdisk -l showed me disks sda – sdf, so I just created partitions on these and used them directly with any problems to date (it’s only been 3 days). The output below might be helpful in working out what has gone wrong.

[root@erac1 ~]# ls -l /dev/iscsi/
total 0
drwxr-xr-x 2 root root 380 Mar 1 19:09 lun
[root@erac1 ~]# ls -l /dev/iscsi/lun
total 0
lrwxrwxrwx 1 root root 12 Mar 1 19:09 part -> ../../../sdf
lrwxrwxrwx 1 root root 12 Mar 1 19:09 part0 -> ../../../sg0
lrwxrwxrwx 1 root root 13 Mar 1 19:09 part1 -> ../../../sde1
lrwxrwxrwx 1 root root 14 Mar 1 19:08 part10 -> ../../../ram10
lrwxrwxrwx 1 root root 14 Mar 1 19:08 part11 -> ../../../ram11
lrwxrwxrwx 1 root root 14 Mar 1 19:08 part12 -> ../../../ram12
lrwxrwxrwx 1 root root 14 Mar 1 19:08 part13 -> ../../../ram13
lrwxrwxrwx 1 root root 14 Mar 1 19:08 part14 -> ../../../ram14
lrwxrwxrwx 1 root root 14 Mar 1 19:08 part15 -> ../../../ram15
lrwxrwxrwx 1 root root 12 Mar 1 19:09 part2 -> ../../../sg2
lrwxrwxrwx 1 root root 12 Mar 1 19:09 part3 -> ../../../sg3
lrwxrwxrwx 1 root root 12 Mar 1 19:09 part4 -> ../../../sg4
lrwxrwxrwx 1 root root 12 Mar 1 19:09 part5 -> ../../../sg5
lrwxrwxrwx 1 root root 13 Mar 1 19:08 part6 -> ../../../ram6
lrwxrwxrwx 1 root root 13 Mar 1 19:08 part7 -> ../../../ram7
lrwxrwxrwx 1 root root 13 Mar 1 19:08 part8 -> ../../../ram8
lrwxrwxrwx 1 root root 13 Mar 1 19:08 part9 -> ../../../ram9

From looking at the script and later in the instructions I would have expected a the lun directory to had a digit at the end. As this isn’t currently causing me any issues I’ve not looked into it further.

Third Voting Disk

During installation of Oracle Clusterware I received an error when specifying 3 locations for my voting disks.

The location /votedisk/third_votedisk.crs, entered for the Additional Cluster Synchronization Services (CSS) voting disk is not shared across all the nodes in the cluster. Specify a shared raw partition or cluster file system file that is visible by the same name on all nodes of the cluster.

I continued the installation with only one voting disk and went back after the installation to work out what the issue was. It turned out to be a permissions problem and I needed to modify the options in /etc/exports as show below.

/votedisk *(rw,sync,all_squash,anonuid=500,anongid=500)

to

/votedisk *(rw,sync,all_squash,anonuid=500,anongid=501)

The permissions of the third_votedisk.crs file also required changing to match the “anon” settings, which in my case due to differing UID and GID values on the Oracle VM Manager box meant setting the following permissions.

[martin@ora-vmm ~]$ ls -l /votedisk/third_votedisk.crs
-rw-r----- 1 martin dba 335544320 Mar 1 20:04 /votedisk/third_votedisk.crs

The important thing is not what the permissions show as locally, but how they appear on the RAC nodes, i.e.:

[oracle@erac1 ~]$ ls -l /votedisk/third_votedisk.crs
-rw-r----- 1 oracle oinstall 335544320 Mar 1 2009 /votedisk/third_votedisk.crs

I assume that the group read permission could be safely removed if deemed desirable from a security point of view.

Enterprise Manager

Near the end of the database installation I received an error regarding Enterprise Manager, which I don’t recall the details of, but I can access the Enterprise Manager console and things seems to work so far. I’ll update the post if I discover any issues.