INS-30502: No ASM disk group found

I’ve been mostly looking busy Informix instances lately so it was nice to have a 12c installation to do at work today. I quickly reacquainted myself of some of my personal rules of Oracle problem-solving:

  1. There is a lot of rubbish posted in online Oracle forums.
  2. Using MOS is generally a better bet.

However, both of these can be lazy ways of solving problems and neither provided the answer to the problem I had. There’s often no substitute for working through a problem yourself.

My 12c instance is a stand-alone one including Grid Infrastructure, ASM and role separation. This means installing Grid Infrastructure as user grid and the database software as user oracle. Having successfully installed Grid Infrastructure and browsed around my working ASM instance using SQL*Plus, I was slightly perturbed to get this error during the installation of the database software when it came to configuring the storage to be used:

INS-30502: No ASM disk group found

In the installActions log I had:

SEVERE: [FATAL] [INS-30502] No ASM disk group found.
CAUSE: There were no disk groups managed by the ASM instance +ASM.
ACTION: Use Automatic Storage Management Configuration Assistant to add disk groups.

But this wasn't true!

[grid@oraserver ~]$ . oraenv
ORACLE_SID = [grid] ? +ASM
The Oracle base has been set to /u01/app/grid
[grid@oraserver ~]$ sqlplus / as sysasm

SQL*Plus: Release 12.1.0.1.0 Production on Tue Sep 10 11:28:49 2013

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Automatic Storage Management option

SQL> select name, state from v$asm_diskgroup;

NAME STATE
------------------------------ -----------
FRA MOUNTED
DATA MOUNTED

So what was going on? I vageuly remembered that there were some old bugs related to role-separation so I checked the ownership of key files and the group memberships of the oracle and grid users. Everything was ok!

There is a MOS note, 1069517.1, that offers some suggestions but none of the solutions fit, in particular the recommendation to check the permissions on $ORACLE_HOME/bin/oracle. Such a file doesn't exist yet because I haven't installed any database software!

Despite this, the problem did smell of a permissions issue. As the installer runs as user oracle, I decided to have a poke around as this user and found this whilst connected to my ASM instance:

oracle@oraserver ~]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.1.0 Production on Tue Sep 10 10:47:55 2013

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Automatic Storage Management option

SQL> select * from v$asm_diskgroup;

no rows selected

Is this the same ASM instance as the one above with its two disk groups? Isn't it only with user_ views or VPD where the output depends on the user?

Here the documentation comes to the rescue:

While an ASM instance is initialized, ASM discovers and examines the contents of all of the disks that are in the paths that you designated with values in the ASM_DISKSTRING initialization parameter. Disk discovery also occurs when you:

  • Run the ALTER DISKGROUP...ADD DISK and ALTER DISKGROUP...RESIZE DISK commands
  • Query the V$ASM_DISKGROUP and V$ASM_DISK views

So the v$asm_diskgroup view actually gets the OS to do something under the covers.

At which point the problem became somewhat obvious:

SQL> show parameter diskstring

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring string /dev/asm/

SQL> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Automatic Storage Management option

On this system, /dev/asm contains the block devices set up in udev. A permission check revealed all:

[oracle@oraserver ~]$ cd /dev/asm
-bash: cd: /dev/asm: Permission denied

The directory could only be viewed by the grid user and members of the asmadmin group. A quick chmod 775 on the directory resolved the problem.

DBD::Oracle module for Perl

This is a shameless post as my friend, Pawel Krol, pointed this out to me so I must give all credit to him. As I am a big fan of the Perl-DBI I thought this was too good to share.

Oracle 11.2.0.1 on Linux and possibly earlier versions is shipped with Perl environment complete with Perl-DBI and the “database dependent” Oracle::DBD module. I would expect it is included in other Oracle releases too, e.g. for AIX. I think Perl is a really useful skill for any DBA working on a Unix-like platform, maybe even Windows as well, and it’s even better when you can embed SQL statements in your code instead of calling SQL*Plus and trying to parse the output. Perl also gives you the ability to use bind variables properly and re-use statement handles, something you would need to write PL/SQL for, if using SQL*Plus.

There are two parts to the Perl database interface:

  1. The database-independent part, the Perl-DBI, which is commonly included in many Linux distributions.
  2. The database-dependent part, in Oracle’s case DBD::Oracle. This is rarely installed by default in a Linux distribution; you will usually just find the MySQL-specific module and one for handling CSV files.

Of course you could download and compile DBD::Oracle yourself but this makes life much easier in environments you don’t have control over as you can use it by just setting a few environment variables.

So how do you use it? There are cleverer ways, I know Pawel has one, but I just have a quick bash script I source into my environment:

#!/bin/bash

if [ -n $ORACLE_HOME ]; then
echo "Setting up Perl environment. ORACLE_HOME set to $ORACLE_HOME"
export ORACLEPERL=$ORACLE_HOME/perl
export PERL5LIB=$PERL5LIB:$ORACLEPERL/lib/5.10.0:$ORACLEPERL/lib/site_perl/5.10.0
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ORACLE_HOME/lib32:$ORACLE_HOME/lib
export LIBPATH=$ORACLE_HOME/lib32
export PATH=$ORACLEPERL/bin:$PATH
fi

And then I can run my test script which just selects the current date from the dual table:

#!/u01/app/oracle/product/11.2.0/db_1/perl/bin/perl

use strict;
use warnings;
use DBI;

&main;

sub main () {

my %params = (
'database' => 'ORCL',
);

my $dbh = &dbconnect(\%params);

my ($sysdate) = $$dbh->selectrow_array("SELECT sysdate FROM dual");
print "Current date is $sysdate\n\n";

&dbdisconnect($dbh);

}

sub dbconnect () {

my $params = shift;
my $dbh = DBI->connect("dbi:Oracle:$$params{'database'}", 'username', 'password') or die ("$DBI::errstr\n\n");
return (\$dbh);
}

sub dbdisconnect () {

my $dbh = shift;
$$dbh->disconnect();
}

The world of Perl is now hopefully your oyster and there is plenty of documentation here.

Ulimits and ORA-04030

Oracle database processes are to many systems administrators just another service that runs on the machines they look after. Understandably sys admins like to be able to put some controls in to prevent these processes from consuming too many CPU resources, filling up the disks or hogging all the memory on the system. This can be done on UNIX and Linux systems with ulimits which control the resources available to your shell and the programs it starts.

You can check your ulimits that currently apply to you by running the command ulimit -a. On Linux it looks like this and it's similar with AIX:

[ben@linux ~]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) 32768
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 23551
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 32768
cpu time               (seconds, -t) unlimited
max user processes              (-u) 23551
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

You'll note that I ran the above commands as my own user. Oracle makes some recommendations for the settings for the oracle user, generally found in the release notes or prerequisites for your platform. Because Oracle makes no recommendations about shell limits on a DBA account, the limits applying to you personally will often be much lower than those applying to the Oracle user.

So when might this be a problem? Let's take how you start up your Oracle service. Here are two ways it's possible to do it:

  • Using srvctl, e.g. srvctl start database -d dbname
  • Using SQL*Plus, e.g. sqlplus / as sysdba and then issue command startup.

The problem comes with the second method. When invoking Oracle using SQL*Plus, the shell limits that apply to all the Oracle processes are inherited from your shell. Therefore you need to make sure you're logged in as the Oracle user. There are some subtleties around this: what happens if you use sudo, e.g. sudo -s -u oracle to invoke an Oracle user's shell? On some systems you might find that your shell limits don't change and you're still stuck with the lower limits of your own shell: you can check by running ulimit -a. You may be setting yourself up for ORA-04030 errors; limiting the PGA memory area allowed for sorting data; and imposing other operating system limits.

This problem doesn't exist if you invoke sudo su - oracle to change to the oracle user.

What happens if Oracle is already running and you're not sure if it was started correctly? With Linux it is quite easy to find out, provided you have the right privileges. First get the process id of your smon process, in this case 4057:

[ben@linux ~]$ ps -ef | grep smon
root      3790     1  1 Feb05 ?        08:04:14 /u01/app/grid/11203/home_1/bin/osysmond.bin
grid      4057     1  0 Feb05 ?        00:02:02 asm_smon_+ASM1
oracle    4495     1  0 Feb05 ?        00:04:50 ora_smon_beverley1
ben       7468  7438  0 11:17 pts/0    00:00:00 grep smon

And then take a look in /proc/4057/limits:

[ben@linux 4057]$ sudo cat limits
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            33554432             unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             23551                23551                processes 
Max open files            65536                65536                files     
Max locked memory         unlimited            unlimited            bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       23551                23551                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        

You can compare the above with the ulimits set for the oracle user.

With AIX it's harder since there is no /proc filesystem. You have to run a debugger against the process. There's a nice blog post on one method here, although it is not suitable for a production system because it interrupts the Oracle process and kills it afterwards.

So is that everything? Unfortunately not, ulimits don't just affect the server processes like smon; they also affect server processes for connections using SQL*Plus. Below are two server processes running, each serving a connection from SQL*Plus, one for my ben user and another for the oracle user:

SQL> select spid, username, program from v$process where spid in (9351, 9798);

SPID			 USERNAME	 PROGRAM
------------------------ --------------- ------------------------------------------------
9798			 oracle 	 oracle@o112-c2n1.ora.pwk (TNS V1-V3)
9351			 ben		 oracle@o112-c2n1.ora.pwk (TNS V1-V3)

The below shows these processes:

SQL> !ps -ef | egrep '(9350|9351|9797|9798)'
ben       9350  7438  0 13:08 pts/0    00:00:00 sqlplus
oracle    9351  9350  0 13:09 ?        00:00:00 oraclebeverley1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle    9797  9762  0 13:34 pts/0    00:00:00 sqlplus
oracle    9798  9797  0 13:34 ?        00:00:00 oraclebeverley1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

As I am running Linux, I can easily compare the ulimits applying to processes 9351 and 9798:

[ben@linux 9351]$ sudo cat limits
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             131072               131072               bytes     
Max stack size            33554432             unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             23551                23551                processes 
Max open files            1024                 1024                 files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       23551                23551                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        

[ben@linux 9798]$ sudo cat limits
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             131072               131072               bytes     
Max stack size            33554432             unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             131072               131072               processes 
Max open files            131072               131072               files     
Max locked memory         51200000000          51200000000          bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       23551                23551                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        

The differences are highlighted.

All this brings me onto the ORA-04030 problem, which can be related to the value of max data size. This parameter limits the amount of PGA memory that your session can use. Checking the ulimits is quick and easy, at least on Linux, so if you encounter such a problem this is one place you can easily check. It may save you from having to experiment with undocumented Oracle parameters like _pga_max_size which may not be the cause.

UK OUG conference: RAC & HA special interest group round table

The RAC and HA roundtable gives users a chance to share experience and gotchas. Often the discussion points that generate most interest are when something unusual or unexplained has happened, with the participants attempting to drill down to where the problem was by firing questions at the DBA telling the tale. The discussion this time there was such a discussion around a RAC stretch cluster where the storage connection had been deliberately broken during a test (but not the RAC interconnect between the two sites). The end result of this was a failure of the cluster but after a restart, the database had been rolled back to a point in time several minutes before the failure. This is discussed here: http://davidburnham.wordpress.com/category/oracle-rac-stretched-cluster-testing/

Some participants suggested that Active Data Guard was a better option than a stretch RAC cluster with one saying that Active Data Guard should be the default position with RAC only deployed if Active DataGuard didn’t fit the bill. Unfortunately as a presentation on DataGuard was taking place at the same time, most people using it were presumably elsewhere.

Another discussion point was whether RAC actually reduces the volume of outages, ensuring high availability. Participants generally agreed that it is great for installing rolling updates and so on without disruption – in other words, planned maintenance work – and dealing with some other modes of failure. But sometimes the added complexity could cause unplanned outages that may never have happened on a single instance system. One participant argued that some of the outages cited should not happen, since they were down to human error.

Talk moved on to whether an Oracle Exadata box represents good value for money. One participant mentioned that he could easily build a similar set up from commodity hardware for a fraction of the cost, but it would not include a high-speed interconnect or the Exadata software. He cited some data-warehouse type queries that were sped up by an order of magnitude when implemented using Exadata. This had transformed the attitude to these queries in the company concerned: because they were quick to run, users were more likely to use them. So the answer is that it depends.

A somewhat inconclusive discussion took place on how best to deploy ASM when there are multiple databases: should all databases share the same ASM instance or diskgroup? One participant suggested that ASM was effectively a single point of failure which could cause all databases to fail in the event on a software bug. Another responded that he had been using ASM for four years without any such disruption. I guess ASM should just be viewed as another filesystem for this point of view and what are the chances of GPFS or even raw devices failing?

The number of RAC nodes in a cluster was another talking point. With a two node cluster, in the event of a failure you are left with just a single node which ought to be able to manage the full system load. Many participants seemed to prefer a three or four node cluster as 4 to 3 or 3 to 2 isn’t as dramatic a change as 2 to 1.

So, it was an interesting discussion and definitely a refreshing change from sitting passively through presentations.