Resumable Gotcha

It’s been a while since I’ve had time to write anything on here due to be deeply absorbed in the world of Exadata… 🙂

Hopefully, now I have my home lab environment running again, I’ll be able to test and document some of the things that have caught my attention over the past 4 months.

So, picking the low hanging fruit…

Some time ago I was contacted to look at a “missing data file” problem reported by the migration team I have been working with. The following words jumped out at me “ORA-01157: cannot identify/lock data file 203 – see DBWR trace file.” This looked interesting. The next step was to find out what file was missing. The full error message was provided and it turned out to be a “temp” rather than “data” file that was missing, and on checking myself, the file reported as missing in the error message was present… I needed to understand what activity was taking place and hitting the error.

The migration team were at a phase where indexes were being build after moving data around, and making use of ALTER SESSION ENABLE RESUMABLE so that running out of temp space wouldn’t give them an immediate error, and would be resumable. The database is pretty big, especially for something that is described as OLTP, and the index they were attempting to build was on the largest table and estimated to end up at over 1TB. From speaking with the 1st line DBA that was directly supporting the migration activity, he had been repeatedly adding 32GB temp files to the temporary tablespace every time a session suspended, but whenever he added a new temp file the index build would error and the migration team would need to kick it off again… After hitting this a number of times the DBA decided it was worth getting someone else to look at the problem as well.

I confirmed that all temp files were visible from all 8 instances and asked the migration team to try again. After seeing the error reproduced it occurred to me to ask the DBA which instance he was adding the new temp file on. It was just a guess at this stage, but it seemed that adding the temp file on another instance might create a situation where the database (as a whole) is aware that there is addition storage available, but that not all the instances can see the file as soon as it is added. My suggestion was that they run the index build again and this time add a new temp file on the same instance that has the suspended session. It worked. But, I wanted to dig into it…

The weeks went by and due to the pressure of the project, and a lack of home lab environment, there was no opportunity to set up a test case…. Well, until tonight:

Creating a test user/environment…

SYSTEM@rac1> create temporary tablespace test_temp tempfile '+DATA' size 5M;

Tablespace created.

SYSTEM@rac1> create user martin identified by martin quota 50M on users temporary tablespace test_temp;

User created.

SYSTEM@rac1> grant create session, create table, resumable to martin;

Grant succeeded.

Causing a session to suspend…

SYSTEM@rac1> conn martin/martin
Connected.
MARTIN@rac1> alter session enable resumable;

Session altered.

MARTIN@rac1> create table t as select a.* from all_tables a, all_tables;

Validating the state of the session and adding a tempfile to the tablespace from another instance…

SYSTEM@rac2> select inst_id, event, state from gv$session where username = 'MARTIN';

INST_ID EVENT                                          STATE
------- ---------------------------------------------- -------------------
      1 statement suspended, wait error to be cleared  WAITING

SYSTEM@rac2> select error_msg from dba_resumable;

ERROR_MSG
------------------------------------------------------------------------
ORA-01652: unable to extend temp segment by 128 in tablespace TEST_TEMP

SYSTEM@rac2> alter tablespace test_temp add tempfile '+DATA' size 5M;

Tablespace altered.

SYSTEM@rac2>

The create table session errors…

MARTIN@rac1> create table t as select a.* from all_tables a, all_tables b;
create table t as select a.* from all_tables a, all_tables b
                                  *
ERROR at line 1:
ORA-01157: cannot identify/lock data file 203 - see DBWR trace file
ORA-01110: data file 203: '+DATA/rac/tempfile/test_temp.275.752627885'

MARTIN@rac1>

If the temp file is added on the same instance that the session is suspended on then the session resumes successfully…

One of the 5M temp files for TEST_TEMP was dropped before each test and the time is included in SQL*Plus output to show the order of commands in different sessions – hopefully it’s clear

00:24:04 MARTIN@rac1> create table t as select a.* from all_tables a, all_tables b;

Table created.

00:26:31 MARTIN@rac1>

While the above command was running the session suspended and an additional tempfile was added on the instance with the suspended session…

00:24:52 SYSTEM@rac1> select inst_id, event, state from gv$session where username = 'MARTIN';

INST_ID EVENT                                          STATE
------- ---------------------------------------------- -------------------
      1 statement suspended, wait error to be cleared  WAITING

00:25:30 SYSTEM@rac1> alter tablespace test_temp add tempfile '+DATA' size 5M;

Tablespace altered.

00:26:06 SYSTEM@rac1> select inst_id, event, state from gv$session where username = 'MARTIN';

INST_ID EVENT                                          STATE
------- ---------------------------------------------- -------------------
      1 direct path read temp                          WAITING

00:26:24 SYSTEM@rac1> /

INST_ID EVENT                                          STATE
------- ---------------------------------------------- -------------------
      1 direct path read temp                          WAITED KNOWN TIME

00:26:28 SYSTEM@rac1> /

INST_ID EVENT                                          STATE
------- ---------------------------------------------- -------------------
      1 SQL*Net message from client                    WAITING

00:26:32 SYSTEM@rac1>

What interested me next was: would I get the same problem increasing an already existing temp file?

00:35:15 MARTIN@rac1> create table t as select a.* from all_tables a, all_tables b;

Table created.

00:39:43 MARTIN@rac1>

While the above command was running the session suspended and the size of the existing tempfile was increased…

00:38:50 SYSTEM@rac3> select inst_id, event, state from gv$session where username = 'MARTIN';

INST_ID EVENT                                          STATE
------- ---------------------------------------------- -------------------
      1 statement suspended, wait error to be cleared  WAITING

00:39:02 SYSTEM@rac3> alter database tempfile '+DATA/rac/tempfile/test_temp.274.752632545' resize 10M;

Database altered.

00:39:19 SYSTEM@rac3> select inst_id, event, state from gv$session where username = 'MARTIN';

INST_ID EVENT                                          STATE
------- ---------------------------------------------- -------------------
      1 direct path read temp                          WAITED KNOWN TIME

00:39:47 SYSTEM@rac3>

… At the point the original problem was hit I’d suggested a move to a BIGFILE temporary tablespace so that additional temp files would not be needed. The above testing confirms that this would allow the tablespace to be increased on any instance. I also proposed using AUTOEXTENSIBLE as the index needed to be build and there was no other activity on the system at the time, so it seemed worth establishing how much temp was needed rather than have more space allocated repeatedly until we got to the same point 🙂

I want to dig deeper into the internals of how a suspended session is resumed, but that can wait for another night…

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s