Sunday, January 18, 2009

v$result_cache_memory or what is the newest method to hang stuff up?

Sometimes I get quite amused by the "side effects" you can discover while playing with features.

Prerequisites

First, we need to place something relatively big into result cache memory. The easiest way to archive this is start like this:
SQL> alter system set result_cache_max_size=64m;

System altered.

SQL> alter system set result_cache_max_result=100;

System altered.
The first parameter controls how much memory out of your shared pool Oracle can dedicate for result cache. The second parameter tells how much memory, in percent, can be dedicated for a single resultset. Default value is 5 and I've set it to 100 so we don't have to dedicate a lot of memory to demonstrate my point.

All we have to do now is populate the result cache memory:
SQL> create table t as
2 select level n, rpad('*', 100, '*') v
3 from dual
4 connect by level <= 500000;

Table created

SQL> begin
2 for cur in (select /*+ result_cache*/ * from t)
3 loop
4 null;
5 end loop;
6 end;
7 /

PL/SQL procedure successfully completed
What's next?

Let's check how much result cache memory is allocated:
SQL> set timing on
SQL> select count(*) from v$result_cache_memory;

COUNT(*)
----------
65536

Executed in 55.984 seconds
That's almost a minute to get the results, yes.

So what?

There are (nasty) consequences.

It's time to buy Core i7

Let's start by taking a look at the trace file of our session:
SQL ID: 0zudkuq64dkwx
Plan Hash: 600544352
select count(*)
from
v$result_cache_memory


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 54.65 56.96 0 0 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 54.65 56.96 0 0 0 1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 38

Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=0 pr=0 pw=0 time=0 us)
65536 FIXED TABLE FULL X$QESRCMEM (cr=0 pr=0 pw=0 time=0 us cost=0 size=13 card=1)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
SQL*Net message from client 2 0.00 0.01
First of all, all the time was spent actively consuming CPU. Selecting from X$QESRCMEM became a pretty expensive thing (in fact, at first I confused such a huge delay with a session being simply hang while spinning on the CPU).

Can't use result cache

Any session which attempts to use result cache will be forced to wait until our first session completes:
SQL ID: bzf4nvdav1m2j
Plan Hash: 1601196873
SELECT /*+ result_cache*/ *
FROM
T


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 5001 0.53 53.71 0 0 0 500000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 5003 0.53 53.71 0 0 0 500000

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 38 (recursive depth: 1)

Rows Row Source Operation
------- ---------------------------------------------------
500000 RESULT CACHE 49yjv3h37cjhz39gkd90p4413p (cr=0 pr=0 pw=0 time=0 us)
0 TABLE ACCESS FULL T (cr=0 pr=0 pw=0 time=0 us cost=2127 size=30436770 card=468258)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
latch free 1 53.20 53.20
And the latch name is...
WAIT #2: nam='latch free' ela= 53205337 address=1610838648 number=377 tries=0 obj#=-1 tim=1232316789354155

SQL> select name from v$latch where latch#=377;

NAME
----------------------------------------------------------------
Result Cache: Latch
Surprised? I'm not. Don't expect me to start complaining about the absence of shared mode gets again.

The Real Deal or result_cache_mode = force

Things gets really nasty if you set result_cache_mode = force which has a potential outcome of rendering your entire system unusable for some extended period of time. Because even the simplest queries like select * from dual will hang, escalating situation to a point where your normal users will be unable to even connect to a database, let alone executing the queries. For example, trying to connect with SQL*Plus results in the following:
SQL ID: d6vwqbw6r2ffk
Plan Hash: 1388734953
SELECT USER
FROM
DUAL


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.01 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 43.40 0 0 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.01 43.41 0 0 0 1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 38

Rows Row Source Operation
------- ---------------------------------------------------
1 RESULT CACHE 4c48ztv5ztq25dkd78360h2uxb (cr=0 pr=0 pw=0 time=0 us)
1 FAST DUAL (cr=0 pr=0 pw=0 time=0 us cost=2 size=0 card=1)


Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 2 0.00 0.00
latch free 1 43.39 43.39
SQL*Net message from client 2 0.00 0.00
I don't have to mention the latch name again.

Real World?

What are the chances of meeting the above situation on some real system? First of all, having a system with more than a 100GB SGA is not uncommon these day. The thing gets worse the more memory is being occupied by a single result. Having a lot of small results seems to be not a problem. If we take 5% as a default threshold to be eligible to be placed into result cache memory, facing the above situation will require 1280M for result cache memory. That looks like quite a big number, however, on the scale of hundred gigs SGA it's no longer seems to be improbable. Aggregating a multi gigabytes of data into 64MB result doesn't sound insane either (at least not to me, you may have a different opinion here).

Plus it takes someone to query v$result_cache_memory. So just don't do it. There are other views (like v$result_cache_objects) which you can use to at least get a part of that info, alas at lower granularity.

Environment

The above tests were verified on Oracle 11GR1 running on Windows XP x64 (native) and Linux x86-64 under VMware (with XP x64 acting as a host OS).

No comments:

Post a Comment