Thursday, July 26, 2012

Exporting DBFS via NFS

Anybody who was thinking about exporting DBFS via NFS have probably stumbled upon the fact the Oracle says it can not be done:
DBFS does not support exporting NFS or SAMBA exports
What's wrong with DBFS?

There is nothing wrong with DBFS itself. The problem originated form the fact that FUSE did not have proper interfaces implemented to support exporting by the kernel. Newer versions of the Linux kernel fully support exporting. I know that OEL 6.x works for sure as I did the DBFS exports myself through both NFS as well as Samba. The common minimum kernel version circulating across the internet seems to be 2.6.27 but I haven't had a chance to check whether it's true.

Older Kernels

Fact of the matter is -- it was always possible to export FUSE via NFS. You just had to use user mode NFS server, like UNFS3. I did that too and know that it works regardless of the kernel version you're running. Unfortunately projects like UNFS3 are lagging in development as well as feature-wise. But they do work if you have to have the NFS exports working and stuck with the older Linux kernels.

Enkitec Extreme Exadata Expo

I will be hanging around E4, it's going to be a really cool and geeky event. See you all there!

Tuesday, July 24, 2012

Oracle GoldenGate Integrated Capture

Oracle GoldenGate 11.2 release notes contain an interesting new feature:
Extract can now be used in integrated capture mode with an Oracle database. Extract integrates with an Oracle database log mining server to receive change data from that server in the form of logical change records (LCR).
All of that just rings too many bells so I've decided to find out what exactly have happened. This feature requires database patches to be installed (described in Note:1411356.1).

Stack dumps

Stack dump reveals a lot of interesting information already (I've left only relevant pieces in place):
...
#10 0x00002b08f2ba21b7 in knxoutReceiveLCR () from /u01/app/oracle/ggs/libclntsh.so.11.1
#11 0x00002b08f2ae1048 in OCIXStreamOutLCRReceive () from /u01/app/oracle/ggs/libclntsh.so.11.1
#12 0x0000000000721a96 in IXAsyncReader::ProcessBatchNonCallbackArray() ()
#13 0x0000000000722dbc in IXAsyncReader::ReaderThread(void*) ()
#14 0x0000003ce8a0673d in start_thread () from /lib64/libpthread.so.0
...
The first point of entry into libclntsh.so.11.1 library is nothing else but OCIXStreamOutLCRReceive. This is a function to receive LCR (Logical Change Record) from an outbound XStream server. Let's confirm that we have an outbound server in our database:
SQL> select server_name, capture_name from dba_xstream_outbound;
 
SERVER_NAME                    CAPTURE_NAME
------------------------------ ------------------------------
OGG$TEST_EXT                   OGG$CAP_TEST_EXT
An Oracle Streams Capture process itself:
SQL> select capture_name, rule_set_name, purpose
  2   from dba_capture;
 
CAPTURE_NAME                   RULE_SET_NAME                  PURPOSE
------------------------------ ------------------------------ -------------------
OGG$CAP_TEST_EXT               OGG$TEST_EXT_CAPTURE_I         GoldenGate Capture
And we can see all the familiar Oracle Streams rules:
SQL> select rsr.rule_name, r.rule_condition
        from dba_rule_set_rules rsr, dba_rules r
        where rsr.rule_set_owner = r.rule_owner
                and rsr.rule_name = r.rule_name
                and rsr.rule_set_name='OGG$TEST_EXT_CAPTURE_I';  2    3    4    5

RULE_NAME       RULE_CONDITION
--------------- ----------------------------------------------------------------------
TEST21          (((:ddl.get_source_database_name() = 'TEST' )) and (:ddl.get_source_da
                tabase_name() != '$'))

GGS_TRACE20     (((:dml.get_object_owner() = 'GGEXT' and :dml.get_object_name() = 'GGS
                _TRACE')) and :dml.get_source_database_name() = 'TEST' )

TEST19          ((:dml.get_object_owner() = 'TEST') and :dml.get_source_database_name(
                ) = 'TEST' )
These rules got generated based on the parameter file I have for my Extract.

XStream

Oracle GoldenGate Extract process became nothing else but an XStream client which receives LCRs from Streams Capture process and converts these to GoldenGate trail files. Anybody who was worried about Oracle Streams future because GoldenGate was supposed to be the new strategic direction (and so on and so forth) can stop worrying now. Oracle GoldenGate is Oracle Streams. The only part left are trail files and I wouldn't be surprised if that will eventually go away as well.

Oracle Streams have won the tech battle.

Wednesday, July 18, 2012

V$CELL_CONFIG

If you ever wondered how you can convert XML information in v$cell_config on Exadata database servers into relational rows and columns so you can get a nice view of the cell configuration without going into the cell itself then here is a nice piece of SQL which does the job. I'm using celldisks as an example and selecting only a number of column so the output fits nicely on the screen:
SQL> select cellname,
  2    name,
  3    deviceName,
  4    diskType,
  5    round(freeSpace/power(1024,3), 2) freeSpace,
  6    round(disk_size/power(1024,3), 2) disk_size
  7   from (
  8    select cellname, XMLTYPE.createXML(confval) confval
  9     from v$cell_config
 10     where conftype='CELLDISKS'
 11      and cellname='192.168.12.3'
 12    ) v,
 13    xmltable('/cli-output/celldisk' passing v.confval
 14     columns
 15      name varchar(15) path 'name',
 16      creationtime varchar(25) path 'creationTime',
 17      deviceName varchar(9) path 'deviceName',
 18      devicePartition varchar2(10) path 'devicePartition',
 19      diskType varchar2(9) path 'diskType',
 20      errorCount number path 'errorCount',
 21      freeSpace number path 'freeSpace',
 22      id varchar2(50) path 'id',
 23      interleaving varchar(10) path 'interleaving',
 24      lun varchar2(5) path 'lun',
 25      raidLevel number path 'raidLevel',
 26      disk_size number path 'size',
 27      status varchar2(10) path 'status'
 28   );
 
CELLNAME        NAME            DEVICENAME DISKTYPE   FREESPACE  DISK_SIZE
--------------- --------------- ---------- --------- ---------- ----------
192.168.12.3    CD_00_enkcel01  /dev/sda   HardDisk           0    1832.59
192.168.12.3    CD_01_enkcel01  /dev/sdb   HardDisk           0    1832.59
192.168.12.3    CD_02_enkcel01  /dev/sdc   HardDisk           0     1861.7
192.168.12.3    CD_04_enkcel01  /dev/sdl   HardDisk           0     1861.7
192.168.12.3    CD_05_enkcel01  /dev/sde   HardDisk           0     1861.7
192.168.12.3    CD_06_enkcel01  /dev/sdf   HardDisk           0     1861.7
192.168.12.3    CD_07_enkcel01  /dev/sdg   HardDisk           0     1861.7
192.168.12.3    CD_08_enkcel01  /dev/sdh   HardDisk           0     1861.7
192.168.12.3    CD_09_enkcel01  /dev/sdi   HardDisk           0     1861.7
192.168.12.3    CD_10_enkcel01  /dev/sdj   HardDisk           0     1861.7
192.168.12.3    CD_11_enkcel01  /dev/sdk   HardDisk           0     1861.7
192.168.12.3    FD_00_enkcel01  /dev/sds   FlashDisk          0      22.88
192.168.12.3    FD_01_enkcel01  /dev/sdr   FlashDisk          0      22.88
192.168.12.3    FD_02_enkcel01  /dev/sdt   FlashDisk          0      22.88
192.168.12.3    FD_03_enkcel01  /dev/sdu   FlashDisk          0      22.88
192.168.12.3    FD_04_enkcel01  /dev/sdaa  FlashDisk          0      22.88
192.168.12.3    FD_05_enkcel01  /dev/sdz   FlashDisk          0      22.88
192.168.12.3    FD_06_enkcel01  /dev/sdab  FlashDisk          0      22.88
192.168.12.3    FD_07_enkcel01  /dev/sdac  FlashDisk          0      22.88
192.168.12.3    FD_08_enkcel01  /dev/sdn   FlashDisk          0      22.88
192.168.12.3    FD_09_enkcel01  /dev/sdo   FlashDisk          0      22.88
192.168.12.3    FD_10_enkcel01  /dev/sdp   FlashDisk          0      22.88
192.168.12.3    FD_11_enkcel01  /dev/sdq   FlashDisk          0      22.88
192.168.12.3    FD_12_enkcel01  /dev/sdv   FlashDisk          0      22.88
192.168.12.3    FD_13_enkcel01  /dev/sdw   FlashDisk          0      22.88
192.168.12.3    FD_14_enkcel01  /dev/sdx   FlashDisk          0      22.88
192.168.12.3    FD_15_enkcel01  /dev/sdy   FlashDisk          0      22.88
 
27 rows selected
Of course, the same approach can be applied to get other information out (like Grid Disks, etc.).

Tuesday, July 17, 2012

Displaying ASM Partner Disks

Here is a quick SQL which I sometimes use to show people disks and their respective partners (grouped by a failure group) inside an ASM disk group every time I need to explain both of these concepts.

An example output from a quarter rack Exadata with a normal redundancy disk group:
SQL> column p format a80
SQL> variable group_number number
SQL> exec :group_number := 1;

PL/SQL procedure successfully completed.

SQL> select d||' => '||listagg(p, ',') within group (order by p) p
from (
select ad1.failgroup||'('||to_char(ad1.disk_number, 'fm000')||')' d,
 ad2.failgroup||'('||listagg(to_char(p.number_kfdpartner, 'fm000'), ',') within group (order by ad1.disk_number)||')' p
 from v$asm_disk ad1, x$kfdpartner p, v$asm_disk ad2
 where ad1.disk_number = p.disk
  and p.number_kfdpartner=ad2.disk_number
  and ad1.group_number = p.grp
  and ad2.group_number = p.grp
  and p.grp = :group_number
 group by ad1.failgroup, ad1.disk_number, ad2.failgroup
) group by d
order by d;

P
--------------------------------------------------------------------------------
EXAPCEL01(12) => EXAPCEL02(26,30,31,35),EXAPCEL03(06,08,09,11)
EXAPCEL01(13) => EXAPCEL02(30,31,34,35),EXAPCEL03(03,07,09,11)
EXAPCEL01(14) => EXAPCEL02(24,28,29,34),EXAPCEL03(05,09,10,11)
EXAPCEL01(15) => EXAPCEL02(28,29,33,34),EXAPCEL03(04,05,08,11)
EXAPCEL01(16) => EXAPCEL02(26,27,33,35),EXAPCEL03(02,06,07,09)
EXAPCEL01(17) => EXAPCEL02(25,32,33,35),EXAPCEL03(00,01,03,07)
EXAPCEL01(18) => EXAPCEL02(24,27,32,34),EXAPCEL03(00,03,08,10)
EXAPCEL01(19) => EXAPCEL02(25,26,32,33),EXAPCEL03(01,02,07,10)
EXAPCEL01(20) => EXAPCEL02(28,30,31,32),EXAPCEL03(04,05,06,10)
EXAPCEL01(21) => EXAPCEL02(24,27,29,31),EXAPCEL03(00,01,04,08)
EXAPCEL01(22) => EXAPCEL02(25,26,27,30),EXAPCEL03(02,03,04,06)
EXAPCEL01(23) => EXAPCEL02(24,25,28,29),EXAPCEL03(00,01,02,05)
EXAPCEL02(24) => EXAPCEL01(14,18,21,23),EXAPCEL03(05,07,09,11)
EXAPCEL02(25) => EXAPCEL01(17,19,22,23),EXAPCEL03(02,06,09,11)
EXAPCEL02(26) => EXAPCEL01(12,16,19,22),EXAPCEL03(04,08,10,11)
EXAPCEL02(27) => EXAPCEL01(16,18,21,22),EXAPCEL03(01,04,08,11)
EXAPCEL02(28) => EXAPCEL01(14,15,20,23),EXAPCEL03(00,06,08,09)
EXAPCEL02(29) => EXAPCEL01(14,15,21,23),EXAPCEL03(01,02,03,05)
EXAPCEL02(30) => EXAPCEL01(12,13,20,22),EXAPCEL03(04,05,08,10)
EXAPCEL02(31) => EXAPCEL01(12,13,20,21),EXAPCEL03(00,01,07,10)
EXAPCEL02(32) => EXAPCEL01(17,18,19,20),EXAPCEL03(00,06,07,10)
EXAPCEL02(33) => EXAPCEL01(15,16,17,19),EXAPCEL03(02,03,05,09)
EXAPCEL02(34) => EXAPCEL01(13,14,15,18),EXAPCEL03(02,03,04,07)
EXAPCEL02(35) => EXAPCEL01(12,13,16,17),EXAPCEL03(00,01,03,06)
EXAPCEL03(00) => EXAPCEL01(17,18,21,23),EXAPCEL02(28,31,32,35)
EXAPCEL03(01) => EXAPCEL01(17,19,21,23),EXAPCEL02(27,29,31,35)
EXAPCEL03(02) => EXAPCEL01(16,19,22,23),EXAPCEL02(25,29,33,34)
EXAPCEL03(03) => EXAPCEL01(13,17,18,22),EXAPCEL02(29,33,34,35)
EXAPCEL03(04) => EXAPCEL01(15,20,21,22),EXAPCEL02(26,27,30,34)
EXAPCEL03(05) => EXAPCEL01(14,15,20,23),EXAPCEL02(24,29,30,33)
EXAPCEL03(06) => EXAPCEL01(12,16,20,22),EXAPCEL02(25,28,32,35)
EXAPCEL03(07) => EXAPCEL01(13,16,17,19),EXAPCEL02(24,31,32,34)
EXAPCEL03(08) => EXAPCEL01(12,15,18,21),EXAPCEL02(26,27,28,30)
EXAPCEL03(09) => EXAPCEL01(12,13,14,16),EXAPCEL02(24,25,28,33)
EXAPCEL03(10) => EXAPCEL01(14,18,19,20),EXAPCEL02(26,30,31,32)
EXAPCEL03(11) => EXAPCEL01(12,13,14,15),EXAPCEL02(24,25,26,27)

Friday, July 06, 2012

ASM normal redundancy with high protection template

One of the ways you can control how many mirror copies ASM keeps on disk is via diskgroup templates.

This presents an interesting question -- can you turn your normal redundancy disk group into a high redundancy by using a diskgroup template with high protection attribute thus telling ASM to use triple mirroring for related files?

I'll start by creating a diskgroup:
SQL> create diskgroup data normal redundancy
  2     disk '/dev/sdb1', '/dev/sdc1', '/dev/sdd1';

Diskgroup created.
I'm not explicitly specifying any failgroups, each disk will end up in it's own FG and triple-mirroring will spread extents across all three disks. Let's add a high protection template:
SQL> alter diskgroup data add template mirror_high attributes (high coarse);

Diskgroup altered.
Now let's create two tablespaces, one using the default mirroring template (two-way) and one using mirror_high (three-way) template:
SQL> create tablespace mirror_normal datafile '+data' size 256m;

Tablespace created.

SQL> create tablespace mirror_high datafile '+data(mirror_high)' size 256m;

Tablespace created.
We can confirm how many mirror extents our datafiles have:
SQL> select     a.name,
        case LXN_KFFXP
                when 0 then 'primary'
                when 1 then 'secondary'
                when 2 then 'third'
        end, count(*)
        from X$KFFXP, v$asm_alias a
        where a.file_number=NUMBER_KFFXP
        group by a.name, LXN_KFFXP
        order by 1, 2;  2    3    4    5    6    7    8    9   10

NAME                           CASELXN_K   COUNT(*)
------------------------------ --------- ----------
MIRROR_HIGH.257.787940101      primary          258
MIRROR_HIGH.257.787940101      secondary        258
MIRROR_HIGH.257.787940101      third            258
MIRROR_NORMAL.256.787940093    primary          258
MIRROR_NORMAL.256.787940093    secondary        258
MIRROR_NORMAL.256.787940093    third              1

6 rows selected.
Indeed, all extents got triple mirrored for MIRROR_HIGH tablespace and only double mirrored for MIRROR_NORMAL (with the exception of one extent which probably holds some triple-mirrored metadata).

Now, because all data essentially got triple mirrored for our MIRROR_HIGH tablespace it ought to survive two disk failure, right? Let's give it a try:
SQL> alter diskgroup data dismount;

Diskgroup altered.
... then I'm going to change the permissions and make two disks inaccessible to ASM:
[oracle@ora11gr2 ~]$ chmod 000 /dev/sdc1
[oracle@ora11gr2 ~]$ chmod 000 /dev/sdd1
Let's see if I could mount it back:
SQL> alter diskgroup data mount force;
alter diskgroup data mount force
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"
...looks like the answer is no. It still sees that the remaining disk is missing required partners and does not allow the disk group to be mounted. How many extent copies the datafile had becomes a moot point.

The conclusion is that triple-mirroring inside a normal redundancy disk group can help you in certain cases (like physical corruption of both primary and secondary extents so ASM can use a third one to read the data) but it's not a substitute for a high redundancy disk group.

Update a little bit later: if anyone is thinking about doing it the other way around then remember that mirroring attribute works only for normal redundancy disk groups.

Thursday, July 05, 2012

Serial direct path reads in 11GR2 and Exadata environments

Serial direct path reads were first introduced in Oracle 11G which were noticed by many customers (in both positive and negative ways) who upgraded from the earlier releases. Back then I did a quick write up on the subject to try and tackle some of the key variables which affect the behavior.

To recap, the following were observed for 11G:
  • Serial direct path reads start at _small_table_threshold*5.
  • Serial direct path reads stop when 50% of the table blocks are cached.
  • Serial direct path reads stop when 25% of the table blocks are dirty.
Since then, many people noticed that 11GR2 seems to be more aggressive on the thresholds so I decided to re-run the tests and see what changed. I'm also going to run the same tests on the Exadata to see whether it changes anything compared to a normal 11GR2 database.

To refresh the memory, here is how the methodology works (all code is available in the 11G post).

Start threshold

The function grows a segment in a loop, performing a full table scan and using physical reads direct statistic as an indicator. Every iteration buffer cache is flushed to make sure cached blocks do not affect the numbers.

Stop cached threshold

A loop is used to cache more and more blocks from a table followed by a full table scan. Again, physical reads direct statistic is used to see when serial direct path reads no longer happen. Buffer cache is flushed at the beginning of the test.

Stop dirty threshold

Similar to the above but blocks are updated instead of just being cached.

Results

 Non-ExadataExadata
Start threshold_small_table_threshold_small_table_threshold
Stop cached50%50%
Stop dirty25%25%

Summary

11GR2 threshold is indeed more aggressive and starts at _small_table_threshold (i.e. five times lower compared to 11G). There seems to be no change to the rest of the numbers. Behavior appears to be the same on both Exadata and non-Exadata systems.

Bottom line is 11GR2 starts doing serial direct path reads earlier and stops doing these at the same time as 11G did.

All testing were done on 11.2.0.3