Tuesday, April 05, 2011

Oracle GoldenGate and undocumented OCI redo log APIs

Last time I did some quick write up on the new APIs appeared in Oracle GoldenGate 11G which can be used against Oracle 10.2.0.5 and 11.2.0.2 to improve how GG handles redo on ASM.

Some of the items still required follow up, namely:
  • What are the names of these new API functions?
  • What kind of security privileges do you need to call these?
I was especially interested in the last item since the ability to read Oracle's redo log gives you extremely powerful access to database information.

New OCI APIs

Discovering these is pretty straightforward, all we need to do is a stack dump on a running GG Extract process:
[oracle@quadro.com ~]$ pstack 4390
...
#10 0x00002acc23443ef0 in knxOGGRedoLogRead () from /u01/app/oracle/ggs_11g/libclntsh.so.11.1
#11 0x00002acc232a9b3b in OCIPOGGRedoLogRead () from /u01/app/oracle/ggs_11g/libclntsh.so.11.1
#12 0x00000000008fa92c in ASMReader2::readFile(unsigned int, unsigned int, unsigned int*, char*, char*) ()
...
Once we know the name of the read function (OCIPOGGRedoLogRead) we can find the rest of them:
[oracle@quadro.com ~]$ nm /u01/app/oracle/ggs_11g/libclntsh.so.11.1 | grep OCIPOGG
00000000010e3954 T OCIPOGGRedoLogClose
00000000010e3c90 T OCIPOGGRedoLogOpen
00000000010e3a2a T OCIPOGGRedoLogRead
As you can see, these functions are indeed exposed directly through libclntsh library. We can even see the Extract process "in action" calling these functions with the help of Linux's gdb:
[oracle@quadro.com ~]$ ps -aef | grep extract
oracle    4390  4386  0 19:36 ?        00:00:00 /u01/app/oracle/ggs_11g/extract PARAMFILE /u01/app/oracle/ggs_11g/dirprm/11g_ext.prm REPORTFILE /u01/app/oracle/ggs_11g/dirrpt/11G_EXT.rpt PROCESSID 11G_EXT USESUBDIRS
oracle    4985  4114  0 19:51 pts/2    00:00:00 grep extract
[oracle@quadro.com ~]$ gdb /u01/app/oracle/ggs_11g/extract -p 4390
...
(gdb) break OCIPOGGRedoLogClose
Breakpoint 1 at 0x2acc232a9958
(gdb) break OCIPOGGRedoLogOpen
Breakpoint 2 at 0x2acc232a9c94
(gdb) break OCIPOGGRedoLogRead
Breakpoint 3 at 0x2acc232a9a2e
(gdb) info break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00002acc232a9958 
2       breakpoint     keep y   0x00002acc232a9c94 
3       breakpoint     keep y   0x00002acc232a9a2e 
After we've sat up our breakpoints we can let the Extract run and hit one of these:
(gdb) continue
Continuing.

Breakpoint 3, 0x00002acc232a9a2e in OCIPOGGRedoLogRead ()
   from /u01/app/oracle/ggs_11g/libclntsh.so.11.1

(gdb) bt
#0  0x00002acc232a9a2e in OCIPOGGRedoLogRead ()
   from /u01/app/oracle/ggs_11g/libclntsh.so.11.1
#1  0x00000000008fa92c in ASMReader2::readFile (this=0xa408100,
    offset=, bytes_to_read=1024000,
    bytes_read=0x7fffb8c73ad4, buffer=0xabc0000 "\001\"", errtext=0xc457c0 "")
    at /home/ecloud/workspace/Build_FBO_OpenSys_r11.1.1.0.0_078_[34086]/perforce/src/app/er/redo/oracle/asm.c:798
...
On top of the stack we can see GoldenGate making a call to OCIPOGGRedoLogRead. We can also get a sense of parameters by looking at the calling function (ASMReader2::readFile). The function specifies how many bytes it would like to read (bytes_to_read=1024000), a pointer to a variable which will hold the actual amount read (bytes_read=0x7fffb8c73ad4), a pointer to a buffer to store the returned redo data (buffer=0xabc0000) and the error text, if any (errtext=0xc457c0).

Now it is time to see the other two functions usage. What I did is switched the redo logs in my database, disabled breakpoint number 3 (the one which points to OCIPOGGRedoLogRead) and let the Extract process continue:
(gdb) disable 3

(gdb) continue
Continuing.

Breakpoint 2, 0x00002acc232a9c94 in OCIPOGGRedoLogOpen ()
   from /u01/app/oracle/ggs_11g/libclntsh.so.11.1
(gdb) bt
#0  0x00002acc232a9c94 in OCIPOGGRedoLogOpen ()
   from /u01/app/oracle/ggs_11g/libclntsh.so.11.1
#1  0x00000000008fa859 in ASMReader2::openFile (this=0xa408100,
    filename=0xa408350 "+DATA/ora11gr2/onlinelog/group_3.258.743164441",
    lblksize=, blksize=0x0, tot_blks=0x0,
    filetype=, errtext=0xc457c0 "")
    at /home/ecloud/workspace/Build_FBO_OpenSys_r11.1.1.0.0_078_[34086]/perforce/src/app/er/redo/oracle/asm.c:762
...
Interestingly enough we never made a call to OCIPOGGRedoLogClose. Maybe I should open an Oracle SR and complain :)

As before, we can get a glimpse of parameters which include redo log file name as well as bunch of output parameters specifying the block size and so on (which looks somewhat similar to what you get from dbms_diskgroup.open call).

Security

Looking at how all this stuff is being exposed through OCI, I became even more eager to find out what kind of privileges do you require on the database side in order to be able to use these. The first step was to try and run the Extract process with stripped out privileges which thankfully resulted in the following error:
2011-04-05 20:10:36  ERROR   OGG-00446  Opening ASM file +DATA/ora11gr2/onlinelog/group_3.258.743164441 in DBLOGREADER mode: (1031) ORA-01031: insufficient privilegesNot able to establish initial position for sequence 398, rba 1040.
At least there is something but what exactly is it? GoldenGate's documentation (should I add "as usual?") doesn't mention anything in that regards so I had to figure that one out on my own.

After some back and forth I was able to eventually discover the privilege which makes it all possible -- it's select any transaction. Alas it is not documented to play this role but now you know anyway.