Thursday, August 06, 2009

How to install Oracle Grid Control Agents on a Windows failover cluster with no downtime

Metalink Note:464191.1 describes steps required to configure Oracle Grid Control agent in Windows failover cluster environment. Unfortunately, as part of the configuration, the cluster disk containing virtual agent's state information have to be moved to the node where the agent is being deployed.

The agent state directory has to "follow" virtual agent when a failover occurs, hence the requirement for it to be on a cluster disk resource. And since the cluster disk resource is visible on the active node only, you can not deploy virtual agent on any of the passive nodes without moving the group containing disk with state information first.

The above is not a big deal when you're doing install on a brand new or development system but what if you have to deal with a production cluster where any potential downtime that might be associated with moving the group across the nodes would better be avoided?

Let's say you have an Oracle FailSafe configuration and you intend to use Oracle Grid Control to monitor your Oracle database. In this case your virtual agent will be a part of the same cluster group where your Oracle database is. Failing over your database across all the nodes for the sake of deploying a virtual agent may not necessarily be what you want to do.

Of course, the easy workaround is to add another disk (LUN) to a cluster, use it to deploy the agents and, once the deployment has been done, add it to the same group where your database is. But what if you do not have any spare disks and have to share the same cluster disk with your Oracle database?

I gave this problem a bit of research and, as it turned out, there is a really simple workaround which may come in handy in case you'll be faced with the same problem.

I'll use the following configuration as an example:
ORA01A -- first (active) node.
ORA01B -- second (passive) node.
ORA01V -- Oracle Database VIP.
c:\oracle\product\10.2.0\agent10g -- Oracle Grid Control agent home.

Let's say that each system has a local drive C: and the deployment will be done on a cluster drive D:.

Deploy virtual agent on the active node

This is where you follow exactly what Metalink note says you to do:
C:\>emctl deploy agent -n OracleAgentORA01V d:\agent10g ORA01V:1830 ORA01A:1830
Creating shared install...
Source location: C:\oracle\product\10.2.0\agent10g
Destination (shared install) : d:\agent10g
DeployMode : agent

Creating directories...
Creating targets.xml...
Creating emctl control program...
Creating emtgtctl control program...
Setting log and trace files locations for Agent ...
Secure agent found. New agent should be configured for secure mode

Source Agent operating in secure mode.
Run "d:\agent10g/bin/emctl secure agent" to secure agent
Service "OracleAgentORA01V" create SUCCESS
The above will create a virtual agent service named OracleAgentORA01V which will be "bound" to ORA01V virtual IP and use d:\agent10g as a location for virtual agent's state files. Note that I'm using port 1830 since port 3872 is used by a "real" agent. You can specify AgentListenOnAllNICs=FALSE in your emd.properties file (for all agents in the cluster) if you want virtual and real agents share the same port as this will stop agents from trying to listen on all network adapters on the node.

Secure the agent in case your OMS is running in the secure mode:
C:\>d:\agent10g/bin/emctl secure agent
Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0.
Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved.
Agent is already stopped... Done.
Securing agent... Started.
Enter Agent Registration Password :
Securing agent... Successful.
Deploy virtual agent on the passive node

The same deployment command won't work on a passive node simply because drive D: is not there. As a workaround which will allow us to create a virtual agent service on the passive node we will use a local drive C: for initial deployment:
C:\>emctl deploy agent -n OracleAgentORA01V c:\agent10g ORA01V:1830 ORA01B:1830
Creating shared install...
Source location: C:\oracle\product\10.2.0\agent10g
Destination (shared install) : c:\agent10g
DeployMode : agent

Creating directories...
Creating targets.xml...
Creating emctl control program...
Creating emtgtctl control program...
Setting log and trace files locations for Agent ...
Secure agent found. New agent should be configured for secure mode

Source Agent operating in secure mode.
Run "c:\agent10g/bin/emctl secure agent" to secure agent
Service "OracleAgentORA01V" create SUCCESS
However, this is not what we want as all virtual agents should be sharing the same cluster drive D: instead. To fix the location of the agent state directory, launch regedit.exe and navigate to
HKLM\SOFTWARE\ORACLE\SYSMAN\OracleAgentORA01V
registry key. Under that key you'll find EMSTATE field with c:\agent10g as its value. Modify this value to be d:\agent10g instead. You can remove original folder as well.

Done! I found that this virtual agent will be fully operational once the group failovers to the passive node (don't forget to create a cluster resource for a virtual agent) and will be using shared state directory.