|
|
|
racgvip was modified as below to dump the values of _O1 and _O2:
... # get RX packets numbers _O1=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"` logx "--------------> by dunal: _O1: $_O1" x=$CHECK_TIMES while [ $x -gt 0 ] do if [ -n "$tmpIP" ] then logx "About to execute command: $PING -S $tmpIP $PING_TIMEOUT $DEFAULTGW " $PING -S $tmpIP $PING_TIMEOUT $DEFAULTGW > /dev/null 2>&1 else logx "About to execute command: $PING $PING_TIMEOUT $DEFAULTGW" $PING $PING_TIMEOUT $DEFAULTGW > /dev/null 2>&1 fi _O2=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"` logx "--------------> by dunal: _O2: $_O2" ... As seen above, logx "--------------> by dunal: ..." lines are added to the script. Don't do that if you're not sure about what you do. After restarting the VIP, the values of _O1 and _O2 are dumped in the logs. Failed Node: ... Wed Mar 18 20:58:49 GMT+02:00 2009 [ 413770 ] --------------> by dunal: _O1: - 2009-03-18 20:58:52.212: [ RACG][1] [360462][1][ora.akyorap2.vip]: Wed Mar 18 20:58:49 GMT+02:00 2009 [ 413770 ] About to execute command: /usr/sbin/ping -S 10.46.180.52 -c 1 -w 1 10.46.180.1 Wed Mar 18 20:58:50 GMT+02:00 2009 [ 413770 ] --------------> by dunal: _O2: - 2009-03-18 20:58:52.212: [ RACG][1] [360462][1][ora.akyorap2.vip]: Wed Mar 18 20:58:51 GMT+02:00 2009 [ 413770 ] About to execute command: /usr/sbin/ping -S 10.46.180.52 -c 1 -w 1 10.46.180.1 Wed Mar 18 20:58:51 GMT+02:00 2009 [ 413770 ] --------------> by dunal: _O2: - 2009-03-18 20:58:52.212: [ RACG][1] [360462][1][ora.akyorap2.vip]: Wed Mar 18 20:58:52 GMT+02:00 2009 [ 413770 ] IsIfAlive: RX packets checked if=en1 failed Wed Mar 18 20:58:52 GMT+02:00 2009 [ 413770 ] Interface en1 checked failed (host =akyorap2) ... As seen above, the values are '-'. It's wrong. But, they are same. So, RX packet number not changed. Successful Node: Wed Mar 18 20:58:55 GMT+02:00 2009 [ 405728 ] --------------> by dunal: _O1: 17297 2009-03-18 20:58:55.793: [ RACG][1] [397546][1][ora.akyorap2.vip]: Wed Mar 18 20:58:55 GMT+02:00 2009 [ 405728 ] About to execute command: /usr/sbin/ping -S 10.46.180.51 -c 1 -w 1 10.46.180.1 Wed Mar 18 20:58:55 GMT+02:00 2009 [ 405728 ] --------------> by dunal: _O2: 17298 2009-03-18 20:58:55.793: [ RACG][1] [397546][1][ora.akyorap2.vip]: Wed Mar 18 20:58:55 GMT+02:00 2009 [ 405728 ] IsIfAlive: RX packets checked if=en1 OK _O1 and _O2 are different. That means RX packet number changed and the interface is up. netstat Output on Failed Node: /usr/bin/netstat -f inet -n -I en1 | /usr/bin/awk "{ if (/^en1/) {print $5; exit}}" en1 1500 link#3 0.21.5e.34.55.bc - 34601 0 16269 3 0 The column#5 is '-'. This is wrong and caused the problem. netstat Output on Successful Node: en1 1500 link#3 0.21.5e.34.57.fe 29223 0 10609 3 0 The column#5 is 29223. This is expected number. Headers of netstat on Failed Node: #/usr/bin/netstat -f inet -n -I en1 Name Mtu Network Address ZoneID Ipkts Ierrs Opkts Oerrs Coll en1 1500 link#3 0.21.5e.34.55.bc - 35645 0 16801 3 0 en1 1500 10.46.180 10.46.180.52 - 35645 0 16801 3 0 Headers of netstat on Successful Node: #/usr/bin/netstat -f inet -n -I en1 Name Mtu Network Address ZoneID Ipkts Ierrs Opkts Oerrs Coll en1 1500 link#3 0.21.5e.34.57.fe 29743 0 10762 3 0 en1 1500 10.46.180 10.46.180.51 29743 0 10762 3 0 en1 1500 10.46.180 10.46.180.53 29743 0 10762 3 0 en1 1500 10.46.180 10.46.180.54 29743 0 10762 3 0 The difference is the ZoneID column. Looks like a network configuration problem. This issue will be open for an update from Network Administrators. The Network Adminisitrator said it was an AIX Bug:
But, this fix changes ZoneID from blank value to '-'. After this fix, no VIP could be started. No solution found from Metalink.
Looks like an inconsistency of Oracle on AIX 6.1.
Workaround: Capturing column number of netstat must be changed from 5 to 6. Original lines for _O1: ... tmpIP=`$LSATTR -El ${_IF} -a netaddr | $AWK '{print $2}'` # get RX packets numbers _O1=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"` x=$CHECK_TIMES while [ $x -gt 0 ] ... Modified line for _O1: ... tmpIP=`$LSATTR -El ${_IF} -a netaddr | $AWK '{print $2}'` # get RX packets numbers _O1=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$6; exit}}"` x=$CHECK_TIMES while [ $x -gt 0 ] ... Original lines for _O2: ... fi _O2=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$5; exit}}"` if [ "$_O1" != "$_O2" ] then # RX packets numbers changed ... Modified line for _O2: ... fi _O2=`$NETSTAT -n -I $_IF | $AWK "{ if (/^$_IF/) {print \\$6; exit}}"` if [ "$_O1" != "$_O2" ] then # RX packets numbers changed ... Then, VIP could be started on the correct nodes: ./crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora....ap1.gsd application ONLINE ONLINE akyorap1 ora....ap1.ons application ONLINE ONLINE akyorap1 ora....ap1.vip application ONLINE ONLINE akyorap1 ora....ap2.gsd application ONLINE ONLINE akyorap2 ora....ap2.ons application ONLINE ONLINE akyorap2 ora....ap2.vip application ONLINE ONLINE akyorap2 Note: Don't edit Oracle scripts unless you know what you're doing. |
Here are the related excerpt from racgvip:
According to the the code above, it does the followings: