|
|
|
Computing the Offset of Corrupted ASM Block:
SQL> select GROUP_NUMBER,NAME,ALLOCATION_UNIT_SIZE from v$asm_diskgroup; GROUP_NUMBER NAME ALLOCATION_UNIT_SIZE ------------ ------------------------- -------------------- 1 DATA 1048576 SQL> select GROUP_NUMBER, DISK_NUMBER, name, path from v$asm_disk; GROUP_NUMBER DISK_NUMBER NAME PATH ------------ ----------- ------------------------- -------------------- 1 0 DATA_0000 /u01/oradata/oravol1 1 1 DATA_0001 /u01/oradata/oravol2 1 2 DATA_0002 /u01/oradata/oravol3 1 3 DATA_0003 /u01/oradata/oravol4 1 4 DATA_0004 /u01/oradata/oravol5
SQL> select BLOCK_SIZE from v$asm_file where FILE_NUMBER=516; BLOCK_SIZE ---------- 512
SQL> select DISK_KFFXP, AU_KFFXP from x$kffxp where XNUM_KFFXP=24 and group_kffxp=1 and NUMBER_KFFXP=516; DISK_KFFXP AU_KFFXP ---------- ---------- 1 60884 Disk#1 : /u01/oradata/oravol2 Interpreting the truss Output of ARCH:
fd#261 is /u01/oradata/oravol2 for ARCH. Reading Offsets by ARCH: bash-3.00$ grep "pread(261" arc0.truss.log 26085: pread(261, 0xFFFFFD7FFC32DE00, 131072, 0xEDE600000) = 131072 26085: pread(261, 0xFFFFFD7FFC21CE00, 131072, 0xEDE620000) = 131072 26085: pread(261, 0xFFFFFD7FFC10BE00, 131072, 0xEDE640000) = 131072 26085: pread(261, 0xFFFFFD7FFBE2DE00, 131072, 0xEDE660000) = 131072 26085: pread(261, 0xFFFFFD7FFBA2DE00, 131072, 0xEDE680000) = 131072 26085: pread(261, 0xFFFFFD7FFB42DE00, 131072, 0xEDE6A0000) = 131072 26085: pread(261, 0xFFFFFD7FFB53DE00, 131072, 0xEDE6C0000) = 131072 26085: pread(261, 0xFFFFFD7FFB64DE00, 131072, 0xEDE6E0000) = 131072 26085: pread(261, 0xFFFFFD7FFADCDE00, 131072, 0xEDE700000) = 131072 26085: pread(261, 0xFFFFFD7FFAE6DE00, 131072, 0xEDE800000) = 131072 26085: pread(261, 0xFFFFFD7FFAEDDE00, 131072, 0xEDE720000) = 131072 26085: pread(261, 0xFFFFFD7FFAF7DE00, 131072, 0xEDE820000) = 131072 26085: pread(261, 0xFFFFFD7FFC2CDE00, 131072, 0xEDE740000) = 131072 26085: pread(261, 0xFFFFFD7FFC36DE00, 131072, 0xEDE840000) = 131072 26085: pread(261, 0xFFFFFD7FFC1BCE00, 131072, 0xEDE760000) = 131072 26085: pread(261, 0xFFFFFD7FFC25CE00, 131072, 0xEDE860000) = 131072 26085: pread(261, 0xFFFFFD7FFC0ABE00, 131072, 0xEDE780000) = 131072 26085: pread(261, 0xFFFFFD7FFC14BE00, 131072, 0xEDE880000) = 131072 26085: pread(261, 0xFFFFFD7FFBDCDE00, 131072, 0xEDE7A0000) = 131072 26085: pread(261, 0xFFFFFD7FFBE6DE00, 131072, 0xEDE8A0000) = 131072 26085: pread(261, 0xFFFFFD7FFB9CDE00, 131072, 0xEDE7C0000) = 131072 26085: pread(261, 0xFFFFFD7FFBA6DE00, 131072, 0xEDE8C0000) = 131072 26085: pread(261, 0xFFFFFD7FFB3CDE00, 131072, 0xEDE7E0000) = 131072 26085: pread(261, 0xFFFFFD7FFB46DE00, 131072, 0xEDE8E0000) = 131072 26085: pread(261, 0xFFFFFD7FFB51DE00, 131072, 0xEDE900000) = 131072 26085: pread(261, 0xFFFFFD7FFB62DE00, 131072, 0xEDE920000) = 131072 26085: pread(261, 0xFFFFFD7FFAE0DE00, 131072, 0xEDE940000) = 131072 26085: pread(261, 0xFFFFFD7FFAF1DE00, 131072, 0xEDE960000) = 131072 26085: pread(261, 0xFFFFFD7FFC30DE00, 131072, 0xEDE980000) = 131072 26085: pread(261, 0xFFFFFD7FFC1FCE00, 131072, 0xEDE9A0000) = 131072 26085: pread(261, 0xFFFFFD7FFC0EBE00, 131072, 0xEDE9C0000) = 131072 26085: pread(261, 0xFFFFFD7FFBE0DE00, 131072, 0xEDE9E0000) = 131072 26085: pread(261, 0xFFFFFD7FFBEADE00, 512, 0xEDD400000) = 512 26085: pread(261, 0xFFFFFD7FFB9AE000, 130560, 0xEDD400200) = 130560 26085: pread(261, 0xFFFFFD7FFBA4DE00, 131072, 0xEDD500000) = 131072 26085: pread(261, 0xFFFFFD7FFBAADE00, 512, 0xEDD420000) = 512 26085: pread(261, 0xFFFFFD7FFB9AE000, 130560, 0xEDD400200) = 130560 26085: pread(261, 0xFFFFFD7FFBA4DE00, 131072, 0xEDD500000) = 131072 26085: pread(261, 0xFFFFFD7FFBAADE00, 512, 0xEDD420000) = 512 26085: pread(261, 0xFFFFFD7FFC53BE00, 16384, 0xEDDED4000) = 16384 bash-3.00$ As seen above, offsets starting with 0xEDE and 0xEDD5 are greater than our corrupted offset of 0xEDD4DFA00. So, They are out of the scope. The followings should be examined:
ARCH did not read the corrupted block#50941. But, it reported an error. dd Output of the Corrupted Block: ASM Corrupted Block Offset in 512 byte block: 63842417152/512=124692221 bash-3.00$ dd if=/u01/oradata/oravol2 bs=512 iseek=124692221 count=1|od -x 0000000 2201 0000 f0fd 0000 001b 0000 80d8 2304 <blockNo> 0000020 3838 322e 3731 312e 3431 7807 0a6c 111e 0000040 2230 3001 002c 0605 3131 3730 3130 3306 0x0000f0fd is not 50941. So, it's corrupted. The reason why ARCH did not read this block is hidden in the error messages: ORA-00353: log corruption near block 50941 change 9160702125 time 03/09/2009 1 It says near. Finding the Other Corrupted Block:
dd Outputs on pread() of ARCH:
As seen above, the block numbers increase from 0xC000 to 0xC0FF. But, in the last call, it jumped to 0xC800. truss Output of ARCH for block# 0xC800 26085: pread(261, 0xFFFFFD7FFBAADE00, 512, 0xEDD420000) = 512 26085: 01 "\0\0\0C8\0\01B\0\0\0 \80 H -\00505 4 1 4 5 0\v 6 6 6 6 6 6 4 <blockNo> 26085: 1 4 5 00F 2 1 2 . 1 5 6 . 2 3 0 . 2 1 807 x l\n07\f %1F01 0 ,\0 26085: 0505 3 5 6 0 705 3 8 0 3 50E 8 8 . 2 4 1 . 1 3 6 . 2 2 007 x l\n 26085: 07\f %1F01 0 ,\00505 6 2 0 5 1\b a d a m k a c i0E 1 9 5 . 2 4 4 26085: . 6 2 . 1 4 507 x l\n07\f % "01 0 ,\00505 6 2 0 5 1\b a d a m k 26085: a c i\f 7 8 . 1 9 0 . 6 8 . 1 707 x l\n07\f % #01 0 ,\00502 - 1 26085: 05 K A Y A 20E 1 9 5 . 2 4 4 . 6 2 . 1 4 507 x l\n07\f % .02 - 2 26085: ,\00502 - 105 K A Y A 20E 1 9 5 . 2 4 4 . 6 2 . 1 4 507 x l\n07 26085: \f &0102 - 2 ,\00505 6 1 1 4 105 1 9 5 5 60E 1 9 5 . 2 4 4 . 6 2 26085: . 1 4 707 x l\n07\f &\r01 0 ,\00505 6 1 1 4 105 1 9 5 5 6\f 8 8 26085: . 2 3 4 . 5 . 2 3 107 x l\n07\f &0F01 0 ,\00502 - 105 K A Y A 2 26085: 0E 1 9 5 . 2 4 4 . 6 2 . 1 4 507 x l\n07\f &1002 - 2 ,\00506 1 1 26085: 1 0 1 605 O K A Y A\f 8 5 . 1 0 8 . 8 7 . 5 007 x l\n07\f & !01 26085: 0 ,\00502 - 105 K A Y A 20E 1 9 5 . 2 4 4 . 6 2 . 1 4 507 x l\n 26085: 07\f & "02 - 2 ,\00505 4 1 9 3 806 6 4 3 2 5 5\r 8 8 . 2 2 5 . 1 26085: 2 0 . 5 307 x l\n07\f & +01 0 ,\00505 5 3 0 5 506 0 9 1 2 1 90E Then, the following messages were written to the trace file: 26085: write(2, " * * * 2 0 0 9 - 0 3 -".., 27) = 27 26085: write(2, "\n", 1) = 1 26085: write(2, " ", 1) = 1 26085: write(2, "\n", 1) = 1 26085: write(2, " C o r r u p t r e d o".., 51) = 51 26085: write(2, "\n", 1) = 1 26085: write(2, " F l a g : 0 x 3 0 F".., 80) = 80 26085: write(2, "\n", 1) = 1 26085: write(2, " - - - - - D u m p o".., 39) = 39 26085: write(2, "\n", 1) = 1 26085: write(2, " 5 c 4 6 3 8 3 0 2 0 3 0".., 64) = 64 <blockNoPiece0> 26085: write(2, "\n", 1) = 1 26085: write(2, " 3 0 3 0 4 3 3 c 5 c 3 0".., 64) = 64 <blockNoPiece1> 26085: write(2, "\n", 1) = 1 26085: write(2, " 5 c 3 0 5 c 5 0 3 0 3 0".., 64) = 64 26085: write(2, "\n", 1) = 1 26085: write(2, " 5 c 3 0 5 c 3 0 2 0 3 0".., 64) = 64 26085: write(2, "\n", 1) = 1 26085: write(2, " 3 2 3 9 3 5 3 2 2 0 0 9".., 64) = 64 26085: write(2, "\n", 1) = 1 26085: write(2, " 3 0 3 1 3 0 3 0 2 0 3 0".., 64) = 64 26085: write(2, "\n", 1) = 1 26085: write(2, " 5 c 3 1 3 0 3 0 5 c 3 2".., 64) = 64 26085: write(2, "\n", 1) = 1 26085: write(2, " 5 c 3 4 3 0 3 0 5 c 3 0".., 64) = 64 26085: write(2, "\n", 1) = 1 26085: write(2, " 2 0 3 0 5 c 3 2 3 0 3 9".., 64) = 64 26085: write(2, "\n", 1) = 1 26085: write(2, " 5 c 3 0 5 c 3 0 2 0 3 6".., 64) = 64 26085: write(2, "\n", 1) = 1 26085: write(2, " 3 0 3 0 4 3 3 c 2 0 3 7".., 64) = 64 26085: write(2, "\n", 1) = 1 26085: write(2, " 3 a 3 5 3 2 3 9 3 0 2 0".., 64) = 64 26085: write(2, "\n", 1) = 1 26085: write(2, " 5 c 3 0 5 c 3 0 3 0 3 0".., 64) = 64 26085: write(2, "\n", 1) = 1 26085: write(2, " 4 2 5 4 2 0 4 4 0 a 4 9".., 64) = 64 26085: write(2, "\n", 1) = 1 26085: write(2, " 2 0 3 8 2 0 3 0 2 0 3 5".., 64) = 64 26085: write(2, "\n", 1) = 1 26085: write(2, " 3 0 3 5 3 0 3 6 5 c 3 8".., 64) = 64 26085: write(2, "\n", 1) = 1 26085: write(2, " R e r e a d i n g l o".., 78) = 78 26085: write(2, "\n", 1) = 1 Rereading the block fails like this. There are 2 problems:
Checking missing IO of LGWR from truss Output :
bash-3.00$ grep Err lgwr.truss.log|grep pwrite bash-3.00$ grep Err lgwr.truss.log|grep pread bash-3.00$ No missing IO. Checking IO buffers of LGWR: fd#260 is /u01/oradata/oravol2 for LGWR. The Last write to block: 25925: pwrite(260, 0x380D78400, 76288, 0xEDD420000) = 76288 25925: 01 "\0\0\0C8\0\01B\0\0\0 \80 H -\00505 4 1 4 5 0\v 6 6 6 6 6 6 4 <blockNo> 25925: 1 4 5 00F 2 1 2 . 1 5 6 . 2 3 0 . 2 1 807 x l\n07\f %1F01 0 ,\0 25925: 0505 3 5 6 0 705 3 8 0 3 50E 8 8 . 2 4 1 . 1 3 6 . 2 2 007 x l\n As seen above, the contents of redo buffer is corrupted. The block number is 0xC800. But, this LGWR had generated correct archivelog: bash-3.00$ dd if=/u01/app/oracle/product/10.2.0/dbs/arch/1_25_681074311.dbf bs=512 skip=256 count=1|od -x 1+0 records in 1+0 records out 0000000 2201 0000 0100 0000 0019 0000 8000 d162 <blockNo> 0000020 3534 332e 2e33 3032 0733 6b78 0904 3c0c 0000040 0114 2c30 0500 3205 3031 3631 6905 6e69 0x0100 = 256, which is the correct block number. Looks like a configuration issue or a bug in OS/STORAGE side.
This issue handles redo corruption only. But, the database encounters the corruptions on UNDO,INDEX,TABLE, CONTROL FILES, too. But, the root cause is same: Similar to This issue will be updated when a comment is sent by the OS vendor. Operating System reinstalled by the vendor. Then problem has not occured.
|
As seen above, the last successful sequence before the corruption is 25.
Header of Archive Log:
The block number is 0x0000c6fd (bytes swapped since the platform is little endian). Since 50941=0x0000c6fd, block number in archive log is correct. That means, LGWR had successfuly written the correct redo before the log switch.