<< Back to previous view

[QA-47] ORA-00354 ORA-00353 ORA-00312: Redolog Block Corruption Created: 10/Mar/09  Updated: 10/Apr/09

Status: Closed
Project: Questions & Answers
Fix Version/s: None

Type: Oracle - Operating System Priority: Major
Reporter: ubTools Support Assignee: ubTools Support
Resolution: Answered Votes: 0

Product Version: Oracle 10.2.0.4 SE,RAC
Operating System: Solaris
Operating System Version: 10

 Description   
Problem:

Import causes instance to be hang. During import only one instance is open.

imp system/manager file=../yedek/gedik_full.dmp log=../yedek/gedik_full_imp3.log full=y FEEDBACK=1000000
buffer=10000000 RESUMABLE=y RESUMABLE_TIMEOUT=72000

Diagnostic Data for Oracle:

Alert Log:

Mon Mar  9 19:38:45 2009
ARC0: Log corruption near block 50941 change 9160702125 time ?
Mon Mar  9 19:38:45 2009
Errors in file /u01/app/oracle/admin/ORCL/bdump/orcl1_arc0_26085.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 50941 change 9160702125 time 03/09/2009 1
9:38:35
ORA-00312: online log 1 thread 1: '+DATA/orcl/onlinelog/group_1.516.680795507'
ARC0: All Archive destinations made inactive due to error 354
Mon Mar  9 19:38:45 2009
ARC0: Closing local archive destination LOG_ARCHIVE_DEST_1: '/u01/app/oracle/p
roduct/10.2.0/dbs/arch/1_27_681074311.dbf' (error 354)
 (ORCL1)
Committing creation of archivelog '/u01/app/oracle/product/10.2.0/dbs/arch/1_2
7_681074311.dbf' (error 354)
ARCH: Archival stopped, error occurred. Will continue retrying

Archive Log Trace:

Corrupt redo block 50941 detected: bad block number
Flag: 0x30 Format: 0x38 Block: 0x20302030 Seq: 0x5c305c79 Beg: 0x3030 Cks:0x5
c31
----- Dump of Corrupt Redo Buffer -----
5c463830203020305c305c795c3130305c3230305c305c305c305c3020665c30
3030433c5c305c345c305c305c305c305c305c3035320a303a35323920202009
5c305c5030305c303033203120305c32303920383022203520305c315c353034
5c305c3020305c3020725c305c3820373035317231305c315c305c330a305c30
3239353220093a35203843203030433f20372034203530395c3230225c393830
30313030203020315c3630795c3431305c3430305c463830203020305c305c79
5c3130305c32303035320a303a3532395c2020095c305c305c305c30433c2066
5c3430305c305c305c305c305c305c3020305c305c305c5030305c3030292031
20305c32303920383022203520305c310a3320363239353220093a355c305c20
5c305c30203620303038203331725c455c625c365c3331305c305c3020665c30
3030433c20372034203530395c3230225c3938305c3530302030203035320a79
3a3532393020200931305c365c305c3038305c6230305c4641305c3431305c35
5c305c3030305c3020305c3542305c3f5c305c3031305c3030305c3131305c31
425420440a4920353239353220093a35314345205c305c395c3130305c323062
20382030203530395c313022203530305c305c345c305c302035303020372034
303530365c3831315c3230304646463035320a463a3532394320200938412032
Rereading log member '+DATA/orcl/onlinelog/group_1.516.680795507' (corruption
)
...
Corrupt redo block 50941 detected: bad block number
Flag: 0x0 Format: 0x0 Block: 0x00000000 Seq: 0x00000000 Beg: 0x0 Cks:0x0
----- Dump of Corrupt Redo Buffer -----
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
Rereading log member '+DATA/orcl/onlinelog/group_1.516.680795507' (corruption
)
...
Corrupt redo block 50941 detected: bad block number
Flag: 0x30 Format: 0x38 Block: 0x20302030 Seq: 0x5c305c79 Beg: 0x3030 Cks:0x5
c31
----- Dump of Corrupt Redo Buffer -----
5c463830203020305c305c795c3130305c3230305c305c305c305c3020665c30
3030433c5c305c345c305c305c305c305c305c3035320a303a35323920202009
5c305c5030305c303033203120305c32303920383022203520305c315c353034
5c305c3020305c3020725c305c3820373035317231305c315c305c330a305c30
3239353220093a35203843203030433f20372034203530395c3230225c393830
30313030203020315c3630795c3431305c3430305c463830203020305c305c79
5c3130305c32303035320a303a3532395c2020095c305c305c305c30433c2066
5c3430305c305c305c305c305c305c3020305c305c305c5030305c3030292031
20305c32303920383022203520305c310a3320363239353220093a355c305c20
5c305c30203620303038203331725c455c625c365c3331305c305c3020665c30
3030433c20372034203530395c3230225c3938305c3530302030203035320a79
3a3532393020200931305c365c305c3038305c6230305c4641305c3431305c35
5c305c3030305c3020305c3542305c3f5c305c3031305c3030305c3131305c31
425420440a4920353239353220093a35314345205c305c395c3130305c323062
20382030203530395c313022203530305c305c345c305c302035303020372034
303530365c3831315c3230304646463035320a463a3532394320200938412032
*** 2009-03-10 03:55:10.757 62692 kcrr.c

As seen above, even if the database hangs, the contents of redo buffer dump change.

Diagnostic Data for Solaris:

Soft Link Mapping to Raw Devices:

oravol1: disk@g600a0b80005a81660000074949959b42:b,raw
oravol2: disk@g600a0b80005a816600000742499595ea:b,raw
oravol3: disk@g600a0b80005a8166000007444995971e:b,raw
oravol4: disk@g600a0b80005a8c9f000004f049959717:b,raw
oravol5: disk@g600a0b80005a8c9f000004f249959991:b,raw

Open File Descriptors of ARCH process:

bash-3.00$ ps -ef|grep arc0
  oracle 19941 14227   0 04:10:31 pts/12      0:00 grep arc0
  oracle 26085     1   0 19:25:05 ?           0:29 ora_arc0_ORCL1

bash-3.00$ ls -ltr /proc/26085/path
...
lrwxrwxrwx   1 oracle   oinstall       0 Mar  9 19:25 261
 -> /devices/scsi_vhci/disk@g600a0b80005a816600000742499595ea:b,raw
lrwxrwxrwx   1 oracle   oinstall       0 Mar  9 19:25 260
 -> /devices/scsi_vhci/disk@g600a0b80005a8c9f000004f049959717:b,raw
lrwxrwxrwx   1 oracle   oinstall       0 Mar  9 19:25 259
 -> /devices/scsi_vhci/disk@g600a0b80005a81660000074949959b42:b,raw
lrwxrwxrwx   1 oracle   oinstall       0 Mar  9 19:25 257
 -> /devices/scsi_vhci/disk@g600a0b80005a8166000007444995971e:b,raw
lrwxrwxrwx   1 oracle   oinstall       0 Mar  9 19:25 256
 -> /devices/scsi_vhci/disk@g600a0b80005a8c9f000004f249959991:b,raw
...
bash-3.00$

Gathering truss output for ARCH:

truss -fae -w 261,260,259,257,256 -r 261,260,259,257,256 -o arc0.truss.log -p 26085

The command above will trace system calls with pread()/pwrite() IO buffer dumping for fd of 261,260,259,257,256.

Open File Descriptors of LGWR process:

bash-3.00$ ps -ef|grep lgwr
  oracle 28447     1   0   Mar 04 ?           0:17 asm_lgwr_+ASM1
  oracle 25925     1   0 19:24:49 ?           0:38 ora_lgwr_ORCL1
  oracle 26468 14227   0 04:21:02 pts/12      0:00 grep lgwr

bash-3.00$ ls -ltr /proc/25925/path
...
lrwxrwxrwx   1 oracle   oinstall       0 Mar  9 19:24 260
 -> /devices/scsi_vhci/disk@g600a0b80005a816600000742499595ea:b,raw
lrwxrwxrwx   1 oracle   oinstall       0 Mar  9 19:24 259
 -> /devices/scsi_vhci/disk@g600a0b80005a81660000074949959b42:b,raw
lrwxrwxrwx   1 oracle   oinstall       0 Mar  9 19:24 258
 -> /devices/scsi_vhci/disk@g600a0b80005a8c9f000004f049959717:b,raw
lrwxrwxrwx   1 oracle   oinstall       0 Mar  9 19:24 257
 -> /devices/scsi_vhci/disk@g600a0b80005a8166000007444995971e:b,raw
lrwxrwxrwx   1 oracle   oinstall       0 Mar  9 19:24 256
 -> /devices/scsi_vhci/disk@g600a0b80005a8c9f000004f249959991:b,raw
...

Gathering truss output for ARCH:

bash-3.00$ truss -fae -w 260,259,258,257,256 -r 260,259,258,257,256 -o lgwr.truss.log -p 25925 &

The command above will trace system calls with pread()/pwrite() IO buffer dumping for fd of 260,259,258,257,256.



 Comments   
Comment by ubTools Support [ 10/Mar/09 03:49 AM ]
Last Successful Log Switch:
Beginning log switch checkpoint up to RBA [0x19.2.10], SCN: 9160700232
Mon Mar  9 19:38:21 2009
Thread 1 advanced to log sequence 25 (LGWR switch)
  Current log# 1 seq# 25 mem# 0: +DATA/orcl/onlinelog/group_1.516.680795507
Thread 1 cannot allocate new log, sequence 26
Checkpoint not complete
  Current log# 1 seq# 25 mem# 0: +DATA/orcl/onlinelog/group_1.516.680795507
Mon Mar  9 19:38:28 2009
Completed checkpoint up to RBA [0x19.2.10], SCN: 9160700232

As seen above, the last successful sequence before the corruption is 25.

Header of Archive Log:

(root@gdksun1:bin)$ dd if=/u01/app/oracle/product/10.2.0/dbs/arch/1_25_681074311.dbf
 bs=512 skip=50941 count=1|od -x

0000000 2201 0000   c6fd 0000      0019 0000 81d8 54c6
                    <blockNo>
0000020 2e32 3134 362e 0736 6b78 1207 2f0f 0212
0000040 332d 002c 0505 3831 3834 0532 7567 6469
0000060 0c65 3838 322e 3433 382e 2e38 3138 7807
0000100 076b 0f12 172f 3001 002c 0505 3032 3834
0000120 0739 7362 7361 6369 0e69 3538 312e 3530
0000140 312e 3535 322e 3233 7807 076b 0f12 172f

The block number is 0x0000c6fd (bytes swapped since the platform is little endian). Since 50941=0x0000c6fd, block number in archive log is correct. That means, LGWR had successfuly written the correct redo before the log switch.

Comment by ubTools Support [ 10/Mar/09 03:59 AM ]
Computing the Offset of Corrupted ASM Block:

SQL> select GROUP_NUMBER,NAME,ALLOCATION_UNIT_SIZE from v$asm_diskgroup;

GROUP_NUMBER NAME                      ALLOCATION_UNIT_SIZE
------------ ------------------------- --------------------
           1 DATA                                   1048576

SQL>  select  GROUP_NUMBER,  DISK_NUMBER, name, path
           from v$asm_disk;

GROUP_NUMBER DISK_NUMBER NAME                      PATH
------------ ----------- ------------------------- --------------------
           1           0 DATA_0000                 /u01/oradata/oravol1
           1           1 DATA_0001                 /u01/oradata/oravol2
           1           2 DATA_0002                 /u01/oradata/oravol3
           1           3 DATA_0003                 /u01/oradata/oravol4
           1           4 DATA_0004                 /u01/oradata/oravol5
  • ASM File Name: +DATA/orcl/onlinelog/group_1.516.680795507
  • ASM File#.........: 516
  • Corrupted Block#...: 50941
  • File Block Size:
SQL> select BLOCK_SIZE from  v$asm_file where FILE_NUMBER=516;

BLOCK_SIZE
----------
       512
  • Blocks per ASM Extent: 1048576/512=2048
  • ASM Extent#......: 50941/2048 = 24 (rounded down)
  • Block# in ASM Extent...: 50941 - 24*2048 = 1789
  • Disk# and ASM Extent Offset:
SQL> select DISK_KFFXP,  AU_KFFXP from x$kffxp
     where XNUM_KFFXP=24 and group_kffxp=1 and  NUMBER_KFFXP=516;

DISK_KFFXP   AU_KFFXP
---------- ----------
         1      60884

Disk#1 : /u01/oradata/oravol2
ASM Extent Offset...: 60884*1048576 = 63841501184 --> 0xEDD400000
ASM Corrupted Block Offset.....: 63841501184+1789*512 = 63842417152 --> 0xEDD4DFA00

Comment by ubTools Support [ 10/Mar/09 04:47 AM ]
Interpreting the truss Output of ARCH:

fd#261 is /u01/oradata/oravol2 for ARCH.

Reading Offsets by ARCH:

bash-3.00$ grep "pread(261" arc0.truss.log
26085:  pread(261, 0xFFFFFD7FFC32DE00, 131072, 0xEDE600000) = 131072
26085:  pread(261, 0xFFFFFD7FFC21CE00, 131072, 0xEDE620000) = 131072
26085:  pread(261, 0xFFFFFD7FFC10BE00, 131072, 0xEDE640000) = 131072
26085:  pread(261, 0xFFFFFD7FFBE2DE00, 131072, 0xEDE660000) = 131072
26085:  pread(261, 0xFFFFFD7FFBA2DE00, 131072, 0xEDE680000) = 131072
26085:  pread(261, 0xFFFFFD7FFB42DE00, 131072, 0xEDE6A0000) = 131072
26085:  pread(261, 0xFFFFFD7FFB53DE00, 131072, 0xEDE6C0000) = 131072
26085:  pread(261, 0xFFFFFD7FFB64DE00, 131072, 0xEDE6E0000) = 131072
26085:  pread(261, 0xFFFFFD7FFADCDE00, 131072, 0xEDE700000) = 131072
26085:  pread(261, 0xFFFFFD7FFAE6DE00, 131072, 0xEDE800000) = 131072
26085:  pread(261, 0xFFFFFD7FFAEDDE00, 131072, 0xEDE720000) = 131072
26085:  pread(261, 0xFFFFFD7FFAF7DE00, 131072, 0xEDE820000) = 131072
26085:  pread(261, 0xFFFFFD7FFC2CDE00, 131072, 0xEDE740000) = 131072
26085:  pread(261, 0xFFFFFD7FFC36DE00, 131072, 0xEDE840000) = 131072
26085:  pread(261, 0xFFFFFD7FFC1BCE00, 131072, 0xEDE760000) = 131072
26085:  pread(261, 0xFFFFFD7FFC25CE00, 131072, 0xEDE860000) = 131072
26085:  pread(261, 0xFFFFFD7FFC0ABE00, 131072, 0xEDE780000) = 131072
26085:  pread(261, 0xFFFFFD7FFC14BE00, 131072, 0xEDE880000) = 131072
26085:  pread(261, 0xFFFFFD7FFBDCDE00, 131072, 0xEDE7A0000) = 131072
26085:  pread(261, 0xFFFFFD7FFBE6DE00, 131072, 0xEDE8A0000) = 131072
26085:  pread(261, 0xFFFFFD7FFB9CDE00, 131072, 0xEDE7C0000) = 131072
26085:  pread(261, 0xFFFFFD7FFBA6DE00, 131072, 0xEDE8C0000) = 131072
26085:  pread(261, 0xFFFFFD7FFB3CDE00, 131072, 0xEDE7E0000) = 131072
26085:  pread(261, 0xFFFFFD7FFB46DE00, 131072, 0xEDE8E0000) = 131072
26085:  pread(261, 0xFFFFFD7FFB51DE00, 131072, 0xEDE900000) = 131072
26085:  pread(261, 0xFFFFFD7FFB62DE00, 131072, 0xEDE920000) = 131072
26085:  pread(261, 0xFFFFFD7FFAE0DE00, 131072, 0xEDE940000) = 131072
26085:  pread(261, 0xFFFFFD7FFAF1DE00, 131072, 0xEDE960000) = 131072
26085:  pread(261, 0xFFFFFD7FFC30DE00, 131072, 0xEDE980000) = 131072
26085:  pread(261, 0xFFFFFD7FFC1FCE00, 131072, 0xEDE9A0000) = 131072
26085:  pread(261, 0xFFFFFD7FFC0EBE00, 131072, 0xEDE9C0000) = 131072
26085:  pread(261, 0xFFFFFD7FFBE0DE00, 131072, 0xEDE9E0000) = 131072
26085:  pread(261, 0xFFFFFD7FFBEADE00, 512, 0xEDD400000) = 512
26085:  pread(261, 0xFFFFFD7FFB9AE000, 130560, 0xEDD400200) = 130560
26085:  pread(261, 0xFFFFFD7FFBA4DE00, 131072, 0xEDD500000) = 131072
26085:  pread(261, 0xFFFFFD7FFBAADE00, 512, 0xEDD420000) = 512
26085:  pread(261, 0xFFFFFD7FFB9AE000, 130560, 0xEDD400200) = 130560
26085:  pread(261, 0xFFFFFD7FFBA4DE00, 131072, 0xEDD500000) = 131072
26085:  pread(261, 0xFFFFFD7FFBAADE00, 512, 0xEDD420000) = 512
26085:  pread(261, 0xFFFFFD7FFC53BE00, 16384, 0xEDDED4000) = 16384
bash-3.00$

As seen above, offsets starting with 0xEDE and 0xEDD5 are greater than our corrupted offset of 0xEDD4DFA00. So, They are out of the scope.

The followings should be examined:

  • 26085: pread(261, 0xFFFFFD7FFBEADE00, 512, 0xEDD400000) = 512
    • This is the ASM Extent Offset. In other words, it's the base offset. (0xEDD400000+512)<0xEDD4DFA00. So, it doesn't read the corrupted block.
  • 26085: pread(261, 0xFFFFFD7FFB9AE000, 130560, 0xEDD400200) = 130560
    • (0xEDD400200+130560)<0xEDD4DFA00. It doesn't read the corrupted block.
  • 26085: pread(261, 0xFFFFFD7FFBAADE00, 512, 0xEDD420000) = 512
    • (0xEDD420000+512)<0xEDD4DFA00. It doesn't read the corrupted block.
  • 26085: pread(261, 0xFFFFFD7FFB9AE000, 130560, 0xEDD400200) = 130560
    • Same as before.
  • 26085: pread(261, 0xFFFFFD7FFBAADE00, 512, 0xEDD420000) = 512
    • Same as before.

ARCH did not read the corrupted block#50941. But, it reported an error.

dd Output of the Corrupted Block:

ASM Corrupted Block Offset in 512 byte block: 63842417152/512=124692221

bash-3.00$ dd if=/u01/oradata/oravol2 bs=512 iseek=124692221 count=1|od -x

0000000 2201 0000 f0fd 0000 001b 0000 80d8 2304
                  <blockNo>
0000020 3838 322e 3731 312e 3431 7807 0a6c 111e
0000040 2230 3001 002c 0605 3131 3730 3130 3306

0x0000f0fd is not 50941. So, it's corrupted.

The reason why ARCH did not read this block is hidden in the error messages:

ORA-00353: log corruption near block 50941 change 9160702125 time 03/09/2009 1

It says near.

Comment by ubTools Support [ 10/Mar/09 06:16 AM ]
Finding the Other Corrupted Block:

dd Outputs on pread() of ARCH:

  • 26085: pread(261, 0xFFFFFD7FFBEADE00, 512, 0xEDD400000) = 512
    • Offset: 0xEDD400000 = 63841501184
    • Offset in 512 byte block: 63841501184/512=124690432
      bash-3.00$ dd if=/u01/oradata/oravol2 bs=512 iseek=124690432 count=1|od -x
      
      0000000 2201 0000 c000 0000 001b 0000 8000 621d
                        <blockNo>
      ...
      
  • 26085: pread(261, 0xFFFFFD7FFB9AE000, 130560, 0xEDD400200) = 130560
    • First Block Offset: 0xEDD400200 = 63841501696
    • First Block Offset in 512 byte block: 63841501696/512=124690433 (next block of previous block)
      bash-3.00$ dd if=/u01/oradata/oravol2 bs=512 iseek=124690433 count=1|od -x
      
      0000000 2201 0000 c001 0000 001b 0000 8124 5172
                        <blockNo>
      ..
      
    • Last Block Offset: 0xEDD400200 + 130560-512= 63841631744
    • First Block Offset in 512 byte block: 63841631744/512=124690687
      bash-3.00$ dd if=/u01/oradata/oravol2 bs=512 iseek=124690687 count=1|od -x
      
      0000000 2201 0000 c0ff 0000 001b 0000 8018 4635
                        <blockNo>
      ..
      
  • 26085: pread(261, 0xFFFFFD7FFBAADE00, 512, 0xEDD420000) = 512
    • Offset: 0xEDD420000 = 63841632256
    • Offset in 512 byte block: 63841632256/512 = 124690688
      bash-3.00$ dd if=/u01/oradata/oravol2 bs=512 iseek=124690688 count=1|od -x
      
      0000000 2201 0000 c800 0000 001b 0000 805c 2d48
                        <blockNo>
      ..
      

As seen above, the block numbers increase from 0xC000 to 0xC0FF. But, in the last call, it jumped to 0xC800.

truss Output of ARCH for block# 0xC800

26085:  pread(261, 0xFFFFFD7FFBAADE00, 512, 0xEDD420000) = 512
26085:    01 "\0\0\0C8\0\01B\0\0\0 \80 H -\00505 4 1 4 5 0\v 6 6 6 6 6 6 4
                 <blockNo>
26085:     1 4 5 00F 2 1 2 . 1 5 6 . 2 3 0 . 2 1 807 x l\n07\f %1F01 0 ,\0
26085:    0505 3 5 6 0 705 3 8 0 3 50E 8 8 . 2 4 1 . 1 3 6 . 2 2 007 x l\n
26085:    07\f %1F01 0 ,\00505 6 2 0 5 1\b a d a m k a c i0E 1 9 5 . 2 4 4
26085:     . 6 2 . 1 4 507 x l\n07\f % "01 0 ,\00505 6 2 0 5 1\b a d a m k
26085:     a c i\f 7 8 . 1 9 0 . 6 8 . 1 707 x l\n07\f % #01 0 ,\00502 - 1
26085:    05 K A Y A 20E 1 9 5 . 2 4 4 . 6 2 . 1 4 507 x l\n07\f % .02 - 2
26085:     ,\00502 - 105 K A Y A 20E 1 9 5 . 2 4 4 . 6 2 . 1 4 507 x l\n07
26085:    \f &0102 - 2 ,\00505 6 1 1 4 105 1 9 5 5 60E 1 9 5 . 2 4 4 . 6 2
26085:     . 1 4 707 x l\n07\f &\r01 0 ,\00505 6 1 1 4 105 1 9 5 5 6\f 8 8
26085:     . 2 3 4 . 5 . 2 3 107 x l\n07\f &0F01 0 ,\00502 - 105 K A Y A 2
26085:    0E 1 9 5 . 2 4 4 . 6 2 . 1 4 507 x l\n07\f &1002 - 2 ,\00506 1 1
26085:     1 0 1 605 O K A Y A\f 8 5 . 1 0 8 . 8 7 . 5 007 x l\n07\f & !01
26085:     0 ,\00502 - 105 K A Y A 20E 1 9 5 . 2 4 4 . 6 2 . 1 4 507 x l\n
26085:    07\f & "02 - 2 ,\00505 4 1 9 3 806 6 4 3 2 5 5\r 8 8 . 2 2 5 . 1
26085:     2 0 . 5 307 x l\n07\f & +01 0 ,\00505 5 3 0 5 506 0 9 1 2 1 90E

Then, the following messages were written to the trace file:

26085:  write(2, " * * *   2 0 0 9 - 0 3 -".., 27)      = 27
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, "  ", 1)                               = 1
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " C o r r u p t   r e d o".., 51)      = 51
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " F l a g :   0 x 3 0   F".., 80)      = 80
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " - - - - -   D u m p   o".., 39)      = 39
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 5 c 4 6 3 8 3 0 2 0 3 0".., 64)      = 64
                                   <blockNoPiece0>
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 3 0 3 0 4 3 3 c 5 c 3 0".., 64)      = 64
           <blockNoPiece1>
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 5 c 3 0 5 c 5 0 3 0 3 0".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 5 c 3 0 5 c 3 0 2 0 3 0".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 3 2 3 9 3 5 3 2 2 0 0 9".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 3 0 3 1 3 0 3 0 2 0 3 0".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 5 c 3 1 3 0 3 0 5 c 3 2".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 5 c 3 4 3 0 3 0 5 c 3 0".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 2 0 3 0 5 c 3 2 3 0 3 9".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 5 c 3 0 5 c 3 0 2 0 3 6".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 3 0 3 0 4 3 3 c 2 0 3 7".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 3 a 3 5 3 2 3 9 3 0 2 0".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 5 c 3 0 5 c 3 0 3 0 3 0".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 4 2 5 4 2 0 4 4 0 a 4 9".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 2 0 3 8 2 0 3 0 2 0 3 5".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " 3 0 3 5 3 0 3 6 5 c 3 8".., 64)      = 64
26085:  write(2, "\n", 1)                               = 1
26085:  write(2, " R e r e a d i n g   l o".., 78)      = 78
26085:  write(2, "\n", 1)                               = 1

Rereading the block fails like this.

There are 2 problems:

  • Redo block# jumped to 0xC800 from 0xC0FF. So, On-Disk image is corrupted.
  • On-Memory image of block is different than On-Disk image.
Comment by ubTools Support [ 10/Mar/09 10:19 AM ]
Checking missing IO of LGWR from truss Output :
bash-3.00$ grep Err lgwr.truss.log|grep pwrite
bash-3.00$ grep Err lgwr.truss.log|grep pread
bash-3.00$

No missing IO.

Checking IO buffers of LGWR:

fd#260 is /u01/oradata/oravol2 for LGWR.
Offset: 0xEDD420000.

The Last write to block:

25925: pwrite(260, 0x380D78400, 76288, 0xEDD420000) = 76288
25925: 01 "\0\0\0C8\0\01B\0\0\0 \80 H -\00505 4 1 4 5 0\v 6 6 6 6 6 6 4
                 <blockNo>
25925: 1 4 5 00F 2 1 2 . 1 5 6 . 2 3 0 . 2 1 807 x l\n07\f %1F01 0 ,\0
25925: 0505 3 5 6 0 705 3 8 0 3 50E 8 8 . 2 4 1 . 1 3 6 . 2 2 007 x l\n

As seen above, the contents of redo buffer is corrupted. The block number is 0xC800.

But, this LGWR had generated correct archivelog:

bash-3.00$ dd if=/u01/app/oracle/product/10.2.0/dbs/arch/1_25_681074311.dbf bs=512 skip=256 count=1|od -x
1+0 records in
1+0 records out
0000000 2201 0000 0100 0000 0019 0000 8000 d162
                  <blockNo>
0000020 3534 332e 2e33 3032 0733 6b78 0904 3c0c
0000040 0114 2c30 0500 3205 3031 3631 6905 6e69

0x0100 = 256, which is the correct block number.

Comment by ubTools Support [ 10/Mar/09 10:35 AM ]
Looks like a configuration issue or a bug in OS/STORAGE side.

This issue handles redo corruption only. But, the database encounters the corruptions on UNDO,INDEX,TABLE, CONTROL FILES, too. But, the root cause is same:
The On-Disk image of the block and its On-Memory image are not same.

Similar to QA-37.

This issue will be updated when a comment is sent by the OS vendor.

Comment by ubTools Support [ 10/Apr/09 01:13 PM ]
Operating System reinstalled by the vendor. Then problem has not occured.
Generated at Fri Apr 26 02:34:11 UTC 2024 using JIRA Standard Edition, Version: 3.12.3-#302.