Bug #7341


Pool halt due to checksum error after reboot

Added by Arne Jansen over 5 years ago. Updated over 5 years ago.

zfs - Zettabyte File System
Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:


We encountered a suspended pool after reboot.

Short summary: persistent l2arc provided an old checksum for a L0-space map block.

Blocked thread:

> ffffff007bf93c40::findstack -v
stack pointer for thread ffffff007bf93c40: ffffff007bf936c0
[ ffffff007bf936c0 _resume_from_idle+0xf4() ]
  ffffff007bf936f0 swtch+0x141()
  ffffff007bf93730 cv_wait+0x70()
  ffffff007bf93770 zio_wait+0x5b()
  ffffff007bf937e0 dbuf_read+0x2c0()
  ffffff007bf93820 dmu_buf_will_dirty+0x5c()
  ffffff007bf938d0 dmu_write+0xc1()
  ffffff007bf939a0 space_map_write+0x306()
  ffffff007bf93a40 metaslab_sync+0x10b()
  ffffff007bf93aa0 vdev_sync+0x7f()
  ffffff007bf93b70 spa_sync+0x2c3()
  ffffff007bf93c20 txg_sync_thread+0x21f()
  ffffff007bf93c30 thread_start+8()


> ffffff15b59b9c40::zio -r
ADDRESS                 TYPE  STAGE            WAITER           TIME_ELAPSED
ffffff15b59b9c40        NULL  DONE             ffffff007bf93c40 -            
 ffffff1521f00aa0       READ  FAILED           -                -            

The read failed due to a checksum error.
Expected checksum: 0x95506ab4f8, 0xd90d777c8bfd, 0xb94217234c8434, 0x74f15af293dd3aab
Actual checksum: 0x9601f5ebc2, 0xd9512bebc64e, 0xb94db62f79cd2b, 0x74f260b9c17568c5

The checksum algorithm used is Fletcher_4. In Fletcher_4, the first word is just the sum of all 32 bit words of the data block. You can see that the actual checksum is slightly larger (first word) than the expected one.
As space maps are mostly append-only, I removed the last 2 entries from the space map block and checksummed again. The checksum now matched the expected one.
Here is an excerpt from the space map:

action 1 syncpass 1 txg 22990083
offset a02bc49 type 1 run 6
offset a02bc6d type 1 run 6
offset a02f963 type 1 run 6
offset a02f96f type 1 run 24
offset a02f99f type 1 run 6
offset b9b0209 type 1 run 384
offset b9bdb89 type 1 run 384
offset b9bde89 type 1 run 2304
offset ba12309 type 1 run 1152
offset ba2e509 type 1 run 3456
offset ba2f409 type 1 run 2304
action 1 syncpass 2 txg 22995718
offset 43a1099 type 1 run 6
offset 43a10a5 type 1 run 6
offset 43a110b type 1 run 6
offset 43a1141 type 1 run 6
offset 6dbf64b type 1 run 6
offset 6ddcd69 type 1 run 12
action 1 syncpass 2 txg 22999492
offset c7aec8 type 1 run 3
offset 23e29e1 type 1 run 3
action 1 syncpass 1 txg 22999548
offset 105eecb type 1 run 6
offset 105eee3 type 1 run 768
offset 105f1e9 type 1 run 384
offset 105f36f type 1 run 384
offset 105f4f5 type 1 run 768
offset 105f7fb type 1 run 3846
action 1 syncpass 2 txg 22999548
offset 3a67427 type 1 run 3

So the L1-block (the blkptr-array) contains a checksum to an older version of the block. Interestingly enough, zdb doesn't see a checksum error. Dumping the L1-block with zdb gives a block that is identical to the one in memory, except that single checksum.

Our code base contains the persistent l2arc-patch, pulled from the nexenta repo shortly before it got upstreamed. The l2arc contains the L1-block, but not the L0-block. It looks like the old version of the L1-block somehow made it to disk.

To confirm that the problem is really related to persistent l2arc we removed and re-added the cache devices. With the next reboot, the problem was gone.

Another clue to the problem might be that the space map got modified twice during that txg, the space map has entries for syncpass 1 and 2.

Related issues

Related to illumos gate - Feature #3525: Persistent L2ARCClosedJason King2013-02-04


Also available in: Atom PDF