Project

General

Profile

Bug #6729

incremental replication stream of a fs tree with lots of snapshots trips assert in zfs recv

Added by Lauri Tirkkonen about 5 years ago. Updated about 2 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
zfs - Zettabyte File System
Start date:
2016-03-07
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

The assertion in question is:

Assertion failed: ilen <= SPA_MAXBLOCKSIZE, file ../common/libzfs_sendrecv.c, line 1955, function recv_read

ilen comes from the dmu_replay_record_t's drr_payloadlen member:

> $c
libc.so.1`_lwp_kill+7(1, 6, 80467a8, fee91cdd, fef63000, 8046800)
libc.so.1`raise+0x22(6, 0, 80467e8, fee6edb9)
libc.so.1`abort+0xf3(8046800, 8046800, 6c, fed732cc, 65737341, 6f697472)
libc.so.1`_assert(fedabf16, fedabefa, 7a3, feda93ad)
libzfs.so.1`recv_read+0x3d(80a4548, 0, 80a8008, 1278128, 0, 8047380)
libzfs.so.1`recv_read_nvlist+0x4a(80a4548, 0, 1278128, 804721c, 0, 8047380)
libzfs.so.1`zfs_receive_package+0x100(80a4548, 0, 8047da0, 8047bdc, 80478d8, 8047380)
libzfs.so.1`zfs_receive_impl+0x527(80a4548, 8047da0, 0, 8047bdc, 0, 0)
libzfs.so.1`zfs_receive+0xc3(80a4548, 8047da0, 80a1f88, 8047bdc, 0, 0)
zfs_do_receive+0x3dd(3, 8047cac, 80768a0, 801, 0, 3)
main+0x22c(fee10180, fef6d6a8, 8047c9c, 80557f7, 4, 8047ca8)
_start+0x83(4, 8047d94, 8047d98, 8047d9d, 8047da0, 0)
> 80478d8::print dmu_replay_record_t
{
    drr_type = 0 (DRR_BEGIN)
    drr_payloadlen = 0x1278128
[ ... ]

zstreamdump reveals that all snapshot names of the source datasets are
present in the stream, which is what I suppose is making the record larger than
recv anticipates. Here's one way to repro: create a bunch of filesystems and
recursively snapshot them lots of times using long snapshot names:

# zfs create rpool/foo
# for i in $(seq 1 160); do zfs create rpool/foo/$i; done
# for i in $(seq 1 1000); do zfs snapshot -r rpool/foo@abcdefghijklmnopqrstvwxyz$i; done

Then generate an incremental replication stream between any two snapshots just
created, and attempt to receive it (doesn't matter where, and receive -n is
fine):

# zfs send -Ri abcdefghijklmnopqrstuvwxyz999 rpool/foo@abcdefghijklmnopqrstuvwxyz1000 > /var/tmp/foo.stream
# zfs recv -n rpool/foo < /var/tmp/foo.stream
Assertion failed: ilen <= SPA_MAXBLOCKSIZE, file ../common/libzfs_sendrecv.c, line 1955, function recv_read
zsh: IOT instruction (core dumped)  zfs recv -n rpool/foo < /var/tmp/foo.stream
#1

Updated by Chip Schweiss over 2 years ago

Anyone looking at this?

By taking advantage of Channel Programs, this bug is becoming more frequent.

Here's a core dump from my latest encounter:

ftp://ftp.nrg.wustl.edu/pub/zfs/core_zfs_recv.gz

#2

Updated by Jason King 5 months ago

It appears OpenZFS recently fixed this in commit 7a6c12fd6a756af5a2f664c0a6a292d22fbb2487:

Don't assert on nvlists larger than SPA_MAXBLOCKSIZE
Originally we asserted that all reads are less than SPA_MAXBLOCKSIZE
However, nvlists are not ZFS records, and are not limited to
SPA_MAXBLOCKSIZE.

Add a new environment variable, ZFS_SENDRECV_MAX_NVLIST, to allow the
user to specify the maximum size of the nvlist that can be sent or
received.
Default value: 4 * SPA_MAXBLOCKSIZE (64 MB)

Modify libzfs send routines to return a useful error if the send stream
will generate an nvlist that is beyond the maximum size.

Modify libzfs recv routines to add an explicit error message if the
nvlist is too large, rather than abort()ing.

Move the change the assert() to only trigger on data records

Reviewed-by: Paul Dagnelie <>
Reviewed-by: Kjeld Schouten <>
Reviewed-by: Brian Behlendorf <>
Reviewed-by: Matthew Ahrens <>
Signed-off-by: Allan Jude <>
Closes #9616

Looking at the change, it should be fairly straightforward to port this over.

#3

Updated by Electric Monk 4 months ago

  • Gerrit CR set to 1106
#4

Updated by Jason King 4 months ago

A followup change should also be included:

commit 908d43d0a9f736af62c0f4b179950bb1262dfd7d
Author: Allan Jude <allanjude@freebsd.org>
Date:   Fri Sep 18 13:23:29 2020 -0400

    libzfs: Don't leak buf if nvlist is too large

    Resolves FreeBSD Coverity defect:
    CID 1432398:  Resource leaks  (RESOURCE_LEAK)

    libzfs: don't leak hdl if there is an error reading env var

    Resolves FreeBSD Coverity defect:
    CID 1432395:  Resource leaks  (RESOURCE_LEAK)

    Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
    Signed-off-by: Allan Jude <allanjude@freebsd.org>
    Closes #10882
#5

Updated by Jason King 4 months ago

I tested this by running the zfs test suite. The only failures were tests with known failure.

#6

Updated by Jason King about 2 months ago

Additionally, I was able to recreate the assertion failure using the steps documented in the ticket. When booted on a BE with the change applied, the zfs recv command no longer fails, but succeeds:

root@pi:~# zpool create testpool c3t0d0
root@pi:~# zfs create testpool/foo
root@pi:~# for f in $(seq 1 160); do zfs create testpool/foo/$f; done
root@pi:~# for f in $(seq 1 1000); do zfs snapshot -r testpool/foo@abcdefghijklmnopqrstuvwxyz${f}; done
root@pi:~# zfs send -Ri abcdefghijklmnopqrstuvwxyz999 testpool/foo@abcdefghijklmnopqrstuvwxyz1000 > /ws/foo.zfs
root@pi:~# zfs recv -n testpool/foo < /ws/foo.zfs
Assertion failed: ilen <= SPA_MAXBLOCKSIZE, file ../common/libzfs_sendrecv.c, line 2201, function recv_read
Abort (core dumped)

On a BE w/ the change:

root@pi:/ws# zfs recv -n testpool/foo < foo.zfs
root@pi:/ws# echo $?
0

#7

Updated by Electric Monk about 2 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit c5286370b84c690a18e8100a5237a1000d7e29c6

commit  c5286370b84c690a18e8100a5237a1000d7e29c6
Author: Allan Jude <allan@klarasystems.com>
Date:   2021-02-19T15:11:37.000Z

    6729 incremental replication stream of a fs tree with lots of snapshots trips assert in zfs recv
    Portions contributed by: Jason King <jason.king@joyent.com>
    Reviewed by: Paul Dagnelie <pcd@delphix.com>
    Reviewed by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
    Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
    Reviewed by: Matthew Ahrens <mahrens@delphix.com>
    Reviewed by: Andy Fiddaman <andy@omnios.org>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Gordon Ross <gordon.w.ross@gmail.com>

Also available in: Atom PDF