Bug #11856
closedchown can trigger vmdump when running recentish zfs
100%
Description
SmartOS 20191002T172656Z
I was using this script to reset some ACLs that got messed up, I can repeatedly trigger the dump by running just the chown command at the beginning of the script. I was doing this inside a zone.
## archive SHARE=/archive # set owner /bin/chown -Rf nobody:media_rw ${SHARE} /bin/chmod 0070 ${SHARE} /bin/chmod g+s ${SHARE} /bin/find ${SHARE}/ -type d -mindepth 1 -maxdepth 3 \ -not -path "${SHARE}/.\$EXTEND*" \ -not -path "${SHARE}/.zfs*" \ -exec /bin/chmod g+s {} \+ # cleanup existing acl /bin/chmod A- ${SHARE} /bin/chmod A3- ${SHARE} /bin/chmod A2- ${SHARE} /bin/chmod A1- ${SHARE} # set base acl /bin/chmod A0=everyone@:------a-R-c--s:fd-----:allow ${SHARE} /bin/chmod A+owner@:------a-R-c--s:fd-----:allow ${SHARE} /bin/chmod A+group@:r-----a-R-c--s:f------:allow ${SHARE} /bin/chmod A+group@:r-x---a-R-c--s:-d-----:allow ${SHARE} # set extended acl /bin/chmod A+group:media_ro:r-----a-R-c--s:f------:allow ${SHARE} /bin/chmod A+group:media_ro:r-x---a-R-c--s:-d-----:allow ${SHARE} ## music for SHARE in /archive/movies /archive/music; do # cleanup existing acl /bin/chmod A- ${SHARE} /bin/chmod A3- ${SHARE} /bin/chmod A2- ${SHARE} /bin/chmod A1- ${SHARE} # set base acl /bin/chmod A0=everyone@:------a-R-c--s:fd-----:allow ${SHARE} /bin/chmod A+owner@:------a-R-c--s:fd-----:allow ${SHARE} /bin/chmod A+group@:rw-pdDaARWcCos:f------:allow ${SHARE} /bin/chmod A+group@:rwxpdDaARWcCos:-d-----:allow ${SHARE} # set extended acl /bin/chmod A+group:media_ro:r-----a-R-c--s:f------:allow ${SHARE} /bin/chmod A+group:media_ro:r-x---a-R-c--s:-d-----:allow ${SHARE} # propogate acl /bin/mkdir ${SHARE}/.zfs_acl_dir /bin/find ${SHARE}/ -type d -mindepth 1 \ -not -path "${SHARE}/.\$EXTEND*" \ -not -path "${SHARE}/.zfs*" \ -exec cpacl ${SHARE}/.zfs_acl_dir {} \+ /bin/rmdir ${SHARE}/.zfs_acl_dir /bin/touch ${SHARE}/.zfs_acl_file /bin/find ${SHARE}/ -type f -mindepth 1 \ -not -path "${SHARE}/.\$EXTEND*" \ -not -path "${SHARE}/.zfs*" \ -exec cpacl ${SHARE}/.zfs_acl_file {} \+ /bin/rm ${SHARE}/.zfs_acl_file done
The following is sufficient to trigger the dump:
/bin/chown -Rf nobody:media_rw /archive
It looks to be a null deref
savecore: 2019-10-20T20:29:25.766090+00:00 carbon savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf Page fault) rp=fffffe00f9157ec0 addr=1 occurred in module "<unknown>" due to a NULL pointer dereference System dump time: Sun Oct 20 20:19:06 2019
And the full stack I extracted, uploading the whole vmdump too but that will take about a full day. And might not be needed, I'm bad with mdb but I have the stack
[root@carbon /var/crash/volatile]# mdb -k unix.1 vmcore.1 Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba stmf_sbd stmf zfs mm sd lofs idm mpt_sas sata random cpc logindmux ptm sppp nfs ] > $c 1() sa_build_layouts+0x23d(fffffeb3ac722790, fffffeb31570b0c0, e, fffffeb3ac6e62c0) sa_modify_attrs+0x2e8(fffffeb3ac722790, 1, 3, 1, 8, 47) sa_attr_op+0xf3(fffffeb3ac722790, fffffe00f9158440, 8, 1, fffffeb3ac6e62c0) sa_bulk_update_impl+0x6d(fffffeb3ac722790, fffffe00f9158440, 8, fffffeb3ac6e62c0) sa_bulk_update+0x4d(fffffeb3ac722790, fffffe00f9158440, 8, fffffeb3ac6e62c0) zfs_setattr_dir+0x24f(fffffeb3ac71cb28) zfs_setattr+0x1ad6(fffffeb3a0e5b380, fffffe00f9158d90, 0, fffffeb3508420c8, 0) fop_setattr+0x91(fffffeb3a0e5b380, fffffe00f9158d90, 0, fffffeb3508420c8, 0) lo_setattr+0x1b(fffffeb3ac71ad40, fffffe00f9158d90, 0, fffffeb3508420c8, 0) fop_setattr+0x91(fffffeb3ac71ad40, fffffe00f9158d90, 0, fffffeb3508420c8, 0) fsetattrat+0x147(ffd19553, fed14042, 0, fffffe00f9158d90) fchownat+0xc8(ffd19553, fed14042, ea61, 8a3, 0) chown+0x1f(fed14042, ea61, 8a3) _sys_sysenter_post_swapgs+0x159()
Related issues
Updated by Jorge Schrauwen over 2 years ago
[root@carbon ~]# zpool get all archive NAME PROPERTY VALUE SOURCE archive size 43.5T - archive capacity 43% - archive altroot - default archive health ONLINE - archive guid 15638511114199084855 default archive version - default archive bootfs - default archive delegation on default archive autoreplace off default archive cachefile - default archive failmode wait default archive listsnapshots off default archive autoexpand on local archive dedupditto 0 default archive dedupratio 1.00x - archive free 24.6T - archive allocated 18.9T - archive readonly off - archive comment archival storage local archive expandsize - - archive freeing 0 default archive fragmentation 0% - archive leaked 0 default archive bootsize - default archive checkpoint - - archive multihost off default archive ashift 0 default archive autotrim off default archive feature@async_destroy enabled local archive feature@empty_bpobj active local archive feature@lz4_compress active local archive feature@multi_vdev_crash_dump enabled local archive feature@spacemap_histogram active local archive feature@enabled_txg active local archive feature@hole_birth active local archive feature@extensible_dataset active local archive feature@embedded_data active local archive feature@bookmarks enabled local archive feature@filesystem_limits enabled local archive feature@large_blocks active local archive feature@large_dnode enabled local archive feature@sha512 enabled local archive feature@skein enabled local archive feature@edonr enabled local archive feature@device_removal enabled local archive feature@obsolete_counts enabled local archive feature@zpool_checkpoint enabled local archive feature@spacemap_v2 active local archive feature@allocation_classes enabled local archive feature@resilver_defer disabled local archive feature@encryption disabled local archive feature@bookmark_v2 disabled local archive feature@userobj_accounting disabled local archive feature@project_quota disabled local archive feature@log_spacemap disabled local
The output from zpool get all to see all the feature flags.
Updated by Jorge Schrauwen over 2 years ago
Dump up on the usual spot, called vmdump.ISSUENUM.gz
Updated by Gergő Mihály Doma over 2 years ago
- Related to Bug #11479: zfs project support added
Updated by Gergő Mihály Doma over 2 years ago
It looks to be a null deref
No, it's a "Dereferencing Past-the-End Pointer" type of error.
The problem is in the zfs_setattr_dir() function:bulk[]
array can hold only 4 elements, but count
variable (which is later used as an index of bulk[]
) can be incremented beyond this limit (see the SA_ADD_BULK_ATTR() preprocessor macro) with every iteration of the while
loop (see line 2818), because count
never set back to 0.
- The less clever one - reset
count
to zero at the end of while loop:... 2885 next: 2886 if (zp) { 2887 VN_RELE(ZTOV(zp)); 2888 zp = NULL; 2889 zfs_dirent_unlock(dl); 2890 } 2891 zap_cursor_advance(&zc); 2892 count = 0; 2893 } ...
- Better solution - move the
count
variable definition into the scope of thewhile
loop:... 2811 znode_t *zp = NULL; 2812 dmu_tx_t *tx = NULL; 2813 sa_bulk_attr_t bulk[4]; 2814 int err; 2815 2816 zap_cursor_init(&zc, os, dzp->z_id); 2817 while ((err = zap_cursor_retrieve(&zc, &zap)) == 0) { 2818 int count = 0; 2819 2820 if (zap.za_integer_length != 8 || zap.za_num_integers != 1) { 2821 err = ENXIO; 2822 break; 2823 } ...
Of course, this is just a quick hack, and it would be good to improve the zfs tests for this case, use ASSERT() or VERIFY() to check boundaries, etc.
Updated by Gergő Mihály Doma over 2 years ago
Same issue in ZoL: #8601 - Avoid stack overwrite in zfs_setattr_dir() .
Updated by Jerry Jelinek over 2 years ago
- Assignee set to Jerry Jelinek
I see the ZoL fix has been merged. It's ZoL commit 8cb34421e. I'll work on getting it ported over.
Updated by Jerry Jelinek over 2 years ago
To test this, I did a zfs test run (which won't hit the bug, but ensures no regressions) and I manually did a recursive chown on a large/deep directory tree (a clone of illumos-gate) which ran fine without any panic.
Updated by Jerry Jelinek over 2 years ago
- Subject changed from chown can trigger vmdump when running recentish zfs to chown can trigger vmdump when running recentish zfs
Updated by Electric Monk over 2 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit 5a120e272991505eb171d0469f79d937cced483a
commit 5a120e272991505eb171d0469f79d937cced483a Author: Tim Chase <tim@chase2k.com> Date: 2019-10-25T21:43:34.000Z 11856 chown can trigger vmdump when running recentish zfs Portions contributed by: Jerry Jelinek <jerry.jelinek@joyent.com> Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed by: Chris Dunlop <chris@onthe.net.au> Reviewed by: Tom Caputi <tcaputi@datto.com> Reviewed by: Andy Stormont <AStormont@racktopsystems.com> Reviewed by: Andy Fiddaman <andy@omniosce.org> Reviewed by: Matthias Scheler <matthias.scheler@wdc.com> Reviewed by: Gergo Doma <domag02@gmail.com> Approved by: Dan McDonald <danmcd@joyent.com>
Updated by Jorge Schrauwen over 2 years ago
Did this make it into illumos-joyent yet?
I check the commit and it was, but when I build a PI from smartos-live/illumos-joyent from about 4 hours ago, I still trigger this crash.
[root@atlas ~]# bash /data/docs/acl/archive.sh panic[cpu7]/thread=fffffeb1f8bd47c0: BAD TRAP: type=e (#pf Page fault) rp=fffffe00f8325ec0 addr=1 occurred in module "<unknown>" due to a NULL pointer dereference chown: #pf Page fault Bad kernel fault at addr=0x1 pid=16756, pc=0x1, sp=0xfffffe00f8325fb8, eflags=0x10202 cr0: 80050033<pg,wp,ne,et,mp,pe> cr4: 1626f8<smep,osxsav,pcide,vmxe,xmme,fxsr,pge,mce,pae,pse,de> cr2: 1 cr3: 105efd7000 cr8: 0 rdi: fffffe00f8325fd0 rsi: fffffe00f8325fcc rdx: 10 rcx: 1 r8: 8 r9: fffffeb3565b6650 rax: 0 rbx: fffffeb3565b6650 rbp: fffffe00f8326030 r10: 0 r11: 10 r12: 0 r13: 10 r14: 1 r15: 8 fsb: 0 gsb: fffffeb1cdd09000 ds: 4b es: 4b fs: 0 gs: 1c3 trp: e err: 10 rip: 1 cs: 30 rfl: 10202 rsp: fffffe00f8325fb8 ss: 38 fffffe00f8325dc0 unix:die+c6 () fffffe00f8325eb0 unix:trap+11fd () fffffe00f8325ec0 unix:cmntrap+e6 () fffffe00f8326030 1 () fffffe00f8326160 zfs:sa_build_layouts+23d () fffffe00f8326260 zfs:sa_modify_attrs+2e8 () fffffe00f83262f0 zfs:sa_attr_op+f3 () fffffe00f8326360 zfs:sa_bulk_update_impl+6d () fffffe00f83263c0 zfs:sa_bulk_update+4d () fffffe00f8326640 zfs:zfs_setattr_dir+24f () fffffe00f8326bd0 zfs:zfs_setattr+1ad6 () fffffe00f8326c40 genunix:fop_setattr+91 () fffffe00f8326c80 lofs:lo_setattr+1b () fffffe00f8326cf0 genunix:fop_setattr+91 () fffffe00f8326d70 genunix:fsetattrat+147 () fffffe00f8326e80 genunix:fchownat+c8 () fffffe00f8326eb0 genunix:chown+1f () fffffe00f8326f10 unix:brand_sys_sysenter+1df () dumping to /dev/zvol/dsk/zones/dump, offset 65536, content: kernel
Updated by Jorge Schrauwen over 2 years ago
Doesn't look like the change is present yet in illumos-joyent, i'm a bit confused becasue it does look like they merged from illumos-gate since the commit went in.
Updated by Jorge Schrauwen over 2 years ago
After todays merge from upstream, the dump now no longer happens <3