Project

General

Profile

Actions

Bug #14131

open

zfs crypto calls kmem_alloc while holding ARC locks needed by pageout

Added by Alex Wilson over 1 year ago. Updated 4 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

Had a machine with encrypted ZFS datasets hit the pageout deadman:

pageout_deadman: stuck pushing the same page for 90 seconds (freemem is 196557)

The thread running pageout is stuck under zil_commit_waiter, and the txg_sync_thread is stuck waiting for an ARC buf hash lock:

> fffffbe361bd3c20::findstack -v
stack pointer for thread fffffbe361bd3c20 (pageout/1): fffffbe361bd3570
[ fffffbe361bd3570 _resume_from_idle+0x12b() ]
  fffffbe361bd35a0 swtch+0x133()
  fffffbe361bd35e0 cv_wait+0x68(fffffdf790ea6f48, fffffdf790ea6f50)
  fffffbe361bd3620 zil_commit_waiter+0x86(fffffdf4bed943c0, fffffdf790ea6f48)
  fffffbe361bd3650 zil_commit_impl+0x3c(fffffdf4bed943c0, 1)
  fffffbe361bd3690 zil_commit+0x3e(fffffdf4bed943c0, 1)
  fffffbe361bd3720 zvol_strategy+0x381(fffffe3a1f0a8e80)
  fffffbe361bd3750 bdev_strategy+0x64(fffffe3a1f0a8e80)
  fffffbe361bd37b0 spec_startio+0x8e(fffffdf4c2b22680, fffffbe228234f60, 1032075000, e000, 8500)
  fffffbe361bd3810 spec_pageio+0x3a(fffffdf4c2b22680, fffffbe228234f60, 1032075000, e000, 8500, fffffdf4496e7db0, 0)
  fffffbe361bd38a0 fop_pageio+0x5e(fffffdf4c2b22680, fffffbe228234f60, 1032075000, e000, 8500, fffffdf4496e7db0, 0)
  fffffbe361bd3980 swap_putapage+0x1f4(fffffdf4c231ee80, fffffbe228234f60, fffffbe361bd39b8, fffffbe361bd39c0, 8400, fffffdf4496e7db0)
  fffffbe361bd3a30 swap_putpage+0x2d1(fffffdf4c231ee80, 1fffffc457ff8000, 1000, 8400, fffffdf4496e7db0, 0)
  fffffbe361bd3ab0 fop_putpage+0x56(fffffdf4c231ee80, 1fffffc457ff8000, 1000, 8400, fffffdf4496e7db0, 0)
  fffffbe361bd3b10 pageout+0x27d()
  fffffbe361bd3b20 thread_start+0xb()

> fffffbe36252fc20::findstack -v
stack pointer for thread fffffbe36252fc20 (txg_sync_thread()): fffffbe36252f170
[ fffffbe36252f170 _resume_from_idle+0x12b() ]
  fffffbe36252f1a0 swtch+0x133()
  fffffbe36252f240 turnstile_block+0x25b(fffffdf794015520, 0, fffffffffbd8d670, fffffffffbc1b600, 0, 0)
  fffffbe36252f2b0 mutex_vector_enter+0x358(fffffffffbd8d670)
  fffffbe36252f350 arc_release+0x6b(fffffe22e4b5ea10, fffffe2040471c68)
  fffffbe36252f370 dbuf_release_bp+0x15(fffffe2040471c68)
  fffffbe36252f4c0 dbuf_write+0xab(fffffe257ac20480, fffffe22e4b5ea10, fffffeffc8742780)
  fffffbe36252f5a0 dbuf_sync_leaf+0x23b(fffffe257ac20480, fffffeffc8742780)
  fffffbe36252f5f0 dbuf_sync_list+0xb5(fffffe2566f70108, 0, fffffeffc8742780)
  fffffbe36252f640 dbuf_sync_indirect+0xb7(fffffe2566f70040, fffffeffc8742780)
  fffffbe36252f690 dbuf_sync_list+0x90(fffffeffadc3fe48, 1, fffffeffc8742780)
  fffffbe36252f6e0 dbuf_sync_indirect+0xb7(fffffeffadc3fd80, fffffeffc8742780)
  fffffbe36252f730 dbuf_sync_list+0x90(ffffff4959063408, 2, fffffeffc8742780)
  fffffbe36252f780 dbuf_sync_indirect+0xb7(ffffff4959063340, fffffeffc8742780)
  fffffbe36252f7d0 dbuf_sync_list+0x90(fffffdf4ec6126c0, 3, fffffeffc8742780)
  fffffbe36252f850 dnode_sync+0x327(fffffdf4ec612558, fffffeffc8742780)
  fffffbe36252f990 dmu_objset_sync+0x16f(fffffdf4bea82080, fffffdf7b669f110, fffffeffc8742780)
  fffffbe36252f9d0 dsl_pool_sync_mos+0x42(fffffdf4bedc8280, fffffeffc8742780)
  fffffbe36252fa80 dsl_pool_sync+0x3cb(fffffdf4bedc8280, 31c8373)
  fffffbe36252fb00 spa_sync_iterate_to_convergence+0xd0(fffffdf4bd143000, fffffe2c30954a40)
  fffffbe36252fb60 spa_sync+0x2f6(fffffdf4bd143000, 31c8373)
  fffffbe36252fc00 txg_sync_thread+0x1fd(fffffdf4bedc8280)
  fffffbe36252fc10 thread_start+0xb()

Another ZIO worker is stuck waiting on this same lock (fffffffffbd8d670):

> fffffbe360e54c20::findstack -v
stack pointer for thread fffffbe360e54c20 (zpool-rpool/123): fffffbe360e547b0
[ fffffbe360e547b0 _resume_from_idle+0x12b() ]
  fffffbe360e547e0 swtch+0x133()
  fffffbe360e54880 turnstile_block+0x25b(fffffdf794015520, 0, fffffffffbd8d670, fffffffffbc1b600, 0, 0)
  fffffbe360e548f0 mutex_vector_enter+0x358(fffffffffbd8d670)
  fffffbe360e54930 buf_hash_insert+0x4b(fffffe00dd16d3c8, fffffbe360e54950)
  fffffbe360e54990 arc_write_done+0xec(fffffea504aa8f20)
  fffffbe360e54a30 zio_done+0x402(fffffea504aa8f20)
  fffffbe360e54a60 zio_execute+0xa7(fffffea504aa8f20)
  fffffbe360e54b10 taskq_thread+0x2cd(fffffdf4bd2ce4d8)
  fffffbe360e54b20 thread_start+0xb()

Who's holding it?

> fffffffffbd8d670::mutex
            ADDR  TYPE             HELD MINSPL OLDSPL WAITERS
fffffffffbd8d670 adapt fffffbe364cc4c20      -      -     yes

> fffffbe364cc4c20::findstack -v
stack pointer for thread fffffbe364cc4c20 (receive_writer_thread()): fffffbe364cc3750
[ fffffbe364cc3750 _resume_from_idle+0x12b() ]
  fffffbe364cc3780 swtch+0x133()
  fffffbe364cc37c0 cv_wait+0x68(fffffffffbc4161c, fffffffffbc41610)
  fffffbe364cc3810 page_create_throttle+0x17c(1, 3)
  fffffbe364cc38f0 page_create_va+0x598(fffffffffbd08920, ffffffced511d000, 1000, 13, fffffbe364cc3900, ffffffced511d000)
  fffffbe364cc3980 segkmem_page_create+0x97(ffffffced511d000, 1000, 0, fffffffffbd08920)
  fffffbe364cc3a20 segkmem_xalloc+0x13f(fffffdf400e1f000, 0, 1000, 0, 0, fffffffffb8a84e0, fffffffffbd08920)
  fffffbe364cc3a90 segkmem_alloc_vn+0x3b(fffffdf400e1f000, 1000, 0, fffffffffbd08920)
  fffffbe364cc3ac0 segkmem_alloc+0x17(fffffdf400e1f000, 1000, 0)
  fffffbe364cc3bd0 vmem_xalloc+0x629(fffffdf400e20000, 1000, 1000, 0, 0, 0, 0, 0)
  fffffbe364cc3c40 vmem_alloc+0x190(fffffdf400e20000, 1000, 0)
  fffffbe364cc3cd0 kmem_slab_create+0x7c(fffffdf400f0a008, 0)
  fffffbe364cc3d30 kmem_slab_alloc+0x10b(fffffdf400f0a008, 0)
  fffffbe364cc3d90 kmem_cache_alloc+0x15b(fffffdf400f0a008, 0)
  fffffbe364cc3dd0 kmem_alloc+0x4b(dc0, 0)
  fffffbe364cc3e40 ccm_init_ctx+0x131(fffffbe364cc3ef0, fffffbe364cc4240, 0, 0, 10, fffffffff84fad30, fffffffff84fb050)
  fffffbe364cc3ed0 aes_common_init_ctx+0x2a0(fffffbe364cc3ef0, fffffe045f86fe00, fffffbe364cc4080, fffffe094ef54e88, 0, 0)
  fffffbe364cc4020 aes_decrypt_atomic+0x104(0, 0, fffffbe364cc4080, fffffe094ef54e88, fffffbe364cc4210, fffffbe364cc41e0, fffffe045f86fe00, 0)
  fffffbe364cc4190 crypto_decrypt+0x29b(fffffbe364cc41c8, fffffbe364cc4210, fffffe094ef54e88, fffffdf83e2f5b88, fffffbe364cc41e0, 0)
  fffffbe364cc4340 zio_do_crypt_uio+0x1ff(0, 5, fffffe094ef54e88, fffffdf83e2f5b88, fffffdf4f9447258, dc0, fffffbe364cc43f8, fffffbe364cc4428, fffffdf6e58d9000, fffffbe300002e20)
  fffffbe364cc44e0 zio_do_crypt_data+0x200(0, fffffe094ef54de0, a, 0, fffffdf4f9447250, fffffdf4f9447258, fffffdf4f9447264, ffffffff00004000, fffffdf59371a000, fffffdf5943f2000, fffffbe364cc4644)
  fffffbe364cc45e0 spa_do_crypt_abd+0x250(0, fffffdf4bd143000, fffffbe364cc4808, a, 0, 0, fffffdf4f9447250, fffffdf4f9447258, fffffdf4f9447264, fffffe0100004000, fffffe294e969b00, fffffe0ce4be7d40, fffffbe364cc4644)
  fffffbe364cc46a0 arc_hdr_decrypt+0xc6(fffffdf4f9447188, fffffdf4bd143000, fffffbe364cc4808)
  fffffbe364cc4710 arc_fill_hdr_crypt+0xb4(fffffdf4f9447188, fffffffffbd8d670, fffffdf4bd143000, fffffbe364cc4808, 0)
  fffffbe364cc4790 arc_buf_fill+0x1f5(fffffe05bf202cd8, fffffdf4bd143000, fffffbe364cc4808, 10)
  fffffbe364cc47e0 arc_untransform+0x36(fffffe05bf202cd8, fffffdf4bd143000, fffffbe364cc4808, 1)
  fffffbe364cc4870 dbuf_read_verify_dnode_crypt+0x105(fffffdf6bc403c48, 9)
  fffffbe364cc4930 dbuf_read_impl+0x41d(fffffdf6bc403c48, 0, 9)
  fffffbe364cc49d0 dbuf_read+0xc9(fffffdf6bc403c48, 0, 9)
  fffffbe364cc4a40 dmu_bonus_hold_by_dnode+0x8d(ffffffced84692b8, fffffffff7a6ad50, fffffbe364cc4a98, 1)
  fffffbe364cc4b80 receive_object+0x590(fffffbe3661925c0, ffffff497f9f2388, ffffff1be9a7ed00)
  fffffbe364cc4bc0 receive_process_record+0x86(fffffbe3661925c0, ffffff497f9f2380)
  fffffbe364cc4c00 receive_writer_thread+0x7d(fffffbe3661925c0)
  fffffbe364cc4c10 thread_start+0xb()

It seems that arc_fill_hdr_crypt takes an ARC hash_lock but then goes on to call into code that will eventually do a kmem_alloc. It's not marked for T_PUSHPAGE, so it can end up waiting for pageout in low memory conditions, but pageout needs this same lock.

Either this logic needs to mark its thread for PUSHPAGE, or it should be allocating all this state earlier before it takes the lock. If we go for the former approach, should we be marking everywhere that ARC takes a hash_lock?

Actions #1

Updated by Jason King 5 months ago

Part of this is that IIUC, one of the relevant specs calls for no output of AEAD mechanisms until all of the input has been validated. I'm not sure what the rationale behind the behavior was aside from forcing users to only be able to use the plaintext if it's been validated by not letting them 'see it'.

As a result, what's happening here is it's allocating memory to hold the entire plaintext before returning it to the caller.

If we can relax that behavior (which would be nice since currently if we try to say encrypt/decrypt a 1gb chunk of data with ads-{ccm,gcm} it means allocating another 1gb of kernel memory for the output), that should allow us to avoid the offending allocation.Given this is crypto related we obviously need to be very sure of the implications before doing so, but it might be good to get a conclusion on the record either way.

Actions #2

Updated by Bill Sommerfeld 4 months ago

I'm not sure what the rationale behind the behavior was aside from forcing users to only be able to use the plaintext if it's been validated by not letting them 'see it'.

It's also conservative crypto design for other reasons -- sloppiness around revealing information derived from failed validation has also enabled attacks on the underlying key management, algorithms, and/or modes used. For instance, revealing plaintext from a botched decryption might enable adaptive chosen ciphertext attacks (see https://en.wikipedia.org/wiki/Adaptive_chosen-ciphertext_attack). Plus the https://en.wikipedia.org/wiki/Padding_oracle_attack

Actions

Also available in: Atom PDF