Bug #8493
closedkmem_move taskq appears to be inducing significant system latency
100%
Description
Note this is a change that logically combines Joyent's OS-4162 and OS-4949. The analysis from both tickets is included here.
One of the Manta storage systems in us-east got into a state today where the console hung after typing the username, ssh to the box was hanging, and sdc-oneachnode timed out trying to find the node. However, the system wasn't totally wedged: the marlin agent was making good forward progress (accepting tasks in as little as 80ms). I also had a shell on the system from which I was able to run several commands, including mpstat, which showed the box was largely idle:
[root@3d1e2fc2 (storage) ~]$ mpstat 1 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 7899 0 1628 3466 138 1048 17 132 619 0 5466 8 22 0 70 1 3786 0 1441 1246 44 941 13 96 610 0 3867 5 14 0 81 2 2318 0 910 908 25 653 9 70 525 0 2547 4 10 0 86 3 1106 0 739 886 42 800 9 55 512 0 2576 4 7 0 88 4 995 0 557 763 101 543 8 41 448 0 1874 3 6 0 91 5 836 0 610 841 101 760 8 41 488 0 2409 4 5 0 90 6 942 0 505 679 88 485 7 33 429 0 1722 3 5 0 92 7 731 0 557 815 106 721 8 36 466 0 2256 4 5 0 91 8 9648 0 1712 1295 88 835 12 71 505 0 6283 9 17 0 74 9 4117 0 1434 1174 103 692 9 59 554 0 3693 6 11 0 84 10 2465 0 861 843 83 480 6 51 486 0 2334 3 7 0 89 11 1009 0 537 2044 1202 734 10 69 652 0 2115 4 7 0 89 12 781 0 455 616 40 418 6 37 407 0 1438 3 4 0 93 13 664 0 472 875 195 648 7 39 452 0 1958 4 5 0 91 14 690 0 403 544 26 380 6 32 393 0 1285 2 4 0 93 15 571 0 469 1151 469 634 6 37 440 0 1824 4 5 0 91 16 717 0 416 486 24 372 7 31 391 0 1165 2 4 0 94 17 467 0 369 413 19 289 6 26 363 0 829 2 3 0 96 18 385 0 385 511 29 512 7 29 402 0 1351 2 3 0 94 19 507 0 323 414 16 368 7 24 361 0 988 2 3 0 95 20 463 0 430 588 47 605 7 29 407 0 1671 3 4 0 93 21 708 0 356 470 29 369 6 24 364 0 1208 2 3 0 95 22 601 0 477 615 35 637 7 29 409 0 1881 3 4 0 93 23 865 0 375 479 28 384 6 25 369 0 1326 2 3 0 94 24 547 0 346 487 20 375 5 34 400 0 1173 2 4 0 94 25 420 0 357 492 61 288 5 30 372 0 854 2 3 0 95 26 360 0 364 540 35 471 5 34 403 0 1241 3 4 0 94 27 426 0 292 441 16 307 5 30 358 0 891 2 3 0 95 28 401 0 365 609 35 583 6 34 416 0 1535 3 4 0 93 29 533 0 331 496 32 345 5 30 375 0 1041 2 4 0 94 30 496 0 480 639 36 614 6 34 422 0 1682 4 4 0 92 31 639 0 336 497 20 364 5 30 368 0 1158 2 4 0 94 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 2147 0 188 2421 164 338 0 58 88 0 938 1 11 0 88 1 2245 0 144 210 61 418 2 81 148 0 680 2 8 0 90 2 56 0 22 278 81 370 0 35 23 0 423 2 5 0 93 3 2346 0 20 175 24 259 2 58 90 0 829 5 9 0 86 4 1329 0 81 558 125 317 2 29 15 0 2014 1 5 0 93 5 535 0 34 393 60 251 2 47 110 0 361 1 8 0 92 6 302 0 184 670 50 195 0 27 36 0 740 1 9 0 89 7 947 0 128 194 50 200 3 50 195 1 452 2 14 0 84 8 10809 0 203 817 33 266 23 67 72 0 4475 31 15 0 53 9 4898 0 139 2471 88 263 29 57 230 0 4158 41 15 0 44 10 4048 0 149 299 46 147 16 38 139 0 2624 22 11 0 67 11 2291 0 96 1634 357 394 1 75 176 1 1789 4 11 0 85 12 707 0 32 160 62 244 0 41 31 0 448 2 3 0 95 13 47 0 0 483 197 187 1 38 130 1 300 4 6 0 90 14 46 0 2 137 43 273 0 27 19 0 543 2 1 0 97 15 35 0 11 228 109 207 0 33 311 0 254 2 7 0 90 16 0 0 31 2567 106 0 0 0 2007 0 0 0 100 0 0 17 544 0 1672 764 42 254 3 39 135 0 542 2 11 0 87 18 629 0 44 152 26 239 1 53 244 1 988 3 15 0 81 19 3438 0 122 107 22 145 0 31 44 1 1242 3 6 0 91 20 837 0 50 140 34 292 1 52 170 0 901 3 6 0 91 21 1504 0 43 237 19 187 0 18 20 0 1723 4 5 0 91 22 4205 0 100 144 38 336 3 46 39 0 1354 4 10 0 86 23 892 0 55 195 64 251 1 22 26 1 1293 1 3 0 96 24 25 0 0 120 45 141 0 26 34 0 132 1 2 0 97 25 22 0 2 100 23 102 0 22 86 0 77 0 5 0 94 26 0 0 0 73 20 70 0 16 63 0 19 0 2 0 98 27 4 0 6 100 23 150 0 28 82 0 59 0 5 0 95 28 2666 0 6755 130 24 100 7 21 38 0 897 6 11 0 84 29 28 0 1 108 33 146 0 28 129 0 131 0 3 0 96 30 5 0 0 466 30 112 1 25 96 0 45 0 4 0 96 31 9 0 1 99 18 90 0 20 80 0 40 0 5 0 95
and zpool iostat:
root@3d1e2fc2 (storage) ~]$ zpool iostat 1 capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- zones 75.9T 14.1T 16 713 643K 6.23M zones 75.9T 14.1T 29 14 199K 83.7K zones 75.9T 14.1T 1.22K 12 30.5M 120K zones 75.9T 14.1T 1.28K 17 32.0M 144K ^C
but when I went to run iostat(1M), that command hung, and I couldn't get a shell prompt back with any of CTRL-C, CTRL-Z, or CTRL-/:
[root@3d1e2fc2 (storage) ~]$ iostat 1 ^Z ^C ^C^C^C^Z^Z^Z^Z^Z ^\ ^\
But jobs were hung because zone resets weren't completing. Most of the 112 outstanding resets appeared to be stuck on the "zoneadm halt" step. Eventually we had to send an NMI to restore service.
My stuck iostat turned out to be here:
> ::pgrep iostat | ::walk thread | ::stacks THREAD STATE SOBJ COUNT ffffff43a719e760 SLEEP CV 1 swtch+0x141 cv_wait+0x70 ndi_devi_enter+0x7f devi_config_one+0x3be ndi_devi_config_one+0xa8 dv_find+0x233 devfs_lookup+0x37 fop_lookup+0xa2 lookuppnvp+0x1f6 prof_lookup_globaldev+0x105 prof_make_names+0x99 prof_filldir+0xe8 sdev_readdir+0xd8 fop_readdir+0x6b getdents64+0xf5 _sys_sysenter_post_swapgs+0x153
listing /dev inside the zone:
> ffffff43a719e760::findstack -v stack pointer for thread ffffff43a719e760: ffffff01fbd3f4b0 [ ffffff01fbd3f4b0 _resume_from_idle+0xf4() ] ffffff01fbd3f4e0 swtch+0x141() ffffff01fbd3f520 cv_wait+0x70(ffffff42e23136ac, ffffff42e23135c0) ffffff01fbd3f560 ndi_devi_enter+0x7f(ffffff42e2313558, ffffff01fbd3f5f8) ffffff01fbd3f660 devi_config_one+0x3be(ffffff42e2313558, ffffff01fbd3f960, ffffff01fbd3f738, 4000, 0) ffffff01fbd3f6d0 ndi_devi_config_one+0xa8(ffffff42e2313558, ffffff01fbd3f960, ffffff01fbd3f738, 4000) ffffff01fbd3f7b0 dv_find+0x233(ffffff42eca24d18, ffffff01fbd3f960, ffffff01fbd3faa8 , ffffff01fbd3fbc0, ffffff42e2b61840, ffffff42e2691db0, ffffff0100000000) ffffff01fbd3f800 devfs_lookup+0x37(ffffff42ece1be80, ffffff01fbd3f960, ffffff01fbd3faa8, ffffff01fbd3fbc0, 0, ffffff42e2b61840, ffffff42e2691db0, 0, 0, 0 ) ffffff01fbd3f8b0 fop_lookup+0xa2(ffffff42ece1be80, ffffff01fbd3f960, ffffff01fbd3faa8, ffffff01fbd3fbc0, 0, ffffff42e2b61840, ffffff42e2691db0, 0, 0, 0 ) ffffff01fbd3fb10 lookuppnvp+0x1f6(ffffff01fbd3fbc0, 0, 1, 0, ffffff01fbd3fbf0, ffffff42e2b61840, ffffff42e2b61440, ffffff42e2691db0) ffffff01fbd3fc50 prof_lookup_globaldev+0x105(ffffff43c793f398, ffffff42eca23dd8, ffffff43a2274e58, ffffff43a2274e58) ffffff01fbd3fcb0 prof_make_names+0x99(ffffff43c793f398) ffffff01fbd3fcf0 prof_filldir+0xe8(ffffff43c793f398) ffffff01fbd3fd60 sdev_readdir+0xd8(ffffff43cb24d080, ffffff01fbd3fe10, ffffff47e3216120, ffffff01fbd3fe5c, 0, 0) ffffff01fbd3fdf0 fop_readdir+0x6b(ffffff43cb24d080, ffffff01fbd3fe10, ffffff47e3216120, ffffff01fbd3fe5c, 0, 0) ffffff01fbd3feb0 getdents64+0xf5(4, fedf4000, 2000) ffffff01fbd3ff10 _sys_sysenter_post_swapgs+0x153() > ffffff43cb24d080::print vnode_t v_path v_path = 0xffffff43aa0a1210 "/zones/3d1e2fc2-5b9b-4c03-9560-821845899d38/root/dev"
That's blocked on this thread:
> ffffff42e2313558::print struct dev_info devi_flags devi_flags = 0x3 > ffffff42e2313558::print struct dev_info devi_busy_thread devi_busy_thread = 0xffffff01eb66cc40 > 0xffffff01eb66cc40::findstack stack pointer for thread ffffff01eb66cc40: ffffff01eb66c7d0 [ ffffff01eb66c7d0 _resume_from_idle+0xf4() ] ffffff01eb66c800 swtch+0x141() ffffff01eb66c840 cv_wait+0x70() ffffff01eb66c880 taskq_wait+0x43() ffffff01eb66c8c0 kmem_cache_destroy+0x64() ffffff01eb66c8d0 aggr_port_fini+0x10() ffffff01eb66c8f0 aggr_detach+0x48() ffffff01eb66c960 devi_detach+0xa7() ffffff01eb66c9a0 detach_node+0x6c() ffffff01eb66ca10 i_ndi_unconfig_node+0xa2() ffffff01eb66ca40 i_ddi_detachchild+0x33() ffffff01eb66cab0 devi_detach_node+0x81() ffffff01eb66cb40 unconfig_immediate_children+0x18e() ffffff01eb66cbd0 devi_unconfig_common+0x107() ffffff01eb66cc20 mt_config_thread+0x167() ffffff01eb66cc30 thread_start+8()
which is blocked on this taskq:
> 0xffffff01eb66cc40::findstack -v stack pointer for thread ffffff01eb66cc40: ffffff01eb66c7d0 [ ffffff01eb66c7d0 _resume_from_idle+0xf4() ] ffffff01eb66c800 swtch+0x141() ffffff01eb66c840 cv_wait+0x70(ffffff42e2686e72, ffffff42e2686e60) ffffff01eb66c880 taskq_wait+0x43(ffffff42e2686e40) ffffff01eb66c8c0 kmem_cache_destroy+0x64(ffffff45c58cc008) ffffff01eb66c8d0 aggr_port_fini+0x10() ffffff01eb66c8f0 aggr_detach+0x48(ffffff42e2aeb560, 0) ffffff01eb66c960 devi_detach+0xa7(ffffff42e2aeb560, 0) ffffff01eb66c9a0 detach_node+0x6c(ffffff42e2aeb560, 2000) ffffff01eb66ca10 i_ndi_unconfig_node+0xa2(ffffff42e2aeb560, 4, 2000) ffffff01eb66ca40 i_ddi_detachchild+0x33(ffffff42e2aeb560, 2000) ffffff01eb66cab0 devi_detach_node+0x81(ffffff42e2aeb560, 2000) ffffff01eb66cb40 unconfig_immediate_children+0x18e(ffffff42e2313558, 0, 2000, ffffffff) ffffff01eb66cbd0 devi_unconfig_common+0x107(ffffff42e2313558, 0, 2000, ffffffff, 0 ) ffffff01eb66cc20 mt_config_thread+0x167(ffffff44642f99d8) ffffff01eb66cc30 thread_start+8() > ffffff42e2686e40::taskq -t ADDR NAME ACT/THDS Q'ED MAXQ INST ffffff42e2686e40 kmem_move_taskq 1/ 1 3816702 4858690 0 THREAD STATE SOBJ COUNT ffffff01e8011c40 ONPROC <NONE> 1 0xf5 kmem_hunt_mag+0x40 kmem_hunt_mags+0x95 kmem_move_buffer+0x1ae taskq_thread+0x2d0 thread_start+8
For reference, there are 107 "zoneadm halt" processes:
> ::pgrep -x zoneadm | ::print proc_t p_user.u_psargs ! grep -c 'halt"' 107
the other four are booting. They presumably correspond to these stacks:
> ::pgrep -x zoneadm | ::walk thread | ::stacks THREAD STATE SOBJ COUNT ffffff42ecf61000 SLEEP SHUTTLE 107 swtch_to+0xb6 shuttle_resume+0x2af door_call+0x336 doorfs32+0xa7 _sys_sysenter_post_swapgs+0x153 ffffff43bd130760 SLEEP CV 4 swtch+0x141 cv_wait+0x70 ndi_devi_enter+0x7f devi_config_one+0x3be ndi_devi_config_one+0xa8 dv_find+0x233 devfs_lookup+0x37 fop_lookup+0xa2 lookuppnvp+0x1f6 lookuppnatcred+0x15e lookupnameatcred+0xad lookupnameat+0x39 ldi_vp_from_name+0x12f ldi_open_by_name+0x6d ip_create_helper_stream+0x88 udp_ioctl+0xd2 so_ioctl+0x1a9 socket_ioctl+0x27 socket_vop_ioctl+0x29 fop_ioctl+0x55 ioctl+0x9b _sys_sysenter_post_swapgs+0x153
Let's take a bit of a deeper look at everything:
We have a taskq that things are getting backed up behind:
> ffffff42e2686e40::taskq ADDR NAME ACT/THDS Q'ED MAXQ INST ffffff42e2686e40 kmem_move_taskq 1/ 1 3816702 4858690 0
The reason that we're getting backed up behind them is because of
kmem_cache_destroy() doing a taskq_wait(). That taskq only has one
thread, let's find it:
> ffffff42e2686e40::print taskq_t tq_nthreads tq_nthreads = 0x1 > ffffff42e2686e40::print taskq_t tq_thr tq_thr = { tq_thr._tq_thread = 0xffffff01e8011c40 tq_thr._tq_threadlist = 0xffffff01e8011c40 } > 0xffffff01e8011c40::findstack -v stack pointer for thread ffffff01e8011c40: ffffff01e800d860 ffffff01e8011950 0xf5() ffffff01e8011aa0 kmem_hunt_mag+0x40(ffffff42ed500008, fffffff9fa14f300, 5f, ffffff481427c280, ffffff60974c32e0) ffffff01e8011b10 kmem_hunt_mags+0x95(ffffff42ed500008, ffffff481427c280) ffffff01e8011b60 kmem_move_buffer+0x1ae(ffffffe679ce3920) ffffff01e8011c20 taskq_thread+0x2d0(ffffff42e2686e40) ffffff01e8011c30 thread_start+8() > ::cpuinfo ! grep ffffff01e8011c40 28 ffffff430d546580 1f 0 0 60 no no t-3176 ffffff01e8011c40 sched > ffffff42ed500008::print kmem_cache_t cache_name cache_name = [ "dnode_t" ] > ffffff42ed500008::kmem_cache ADDR NAME FLAG CFLAG BUFSIZE BUFTOTL ffffff42ed500008 dnode_t 1200 000000 640 22732050
And wow, does that cache have a lot of buffers.
So, let's dig into a bit more and figure out what it means to hunt all the magazines for a given buffer. Let's take a look at what this looks like. Let's start with the number of full magazines:
> ffffff42ed500008::print kmem_cache_t cache_full.ml_list | ::list kmem_magazine_t mag_next ! cat > /files rmustacc@manta # wc -l files 76379 files > ffffff42ed500008::print kmem_cache_t cache_magtype->mt_magsize cache_magtype->mt_magsize = 0x5f > 0x5f * 0t76379=D 7256005
So, we have over 7 million entries here. We also need to look at all of the per-CPU magazines. However, they have at most 2 * CPUS magazines to worry about, so a drop in the bucket really.
What we don't know is how often we have to do this. This appears to happen each time that a move callback returns KMEM_CBRC_DONT_KNOW. There are a bunch of reasons that might cause that, but it's not clear exactly why that happens. We should be able to put together a D script that proves a bit of this theory. But if so, then we're going to have to better understand this.
Looking at a live system, we were able to gather a bunch of interesting data with DTrace:
[rmustacchi@MS10210 (us-east-1) ~]$ dtrace -n 'fbt::kmem_hunt_mag:entry{ self->t = timestamp; }' -n 'fbt::kmem_hunt_mag:return/self->t/{ @ = quantize(timestamp - self->t); self->t = 0; }' -n 'tick-30s { printa(@); exit(0); }' dtrace: description 'fbt::kmem_hunt_mag:entry' matched 1 probe dtrace: description 'fbt::kmem_hunt_mag:return' matched 1 probe dtrace: description 'tick-30s ' matched 1 probe ^[CPU ID FUNCTION:NAME 1 10924 :tick-30s value ------------- Distribution ------------- count 256 | 0 512 |@@@@@@@@@@@@@@@@@@@@@@ 7292794 1024 |@@@@@@@@@@@@@@@@@ 5691453 2048 |@ 269332 4096 | 1035 8192 | 1163 16384 | 2773 32768 | 80 65536 | 61 131072 | 9 262144 | 2 524288 | 0 1048576 | 1 2097152 | 0 [rmustacchi@MS10210 (us-east-1) ~]$ dtrace -n 'fbt::dnode_move:return,fbt::zfs_znode_move:return{ @[probefunc,arg1] = count(); }' -n 'tick-10s{ printa(@); exit(0); }' dtrace: description 'fbt::dnode_move:return,fbt::zfs_znode_move:return' matched 2 probes dtrace: description 'tick-10s' matched 1 probe CPU ID FUNCTION:NAME 7 78436 :tick-10s dnode_move 0 138 dnode_move 4 604 [rmustacchi@MS10210 (us-east-1) ~]$ dtrace -n 'fbt::kmem_hunt_mags:entry{ self->t = timestamp; }' -n 'fbt::kmem_hunt_mags:return/self->t/{ @ = quantize(timestamp - self->t); self->t = 0; }' -n 'tick-30s { printa(@); exit(0); }' dtrace: description 'fbt::kmem_hunt_mags:entry' matched 1 probe dtrace: description 'fbt::kmem_hunt_mags:return' matched 1 probe dtrace: description 'tick-30s ' matched 1 probe CPU ID FUNCTION:NAME 8 10924 :tick-30s value ------------- Distribution ------------- count 4096 | 0 8192 | 1 16384 | 0 32768 | 3 65536 | 0 131072 | 4 262144 | 17 524288 |@ 23 1048576 |@@ 55 2097152 |@@ 73 4194304 |@@@@ 139 8388608 |@@@@@@@@@ 308 16777216 |@@@@@@@@@@@@@ 446 33554432 |@@@@@@@@@ 318 67108864 | 0 $ dtrace -n 'fbt::kmem_move_buffer:entry{ self->t = timestamp; }' -n 'fbt::kmem_hunt_mags:return/self->t/{ @ = quantize(timestamp - self->t); self->t = 0; }' -n 'tick-30s { printa(@); exit(0); }' -n 'fbt::dnode_move:return,fbt::zfs_znode_move:return{ @counts[probefunc,arg1] = count(); }' dtrace: description 'fbt::kmem_move_buffer:entry' matched 1 probe dtrace: description 'fbt::kmem_hunt_mags:return' matched 1 probe dtrace: description 'tick-30s ' matched 1 probe dtrace: description 'fbt::dnode_move:return,fbt::zfs_znode_move:return' matched 2 probes CPU ID FUNCTION:NAME 8 10924 :tick-30s value ------------- Distribution ------------- count 32768 | 0 65536 | 2 131072 | 5 262144 | 12 524288 |@ 35 1048576 |@@ 57 2097152 |@@ 79 4194304 |@@@ 123 8388608 |@@@@@@@@@ 318 16777216 |@@@@@@@@@@@@@ 463 33554432 |@@@@@@@@@ 312 67108864 | 0 dnode_move 0 248 dnode_move 4 1407 [rmustacchi@MS10210 (us-east-1) ~]$ pfexec dtrace -n 'fbt::kmem_move_buffer:entry{ self->t = timestamp; }' -n 'fbt::kmem_hunt_mags:return/self->t/{ @ = quantize(timestamp - self->t); self->t = 0; }' -n 'tick-30s { printa(@); exit(0); }' -n 'fbt::dnode_move:return,fbt::zfs_znode_move:return{ @counts[probefunc,arg1] = count(); }' dtrace: description 'fbt::kmem_move_buffer:entry' matched 1 probe dtrace: description 'fbt::kmem_hunt_mags:return' matched 1 probe dtrace: description 'tick-30s ' matched 1 probe dtrace: description 'fbt::dnode_move:return,fbt::zfs_znode_move:return' matched 2 probes CPU ID FUNCTION:NAME 8 10924 :tick-30s value ------------- Distribution ------------- count 16384 | 0 32768 | 1 65536 | 5 131072 | 6 262144 |@ 20 524288 |@ 29 1048576 |@@ 61 2097152 |@@@ 96 4194304 |@@@ 121 8388608 |@@@@@@@@@@ 352 16777216 |@@@@@@@@@@@@ 444 33554432 |@@@@@@@@@ 310 67108864 | 0 dnode_move 0 221 dnode_move 4 1446 [rmustacchi@MS10210 (us-east-1) ~]$ dtrace -q -n 'fbt::kmem_move_buffer:return{ @["proc"] = count(); }' -n 'fbt::kmem_move_begin:entry{ self->t = 1; }' -n 'fbt::kmem_move_begin:return{ self->t = 0; }' -n 'fbt::taskq_dispatch:return/self->t && arg1 == 0/{ @["queue"] = count(); }' -n 'tick-1s{ printa(@); trunc(@); }' proc 58 proc 50 proc 70 proc 49 proc 56 proc 52 proc 52 proc 54 proc 56 proc 46 proc 65 proc 55 proc 60 proc 59 proc 48 proc 63 proc 59 proc 57 proc 46 proc 56 proc 55 proc 76 proc 72 proc 73 proc 78 proc 65 proc 54
Based on this data we can learn several things. First and foremost is that we seem to be making forward progress, albeit it at a slow rate. We see some latency outliers in terms of how long things are taking, though this may be accounted for by interrupts. Based on this, it seems that the real problem is that we're blocking up everything on work that is progress, but unfortunately scales with memory.
We've had a bunch of different issues with Manta in us-east that I increasingly think are falling out from latency induced by the kmem_move taskq, though the data is still very circumstantial. More investigation is needed.
In the past few weeks, we've seen a bunch of the following events:
- Manta jobs experiencing upload/download failures reported as either timeouts or "socket hang up". We know from experience that this can result from a couple of minutes of idleness on the client socket, which in turn can result from shrimps being unresponsive.
- Marlin jobsupervisors reporting agent timeouts. When we go examine the agents (usually a few minutes later), they seem to be running fine.
- Lackey timeouts resulting in zones becoming disabled.
In all of the cases for which we have data on taskq length and have checked it (which is probably a modest percentage), these events were strongly correlated with kmem_move taskq activity.
I believe this is having significant impact on our ability to complete daily metering, as these spurious agent timeouts trigger retries of multi-hour tasks, causing metering jobs to take a very long time.
We hit this massively again this morning. In the crash dump, we have 233 threads with the same stack, attempting to open a file, blocked on a lock. If we follow the lock holders, we wind up at a thread that's holding a lock (that ultimately hundreds of other threads are blocked on) that's doing a cv_wait() for ARC eviction:
[root@RM08218 (us-east-3) /var/crash/volatile]# mdb 13 mdb: warning: dump is from SunOS 5.11 joyent_20150729T153032Z; dcmds and macros may not match kernel implementation Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba stmf_sbd stmf zfs mm sd lofs idm mpt_sas crypto random cpc logindmux ptm kvm sppp nsmb smbsrv nfs ipc ] > ::status debugging crash dump vmcore.13 (64-bit) from RM08218 operating system: 5.11 joyent_20150729T153032Z (i86pc) image uuid: (not set) panic message: BAD TRAP: type=6 (#ud Invalid opcode) rp=ffffff01e80dda00 addr=f dump content: kernel pages only > ::stacks -m zfs ! head -80 THREAD STATE SOBJ COUNT ffffff43ed87c4c0 SLEEP MUTEX 233 swtch+0x141 turnstile_block+0x21a mutex_vector_enter+0x3a3 zfs_zget+0x51 zfs_root+0x78 fshead_root+0x9e fsop_root+0x2d traverse+0x87 lookuppnvp+0x22b lookuppnatcred+0x15e lookupnameatcred+0xad lookupnameat+0x39 vn_openat+0x315 copen+0x1fc openat64+0x2a open64+0x25 _sys_sysenter_post_swapgs+0x153 ffffff76d86a3520 SLEEP MUTEX 58 swtch+0x141 turnstile_block+0x21a mutex_vector_enter+0x3a3 zfs_zget+0x51 zfs_root+0x78 fsop_root+0x2d traverse+0x87 lookuppnvp+0x22b lookuppnatcred+0x15e lookupnameatcred+0xad lookupnameat+0x39 cstatat_getvp+0x107 cstatat+0x6f fstatat+0x42 stat+0x22 ffffff4442f797e0 SLEEP CV 35 swtch+0x141 cv_wait+0x70 txg_wait_open+0x83 dmu_tx_wait+0x16f dmu_tx_assign+0x55 zfs_write+0x4a6 fop_write+0x8b write+0x250 write32+0x1e _sys_sysenter_post_swapgs+0x153 ffffff44694fd4e0 SLEEP RWLOCK 8 swtch+0x141 turnstile_block+0x21a rw_enter_sleep+0x236 dmu_zfetch_find+0x56 dmu_zfetch+0xe0 dbuf_read+0x309 dnode_hold_impl+0xde dnode_hold+0x28 dmu_read+0x53 zfs_fillpage+0x106 zfs_getpage+0x227 fop_getpage+0x7e segvn_fault+0xdfa as_fault+0x37a pagefault+0x96 trap+0x30c ffffff45ad4534a0 SLEEP CV 7 swtch+0x141 cv_wait+0x70 txg_wait_synced+0x83 zil_create+0x1c5 zil_commit_writer+0x1a8 zil_commit+0x9e zfs_fsync+0xba fop_fsync+0x44 fifo_fsync+0x8d fifo_inactive+0x90 fop_inactive+0x76 vn_rele+0x82 > ffffff43ed87c4c0::findstack -v stack pointer for thread ffffff43ed87c4c0: ffffff01eb3493c0 [ ffffff01eb3493c0 _resume_from_idle+0xf4() ] ffffff01eb3493f0 swtch+0x141() ffffff01eb349490 turnstile_block+0x21a(ffffff43be94e198, 0, ffffff431e6ffca8, fffffffffbc07aa0, 0, 0) ffffff01eb349500 mutex_vector_enter+0x3a3(ffffff431e6ffca8) ffffff01eb3495a0 zfs_zget+0x51(ffffff431e6ff800, 4, ffffff01eb3495c8) ffffff01eb349600 zfs_root+0x78(ffffff433a1411d0, ffffff01eb3496b8) ffffff01eb349660 fshead_root+0x9e(ffffff433a1411d0, ffffff01eb3496b8) ffffff01eb3496a0 fsop_root+0x2d(ffffff433a1411d0, ffffff01eb3496b8) ffffff01eb3496f0 traverse+0x87(ffffff01eb3498d8) ffffff01eb349940 lookuppnvp+0x22b(ffffff01eb349a60, 0, 1, 0, ffffff01eb349c78, ffffff42e2b70840, ffffff42e2b70840, ffffff61b99444c0) ffffff01eb3499e0 lookuppnatcred+0x15e(ffffff01eb349a60, 0, 1, 0, ffffff01eb349c78, 0, ffffff61b99444c0) ffffff01eb349ae0 lookupnameatcred+0xad(8d89d88, 0, 1, 0, ffffff01eb349c78, 0, ffffff61b99444c0) ffffff01eb349b30 lookupnameat+0x39(8d89d88, 0, 1, 0, ffffff01eb349c78, 0) ffffff01eb349ce0 vn_openat+0x315(8d89d88, 0, 2001, 1b6, ffffff01eb349df0, 0, ffffff0100000012, 0, ffffff010000000c) ffffff01eb349e50 copen+0x1fc(ffd19553, 8d89d88, 2001, 1b6) ffffff01eb349e80 openat64+0x2a(ffd19553, 8d89d88, 0, 1b6) ffffff01eb349eb0 open64+0x25(8d89d88, 0, 1b6) ffffff01eb349f10 _sys_sysenter_post_swapgs+0x153() > ffffff431e6ffca8::mutex ADDR TYPE HELD MINSPL OLDSPL WAITERS ffffff431e6ffca8 adapt ffffff449dcf6120 - - yes > ffffff449dcf6120::findstack -v stack pointer for thread ffffff449dcf6120: ffffff020172c0a0 [ ffffff020172c0a0 _resume_from_idle+0xf4() ] ffffff020172c0d0 swtch+0x141() ffffff020172c170 turnstile_block+0x21a(0, 0, ffffff7baacf3c30, fffffffffbc07aa0, 0 , 0) ffffff020172c1e0 mutex_vector_enter+0x3a3(ffffff7baacf3c30) ffffff020172c250 dmu_zfetch_find+0x128(ffffffe4a773cab0, ffffff020172c270, 1) ffffff020172c320 dmu_zfetch+0xe0(ffffffe4a773cab0, 0, 4000, 1) ffffff020172c390 dbuf_read+0x309(ffffffdf01430130, 0, 2) ffffff020172c440 dnode_hold_impl+0xde(ffffff58b741a180, 4, 1, fffffffff7d7acc1, ffffff020172c488) ffffff020172c470 dnode_hold+0x28(ffffff58b741a180, 4, fffffffff7d7acc1, ffffff020172c488) ffffff020172c4d0 dmu_bonus_hold+0x37(ffffff58b741a180, 4, 0, ffffff020172c548) ffffff020172c500 sa_buf_hold+0x1d(ffffff58b741a180, 4, 0, ffffff020172c548) ffffff020172c5a0 zfs_zget+0x64(ffffff431e6ff800, 4, ffffff020172c5c8) ffffff020172c600 zfs_root+0x78(ffffff433a1411d0, ffffff020172c6b8) ffffff020172c660 fshead_root+0x9e(ffffff433a1411d0, ffffff020172c6b8) ffffff020172c6a0 fsop_root+0x2d(ffffff433a1411d0, ffffff020172c6b8) ffffff020172c6f0 traverse+0x87(ffffff020172c8d8) ffffff020172c940 lookuppnvp+0x22b(ffffff020172ca60, 0, 1, 0, ffffff020172cc78, ffffff42e2b70840, ffffff42e2b70840, ffffff612163c018) ffffff020172c9e0 lookuppnatcred+0x15e(ffffff020172ca60, 0, 1, 0, ffffff020172cc78, 0, ffffff612163c018) ffffff020172cae0 lookupnameatcred+0xad(8d89208, 0, 1, 0, ffffff020172cc78, 0, ffffff612163c018) ffffff020172cb30 lookupnameat+0x39(8d89208, 0, 1, 0, ffffff020172cc78, 0) ffffff020172cce0 vn_openat+0x315(8d89208, 0, 2001, 1b6, ffffff020172cdf0, 0, ffffff0200000012, 0, ffffff0200000010) ffffff020172ce50 copen+0x1fc(ffd19553, 8d89208, 2001, 1b6) ffffff020172ce80 openat64+0x2a(ffd19553, 8d89208, 0, 1b6) ffffff020172ceb0 open64+0x25(8d89208, 0, 1b6) ffffff020172cf10 _sys_sysenter_post_swapgs+0x153() > ffffff7baacf3c30::mutex ADDR TYPE HELD MINSPL OLDSPL WAITERS ffffff7baacf3c30 adapt ffffff9a0e2b9b80 - - yes > ffffff9a0e2b9b80::findstack -v stack pointer for thread ffffff9a0e2b9b80: ffffff020c1acc80 [ ffffff020c1acc80 _resume_from_idle+0xf4() ] ffffff020c1accb0 swtch+0x141() ffffff020c1acd50 turnstile_block+0x21a(ffffff4397233cf0, 0, ffffffffc001dc28, fffffffffbc07aa0, 0, 0) ffffff020c1acdc0 mutex_vector_enter+0x3a3(ffffffffc001dc28) ffffff020c1ace30 dbuf_find+0x62(ffffff58b741a180, 0, 0, 8) ffffff020c1acfb0 dbuf_prefetch+0xc3(ffffffe4a773c870, 0, 8, 2, 8) ffffff020c1ad000 dmu_zfetch_fetch+0x5d(ffffffe4a773c870, 8, 1) ffffff020c1ad040 dmu_zfetch_dofetch+0xbb(ffffffe4a773cab0, ffffff7baacf3c00) ffffff020c1ad0b0 dmu_zfetch_find+0x3b2(ffffffe4a773cab0, ffffff020c1ad0d0, 1) ffffff020c1ad180 dmu_zfetch+0xe0(ffffffe4a773cab0, 8000, 4000, 1) ffffff020c1ad1f0 dbuf_read+0x309(ffffff4636790ef8, 0, 2) ffffff020c1ad2a0 dnode_hold_impl+0xde(ffffff58b741a180, 59, 1, fffffffff7d7acc1, ffffff020c1ad2e8) ffffff020c1ad2d0 dnode_hold+0x28(ffffff58b741a180, 59, fffffffff7d7acc1, ffffff020c1ad2e8) ffffff020c1ad330 dmu_bonus_hold+0x37(ffffff58b741a180, 59, 0, ffffff020c1ad3a8) ffffff020c1ad360 sa_buf_hold+0x1d(ffffff58b741a180, 59, 0, ffffff020c1ad3a8) ffffff020c1ad400 zfs_zget+0x64(ffffff431e6ff800, 59, ffffff020c1ad530) ffffff020c1ad4e0 zfs_dirent_lock+0x516(ffffff020c1ad538, ffffff71ee818218, ffffff020c1ad790, ffffff020c1ad530, 6, 0, 0) ffffff020c1ad5a0 zfs_dirlook+0x94(ffffff71ee818218, ffffff020c1ad790, ffffff020c1ad8d8, 0, 0, 0) ffffff020c1ad640 zfs_lookup+0x3da(ffffff49858d6040, ffffff020c1ad790, ffffff020c1ad8d8, ffffff020c1ada60, 0, ffffff7887271400, ffffff43c8d76978, 0, 0, 0 ) ffffff020c1ad6f0 fop_lookup+0xa2(ffffff49858d6040, ffffff020c1ad790, ffffff020c1ad8d8, ffffff020c1ada60, 0, ffffff7887271400, ffffff43c8d76978, 0, 0, 0 ) ffffff020c1ad940 lookuppnvp+0x1fe(ffffff020c1ada60, 0, 1, 0, ffffff020c1adc78, ffffff7887271400, ffffff7887271400, ffffff43c8d76978) ffffff020c1ad9e0 lookuppnatcred+0x15e(ffffff020c1ada60, 0, 1, 0, ffffff020c1adc78, 0, ffffff43c8d76978) ffffff020c1adae0 lookupnameatcred+0xad(80598d9, 0, 1, 0, ffffff020c1adc78, 0, ffffff43c8d76978) ffffff020c1adb30 lookupnameat+0x39(80598d9, 0, 1, 0, ffffff020c1adc78, 0) ffffff020c1adce0 vn_openat+0x315(80598d9, 0, 1, 1b6, ffffff020c1addf0, 0, ffffff0200000012, 0, ffffff0200000000) ffffff020c1ade50 copen+0x1fc(ffd19553, 80598d9, 1, 1b6) ffffff020c1ade80 openat32+0x27(ffd19553, 80598d9, 0, 1b6) ffffff020c1adeb0 open32+0x25(80598d9, 0, 1b6) ffffff020c1adf10 _sys_sysenter_post_swapgs+0x153() > ffffffffc001dc28::mutex ADDR TYPE HELD MINSPL OLDSPL WAITERS ffffffffc001dc28 adapt ffffff45ad3fab00 - - yes > ffffff45ad3fab00::findstack -v stack pointer for thread ffffff45ad3fab00: ffffff0209210790 [ ffffff0209210790 _resume_from_idle+0xf4() ] ffffff02092107c0 swtch+0x141() ffffff0209210860 turnstile_block+0x21a(0, 0, ffffff99e6bc2f40, fffffffffbc07aa0, 0 , 0) ffffff02092108d0 mutex_vector_enter+0x3a3(ffffff99e6bc2f40) ffffff0209210940 dbuf_find+0xac(ffffff59fe188040, 1006, 0, 0) ffffff02092109d0 dbuf_hold_impl+0x6b(ffffff4379e6ed48, 0, 0, 0, 0, fffffffff7d7ac30 , ffffff02092109f8) ffffff0209210a20 dbuf_hold_level+0x31(ffffff4379e6ed48, 0, 0, fffffffff7d7ac30) ffffff0209210a50 dbuf_hold+0x21(ffffff4379e6ed48, 0, fffffffff7d7ac30) ffffff0209210af0 dmu_buf_hold_array_by_dnode+0x121(ffffff4379e6ed48, 0, 2, 1, fffffffff7d7ac30, ffffff0209210b3c, ffffff0209210b30, ffffff6600000000) ffffff0209210b90 dmu_read_uio_dnode+0x5a(ffffff4379e6ed48, ffffff0209210de0, 2) ffffff0209210be0 dmu_read_uio_dbuf+0x48(ffffff445a6d3680, ffffff0209210de0, 2) ffffff0209210c80 zfs_read+0x1a3(ffffff65e72fe240, ffffff0209210de0, 0, ffffff66ff690590, 0) ffffff0209210d20 fop_read+0x8b(ffffff65e72fe240, ffffff0209210de0, 0, ffffff66ff690590, 0) ffffff0209210e80 read+0x2a7(44, 92650c0, 2) ffffff0209210eb0 read32+0x1e(44, 92650c0, 2) ffffff0209210f10 _sys_sysenter_post_swapgs+0x153() > ffffff99e6bc2f40::mutex ADDR TYPE HELD MINSPL OLDSPL WAITERS ffffff99e6bc2f40 adapt ffffff82208b3820 - - yes > ffffff82208b3820::findstack -v stack pointer for thread ffffff82208b3820: ffffff0203b817c0 [ ffffff0203b817c0 _resume_from_idle+0xf4() ] ffffff0203b817f0 swtch+0x141() ffffff0203b81890 turnstile_block+0x21a(ffffff56a3dfd0e8, 0, ffffffffc0019bb0, fffffffffbc07aa0, 0, 0) ffffff0203b81900 mutex_vector_enter+0x3a3(ffffffffc0019bb0) ffffff0203b81940 arc_buf_add_ref+0x6a(ffffff939f5679a0, ffffff99e6bc2ee8) ffffff0203b819d0 dbuf_hold_impl+0xa2(ffffff4379e6ed48, 0, 0, 0, 0, fffffffff7d7ac30 , ffffff0203b819f8) ffffff0203b81a20 dbuf_hold_level+0x31(ffffff4379e6ed48, 0, 0, fffffffff7d7ac30) ffffff0203b81a50 dbuf_hold+0x21(ffffff4379e6ed48, 0, fffffffff7d7ac30) ffffff0203b81af0 dmu_buf_hold_array_by_dnode+0x121(ffffff4379e6ed48, 0, 2, 1, fffffffff7d7ac30, ffffff0203b81b3c, ffffff0203b81b30, ffffff6600000000) ffffff0203b81b90 dmu_read_uio_dnode+0x5a(ffffff4379e6ed48, ffffff0203b81de0, 2) ffffff0203b81be0 dmu_read_uio_dbuf+0x48(ffffff445a6d3680, ffffff0203b81de0, 2) ffffff0203b81c80 zfs_read+0x1a3(ffffff65e72fe240, ffffff0203b81de0, 0, ffffff612163c018, 0) ffffff0203b81d20 fop_read+0x8b(ffffff65e72fe240, ffffff0203b81de0, 0, ffffff612163c018, 0) ffffff0203b81e80 read+0x2a7(46, 92670d0, 2) ffffff0203b81eb0 read32+0x1e(46, 92670d0, 2) ffffff0203b81f10 _sys_sysenter_post_swapgs+0x153() > ffffffffc0019bb0::mutex ADDR TYPE HELD MINSPL OLDSPL WAITERS ffffffffc0019bb0 adapt ffffff8655614180 - - yes > ffffff8655614180::findstack -v stack pointer for thread ffffff8655614180: ffffff01e9f9b3c0 [ ffffff01e9f9b3c0 _resume_from_idle+0xf4() ] ffffff01e9f9b3f0 swtch+0x141() ffffff01e9f9b430 cv_wait+0x70(ffffffffc0019318, ffffffffc0019308) ffffff01e9f9b480 arc_get_data_buf+0x19f(ffffff49f91edf70) ffffff01e9f9b4c0 arc_buf_clone+0x69(ffffff54ac838148) ffffff01e9f9b5b0 arc_read+0x904(ffffffdecebff040, ffffff4322e8d000, ffffff01e9f9b620, fffffffff7cc12c0, ffffff9d4198b780, 2, ffffff0100000180, ffffff01e9f9b6c8, ffffff01e9f9b6a0) ffffff01e9f9b730 dbuf_prefetch+0x313(fffffff45618c798, 0, 302, 2, 8) ffffff01e9f9b780 dmu_zfetch_fetch+0x5d(fffffff45618c798, 2fb, 9) ffffff01e9f9b7c0 dmu_zfetch_dofetch+0xbb(fffffff45618c9d8, ffffff69233afbf0) ffffff01e9f9b830 dmu_zfetch_find+0x3b2(fffffff45618c9d8, ffffff01e9f9b850, 20) ffffff01e9f9b900 dmu_zfetch+0xe0(fffffff45618c9d8, 4100000, 20000, 20) ffffff01e9f9b970 dbuf_read+0x242(ffffffee25149920, ffffffe76e13b020, 16) ffffff01e9f9ba10 dmu_buf_hold_array_by_dnode+0x146(fffffff45618c798, 4100000, 5000 , 1, fffffffff7d7ac30, ffffff01e9f9ba5c, ffffff01e9f9ba50, ffffff4500000000) ffffff01e9f9bab0 dmu_read_uio_dnode+0x5a(fffffff45618c798, ffffff01e9f9bde0, 5000) ffffff01e9f9bb00 dmu_read_uio_dbuf+0x48(ffffff55c6ff8870, ffffff01e9f9bde0, 5000) ffffff01e9f9bba0 zfs_read+0x1a3(ffffff498b4d6600, ffffff01e9f9bde0, 0, ffffff4384b5a720, 0) ffffff01e9f9bc40 fop_read+0x8b(ffffff498b4d6600, ffffff01e9f9bde0, 0, ffffff4384b5a720, 0) ffffff01e9f9bc80 hyprlofs_read+0x34(ffffff54e4ddf900, ffffff01e9f9bde0, 0, ffffff4384b5a720, 0) ffffff01e9f9bd20 fop_read+0x8b(ffffff54e4ddf900, ffffff01e9f9bde0, 0, ffffff4384b5a720, 0) ffffff01e9f9be80 read+0x2a7(0, 8146000, 8000) ffffff01e9f9beb0 read32+0x1e(0, 8146000, 8000) ffffff01e9f9bf10 _sys_sysenter_post_swapgs+0x153()
The ARC reclaim thread is here:
> ::stacks -c arc_reclaim_thread THREAD STATE SOBJ COUNT ffffff01e9bfdc40 SLEEP CV 1 swtch+0x141 cv_wait+0x70 taskq_wait+0x43 kmem_cache_reap_now+0x3c arc_kmem_reap_now+0x53 arc_reclaim_thread+0x21a thread_start+8 > ffffff01e9bfdc40::findstack -v stack pointer for thread ffffff01e9bfdc40: ffffff01e9bfdaa0 [ ffffff01e9bfdaa0 _resume_from_idle+0xf4() ] ffffff01e9bfdad0 swtch+0x141() ffffff01e9bfdb10 cv_wait+0x70(ffffff42e268ad5a, ffffff42e268ad48) ffffff01e9bfdb50 taskq_wait+0x43(ffffff42e268ad28) ffffff01e9bfdb80 kmem_cache_reap_now+0x3c(ffffff42eceec008) ffffff01e9bfdbb0 arc_kmem_reap_now+0x53() ffffff01e9bfdc20 arc_reclaim_thread+0x21a() ffffff01e9bfdc30 thread_start+8()
So it's blocked on that taskq thread, which is here:
> ffffff42e268ad28::taskq -t ADDR NAME ACT/THDS Q'ED MAXQ INST ffffff42e268ad28 kmem_taskq 1/ 1 213 768 0 THREAD STATE SOBJ COUNT ffffff01e8017c40 SLEEP MUTEX 1 swtch+0x141 turnstile_block+0x21a mutex_vector_enter+0x3a3 kmem_depot_alloc+0x88 kmem_depot_ws_reap+0x70 kmem_cache_reap+0x77 taskq_thread+0x2d0 thread_start+8
Given that the repo lock is also taken by the kmem_move taskq, things are starting to make a little sense. That said, the last link in the chain is missing. That thread is blocked on a mutex, but we can't get the stack for the thread holding it:
> ffffff01e8017c40::findstack -v stack pointer for thread ffffff01e8017c40: ffffff01e8017960 [ ffffff01e8017960 _resume_from_idle+0xf4() ] ffffff01e8017990 swtch+0x141() ffffff01e8017a30 turnstile_block+0x21a(0, 0, ffffff42ed50b1d0, fffffffffbc07aa0, 0 , 0) ffffff01e8017aa0 mutex_vector_enter+0x3a3(ffffff42ed50b1d0) ffffff01e8017ae0 kmem_depot_alloc+0x88(ffffff42ed50b008, ffffff42ed50b1e0) ffffff01e8017b20 kmem_depot_ws_reap+0x70(ffffff42ed50b008) ffffff01e8017b60 kmem_cache_reap+0x77(ffffff42ed50b008) ffffff01e8017c20 taskq_thread+0x2d0(ffffff42e268ad28) ffffff01e8017c30 thread_start+8() > ffffff42ed50b1d0::mutex ADDR TYPE HELD MINSPL OLDSPL WAITERS ffffff42ed50b1d0 adapt ffffff01e8011c40 - - yes > ffffff01e8011c40::findstack -v stack pointer for thread ffffff01e8011c40: ffffff01e8011270 ffffff01e8011c40 0xffffff01e8d3ac40()
The depot lock is held, which points to this code in kmem_hunt_mags:
/* Hunt the depot. */ mutex_enter(&cp->cache_depot_lock); n = cp->cache_magtype->mt_magsize; for (m = cp->cache_full.ml_list; m != NULL; m = m->mag_next) { if (kmem_hunt_mag(cp, m, n, buf, tbuf) != NULL) { mutex_exit(&cp->cache_depot_lock); return (buf); } } mutex_exit(&cp->cache_depot_lock);
How many full magazines do we have, anyway?
> *dnode_cache::print kmem_cache_t cache_full | ::list kmem_magazine_t mag_next ! wc -l 32847
Yes, you read correctly – there are 32,847 full magazines. Multiply by the magazine size (95) and you have 3.1M buffers that are actually entirely free but can't be freed because the reaping thread can't get the lock because we're too busy trying to defragment a cache by freeing buffers. (!!!)
There are many ways to fix this, but I think the simplest is to have the move thread should basically not consult the magazines at all: a move is kicked by a defrag which itself is kicked by a cache reap; full magazines will be reaped by the reap thread, and buffers that are present in those magazines will be returned to their slabs anyway. Looking at the move stats, on this machine we returned KMEM_CBRC_YES 96.1% of the time and KMEM_CBRC_DONT_KNOW 1.7% of the time; doing invasive, futile work for 1.7% of the cases is fool-hardy – and that's before considering the iteration over 3.1M buffers.
For whatever it's worth, on the 302 dumps in which we had non-zero defrags, here is the percentage of KMEM_CBRC_DONT_KNOW, with output sorted by the number of callbacks:
DUMP CALLBACKS %DONTKNOW d349015bda171b2acea65aa615f91505 276183824 4.49 320a8c18e0c49818e36635c4d5781204 168332641 3.17 5982fc0918e1288240b45fbcb94b02e2 146940234 4.78 675bdb31abbc8bb7965186893319ebe9 112053071 5.22 66a73fc1530cdc09f97c14aab4d94e52 77218093 4.34 33aad01524eba0f50ab2b64588e8526f 47942462 4.32 3b095192a87588bf71e67d99b1f578c6 29365662 2.53 1620b89dc746f3591e44e2d85543019c 26463768 3.62 9cfc5a874b08a9d1c350da24e59b27d3 21269506 3.55 a1da033680d0a3faaad277b16b5e1a1a 16373257 2.77 e2a2472769e7b818214ae5e2f138a37b 15448841 0.83 f538fc3a9fa68068177a9d0075108d8c 12787953 8.29 3696f4e29c9df9a56ff13353bb4bbba9 12558408 2.55 f1a311f9cbd5f45187c8b882e40a0e24 12437696 4.24 b7c157e1960b82b5e2ff4ce019b385d0 11243756 3.27 af3aaaacbc4dc3cd13f3aa380f16acb0 10864791 13.84 2b65cc2e0dc53ee5d9bb140824dfdf9a 10653476 15.91 17c5298dc62f5587ebd6741d7ce0f00a 10627062 17.36 4a059401d804191d1762186ac3fa5309 10094462 11.25 fd0e8b2651cb5a74614eb631631ff6b7 7690316 0.38 dab916689e68b3f0721bad890e632bc9 7056250 4.18 47a9cde7f15f0ad7dfcdaf8fb30deb0a 6859051 0.44 c85a5b60cbb4add4f4d31de774683155 6435861 1.33 6a837e222e5e4b969bae9a5f6d42183a 6428267 4.20 c3af7251dcfbd7f03b81761bdf7b0752 6377188 1.91 2137d1afcc3f7d379f188a9a04b87ed8 6209598 9.57 28f53b45363776e842a9134e9b542d5e 6174478 4.29 dd7afe8aceaf3d9ad6c8ab580df429d4 5697137 1.24 56b940609ad27f93b2875f1eb5c4582e 5618080 0.35 a5d6ecdf263d3fc84f716d0aee9f7370 5190131 0.36 0c38102577f4214187e8fad3375d7910 3919920 8.88 d287364d8d813b06a88aa5d884e805ae 3517997 3.05 dba6fae8acba5a3fa696ba0ca405a4d9 3120740 4.06 c81a7a50e4447aed770d4cf8110dfcf0 2791764 20.08 2260f5619f1e4898f17df6f1a964c437 2785140 7.92 7f674cc24240c99e3b0468cfc43500fb 2579713 0.90 4056895771e0dbdd19b9ec5902fc85c6 2381808 46.21 af49944a7d785246d93d1a8298c8c568 2253549 6.02 517142bc8fdabce8913b288beaa70784 2236139 9.23 5ae902d321ca7589317856406be0def1 2217412 3.50 ff80de3303b73886ae6fac7891416eaf 2199565 14.57 728d99184eef0631377838bcb5e68c0f 2189680 4.59 e41857a5d584d27830ea5ea9079072c3 2123205 9.21 796991e0f6e98c86feb7347e9b92d090 2113917 2.33 1d20ebaec27b18a5ebe7e24ef9941917 2103142 5.21 15063536f5ffe881c62c478543433f01 2053119 7.64 e96b97e7b653343644e706898d674b49 1907295 2.57 7afc7ba397fa55adea65d9724b686f00 1765535 70.01 58bc62c33399dfbdbb6bb6716446af86 1539337 3.18 6e98e9fb67204e907a7af487380b8630 1533618 3.25 2ade48bfd97d432c8b7da39e9b0cf96d 1500285 2.30 5648ae09e7aea4db0d517a03660a09cf 1476190 27.55 a4da13e1eee343332041d2939be16cb8 1398312 0.38 f2d0a0716785848ac93aa2d9e0a18b64 1317226 2.56 d4611b2ae8bdba4a15e5ecb742c1b62f 1297909 23.02 9439f722d877b81b7d1e0eab29c447c3 1278185 1.68 67e3138bd0f2b6a6d56cf0f5fe9b2c2b 1266340 89.14 c5476e01217acdbecd2886fe6c7bab5d 1149864 0.42 e3522e9f2d16e0d929b83eb828fe5058 1122116 4.63 6bd211e8a40871e914d0499c588049fa 1116914 5.87 2bc211e7238c99dc4b25a9f509e27aa6 1089444 1.92 634ff17a17b32defe3eb7a4d19238522 1062046 1.09 e2a0fa4bb35fc1e214cff09ce62ec298 978204 0.98 16914eaa2f09a300238b04ed71cf65e3 972559 2.94 f1dec9cacb2e282b2f470c15ddc7ebef 965830 3.76 7717d20f34e960d18594dcd96d033641 951935 2.95 27c987c208854fd2cad2a5a3cad2ad7b 915685 2.02 19edd31834b60943a742783025060eab 914643 5.16 f54b4c166222561209b7770c729b08e2 896811 3.03 b757b101bb0c92519125ab62bff54909 884683 1.23 52cefe6966b2fe2e4c3d8e224674288a 878024 2.12 94bf58e51a6db75738f6b6fad3703d26 877734 1.74 dd933c1c57754bf2580fe42c04bd13b8 861923 1.15 2c159a6a17d9bde9e064045f49b5aad5 847944 0.10 cc41a90e066967fe67293cf454694782 797258 9.20 12a5b3edcd5f8785987aef20779d7d32 779537 46.19 b7773ba947d7bc90369038b46a5c01ad 729475 0.79 08bfea3be455022f9cdce432e1799836 721142 1.92 05cc8345b4f1a32d7845e66b90e530f5 711761 6.53 eb0dcad9fb24f4e225741ea98eae875e 695876 1.09 e964c560929bb543f9b87ee9349fc1ac 667077 12.52 b6ab26a1b245f9a9d29188482f187992 659511 7.09 b80c1a837b2faf8ae44b5e77dda9831e 597245 57.17 1bcc39bca886ebb3430ab3e99e3fb748 583637 14.19 28c4e03367c24b9cddd605a3851e0c6e 582447 4.82 6ebceaaf64d3abae27fa1336dd7fb403 569617 1.52 d14afdede9dfacb727d0f11780155211 560861 13.76 7cb15276785d29a5e1cb447c3bcf4e2b 485817 1.81 09b065cd258b0cb8743e570eda2c994a 484606 12.61 ebad5eed8b84dc4f1a4d107395d669d0 481531 4.74 0ea89577ba16992dc74d86d112589e0d 460968 5.21 2539a9a0167535a24f2a07918c3294a2 446319 1.70 12675849fcb76f0eeebd5db87dca28dc 442619 3.77 147b25e20727229fb28add66452b71d2 434441 1.44 36c735d11bbfeffee4d5187d8213b6a9 427376 2.50 c7c9e8ff39564838369f108c5b1df61a 392947 14.88 c5c740f1cd78419e362e413249b30b30 386811 10.17 52e402c09ebbcc9ff87120d21d43edaa 346112 29.72 192ffb93b289822191132d6a4522863b 337730 2.94 85ccd7baba57ea83e6009c90c587fca8 336446 12.77 db56b200e3d22424019fc2415bfcdeb3 328158 32.74 6cd121d4398c168dd30861ae6b2f827f 323312 11.77 80bbd905c8d638b44e01a2a649a4d375 317025 1.03 617e3fbe573626c4ab2577feab1f8e40 314942 2.93 4293ce4aedfa7f70a13b269b4b48021f 305313 2.29 a754657af6298ec0fa61d857d18ede70 302530 1.66 382daf07d730af46ca140b270886af86 297567 4.38 5abb75bd5b220ccd51992b583499e915 283893 1.67 1c9656276f57ba215ef04b3848d157cf 281979 11.49 7c430480c4be364d1061b5c8dfe76ee2 275551 0.54 963e41a8e430bec225dffeddad854bc8 230655 1.32 3e226f111555e9ea6766ff413aa4580f 226435 0.64 1994c984275701f25725fd89e95656c2 223092 6.33 2dabc55e9ae2a28028f329734926fefc 210021 7.01 f4d475043737761927aa3622746c19de c 193139 3.19 5718a7031b2a2fc301ba34e6d82ac5f2 181303 28.75 e1a69f199f11517f6dd08b50cbad09c1 174660 0.16 daca2f4a868acfdcc3a433e5a533d894 157049 1.38 7b41a33ec268004676c41f3cdfda368d 152321 0.66 0adc11edc9f6b7e00ae15e394229fb14 146385 3.74 d56fc2699574aa210cf23479a839b517 143939 6.92 5df31abcaba65483c3477e84bd81a27b 143156 34.25 e95e51af387f0f34aa867baaf4c7812c 143048 6.11 ad1c54e746675d2ee7754426beb21ff3 141196 4.15 79ffe2b444df9ecb98cbce5057f218b7 139633 2.97 40f58b694a1815dd25607f005698f6e3 135483 1.39 18e56102c306041c0a6bc373fdde5980 134396 13.71 e05ce0ff0050cf43ce5c8a0d14c2b9f9 133468 2.89 5dfc850920ea0fb999de7412d1d22a3f 129186 4.48 36213877a483cf69b4688d68e9ba0346 126542 0.89 8c8bb6cdfbc5dc79c8c74b346ca545da 126248 0.39 f7dc51ff5872e8e5e8ccdf95a460cbf9 119762 0.33 0965928ad80c38b7918a4a84c420f722 111950 1.72 b20a6bcd022ce276eb477786e6a2fb9b 102467 2.97 a78d7f920cdf0410bbb5573e45a77916 98046 0.66 a0b124a308064f52466d1f55533e9ca4 96050 2.34 16ee4c6796c70a9033f9d3682f695561 94795 0.75 8f356abd183f92e36f0435df43d59402 94199 2.56 8a0927e6af5b5c2934355a0b2df4f7a7 93435 0.31 75ce9d4e4d2c4686151373073ca52e85 90631 0.83 c58438544b22393786f10a2b6f87e4d5 90346 0.62 5516e397942ea2dde25a28a50a56d83d 89445 1.02 f963d97832f8868c3ba76eb62b3cf2f3 82630 3.78 4848390c464ad577b882aca35ca97109 82218 7.47 4ad4b191fa3493929db292d4125eb870 81959 26.53 827794964e30dcd462582f0a8fd9386f 81174 59.82 54e7933b887771e26afc50dca3f8cef7 79395 1.79 93d70fcda75be4bf02e1d2db56ef4c34 76825 1.64 a288eda3ac0bbe1624e37eebfc8296a3 76750 0.41 d666942527a60b3c451f5822a2cd2f9d 73194 17.72 cb3f0700749e05af398cbeb7f3e3d832 72638 0.63 ddb14a7b16725488c2bc88f465e008f4 71841 0.55 b6cabb3d1259456dec825d38e8dafd0c 71463 0.23 09a6f062f310f483b486f6b448f3a790 64131 0.49 d900d0921bd37624903067b8e282e383 62788 2.27 4f47e5304597137a9d1b54cc6387e6fb 61581 0.99 d84c44bd7b9040c6864392ba72080994 59516 5.73 d091866bc5a8696e5566b6081bc95cc2 59002 1.41 1e543d224f0735fdffa9ba8ce24373a2 58005 12.86 78368e88c2d3602590dc1ed224262b02 52819 0.96 061cd10f29b1029a9a169ced3d4c05d0 52596 0.39 a1469aae93df428a5e25e0c90aa25d1d 49979 2.18 bd90c48b0ba49141e06ecd76d379d7b5 49063 0.62 4de012679309c685bceb7a3c8b1379b3 48882 2.66 4ae326b34da46edc498420ee301cf2de 47411 3.72 ef06807bd47e203aac31997ce49b6bc1 46553 31.22 88eaf4bd1c1cb528588cdb3475ba4668 46090 4.51 e1852ac40ab37cd42bdb275a76d1ba92 46021 6.49 97c4b24d051495d572a4bbf3d03e61c0 45640 4.03 708f69b48013a42e3e72e0694bdd0bba 45440 0.45 21fcc0e6f479f457046d93a19fec0abc 44039 14.40 196067f765ec8b877ed3f7b7bb03013e 42188 5.58 47e17a1fac5fba55d63b3c4fc4b83968 40036 0.37 422e623d91c5846a05815de14afcfdee 35707 1.40 ab36d383c473c6291ef8c84b0c5d3cbe 35198 13.07 09528f87d42419a4bf623288802cc95f 34982 0.40 a3e72020fb344b8b9d94a71aeb32d56b 32243 20.27 1f043226cdf32a7456208a65243c4677 31928 1.78 cf9f34d562fed1b63c32fe3e2a887057 31474 1.46 068f7128e3cbcf8bde52964cb6d86ff4 30927 17.36 b4f68c7971099db65047774134c80a2f 30873 9.35 244461bfe11b6f7c43aacc018959aab7 30835 2.08 07e5ec2dd4a57a735e2c5bef25c150b7 30655 4.50 1dcbd97db3f40874b28dd90aafcc17fb 30617 0.84 7340733a5fcfae49244177abc74a3fea 28625 0.16 df694f9bd93066367f57508f7431f65c 27268 32.39 1b9bdb7b315c84e344d37366557b896b 27256 21.55 7ec67ee8dbea6391e28eb2ed62d85628 24225 0.66 b939b19ccb279ab74e794ef8a0f5748e 22945 0.34 dde1e5c33dbbbcc35a981d6b0059bfd8 21691 2.78 06361bfefc7e2ec2c7a6d136e47c9275 21195 36.71 6bf32500fae459f6c74fc9ab39baa894 19927 0.37 21321fb7c5a924580a0d610000515e09 19812 0.64 68627711e3b7021206edbc4de62cf2c2 18612 84.19 885bf97a5d0297b16298e611469d1cec 18468 0.03 9afb6a10ac410241e329028ee510a6dc 17752 5.78 7e455494504669c6bea531eef0c7d9b7 17223 1.13 0ebb88a740b1da69f7d234f5de0689f8 16958 1.12 b9dc37e5ffeb6266c315f8bee948e8b5 16481 0.29 2f5a121623535338fd026c68a14307aa c 13068 6.06 9d8829b1bb675e601a881e1830f7b3fe 13008 0.66 c6b35ff55f1ceffd4f5f28a1eb994658 12499 0.22 e0904c9b401f7c84330dd35ef5cc33e1 12488 16.00 c628f52e11b54ae0bdf3f477e31ca888 11896 2.09 456ee1f30b86f7c63bd44fc41a34a67a 11460 12.16 f2243c050c252cf4607c0295b31ac5f9 11447 3.04 849e6f93177decde216e5dd925cc73a9 11044 3.59 188f642acdf1e131fb99f00cf52a565a 10499 2.28 02ff7b9cc9fca87adf25239503e8bd92 10438 2.69 a9552973544aa60e4ec72153efe6df9b 10238 9.54 11e6aa49ec320c49ea149e8bb4554961 9566 3.10 4e95de2ccbb0f7d46aef42638c11afdc 9364 3.20 b4ec5232c9a3b3928fc32636126676a3 9081 2.11 55eee843ac19d92ba3254a500dc9dd79 8846 0.61 e3b86ba8f8d6d1151eb96faffd876466 8788 4.35 eaa98455fd2741106e3b9e693614fe4d 8182 8.23 c33f8d5c2e7963c470d13f55e2018b14 7988 1.51 51549b64fabfe0623d8435ee251ed077 6955 1.37 bbcd436ad366ee139205af6a919cf379 6156 0.91 1db5ea3d58098d73a225d3213fca2fec 5763 10.29 f94afdf3e9c95b1c97f1324212e416d1 5651 0.92 8e03ee719208d74fb5d204e33d38c60d 5511 2.09 64d86358b87b18707bae4dfba5e49975 5508 1.65 ddc92a67f2ed6511c72b2d918e0e6f97 5294 2.19 d4ba85da1c16b3d76e44ea1c0c623314 5105 0.86 1f4ecc595b236ef0fa07c5aad08d14e2 4904 1.63 45a970c8dc458bc6b7b98b8eb698a991 4869 5.26 06baf056dae7c720f239e7f3f607643f 4854 0.87 ef516b334b8fb6b4004fd85143e939ad 4733 2.26 0fcfcbc3ea4ce8c3244d41fdc38d7153 4587 48.59 441dca8a871572071dd67ab53d5ba183 4328 1.66 e6a70546af87d8ef341848bf4e4863f6 4210 1.31 dbad7c09387b7c08995408dbcaea127d 4177 10.63 4a58986bda0f492da42d33ea3f9b76e7 4087 15.15 dafa15928dd4696c118c88f872fc3330 3961 1.16 8176f59815d5729f627e84b59d0f4f3e 3960 1.59 cdf5b88d646f5bc457e528488b6251da 3803 0.24 af60eecc1507b1eb6b24ce96392a4ff5 3794 0.47 816bb95555b84c8dae7df3a9bbe1104f 3751 0.56 2e5676ad1605cfca840310cd1cc939a6 3641 3.38 f9bdc18f104d3aa89f12892e408ed3f5 3620 1.93 04b9236661f8fb254b456e65e2069a16 3612 2.85 8de38da49bb9983c68aa9f046c301ec0 3527 0.40 cb20d8f377b33630aa7bc97227d3f999 3453 0.78 2ee01c8007421c64b48ead3109bd2a7b 3446 1.54 66ecafa97cdb4cb32d4f1876b05e52a0 3141 1.27 8b8f934fa46102e78ba748a4e35036a7 2773 30.83 533fdc8d35cc5ed88fdc15ae69f44b93 2490 4.46 065a2ebd18b71d6266a581419569bbfe 2428 0.62 7ecc12f230687f047e948898e9d49d20 2159 1.67 23c80a2f4c1fe51d700bc3cf02199634 2085 2.69 5e0523645ae5d5145d47f80b0717dd5b 2072 10.71 913f3690d9712b55bfef9e3101baad26 1934 1.03 9970072dc2546b0996eebef79ccf981d 1925 10.18 572f8fe3b308fb391a33cdcdeb12bf1b 1832 0.98 dd7bc5ad198f889d825d14459d6d6cab 1729 1.10 da6ba5f37a54705aaf2a1e14fa01272b 1571 4.01 26d780566735c56e630eadebaafe996b 1561 39.78 816a019da081de42e34168af23818e20 1537 1.63 d9243db86ad0bd2e8e12fdc6f0f7c72d 1476 7.45 1e47a72856ecd9bf6a74c6ffeaa39835 1307 1.68 1d1f11c552a23f26f547b190435f6457 1231 38.67 68b4f4bae6c21ac7c9059b8e35487320 1187 2.61 2887c6815bb943948d46aecb237acf38 1150 2.17 1f2058ecb343f3decf0bd1d7aecc2879 1071 2.33 d00610cc6684863be1c43a7d92c93886 1064 33.93 2b9e6d2616f5cdf5272f34ea8eb97e02 1002 20.46 5dfd0ba129bf934ba6bdd030bcead4cc 860 7.09 bcff2cb9411467a59eabb02dfac2863d 801 4.00 c983a739044c2a6fd96555a74c53065e 787 4.70 cc53963d4f74847de6d7fbe0c1722b42 762 1.31 c0a6248f3529dcc05ed8f17d493a89a2 738 0.27 c06916affdb467583731b26d9e0c1c55 728 0.27 65930c21ee3707b990d844032821309d 702 2.42 e4dea78e44542e1fa83cb622cf07c38d 682 0.00 645f44b8c04b762cdde55a2b128bc1d7 673 0.15 749ad6df5ea6ebf6b6497dcb148d4c79 647 3.55 8693b5cb1b5bb13ad349a16702a5be67 637 3.92 9554ecd3905d6bb8ea6693234db33dfd 533 14.26 d79f4eeee7d6f968752d3439ae4c7a11 506 0.59 f04e819d5d905ae616eadc4451e7db31 405 4.20 53ba0d29ab41bdf27b5320b961c929b9 403 0.25 30f417e7c78295bcf45f6f4b899c0f5c 402 4.48 64da01e8a7efb9e0d443ade6f732cd30 362 20.17 d5bd0063dbc1ae7dadee9b01289af004 339 3.54 14c440a88fc03e01f24d22628ebb06af 336 7.74 e5621dee75d6c8c1e56a8035c69afbdc 308 3.90 e4b2a7169bdd52a650a8a29d0ed3733c 306 24.51 2f0a0dc635184f146910224c1f2ac4a2 299 6.35 6230856547207b31376cdb49d551c43e 280 1.07 c62a183eeb40b29919ae5a89f63559ef 226 24.34 94885d274d289277105f57ddd6c371e1 218 10.09 b111db893e23470878a929d2b41e7fbf 190 14.74 9f645b226c8d735a1497312ebd3136e0 147 12.24 7b3ab93aa642d947ec1ba607a056bcc9 119 45.38 73ff09f73bace332f50645ce3e29ce88 72 15.28 079f15401fe71d1f3f494e572123e99e 56 21.43 f80e66528be4fa36e719e9b1645e02a0 30 46.67 d480f15bef5426ebec387a244502cf9b 24 20.83 aadc0a80223becf17e1a400991977bc3 24 4.17 b506552d1450174f7b020c88f77a1228 10 0.00 48d3c93279adc23d749f17ac0bb7d76b 10 0.00
Updated by Electric Monk over 4 years ago
- Status changed from New to Closed
git commit d7db73d165ff4802d277d375e93877471457fed7
commit d7db73d165ff4802d277d375e93877471457fed7 Author: Bryan Cantrill <bryan@joyent.com> Date: 2018-01-23T21:44:09.000Z 8493 kmem_move taskq appears to be inducing significant system latency Reviewed by: Robert Mustacchi <rm@joyent.com> Reviewed by: Dan McDonald <danmcd@joyent.com> Reviewed by: Paul Dagnelie <pcd@delphix.com> Approved by: Richard Lowe <richlowe@richlowe.net>