Bug #9695
closedSlow crash dumps, significantly slower than live core
100%
Description
The table below showscomparative data for approx 10.5G crash dumps, with various different dump devices (all zvol). The compression ratio for this scenario is 5.6, so about 2G of data is written to the dump device. The IO Size is 128k.
driver | Parallel / Serial | dump time (s) | IORate | device write (ms) | write ns/pg | pages /sec |
---|---|---|---|---|---|---|
mr_sas | S | 192 | 13.04 | 9.59 | 56572 | 15090 |
mpt_sas | S | 42 | 111.95 | 1.12 | 5951 | 66940 |
mr_sas | P | 148 | 13.03 | 9.59 | 53310 | 18797 |
mpt_sas | P | 17 | 114.19 | 1.09 | 6113 | 163469 |
sata | S | 357 | 6.53 | 19.14 | 107711 | 7802 |
mpt_sas | S | 73 | 119.33 | 1.05 | 6102 | 38227 |
sata | P | 301 | 6.53 | 19.14 | 107807 | 9263 |
mpt_sas | P | 18 | 113.78 | 1.1 | 6343 | 154874 |
What we can see from above is that the time for each write to the dump device (125 / IO Rate) is approx:
driver | device write (ms) |
---|---|
mpt_sas | 1 |
mr_sas | 10 |
sata | 20 |
It appears that the observed dump performance issues are when using SATA or MR_SAS drivers. This is pretty much specific to dump, in that it is the SYNC/POLLED write path. Tests with savecore -L have not shown such differences (although SATA still about 2x slower than SAS, but that's probably closer to what we'd expect).
The conclusion is that the SYNC/POLLED write path (as used by dump) in SATA and MR_SAS is too slow.
The table also shows the comparison between Serial and Parallel dump times. The IORate is unaffected, as expected, but in cases with a reasonable IORate (mpt_sas) the compression parallelism provides significant benefit.
Investigating mr_sas:
It turns out that the mr_sas driver checks for polled write completion after 10ms, which would explain why a write appears to take 10ms. By reducing this to 1ms (which is what is used if the command were issued via bolt_issue_cmd_in_poll_mode) dump times can be reduced significantly. The above 192s (serial) / 148s (parallel) times are reduced to 39s and 15s.
Investigating sata:
There is a similar issue in the ahci SATA driver. The poll wait time is 10ms. Reducing this to 1ms improves IORate by approximately a factor of 10. However, in the SATA case the IORate was about 20x lower, so dump writes still take twice as long as SAS. The reason for this is that the design of this driver causes the drv_usecwait() to be done twice on every call.
Updated by Electric Monk over 4 years ago
- Status changed from In Progress to Closed
- % Done changed from 50 to 100
git commit d8873b3136f0985c9e33ff8801644e1b4253b36c
commit d8873b3136f0985c9e33ff8801644e1b4253b36c Author: Joyce McIntosh <joyce.mcintosh@nexenta.com> Date: 2018-08-07T19:46:09.000Z 9695 Slow crash dumps, significantly slower than live core Reviewed by: Dan Fields <dan.fields@nexenta.com> Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com> Reviewed by: Rick McNeal <rick.mcneal@nexenta.com> Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com> Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Richard Lowe <richlowe@richlowe.net>