Integer overflow in txg_delay()
The function txg_delay() is used to delay txg (transaction group) threads in ZFS.
The timeout value for this function is calculated using:
int timeout = ddi_get_lbolt() + ticks;
Later, the actual wait is performed:
while (ddi_get_lbolt() < timeout && tx->tx_syncing_txg < txg-1 && !txg_stalled(dp)) (void) cv_timedwait(&tx->tx_quiesce_more_cv, &tx->tx_sync_lock, timeout - ddi_get_lbolt());
The ddi_get_lbolt() function returns current uptime in clock ticks and is typed as clock_t.
The clock_t type on 64-bit architectures is int64_t.
The "timeout" variable will overflow depending on the tick frequency (e.g. for 1000 it will overflow in 28.855 days). This will make the expression "ddi_get_lbolt() < timeout" always false - txg threads will not be delayed anymore at all. This leads to a slowdown in ZFS writes.
The attached patch initializes timeout as clock_t to match the return value of ddi_get_lbolt().
Updated by Gordon Ross almost 8 years ago
- Status changed from New to Resolved
- Assignee set to Martin Matuška
changeset: 13487:78d9278724d7 user: Martin Matuska <mm@FreeBSD.org> date: Tue Oct 18 18:08:05 2011 -0700 description: 1313 Integer overflow in txg_delay() Reviewed by: Matthew Ahrens <firstname.lastname@example.org> Reviewed by: Dan McDonald <email@example.com> Approved by: Eric Schrock <Eric.Schrock@delphix.com>