Feature #7090


zfs should improve allocation order and throttle allocations

Added by George Wilson over 7 years ago. Updated almost 7 years ago.

zfs - Zettabyte File System
When write I/Os are issued, they are issued in block order but the ZIO pipeline will drive them asynchronously through the allocation stage which can result in blocks being allocated out-of-order. It would be nice to preserve as much of the logical order as possible.

In addition, the allocations are equally scattered across all top-level VDEVs but not all top-level VDEVs are created equally. The pipeline should be able to detect devices that are more capable of handling allocations and should allocate more blocks to those devices. This allows for dynamic allocation distribution when devices are imbalanced as fuller devices will tend to be slower than empty devices.

The change includes a new pool-wide allocation queue which would throttle and order allocations in the ZIO pipeline. The queue would be ordered by issued time and offset and would provide an initial amount of allocation of work to each top-level vdev. The allocation logic utilizes a reservation system to reserve allocations that will be performed by the allocator. Once an allocation is successfully completed it's scheduled on a given top-level vdev. Each top-level vdev maintains a maximum number of allocations that it can handle (mg_alloc_queue_depth). The pool-wide reserved allocations (top-levels * mg_alloc_queue_depth) are distributed across the top-level vdevs metaslab groups and round robin across all eligible metaslab groups to distribute the work. As top-levels complete their work, they receive additional work from the pool-wide allocation queue until the allocation queue is emptied.

Updated by Electric Monk about 7 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit 0f7643c7376dd69a08acbfc9d1d7d548b10c846a

commit  0f7643c7376dd69a08acbfc9d1d7d548b10c846a
Author: George Wilson <>
Date:   2016-08-02T16:44:16.000Z

    7090 zfs should improve allocation order and throttle allocations
    Reviewed by: Adam Leventhal <>
    Reviewed by: Alex Reece <>
    Reviewed by: Christopher Siden <>
    Reviewed by: Dan Kimmel <>
    Reviewed by: Matthew Ahrens <>
    Reviewed by: Paul Dagnelie <>
    Reviewed by: Prakash Surya <>
    Reviewed by: Sebastien Roy <>
    Approved by: Robert Mustacchi <>

Updated by Alexander Motin almost 7 years ago

"The queue would be ordered by issued time and offset" is not true, since io_offset field in logical ZIOs is always zero, so this code orders writes only by issue time.


