Project

General

Profile

Bug #7367

blkdev: support block size larger than 512

Added by Youzhong Yang about 3 years ago. Updated 11 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
driver - device drivers
Start date:
2016-09-09
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

It seems the current blkdev driver assumes the block size is 512, although it has the right shift value available as (bd_t*)->d_blkshift.

We tried a patch that calculates the block number and number of blocks based on the d_blkshift value, it worked for a zpool full of NVMe SSDs which are formatted to be 4k block size.

However, if a 4k device is used as zpool cache drive, it won't work, because zfs does not respect the ashift value of the drive, and sends I/O requests of size = n*512 to the device.


Files

rdisk.c (1.24 KB) rdisk.c Youzhong Yang, 2017-01-26 07:53 PM

History

#1

Updated by Youzhong Yang over 2 years ago

We've tested the patch by using the following scripts:

# cat send-recv
#!/bin/bash

action=$1

if [ "$action" == "send" ]; then
   snapshot=$2
   host=$3
   port=$4
   loop=$5
   while true; do
      zfs send -R $snapshot | mbuffer -O ${host}:${port} -m 4g -s 2048k
      if [ "$loop" != "1" ]; then
         break
      fi
      echo sleeping 120 seconds ....
      sleep 120
   done
elif [ "$action" == "recv" ]; then
   port=$2
   zfs=$3
   loop=$4
   while true; do
      if [ "$loop" == "1" ]; then
         zfs destroy -r -R $zfs
      fi
      mbuffer -I ${port} -m 4g -s 2048k | zfs recv -Fv $zfs
      if [ "$loop" != "1" ]; then
         break
      fi
      sleep 5
   done
else
   echo Usage:
   echo     $0 send snapshot host port loop
   echo     $0 recv port zfs loop
   echo "" 
fi

which does zfs/send of a few hundred GiB of data. After zfs recv, we did zpool scrub and it reported no error.
# cat run-dds
#!/bin/bash
for i in `seq 1 1000`; do
    dd if=/dev/zero of=file00 bs=1M count=102400 oflag=sync &
    dd if=/dev/zero of=file01 bs=1M count=102400 oflag=sync &
    dd if=/dev/zero of=file02 bs=1M count=102400 oflag=sync &
    dd if=/dev/zero of=file03 bs=1M count=102400 oflag=sync &
    wait
    rm file00 file01 file02 file03
done

which is for NVMe benchmarking.

The patch has been tested for both 512B block size and 4K block size.

#2

Updated by Youzhong Yang over 2 years ago

Additional info - two systems were used for the testing:

Baseboard: Supermicro X10DRU-i+ (System SYS-1028U-TN10RT+)
Memory: 768G
CPU: 2 x Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz
NVMe SSDs: 4 x Intel DC P3700 2TB (SSDPE2MD020T4)

Baseboad: Supermicro X10DRU-i+ (System SYS-2028U-TN24R4T+)
Memory: 768G
CPU: 2 x Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
NVMe SSDs: 2 x Intel DC P3700 800GB (SSDPE2MD800G4)
NVMe SSDs: 22 x Intel DC P3600 400GB (SSDPE2ME400G4)

#3

Updated by Electric Monk over 2 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit d2c5b266b717b923ea0e28b925ddb8e66dd98b42

commit  d2c5b266b717b923ea0e28b925ddb8e66dd98b42
Author: Youzhong Yang <yyang@mathworks.com>
Date:   2017-01-25T22:09:30.000Z

    7367 blkdev: support block size larger than 512
    Reviewed by: Garrett D'Amore <garrett@damore.org>
    Reviewed by: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
    Approved by: Dan McDonald <danmcd@omniti.com>

#4

Updated by Youzhong Yang over 2 years ago

Commit d2c5b266 was backed out.

New webrev here for review:
http://cr.illumos.org/~webrev/yyang/7367-1/

A test program is attached.

#5

Updated by Electric Monk over 2 years ago

git commit c0591a0ce5e26f7f32f7f6e8ae0ca4193cd2e50e

commit  c0591a0ce5e26f7f32f7f6e8ae0ca4193cd2e50e
Author: Youzhong Yang <yyang@mathworks.com>
Date:   2017-02-27T13:17:14.000Z

    7367 blkdev: support block size larger than 512
    Reviewed by: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
    Reviewed by: Robert Mustacchi <rm@joyent.com>
    Reviewed by: Garrett D'Amore <garrett@damore.org>
    Approved by: Dan McDonald <danmcd@omniti.com>

#6

Updated by Electric Monk over 2 years ago

git commit 04904ca2a4492f1b3e2ec393f82d81a9a1c9611e

commit  04904ca2a4492f1b3e2ec393f82d81a9a1c9611e
Author: Dan McDonald <danmcd@omniti.com>
Date:   2017-01-26T17:02:38.000Z

    backout: 7367 blkdev: support block size larger than 512
        (Needs more work.)
    This reverts commit d2c5b266b717b923ea0e28b925ddb8e66dd98b42.

#7

Updated by Joshua M. Clulow 11 months ago

Note that the automated commit notifications are out-of-order in the issue comments. Despite immediate appearances the change was integrated, backed out, then finally integrated -- as reflected in the Date: headers above.

Also available in: Atom PDF