Project

General

Profile

Actions

Bug #16597

open

pkt_resid value is incorrect for SATA drives when truncation occurs

Added by Jason King 11 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
kernel
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

A number of SCSI commands (generally those retrieving information from the device vs. commands doing data I/O) include an ALLOCATION LENGTH field in their CDB. Per SPC-5 4.2.5.6,

The ALLOCATION LENGTH field specifies the maximum number of bytes or blocks that an application client has allocated in the Data-In Buffer.

That is, it is the limit on the amount of data to return. For such commands, the response generally includes a fixed size header that indicates the total amount of data available to read (to allow the caller to discover the necessary buffer size). SPC-5 also indicates that when more data is available than ALLOCATION LENGTH, the output should not be modified in any way -- i.e. the output is just truncated to ALLOCATION LENGTH (as presumably a caller can always use a buffer at least as large as the fixed sized portion of the response to discover the correct size needed and retry).

Within the SATL layer, a number of commands calculate the pkt_resid (how much data was not transferred) as alc_len > count ? alc_len - count : 0; where alc_len is the value of ALLOCATION LENGTH field and count is the amount of data that's been copied to the output buffer. This doesn't seem quite correct. This can be a bit confusing, but there's four lengths that are relevant here:
  • The total amount of data available to return (irrespective of any other limits on size)
  • ALLOCATION LENGTH
  • The actual buffer size (bp.b_bcount)
  • The amount of data returned

We do not treat the condition bp.b_bcount < ALLOCATION LENGTH as an error, and generally bp.b_bcount > ALLOCATION LENGTH isn't a problem (as the results will include how much of the resulting buffer contains valid data). The amount of data that is actually written out is the minimum value (data available, ALLOCATION LENGTH, bp.b_count) so we never exceed ALLOCATION LENGTH, nor will we overflow the supplied buffer for the results.

When the amount of data written is equal to the total amount of data available, or if the amount of data written is equal to ALLOCATION LENGTH, pkt_resid should be 0. The only time pkt_resid should be non-zero is if the result has been truncated because bp.b_count is less than ALLOCATION LENGTH. In this situation pkt_resid should be the difference of MIN(ALLOCATION LENGTH, total data available) (the amount of data the command was expected to return) and bp.b_count.

Currently, this doesn't seem to be causing a problem. It was discovered while root causing problems with the sg3 utils, however it appears the sg3 utils only care that pkt_resid is less than than ALLOCATION LENGTH, and other things don't appear to care about pkt_resid at all. The issue with sg3 utils is fixed by #16594 where the value of ALLOCATION LENGTH was incorrectly extracted from the CDB, resulting in using invalid values (which would result in what should be impossible pkt_resid values).

With the fix for #16594, we can still return invalid pkt_resid values, though they're now at least sane but incorrect, but also incorrect values that at least don't seem to cause things to fail (so probably not as critical).


Related issues

Related to illumos gate - Bug #16594: SATL confuses lengths resulting in bad resid valuesClosedJason King

Actions
Actions #1

Updated by Jason King 11 days ago

  • Related to Bug #16594: SATL confuses lengths resulting in bad resid values added
Actions

Also available in: Atom PDF