Project

General

Profile

Bug #968

fct driver sets incorrect fc-ct revision

Added by thomas joy over 8 years ago. Updated almost 8 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
kernel
Start date:
2011-04-27
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

I have been attempting to use openindiana as a fibre-channel target using comstar. When using fabric point-to-point with a brocade silkworm 3800 (larger version of the 3200) with latest firmware, connections can be flakey.

My initiator is a LP952, attempting to boot linux from SAN. The connection lasts long enough to load a bootloader, but returns a data error after that. This operation works perfectly on a point-to-point connection between the two HBAs, without using the switch.

I believe fct is passing an invalid CT revision to the switch, causing the instability.

I've found this message on the mailing list from Allen Andrews at Emulex:
http://mail.opensolaris.org/pipermail/storage-discuss/2010-January/007930.html

When COMSTAR issues CT requests to the switch it is now setting the CT Revision level in the request to 2. I believe it used to use revision level 1.

Also, I've tracked down a document from t10.org that seems authoritative in making the CT revision field obsolete. Note that the author states that the CT field must be set to 0x01.
ftp://ftp.t10.org/t11/document.05/05-053v1.pdf

Here's the relevant section of fct.h . As you can see, x_payload0 is set to 0x02, which I believe is incorrect. It should be 0x01.

retrieved from http://src.illumos.org/source/xref/illumos-opengrok/usr/src/uts/common/sys/fct.h

369 /*
370 * Fill CTIU preample
371 */
372 #ifdef lint
373 #define FCT_FILL_CTIU_PREAMPLE(x_payload, x_ctop) _NOTE(EMPTY)
374 #else
375 #define FCT_FILL_CTIU_PREAMPLE(x_payload, x_ctop) \
376 do { \
377 x_payload[0] = 0x02; \
378 x_payload[4] = 0xFC; \
379 x_payload[5] = 0x02; \
380 x_payload[8] = 0xFF & (x_ctop >> 8); \
381 x_payload[9] = 0xFF & (x_ctop); \
382 } while (0)
383 #endif
384

History

#1

Updated by thomas joy over 8 years ago

i've patched the file and recompiled using illumos-gate on my local oi box. Linux boots fine from SAN now, tested multiple times with the same initiator.

webrev of the change:
http://cr.illumos.org/view/mwxjxrcc/

#2

Updated by Dan McDonald almost 8 years ago

  • Assignee set to Dan McDonald
  • % Done changed from 0 to 50
  • Difficulty set to Medium
  • Tags set to needs-triage

A Nexenta engineer who needs to be placed on here will fix this. The only concern I ever had about this were any busted implementations that expected it to be set to 0x2. The Nexenta engineer in question understands these protocols better than I, and I hope he'll have something to report shortly.

#3

Updated by Dan McDonald almost 8 years ago

Turns out the word "preamble" was also spelled wrong. I now also have a webrev for you: http://www.kebe.com/~danmcd/webrevs/968/

Any testing volunteers would be vastly appreciated. We are testing this fix for NexentaStor internally, whose "fct" is identical to Illumos, but for proper testing, someone using Illumos per se should also test this fix. Additionally, this fix should be tested with known-to-work-already configurations, in case changing the definition from 0x2 to 0x1 makes previously-working clients stop working.

#4

Updated by Mathieu Simon almost 8 years ago

Dan has given instructions to change the value on runtime with:
echo "fct_create_solct+0x5e2/W0x100c641" | mdb -kw

Configuration:
  • Target: QLogic QLE2462, OpenIndiana dev 151
  • Initiator: QLogic QLA2460
  • Switch: Brocade SilkWorm 200E 4Gbit FC, running FabricOS 6.2.2e
  • Notes: No MPIO, the the initiator' BIOS was simply configured to boot of the OI target.
  • Booted OS: Debian Squeeze amd64 (Linux 2.6.32, grub 1.98+patches)

Result: Debian got installed with 0x2 and bootet fine, after the switch to 0x1 I rebooted to initiator was rebooted several times without issues. Until now, no regressions due to this change.

#5

Updated by Dan McDonald almost 8 years ago

Mat Simon wrote:

Dan has given instructions to change the value on runtime with:
echo "fct_create_solct+0x5e2/W0x100c641" | mdb -kw

NOTE: The previous value at this location should have been 0x200c641. This corresponds to the version change, and when I compiled before-and-after versions of "fct", it was the only difference in their respective disassemblies.

#6

Updated by Mike La Spina almost 8 years ago

Using revision 0x01 on FOS 5.1 and 6.2 (200E) (2009F16) show no signs of regression, the Brocade FOS logs are clean. No errors on any system after 36 hours of real usage.

LGTM

#7

Updated by Mathieu Simon almost 8 years ago

Another test with the same initiator-target combination but this time with a McDATA Sphereon 4500, M-EOSc 9.9.9: No errors after a couple of reboots. BTW: The Brocade didn't showed no errors after some 24h normal (albeit low load) usage.

I'll give RHEL/CentOS since they still use the legacy like we in illumos do. (current Debian uses Grub 2)

#8

Updated by Rich Lowe almost 8 years ago

  • Status changed from New to Resolved
  • % Done changed from 50 to 100
  • Tags deleted (needs-triage)

Resolved in r13579 commit:eb88ab5c1435

Also available in: Atom PDF