Project

General

Profile

Actions

Bug #349

closed

hang during network boot (circular kcf dependency)

Added by Garrett D'Amore over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
High
Category:
kernel
Start date:
2010-10-15
Due date:
% Done:

100%

Estimated time:
Difficulty:
Tags:
Gerrit CR:

Description

Bryan Cantrill reported an interested deadlock/hang in kcf during netboot:

Has anyone been able to get illumos to net boot? We're seeing a hang
on boot (after/while plumbing the dld streams module), and it would be
very helpful to know if someone has gotten this to work (or is trying
to, for that matter)...

- Bryan

Just to follow up on this: as it turns out, this is an odd bug due to
a circular dependency between the kcf and swrand drivers. When we
boot over the network (or more accurately, from a ramdisk that has
itself been loaded by pxegrub or gPXE), we load kcf as one of dev/ip's
dependencies; kcf's _init, in turn, calls kcf_rnd_schedule_timeout(),
which calls crypto_mech2id_common(), which does an explicit modload()
of "crypto/swrand". The problem is that crypto/swrand has a
dependency on kcf -- which is busy (recall that we got here by trying
to load it), and this code path doesn't detect circular dependencies.
When one does not boot over the network, we load zfs before kcf or
swrand -- and zfs has an explicit dependency on swrand which then
pulls in kcf. Of course, this code path still calls
crypto_mech2id_common(), but because this loads a module by name, it
detects the circular dependency and kicks out. This bug should be
fixed (there are several ways to fix it), but we worked around it by
adding a "forceload: crypto/swrand" directive to /etc/system.

- Bryan
Actions

Also available in: Atom PDF