Project

General

Profile

Bug #1906

pkcs11_tpm will hang indefinitely if localhost:30003 is reachable

Added by Rich Lowe almost 8 years ago. Updated over 3 years ago.

Status:
New
Priority:
High
Assignee:
-
Category:
lib - userland libraries
Start date:
2011-12-19
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

You can make any consumer of crypto hang indefinitely by doing this:

nc -l 30003

This is because pkcs11_tpm is far from sufficiently careful, and regardless of that port actually speaking whatever protocol it wants, will continue to read from it until reads timeout (I haven't actually checked it'll re-try on timeout, but even doing it once is bad enough).

Discovered by OI, in production.

History

#1

Updated by Wyllys Ingersoll almost 8 years ago

pkcs11_tpm is not the problem. The root of the problem is the TSS library - libtspi.so.1.1.3 - which comes from the TrouSerS package. All commands to the TPM go through the 'tcsd' service, which listens on port 30003 by default. The library has no idea if the listener on the other end is the actual tcsd service or not.

TPM stuff should not be enabled in the zone anyway - remove the pkcs11_tpm provider from the pkcs11.conf configuration:
  1. cryptoadm uninstall provider=/usr/lib/security/\$ISA/pkcs11_tpm.so.1

Also, to mitigate the tcsd issue with netcat you could change the default port in /etc/security/tcsd.conf to be a privileged port so that it cannot so easily be DoS-ed by non-privileged users.

I suggest filing a bug with the TrouSerS developers - http://sourceforge.net/projects/trousers/

#2

Updated by Dan McDonald almost 8 years ago

everywhere(src/lib)[0]% pkg search /usr/lib/libtspi.so.1.1.3 
INDEX      ACTION VALUE                    PACKAGE
path       file   usr/lib/libtspi.so.1.1.3 pkg:/library/security/trousers@0.3.4-0.151.1
everywhere(src/lib)[0]%    

Trousers is not in illumos-gate. I've also noticed it's updated to 0.3.7 now; perhaps that revision has fixed the problems?

The hackaround of removing the pkcs11_tpm provider is probably the best defense in the short-term. Not sure how many folks are actually using their TPM right now anyway.

Rich --> Can you pstack(1) the hung crypto-using process when this occurs? I'm not sure if the hang is in illumos-gate code or in Trousers.

#3

Updated by Wyllys Ingersoll almost 8 years ago

Im almost certain it is NOT fixed in the latest trousers package, I doubt that they are even aware that it is a problem.

#4

Updated by Albert Lee almost 8 years ago

Wyllys Ingersoll wrote:

Im almost certain it is NOT fixed in the latest trousers package, I doubt that they are even aware that it is a problem.

Thanks. It's downright stupid if they're not using a Unix domain socket by default.

#5

Updated by Albert Lee almost 8 years ago

Typical stack, courtesy of darrenr:
3409: /usr/bin/openssl x509 -noout -hash -in CA-certs/IGC_A.pem
fed335e5 connect (4, 80466b0, 10, 1)
fe7dca03 connect (4, 80466b0, 10, fe93f417) + 23
fe93f480 send_init (80d1b20, 0, 80d8150, fe93f16a) + b8
fe93f22d sendTCSDPacket (80d1b20, 0, 0, fe93afb2) + d1
fe93afe0 RPC_OpenContext_TP (80d1b20, 8046768, 804676c, 8046764) + 3c
fe93b12d RPC_OpenContext (c000000c, 80d7a40, 1, fe93a93e) + 4d
fe93a9d1 Tspi_Context_Connect (c000000c, 0, c0000004, fe9bde0a) + a1
fe9bde34 open_tss_context (804681c, 0, 4c, fe9a8d6c) + 38
fe9a8da8 C_GetInfo (8046950, fec17ff0, 8046948, fec15e5d) + 58
fec16051 pkcs11_slot_mapping (80cdc98, 0, 80469f8, fec112a7) + 205
fec112d3 C_Initialize (0, feebd494, 8046a88, fee4796f) + fb
fee479ba pk11_library_init (80cdab8, feeea000, 8046aa8, fee4732a) + ae
fee47342 bind_pk11 (80cdab8, feeea000, 8046ac8, fee47562) + 2e
fee47571 engine_pk11 (8046af0, fefc06a4, fefa0818, 8047150, 80c8c18, feeea000) + 25
fee475d8 ENGINE_load_pk11 (8046f98) + 40
fee44a6c ENGINE_load_builtin_engines (feffb388, feffbafc, 8046b28, fefd191e, 3, 3) + 18
080622f5 main (6, 8046fc8, 8046fe4, 8046fbc) + 1ad
08061fad _start (6, 80471b8, 80471c9, 80471ce, 80471d5, 80471db) + 7d

#6

Updated by Dan X over 7 years ago

FYI, the current version of Trousers in the Userland gate uses a UNIX Domain socket by default.

This has not been fixed in the upstream source from IBM (it still uses TCP sockets only).

#8

Updated by Dan McDonald over 5 years ago

A summary:

The reason Illumos 1906 is a problem is because an illumos-gate library (pkcs11_tpm) depends on an external service (Trousers, aka. tcsd and libtspi, but libtspi directly) whose implementation sucks. Strictly speaking, this is NOT an illumos-gate problem, but it sure feels like one. It is a distro problem, because Trousers isn't part of illumos-gate.

There are three approaches to fixing this problem.

1.) Break the dependency on Trousers. In a deployed system, this is as simple as commenting out the pkcs11_tpm line in /etc/crypto/pkcs11.conf. In the short term, we will be doing this for OmniOS very soon. SmartOS already does this by NOT shipping pkcs11_tpm AND (naturally) commenting out the line in pkcs11.conf. This fragments distros, but allows those who really want TPM-via-trousers to keep it.

2.) Improve detection of whether or not Trousers is running. The problem with solutions of this flavor are that we need Trousers to present a better, unified-across-all-distros method of detection. For example, OmniOS's Trousers doesn't use SMF (which invalidated a use-SMF's smf_get_status() solution). Right now, the failure detection is effectively a failed connect(localhost:30003), which causes all the pain in the first place!

3.) Fix Trousers, and get all Trousers-using distros to use it. Oracle fixed this (see above), and it's in their open-source, so we can use it. The problem is, again, one of coordination. OTOH, no illumos-gate code would have to change.

#9

Updated by Bob Lu over 3 years ago

Dan McDonald wrote:

A summary:

The reason Illumos 1906 is a problem is because an illumos-gate library (pkcs11_tpm) depends on an external service (Trousers, aka. tcsd and libtspi, but libtspi directly) whose implementation sucks. Strictly speaking, this is NOT an illumos-gate problem, but it sure feels like one. It is a distro problem, because Trousers isn't part of illumos-gate.

There are three approaches to fixing this problem.

1.) Break the dependency on Trousers. In a deployed system, this is as simple as commenting out the pkcs11_tpm line in /etc/crypto/pkcs11.conf. In the short term, we will be doing this for OmniOS very soon. SmartOS already does this by NOT shipping pkcs11_tpm AND (naturally) commenting out the line in pkcs11.conf. This fragments distros, but allows those who really want TPM-via-trousers to keep it.

2.) Improve detection of whether or not Trousers is running. The problem with solutions of this flavor are that we need Trousers to present a better, unified-across-all-distros method of detection. For example, OmniOS's Trousers doesn't use SMF (which invalidated a use-SMF's smf_get_status() solution). Right now, the failure detection is effectively a failed connect(localhost:30003), which causes all the pain in the first place!

3.) Fix Trousers, and get all Trousers-using distros to use it. Oracle fixed this (see above), and it's in their open-source, so we can use it. The problem is, again, one of coordination. OTOH, no illumos-gate code would have to change.

Dear Dan,

We found that with all the three solution you provided, the sshd sill hang up when the pool is in a suspend state.
the port of 30003 will be used by quota process, Do we have other solutions for this case? Maybe this is a corner case.

#10

Updated by Dan McDonald over 3 years ago

This is the first mention of a "quota process". I'm not sure what you're talking about.

Also available in: Atom PDF