Project

General

Profile

Actions

Bug #4909

closed

fork during pkcs11 operation can deadlock child process

Added by Eric Sproul about 9 years ago. Updated about 9 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2014-06-04
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:
External Bug:

Description

Repeatedly see this on an OmniOS r151006 system, running Oracle Java 7 and Jenkins. After some period of run time, a Jenkins SCM polling operation will get stuck and be unrecoverable until the whole app is shut down.

The stuck process has a stack like:

44001:  java -Dsvnkit.http.sslProtocols=SSLv3 -Xmx512m -jar /jenkins/app/jenki
 feea0f69 lwp_park (0, 0, 0)
 fee9aed5 mutex_lock_impl (d3919120, 0, 7601, 518941d, 5b, d3883f84) + 291
 fee9b7ef mutex_lock (d3919120) + 19
 d390531d pkcs11_close_common (d391911c, d3919120, 0, d38f1e00) + 21
 d39053c2 pkcs11_close_urandom (d38f2260, d03a0fb0, d1d83d1c, d388c779, d38ef3c4, 0) + 26
 d388c80c finalize_common (1) + a1
 d388c923 softtoken_fini (feffb0a4, d1d83d88, fefd213b, feffc8c0, 0, d3920818) + 4b
 d38d238a _fini    (feffc8c0, 0, d3920818, f, d3980660, fee1d23c) + 1a
 fefd213b call_fini (feffc8c0, d03a0fb0, d3980c18, 0) + b3
 fefd8727 remove_hdl (d3920cc0, d3980c18, 0, fefd4225, fef14000, d2adea40) + 744
 fefd42cf dlclose_core (d3920cc0, d3980c18, feffc8c0, fef181c0, 0, fefceb85) + b8
 fefd431f dlclose_intn (d3920cc0, d3980c18, fefd4391, d3920cc0) + 25
 fefd4405 dlclose_check (d3920cc0, d3980c18, d1d83ef8, d395c000, 8686ca0, 8686ca8) + 92
 fefd444d dlclose  (d3920cc0, fedb0080, d1d83f28, 0) + 3d
 d39443ee pkcs11_slottable_delete (d395c45c, 0, 18, d395c000, fd960680, fedb0080) + b5
 d394034d finalize_common (0, fedb0080, d1d83f88, d388c761, d38ef3c4, fef14000) + 10f
 d39403ac pkcs11_fini (fef14000, fd960640, d1d83f98, fee1d524, d2adea40, fef181c0) + 57
 d39403cd pkcs11_fork_child_fini (d2adea40, fef181c0, d1d83fd8, fee942c9, fef18240, d2adebcc) + b
 fee1d524 _postfork_child_handler (fef18240, d2adebcc, fed933b0, fef14000, fef14824, fef18240) + 23
 fee942c9 forkx    (0, 85593b0, c, 700, 0) + 19c
 fee94373 fork     (d2adea40, fef14824, d1d84048, 0, fef14824, b0487e8) + 1e
 fd85d75b startChild (93cb130, d1d8418c, b0487e8, b048b68) + 2b
 fd85de66 Java_java_lang_UNIXProcess_forkAndExec (93cb130, d1d8418c, 1, d1d84184, d1d84180, d1d8417c) + 2da
 fa60b158 * java/lang/UNIXProcess.forkAndExec(I[B[B[BI[BI[B[IZ)I+0
 fad50e0c * *java/lang/UNIXProcess.<init>([B[BI[BI[B[IZ)V [compiled] +29 (line 271)
 fad50e0c * *java/lang/ProcessImpl.start([Ljava/lang/String;Ljava/util/Map;Ljava/lang/String;[Ljava/lang/ProcessBuilder$Redirect;Z)Ljava/lang/Process;+434 (line 207)

With Dan McDonald's help, I captured the act when it happened (forking while pkcs11/libcryptoutil holds urandom_mutex) using the attached D script, which showed me this:

CPU     ID                    FUNCTION:NAME
 11  12265                    forksys:entry Holy shit, busted!

 18  10243 mutex_unlock_queue:mutex-release
              libc.so.1`mutex_unlock_queue+0x2f
              libc.so.1`mutex_unlock+0x11b
              libcryptoutil.so.1`pkcs11_read_common+0x3b
              libcryptoutil.so.1`pkcs11_read_urandom+0x2c
              libcryptoutil.so.1`pkcs11_get_urandom+0x26
              pkcs11_softtoken.so.1`C_GenerateRandom+0xb9
              libpkcs11.so.1`C_GenerateRandom+0x70
              libj2pkcs11.so`Java_sun_security_pkcs11_wrapper_PKCS11_C_1GenerateRandom+0x76
              0xfa60b158
              0xfa6035fb
              0xfab8f350

The D script takes two arguments:

./fork-while-mutex.d <pid> 0x<mutex_addr>

where "mutex_addr" is the hex addr of the mutex in question, which I obtained with:
echo urandom_mutex=J | mdb -p <pid>


Files

fork-while-mutex.d (369 Bytes) fork-while-mutex.d D script to catch forking while holding a mutex Eric Sproul, 2014-06-04 07:42 PM

Related issues

Is duplicate of illumos gate - Bug #1667: pkcs11 may deadlock when multi-threaded consumers forkClosed2011-10-19

Actions
Actions #1

Updated by Rich Lowe about 9 years ago

  • Status changed from New to Closed

Dup of #1667

Actions

Also available in: Atom PDF