Bug #10449
openwsdiff hangs forever after an exception is thrown
0%
Description
I was running a wsdiff
between two workspaces where the comparison of a particular file induced an encoding error. That error was thrown but not caught in the worker thread. The process wrote no further output to the result file, and appeared to be basically idle.
Before hanging, this was the output:
18:34:51 + /opt/onbld/bin/wsdiff -r artefacts/wsdiff.txt previous/proto/root_i386 current/proto/root_i386 18:37:48 platform/i86xpv/kernel/amd64/unix 18:38:27 platform/i86pc/kernel/amd64/unix 18:41:54 etc/motd 18:45:40 usr/sbin/iasl 18:46:14 usr/sbin/acpixtract 18:46:15 usr/sbin/acpidump 19:38:36 usr/lib/spell/hlistb 19:38:36 Exception in thread Thread-2: 19:38:36 Traceback (most recent call last): 19:38:36 File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner 19:38:36 self.run() 19:38:36 File "/opt/onbld/bin/wsdiff", line 1233, in run 19:38:36 compareOneFile(base, ptch, False) 19:38:36 File "/opt/onbld/bin/wsdiff", line 1277, in compareOneFile 19:38:36 return compareBasic(base, ptch, quiet, fileType) 19:38:36 File "/opt/onbld/bin/wsdiff", line 1112, in compareBasic 19:38:36 difference(fileName, fileType, diffs) 19:38:36 File "/opt/onbld/bin/wsdiff", line 195, in difference 19:38:36 log_difference(f, dtype, diffs) 19:38:36 File "/opt/onbld/bin/wsdiff", line 212, in log_difference 19:38:36 print(diffs[:diffs_sz_thresh], file=log) 19:38:36 UnicodeEncodeError: 'ascii' codec can't encode characters in position 64-66: ordinal not in range(128) 19:38:36 19:47:31 usr/share/lib/java/javadoc/dtrace/api/org/opensolaris/os/dtrace/class-use/StackFrame.html 19:47:31 usr/share/lib/java/javadoc/dtrace/api/org/opensolaris/os/dtrace/class-use/ProcessEvent.html 19:47:31 usr/share/lib/java/javadoc/dtrace/api/org/opensolaris/os/dtrace/class-use/ErrorEvent.html
The stuck wsdiff
process has three worker threads (from the original total of 4, for a 4 CPU VM?) remaining, all of which are waiting to acquire a Python-level lock:
root@jenkins-oi0:~# pstack 16110 16110: /usr/bin/python2.7 /opt/onbld/bin/wsdiff -r artefacts/wsdiff.txt previ --------------------- thread# 1 / lwp# 1 --------------------- fed265d5 pollsys (8045590, 0, 8045628, 0) fecbad39 pselect (0, fed9f020, fed9f020, fed9f020, 8045628, 0) + 232 fecbb03b select (0, 0, 0, 0, 8045688) + 8e fe951d8c time_sleep (0, fe3dd02c, 0, feeba79b) + ac feeb9f84 PyEval_EvalFrameEx (1ff, 3cfa8ed9, fef55000, feebdbcd) + 6024 feebdbcd PyEval_EvalCodeEx (fe8a0430, fe8a0430, fedf97ab, feeb9c9f, fea11410, feb97714) + 7bd feeb9c9f PyEval_EvalFrameEx (fed95804, ffffffff, 8045958, feebdbcd) + 5d3f feebdbcd PyEval_EvalCodeEx (fed95000, fed97870, 8045a08, feebdf0e, fea11458, feb97714) + 7bd feebdf0e PyEval_EvalCode (fea11458, feb97714, feb97714, 83ddbc0, fed95000, fed97870) + 2e feee2373 PyRun_FileExFlags (fed97870, 8045dc3, 101, feb97714, feb97714, 1) + 73 feee3905 PyRun_SimpleFileExFlags (fed97870, 8045dc3, 1, 8045b3c) + d5 feee400d PyRun_AnyFileExFlags (fed97870, 8045dc3, 1, 8045b3c, fef693c0, 0) + 6d feefc494 Py_Main (6, 8045c6c, 8045c6c, 8051044) + dd4 08051057 main (8045c0c, feda2308, 8045c48, 8050f18, 6, 8045c6c) + 27 08050f18 _start_crt (6, 8045c6c, f7d9bdb0, 0, 0, 0) + 97 08050dea _start (6, 8045db0, 8045dc3, 8045dd9, 8045ddc, 8045df1) + 1a --------------------- thread# 2 / lwp# 2 --------------------- fed21ec9 lwp_park (0, 0, 0) fed1575e sema_wait (8429fe0) + 19 fed07ecd sem_wait (8429fe0) + 35 feef7dd9 PyThread_acquire_lock (8429fe0, 1, fdf5f178, feefe23d, feb9b140, fe8f10e0) + 99 feefe250 lock_PyThread_acquire_lock (feb9b140, feb6102c, 0, fed95000) + 50 feeb9f84 PyEval_EvalFrameEx (fed9a340, fed95000, fdf5f2b8, feebdbcd) + 6024 feebdbcd PyEval_EvalCodeEx (1ff, 641dd4a6, feb62020, feeb9c9f, febe39b0, feb97714) + 7bd feeb9c9f PyEval_EvalFrameEx (24, 4, fdf5f488, feebdbcd) + 5d3f feebdbcd PyEval_EvalCodeEx (1ff, 7747505b, fdf5f4b8, feeb9c9f, febe34e8, feb97714) + 7bd feeb9c9f PyEval_EvalFrameEx (c, 1, fdf5f628, feebdbcd) + 5d3f feebdbcd PyEval_EvalCodeEx (1ff, 72d3e7f1, fdf5f658, feeb9c9f, fea11140, feb97714) + 7bd feeb9c9f PyEval_EvalFrameEx (fed9a340, fed95000, fdf5f798, feebdbcd) + 5d3f feebdbcd PyEval_EvalCodeEx (1ff, 2e0f5864, feb62020, feeb9c9f, fea113c8, feb97714) + 7bd feeb9c9f PyEval_EvalFrameEx (fe8f0d10, fe8a82d4, fdf5f938, feebdbcd) + 5d3f feebdbcd PyEval_EvalCodeEx (fe3dd82c, fe3ec36c, fdf5f998, feeb9c9f, fea112a8, feb97714) + 7bd feeb9c9f PyEval_EvalFrameEx (fe87a98c, fe88172c, fdf5fb68, feeb9ee9, fe3ec36c, 0) + 5d3f feeb9ee9 PyEval_EvalFrameEx (0, 0, 0, feebdbcd) + 5f89 feebdbcd PyEval_EvalCodeEx (0, 0, 0, fee2e785, fe91fde8, fe9178ac) + 7bd fee2e785 function_call (fe8816bc, fe9165ec, 0, fee64ee2, fe9165f8, 0) + 75 fedf8fde PyObject_Call (fe8816bc, fe9165ec, 0, 0, 0, fe3dd82c) + 4e fee0b4a8 instancemethod_call (fe898cac, feb6102c, 0, fed07ecd, 8459da0, fef55000) + a8 fedf8fde PyObject_Call (fe898cac, feb6102c, 0, feef7d4a, fef55000, 842c728) + 4e feeb391c PyEval_CallObjectWithKeywords (fe898cac, feb6102c, 0, feefe65f) + 4c feefe68b t_bootstrap (83ddfd0, 0, 0, 0) + 4b fed21cdb _thrp_setup (fdf70240) + 88 fed21e70 _lwp_start (fdf70240, 0, 0, 0, 0, 0) --------------------- thread# 4 / lwp# 4 --------------------- fed21ec9 lwp_park (0, 0, 0) fed1575e sema_wait (8429fe0) + 19 fed07ecd sem_wait (8429fe0) + 35 feef7dd9 PyThread_acquire_lock (8429fe0, 1, fdb21178, feefe23d, feb9b140, fe8f10e0) + 99 feefe250 lock_PyThread_acquire_lock (feb9b140, feb6102c, 0, fed95000) + 50 feeb9f84 PyEval_EvalFrameEx (fed95000, 0, 0, feebdbcd) + 6024 feebdbcd PyEval_EvalCodeEx (1ff, 641dd4a6, feb62020, feeb9c9f, febe39b0, feb97714) + 7bd feeb9c9f PyEval_EvalFrameEx (24, 4, fdb21488, feebdbcd) + 5d3f feebdbcd PyEval_EvalCodeEx (1ff, 7747505b, fdb214b8, feeb9c9f, febe34e8, feb97714) + 7bd feeb9c9f PyEval_EvalFrameEx (c, 1, fdb21628, feebdbcd) + 5d3f feebdbcd PyEval_EvalCodeEx (1ff, 72d3e7f1, fdb21658, feeb9c9f, fea11140, feb97714) + 7bd feeb9c9f PyEval_EvalFrameEx (fed9a340, fed95000, fdb21798, feebdbcd) + 5d3f feebdbcd PyEval_EvalCodeEx (1ff, 2e0f5864, feb62020, feeb9c9f, fea113c8, feb97714) + 7bd feeb9c9f PyEval_EvalFrameEx (fe8f0d10, fe8a8acc, fdb21938, feebdbcd) + 5d3f feebdbcd PyEval_EvalCodeEx (fe3ddcec, fe3d433c, fdb21998, feeb9c9f, fea112a8, feb97714) + 7bd feeb9c9f PyEval_EvalFrameEx (fd26616c, fe88172c, fdb21b68, feeb9ee9, fe3d433c, 0) + 5d3f feeb9ee9 PyEval_EvalFrameEx (0, 0, 0, feebdbcd) + 5f89 feebdbcd PyEval_EvalCodeEx (0, 0, 0, fee2e785, fe91fde8, fe9178ac) + 7bd fee2e785 function_call (fe8816bc, fe3dd90c, 0, fee64ee2, fe3dd918, 0) + 75 fedf8fde PyObject_Call (fe8816bc, fe3dd90c, 0, 0, 0, fe3ddcec) + 4e fee0b4a8 instancemethod_call (fd94f414, feb6102c, 0, fed07ecd, 8459da0, fef55000) + a8 fedf8fde PyObject_Call (fd94f414, feb6102c, 0, feef7d4a, fef55000, 840d1d8) + 4e feeb391c PyEval_CallObjectWithKeywords (fd94f414, feb6102c, 0, feefe65f) + 4c feefe68b t_bootstrap (83de010, 0, 0, 0) + 4b fed21cdb _thrp_setup (fdf71240) + 88 fed21e70 _lwp_start (fdf71240, 0, 0, 0, 0, 0) --------------------- thread# 5 / lwp# 5 --------------------- fed21ec9 lwp_park (0, 0, 0) fed1575e sema_wait (8429fe0) + 19 fed07ecd sem_wait (8429fe0) + 35 feef7dd9 PyThread_acquire_lock (8429fe0, 1, fd54e178, feefe23d, feb9b140, fe8f10e0) + 99 feefe250 lock_PyThread_acquire_lock (feb9b140, feb6102c, 0, fed95000) + 50 feeb9f84 PyEval_EvalFrameEx (fed9a340, fed95000, fd54e2b8, feebdbcd) + 6024 feebdbcd PyEval_EvalCodeEx (1ff, 641dd4a6, feb62020, feeb9c9f, febe39b0, feb97714) + 7bd feeb9c9f PyEval_EvalFrameEx (24, 4, fd54e488, feebdbcd) + 5d3f feebdbcd PyEval_EvalCodeEx (1ff, 7747505b, fd54e4b8, feeb9c9f, febe34e8, feb97714) + 7bd feeb9c9f PyEval_EvalFrameEx (c, 1, fd54e628, feebdbcd) + 5d3f feebdbcd PyEval_EvalCodeEx (1ff, 72d3e7f1, fd54e658, feeb9c9f, fea11140, feb97714) + 7bd feeb9c9f PyEval_EvalFrameEx (fed9a340, fed95000, fd54e798, feebdbcd) + 5d3f feebdbcd PyEval_EvalCodeEx (1ff, 2e0f5864, feb62020, feeb9c9f, fea113c8, feb97714) + 7bd feeb9c9f PyEval_EvalFrameEx (fe8f0d10, fdb4646c, fd54e938, feebdbcd) + 5d3f feebdbcd PyEval_EvalCodeEx (fe3dddac, fe3d44c4, fd54e998, feeb9c9f, fea112a8, feb97714) + 7bd feeb9c9f PyEval_EvalFrameEx (fd26658c, fe88172c, fd54eb68, feeb9ee9, fe3d44c4, 0) + 5d3f feeb9ee9 PyEval_EvalFrameEx (0, 0, 0, feebdbcd) + 5f89 feebdbcd PyEval_EvalCodeEx (0, 0, 0, fee2e785, fe91fde8, fe9178ac) + 7bd fee2e785 function_call (fe8816bc, fe8a9c0c, 0, fee64ee2, fe8a9c18, 0) + 75 fedf8fde PyObject_Call (fe8816bc, fe8a9c0c, 0, 0, 0, fe3dddac) + 4e fee0b4a8 instancemethod_call (fd94f43c, feb6102c, 0, fed07ecd, 8459da0, fef55000) + a8 fedf8fde PyObject_Call (fd94f43c, feb6102c, 0, feef7d4a, fef55000, 845fb58) + 4e feeb391c PyEval_CallObjectWithKeywords (fd94f43c, feb6102c, 0, feefe65f) + 4c feefe68b t_bootstrap (83de030, 0, 0, 0) + 4b fed21cdb _thrp_setup (fdf71a40) + 88 fed21e70 _lwp_start (fdf71a40, 0, 0, 0, 0, 0)
The stuck process appears to be sleeping for one second in a loop:
root@jenkins-oi0:~# truss -D -p 16110 /1: 0.1207 pollsys(0x08045590, 0, 0x08045628, 0x00000000) = 0 /5: lwp_park(0x00000000, 0) (sleeping...) /4: lwp_park(0x00000000, 0) (sleeping...) /2: lwp_park(0x00000000, 0) (sleeping...) /1: pollsys(0x08045590, 0, 0x08045628, 0x00000000) (sleeping...) /1: 1.0011 pollsys(0x08045590, 0, 0x08045628, 0x00000000) = 0 /1: pollsys(0x08045590, 0, 0x08045628, 0x00000000) (sleeping...) /1: 1.0006 pollsys(0x08045590, 0, 0x08045628, 0x00000000) = 0 /1: pollsys(0x08045590, 0, 0x08045628, 0x00000000) (sleeping...) /1: 1.0006 pollsys(0x08045590, 0, 0x08045628, 0x00000000) = 0 /1: 1.0004 pollsys(0x08045590, 0, 0x08045628, 0x00000000) = 0 /1: pollsys(0x08045590, 0, 0x08045628, 0x00000000) (sleeping...) /1: 1.0006 pollsys(0x08045590, 0, 0x08045628, 0x00000000) = 0 /1: pollsys(0x08045590, 0, 0x08045628, 0x00000000) (sleeping...) /1: 1.0006 pollsys(0x08045590, 0, 0x08045628, 0x00000000) = 0 /1: pollsys(0x08045590, 0, 0x08045628, 0x00000000) (sleeping...) /1: 1.0004 pollsys(0x08045590, 0, 0x08045628, 0x00000000) = 0 /1: pollsys(0x08045590, 0, 0x08045628, 0x00000000) (sleeping...) /1: 1.0005 pollsys(0x08045590, 0, 0x08045628, 0x00000000) = 0 /1: pollsys(0x08045590, 0, 0x08045628, 0x00000000) (sleeping...) /1: 1.0006 pollsys(0x08045590, 0, 0x08045628, 0x00000000) = 0
Which I suspect aligns with this code:
... # Wait for the threads to finish and do cleanup if interrupted debug("Waiting for the threads to finish") while True: if not True in [thread.isAlive() for thread in mythreads]: break else: # Some threads are still going time.sleep(1) ...
Looking at the log_difference
function, where the original exception occurred, it's not difficult to see what's happened:
197 # 198 # Do the actual logging of the difference to the results file 199 # 200 def log_difference(f, dtype, diffs) : 201 202 if logging : 203 log_lock.acquire() 204 print(f, file=log) 205 print("NOTE: " + dtype + " difference detected.", file=log) 206 207 difflen = len(diffs) 208 if difflen > 0 : 209 print('', file=log) 210 211 if not vdiffs and difflen > diffs_sz_thresh : 212 print(diffs[:diffs_sz_thresh], file=log) 213 print("... truncated due to length: " + 214 "use -v to override ...", file=log) 215 else : 216 print(diffs, file=log) 217 print('\n', file=log) 218 log.flush() 219 log_lock.release()
We're logging to a result file, so we take log_lock
on 203. When we attempt to print the differences on line 212, an encoding exception is thrown. The exception is not caught within the function, or indeed anywhere in the worker thread, so the thread exits at this point while holding log_lock
. The other three threads get to work on at most one more item, which they print to stdout
(using output_lock
, not held) and then when they go to write differences to the file, they hang forever -- this explains the three additional HTML files mentioned before the process came to rest.
There are quite a lot of places where we use the same pattern for different locks. The whole program should exit in error if an exception is not handled before unwinding to the processing loop in workerThread()
.
Related issues
Updated by Joshua M. Clulow over 3 years ago
- Related to Bug #10448: wsdiff explodes on encoding error added