Bug #13700
closedpollhead_delete trips over bad pointer
100%
Description
I get the following panic when killing hanging postgresql processes from the postgres-HEAD (upcoming 14) test suit:
panic[cpu3]/thread=fffffec85ed5b860: BAD TRAP: type=d (#gp General protection) rp=fffffe0116d67a70 addr=fffffec8f1f90a60 postgres: #gp General protection addr=0xfffffec8f1f90a60 pid=857, pc=0xfffffffffba35475, sp=0xfffffe0116d67b60, eflags=0x10203 cr0: 80050033<pg,wp,ne,et,mp,pe> cr4: 26f8<vmxe,xmme,fxsr,pge,mce,pae,pse,de> cr2: 80da0d0 cr3: 2474c58000 cr8: 0 rdi: 6f7078012f746f6f rsi: fffffec86ba571a8 rdx: fffffec85ed5b860 rcx: 100 r8: a5 r9: 0 rax: 0 rbx: fffffec86ba571a8 rbp: fffffe0116d67b90 r10: fffffffffb8762dc r11: fffffec85ed5b860 r12: fffffffffbcce900 r13: fffffec857cd9248 r14: fffffec87699e000 r15: fffffec8f1f90a60 fsb: 0 gsb: fffffec83097f000 ds: 4b es: 4b fs: 0 gs: 1c3 trp: d err: 0 rip: fffffffffba35475 cs: 30 rfl: 10203 rsp: fffffe0116d67b60 ss: 38 fffffe0116d67970 unix:real_mode_stop_cpu_stage2_end+c40d () fffffe0116d67a60 unix:trap+c72 () fffffe0116d67a70 unix:cmntrap+e9 () fffffe0116d67b90 genunix:pollhead_delete+55 () fffffe0116d67bf0 poll:dpclose+81 () fffffe0116d67c20 genunix:dev_close+27 () fffffe0116d67c70 specfs:device_close+c0 () fffffe0116d67cf0 specfs:spec_close+19d () fffffe0116d67d70 genunix:fop_close+66 () fffffe0116d67db0 genunix:closef+5e () fffffe0116d67df0 genunix:closeall+57 () fffffe0116d67e80 genunix:proc_exit+429 () fffffe0116d67ea0 genunix:exit+b () fffffe0116d67ec0 genunix:rexit+15 () fffffe0116d67f10 unix:brand_sys_sysenter+1d2 () dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel > ::status debugging crash dump vmcore.3 (64-bit) from sin operating system: 5.11 illumos-ffe7853a48 (i86pc) build version: heads/master-0-gffe7853a48-dirty image uuid: c609e9f2-75b0-cd19-b04b-add90c63e31d panic message: BAD TRAP: type=d (#gp General protection) rp=fffffe0116d67a70 addr=fffffec8f1f90a60 dump content: kernel pages only > ::stackregs fffffe0116d67b90 pollhead_delete+0x55(fffffec857cd9248, fffffec86ba571a8) fffffe0116d67bf0 dpclose+0x81(a500000064, 3, 2, fffffec870da9cd0) fffffe0116d67c20 dev_close+0x27(a500000064, 3, 2, fffffec870da9cd0) fffffe0116d67c70 device_close+0xc0(fffffec8be0a0d40, 3, fffffec870da9cd0) fffffe0116d67cf0 spec_close+0x19d(fffffec8be0a0d40, 3, 1, 10, fffffec870da9cd0, 0) fffffe0116d67d70 fop_close+0x66(fffffec8be0a0d40, 3, 1, 10, fffffec870da9cd0, 0) fffffe0116d67db0 closef+0x5e(fffffec8dd0f5a40) fffffe0116d67df0 closeall+0x57(fffffec86ff88fc0) fffffe0116d67e80 proc_exit+0x429(1, 2) fffffe0116d67ea0 exit+0xb(1, 2) fffffe0116d67ec0 rexit+0x15(2) fffffe0116d67f10 _sys_sysenter_post_swapgs+0x14f()
I had that crash twice already at the same event. Unfortunately the following output is from the previous panic, but it should be the same:
elmer@ailbhein:~$ cat typescript Script started on March 18, 2021 at 03:30:08 PM CET elmer@ailbhein:~$ ps -fu elmer UID PID PPID C STIME TTY TIME CMD elmer 838 239 0 Mar 16 ? 0:00 psql -X -a -q -d regression -v HIDE_TABLEAM=on elmer 27983 27980 0 15:30:13 pts/3 0:00 ps -fu elmer elmer 373 364 0 Mar 16 ? 0:00 postgres -D /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test/regress/ elmer 364 239 0 Mar 16 ? 0:06 postgres -D /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test/regress/ elmer 857 364 0 Mar 16 ? 0:00 postgres -D /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test/regress/ elmer 20166 20165 0 Mar 16 ? 0:00 /usr/bin/perl ./run_branches.pl --run-all --verbose elmer 372 364 0 Mar 16 ? 0:11 postgres -D /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test/regress/ elmer 26210 26209 0 Mar 16 ? 0:00 gmake NO_LOCALE=1 check elmer 369 364 0 Mar 16 ? 0:01 postgres -D /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test/regress/ elmer 370 364 0 Mar 16 ? 0:01 postgres -D /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test/regress/ elmer 238 26210 0 Mar 16 ? 0:00 /bin/sh -c PATH="/export/home/elmer/c12/buildroot/HEAD/pgsql.build/tmp_install/ elmer 26209 20326 0 Mar 16 ? 0:00 sh -c { cd pgsql.build/src/test/regress && gmake NO_LOCALE=1 check; echo $? > / elmer 27073 27069 0 14:17:08 pts/2 0:00 -ksh elmer 368 364 0 Mar 16 ? 0:00 postgres -D /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test/regress/ elmer 20326 20166 0 Mar 16 ? 0:00 /usr/perl5/5.22/bin/perl ./run_build.pl --config ./build-farm.conf --verbose HE elmer 20165 9097 0 Mar 16 ? 0:00 sh -c ( cd /export/home/elmer/c12 && PATH=/usr/gnu/bin:$PATH ./run_branches.pl elmer 371 364 0 Mar 16 ? 0:04 postgres -D /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test/regress/ elmer 239 238 0 Mar 16 ? 0:00 ../../../src/test/regress/pg_regress --temp-instance=./tmp_check --inputdir=. - elmer 27979 27978 0 15:30:08 pts/2 0:00 script elmer 27980 27979 0 15:30:08 pts/3 0:00 /bin/ksh -i elmer 27978 27073 0 15:30:08 pts/2 0:00 script elmer@ailbhein:~$ pgrep psql 838 elmer@ailbhein:~$ pstack 838 838: psql -X -a -q -d regression -v HIDE_TABLEAM=on fe81c5d5 pollsys (8047160, 1, 0, 0) fe7a747d poll (8047160, 1, ffffffff) + 61 fef2f0d7 pqSocketPoll (6, 1, 0, ffffffff) + ac fef2efb9 pqSocketCheck (80fea38, 1, 0, ffffffff) + ae fef2ee73 pqWaitTimed (1, 0, 80fea38, ffffffff) + 23 fef2ee48 pqWait (1, 0, 80fea38, fef29329) + 23 fef2939d PQgetResult (80fea38, fef6d124, 8047368, fef29a6a) + 81 fef29acd PQexecFinish (80fea38, 8107d50, 80473a8, fef29788) + 6f fef297d1 PQexec (80fea38, 8107d50, 25, 8128f90) + 55 08067ca7 SendQuery (8107d50, 80fdbe0, 8047430, 807b4a2) + 3df 0807b626 MainLoop (80fc1c0, 8, 80479d4, 80b088d) + 9ce 0806383e process_file (0, 0, 8047968, 808bbf4) + 182 0808be74 main (804796c, fe8985c8, 80479a8, 805d31b) + 9c2 0805d31b _start_crt (8, 80479d4, fefd0c6f, 0, 0, 0) + 9a 0805d1ea _start (8, 8047af8, 8047afd, 8047b00, 8047b03, 8047b06) + 1a elmer@ailbhein:~$ pfiles 838 838: psql -X -a -q -d regression -v HIDE_TABLEAM=on Current rlimit: 65536 file descriptors 0: S_IFREG mode:0644 dev:272,65872 ino:508845 uid:5432 gid:5432 size:3356 O_RDONLY|O_LARGEFILE /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test/regress/sql/regproc.sql offset:3356 1: S_IFREG mode:0644 dev:272,65872 ino:531226 uid:5432 gid:5432 size:1868 O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test/regress/results/regproc.out offset:1868 2: S_IFREG mode:0644 dev:272,65872 ino:531226 uid:5432 gid:5432 size:1868 O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test/regress/results/regproc.out offset:1868 3: S_IFREG mode:0644 dev:272,65872 ino:528826 uid:5432 gid:5432 size:84 O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test/regress/regression.out offset:84 4: S_IFREG mode:0644 dev:272,65872 ino:509211 uid:5432 gid:5432 size:4944 O_RDONLY|O_LARGEFILE /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test/regress/parallel_schedule offset:4944 5: S_IFDOOR mode:0444 dev:537,0 ino:192 uid:0 gid:0 rdev:539,0 O_RDONLY|O_LARGEFILE FD_CLOEXEC door to nscd[8540] 6: S_IFSOCK mode:0666 dev:546,0 ino:16958 uid:0 gid:0 rdev:0,0 O_RDWR|O_NONBLOCK FD_CLOEXEC SOCK_STREAM SO_SNDBUF(16384),SO_RCVBUF(5120) sockname: AF_UNIX peername: AF_UNIX /tmp/pg_regress-oKa4Da/.s.PGSQL.5678 peer: postgres[364] zone: ailbhein[4] elmer@ailbhein:~$ pstack 364 364: postgres -D /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test fe91c5d5 pollsys (8043720, 1, 80437e8, 0) fe8ac772 pselect (7, 804382c, fe995240, fe995240, 80437e8, 0) + 272 fe8acb08 select (7, 804382c, 0, 0, 804582c) + 89 084bb7b3 ServerLoop (fef70530, fef70530, 8962330, 1, 8962330, 0) + db 084bb259 PostmasterMain (8, 895cba0, 8047908, 83e08d2, 8047970, 89312e0) + 10c8 083e0a99 startup_hacks (804790c, fe9985c8, 8047948, 811f57b) 0811f57b _start_crt (8, 8047970, fefd0c6f, 0, 0, 0) + 9a 0811f44a _start (8, 8047a94, 8047a9d, 8047aa0, 8047af4, 8047af7) + 1a elmer@ailbhein:~$ pfiles 364 364: postgres -D /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test Current rlimit: 65536 file descriptors 0: S_IFCHR mode:0666 dev:536,5 ino:201223015 uid:0 gid:3 rdev:134,2 O_RDONLY|O_LARGEFILE /dev/null offset:0 1: S_IFREG mode:0644 dev:272,65872 ino:528229 uid:5432 gid:5432 size:2344339 O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test/regress/log/postmaster.log offset:2344339 2: S_IFREG mode:0644 dev:272,65872 ino:528229 uid:5432 gid:5432 size:2344339 O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test/regress/log/postmaster.log offset:2344339 3: S_IFREG mode:0644 dev:272,65872 ino:528826 uid:5432 gid:5432 size:84 O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test/regress/regression.out offset:84 4: S_IFIFO mode:0000 dev:533,0 ino:60317 uid:5432 gid:5432 rdev:0,0 O_RDWR|O_NONBLOCK 5: S_IFIFO mode:0000 dev:533,0 ino:60317 uid:5432 gid:5432 rdev:0,0 O_RDWR 6: S_IFSOCK mode:0666 dev:546,0 ino:19348 uid:0 gid:0 rdev:0,0 O_RDWR SOCK_STREAM SO_SNDBUF(16384),SO_RCVBUF(5120) sockname: AF_UNIX /tmp/pg_regress-oKa4Da/.s.PGSQL.5678 7: S_IFDOOR mode:0444 dev:537,0 ino:192 uid:0 gid:0 rdev:539,0 O_RDONLY|O_LARGEFILE FD_CLOEXEC door to nscd[8540] 8: S_IFSOCK mode:0666 dev:546,0 ino:25650 uid:0 gid:0 rdev:0,0 O_RDWR|O_NONBLOCK SOCK_DGRAM SO_DGRAM_ERRIND,SO_SNDBUF(57344),SO_RCVBUF(102400) sockname: AF_INET6 ::1 port: 59134 peername: AF_INET6 ::1 port: 59134 elmer@ailbhein:~$ pargs 364 pargs: Couldn't determine locale of target process. pargs: Some strings may not be displayed properly. 364: postgres -D /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test/regress/ argv[0]: postgres argv[1]: -D argv[2]: /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test/regress/./tmp_check/data argv[3]: -F argv[4]: -c argv[5]: listen_addresses= argv[6]: -k argv[7]: /tmp/pg_regress-oKa4Da elmer@ailbhein:~$ # all postgres server stacks follow elmer@ailbhein:~$ for p in $(pgrep postgres); do pstack $p; done 373: postgres -D /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test fe91c345 ioctl (a, d001, 8043010) 0853c949 WaitEventSetWaitBlock (895cfd8, 2bf20, 80430d0, 1) + 26 0853c804 WaitEventSetWait (895cfd8, 2bf20, 80430d0, 1, 5000006, 0) + 12e 0853bd8c WaitLatch (fe5816dc, 29, 2bf20, 5000006) + a6 084db16b ApplyLauncherMain (0, 895d190, 8043378, 84ac205) + 295 084ac238 StartBackgroundWorker (0, 80434cc, 80433c8, 0, 8957d20, fe98b000) + 225 084c0837 do_start_bgworker (897ffc8, 0, fe91bb35, fe8f1b69) + 1d5 084c0c24 maybe_start_bgworkers (3, 0, 16f, 0, fe8efa50, fe98b000) + 1a2 084bd638 reaper (12, 0, 80434cc, fe98b000, fe812a40, fe98b000) + 39c fe917d15 __sighndlr (12, 0, 80434cc, 84bd29c, fe8524f8, fe990f80) + 15 fe90b832 call_user_handler (12, 0, 80434cc) + 1e9 fe90ba40 sigacthandler (12, 0, 80434cc) + e4 --- called from signal handler with signal 18 (SIGCLD) --- fe91c5d5 __pollsys (8043720, 1, 80437e8, 0, 6, 40) + 15 fe8ac772 pselect (7, 804382c, fe995240, fe995240, 80437e8, 0) + 272 fe8acb08 select (7, 804382c, 0, 0, 804582c) + 89 084bb7b3 ServerLoop (fef70530, fef70530, 8962330, 1, 8962330, 0) + db 084bb259 PostmasterMain (8, 895cba0, 8047908, 83e08d2, 8047970, 89312e0) + 10c8 083e0a99 startup_hacks (804790c, fe9985c8, 8047948, 811f57b) 0811f57b _start_crt (8, 8047970, fefd0c6f, 0, 0, 0) + 9a 0811f44a _start (8, 8047a94, 8047f7e, 8047f7e, 8047f7e, 8047f7e) + 1a 364: postgres -D /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test fe91c5d5 pollsys (8043720, 1, 80437e8, 0) fe8ac772 pselect (7, 804382c, fe995240, fe995240, 80437e8, 0) + 272 fe8acb08 select (7, 804382c, 0, 0, 804582c) + 89 084bb7b3 ServerLoop (fef70530, fef70530, 8962330, 1, 8962330, 0) + db 084bb259 PostmasterMain (8, 895cba0, 8047908, 83e08d2, 8047970, 89312e0) + 10c8 083e0a99 startup_hacks (804790c, fe9985c8, 8047948, 811f57b) 0811f57b _start_crt (8, 8047970, fefd0c6f, 0, 0, 0) + 9a 0811f44a _start (8, 8047a94, 8047a9d, 8047aa0, 8047af4, 8047af7) + 1a 857: postgres -D /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test fe91c345 ioctl (b, d001, 8042ee0) 0853c949 WaitEventSetWaitBlock (895cfd8, ffffffff, 8042fa0, 1) + 26 0853c804 WaitEventSetWait (895cfd8, ffffffff, 8042fa0, 1, 8000007, 0) + 12e 0853bd8c WaitLatch (fe57c804, 21, 0, 8000007) + a6 08396b39 gather_readnext (89f9e7c, 0, 8043028, 870560b) + 1dd 08396890 gather_getnext (89f9e7c, 89fbc5c, 4, 83db151) + 48 0839679f ExecGather (89f9e7c, 89fa764, 8043088, 837d68e) + 1f9 0837d6be ExecProcNodeFirst (89f9e7c, 80430a8, 81f578b, 1) + 3b 08373a16 ExecProcNode (89f9e7c, 0, 80430f8, 8375e97) + 2c 08375ec1 ExecutePlan (89f9d5c, 89f9e7c, 1, 1, 1, 0) + a3 08373fc7 standard_ExecutorRun (898105c, 1, 0, 0, 1, 8962b8c) + 1c1 08373e00 ExecutorRun (898105c, 1, 0, 0, 1, 8047920) + 5c 085701b6 PortalRunSelect (89b1164, 1, 0, 8962d14) + ed 0856fede PortalRun (89b1164, 7fffffff, 1, 1, 8962d14, 8962d14) + 1ea 0856a3bf exec_simple_query (8961034, 0, 80437a8, 856e3c3) + 406 0856e41f PostgresMain (1, 80437c4, 895ec20, 898a8ec, 8047920, 885dd07) + 7cc 084bf972 ExitPostmaster (8980a00, 8980a00, 8043808, 84bf2fc, 3, 89595e8) 084bf322 BackendStartup (8980a00, 89595e8, 0, 84bbafd) + 1bf 084bb8b6 ServerLoop (fef70530, fef70530, 8962330, 1, 8962330, 0) + 1de 084bb259 PostmasterMain (8, 895cba0, 8047908, 83e08d2, 8047970, 89312e0) + 10c8 083e0a99 startup_hacks (804790c, fe9985c8, 8047948, 811f57b) 0811f57b _start_crt (8, 8047970, fefd0c6f, 0, 0, 0) + 9a 0811f44a _start (8, 8047a94, 8047f7e, 8047f7e, 8047f7e, 8047f7e) + 1a 372: postgres -D /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test fe91c345 ioctl (5, d001, 8042ee0) 0853c949 WaitEventSetWaitBlock (8962c2c, ffffffff, 8042fac, 1) + 26 0853c804 WaitEventSetWait (8962c2c, ffffffff, 8042fac, 1, 5000007, 8042fbc) + 12e 084b5cae PgstatCollectorMain (0, 0, 80433e8, 84aff95, fe98b000, fe812a40) + 4eb 084affa1 pgstat_start (3, 0, 16f, 0, fe8efa50, fe98b000) + cf 084bd62e reaper (12, 0, 80434cc, fe98b000, fe812a40, fe98b000) + 392 fe917d15 __sighndlr (12, 0, 80434cc, 84bd29c, fe8524f8, fe990f80) + 15 fe90b832 call_user_handler (12, 0, 80434cc) + 1e9 fe90ba40 sigacthandler (12, 0, 80434cc) + e4 --- called from signal handler with signal 18 (SIGCLD) --- fe91c5d5 __pollsys (8043720, 1, 80437e8, 0, 6, 40) + 15 fe8ac772 pselect (7, 804382c, fe995240, fe995240, 80437e8, 0) + 272 fe8acb08 select (7, 804382c, 0, 0, 804582c) + 89 084bb7b3 ServerLoop (fef70530, fef70530, 8962330, 1, 8962330, 0) + db 084bb259 PostmasterMain (8, 895cba0, 8047908, 83e08d2, 8047970, 89312e0) + 10c8 083e0a99 startup_hacks (804790c, fe9985c8, 8047948, 811f57b) 0811f57b _start_crt (8, 8047970, fefd0c6f, 0, 0, 0) + 9a 0811f44a _start (8, 8047a94, 8047f7e, 8047f7e, 8047f7e, 8047f7e) + 1a 369: postgres -D /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test fe91c345 ioctl (a, d001, 8041c00) 0853c949 WaitEventSetWaitBlock (895cfd8, 2710, 8041cc0, 1) + 26 0853c804 WaitEventSetWait (895cfd8, 2710, 8041cc0, 1, 5000002, 0) + 12e 0853bd8c WaitLatch (fe58377c, 29, 2710, 5000002) + a6 084acf0d BackgroundWriterMain (3, 89595e8, 8043358, 870440c, 8960fa0, ffffffff) + 341 0821c6cb AuxiliaryProcessMain (2, 804339c, 80433d8, 84bfe70, 4, 0) + 568 084bfeb8 StartChildProcess (3, 80433f4, 40, 11) + cb 084bd528 reaper (12, 0, 80434cc, fe98b000, fe812a40, fe98b000) + 28c fe917d15 __sighndlr (12, 0, 80434cc, 84bd29c, fe8524f8, fe990f80) + 15 fe90b832 call_user_handler (12, 0, 80434cc) + 1e9 fe90ba40 sigacthandler (12, 0, 80434cc) + e4 --- called from signal handler with signal 18 (SIGCLD) --- fe91c5d5 __pollsys (8043720, 1, 80437e8, 0, 6, 40) + 15 fe8ac772 pselect (7, 804382c, fe995240, fe995240, 80437e8, 0) + 272 fe8acb08 select (7, 804382c, 0, 0, 804582c) + 89 084bb7b3 ServerLoop (fef70530, fef70530, 8962330, 1, 8962330, 0) + db 084bb259 PostmasterMain (8, 895cba0, 8047908, 83e08d2, 8047970, 89312e0) + 10c8 083e0a99 startup_hacks (804790c, fe9985c8, 8047948, 811f57b) 0811f57b _start_crt (8, 8047970, fefd0c6f, 0, 0, 0) + 9a 0811f44a _start (8, 8047a94, 8047f7e, 8047f7e, 8047f7e, 8047f7e) + 1a 370: postgres -D /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test fe91c345 ioctl (a, d001, 8043020) 0853c949 WaitEventSetWaitBlock (895cfd8, 1388, 80430e0, 1) + 26 0853c804 WaitEventSetWait (895cfd8, 1388, 80430e0, 1, 500000c, 0) + 12e 0853bd8c WaitLatch (fe583a34, 29, 1388, 500000c) + a6 084c2cb4 WalWriterMain (3, 89595e8, 8043358, 870440c, 8960fa0, ffffffff) + 2a0 0821c6da AuxiliaryProcessMain (2, 804339c, 80433d8, 84bfe70, 4, 0) + 577 084bfeb8 StartChildProcess (6, 80433f4, 40, 11) + cb 084bd543 reaper (12, 0, 80434cc, fe98b000, fe812a40, fe98b000) + 2a7 fe917d15 __sighndlr (12, 0, 80434cc, 84bd29c, fe8524f8, fe990f80) + 15 fe90b832 call_user_handler (12, 0, 80434cc) + 1e9 fe90ba40 sigacthandler (12, 0, 80434cc) + e4 --- called from signal handler with signal 18 (SIGCLD) --- fe91c5d5 __pollsys (8043720, 1, 80437e8, 0, 6, 40) + 15 fe8ac772 pselect (7, 804382c, fe995240, fe995240, 80437e8, 0) + 272 fe8acb08 select (7, 804382c, 0, 0, 804582c) + 89 084bb7b3 ServerLoop (fef70530, fef70530, 8962330, 1, 8962330, 0) + db 084bb259 PostmasterMain (8, 895cba0, 8047908, 83e08d2, 8047970, 89312e0) + 10c8 083e0a99 startup_hacks (804790c, fe9985c8, 8047948, 811f57b) 0811f57b _start_crt (8, 8047970, fefd0c6f, 0, 0, 0) + 9a 0811f44a _start (8, 8047a94, 8047f7e, 8047f7e, 8047f7e, 8047f7e) + 1a 368: postgres -D /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test fe91c345 ioctl (a, d001, 8043000) 0853c949 WaitEventSetWaitBlock (895cfd8, 493e0, 80430c0, 1) + 26 0853c804 WaitEventSetWait (895cfd8, 493e0, 80430c0, 1, 5000004, 0) + 12e 0853bd8c WaitLatch (fe5834c4, 29, 493e0, 5000004) + a6 084ad5fb CheckpointerMain (3, 89595e8, 8043358, 870440c, 8960fa0, ffffffff) + 662 0821c6d0 AuxiliaryProcessMain (2, 804339c, 80433d8, 84bfe70, 4, 0) + 56d 084bfeb8 StartChildProcess (5, 80433f4, 40, 11) + cb 084bd50d reaper (12, 0, 80434cc, fe98b000, fe812a40, fe98b000) + 271 fe917d15 __sighndlr (12, 0, 80434cc, 84bd29c, fe8524f8, fe990f80) + 15 fe90b832 call_user_handler (12, 0, 80434cc) + 1e9 fe90ba40 sigacthandler (12, 0, 80434cc) + e4 --- called from signal handler with signal 18 (SIGCLD) --- fe91c5d5 __pollsys (8043720, 1, 80437e8, 0, 6, 40) + 15 fe8ac772 pselect (7, 804382c, fe995240, fe995240, 80437e8, 0) + 272 fe8acb08 select (7, 804382c, 0, 0, 804582c) + 89 084bb7b3 ServerLoop (fef70530, fef70530, 8962330, 1, 8962330, 0) + db 084bb259 PostmasterMain (8, 895cba0, 8047908, 83e08d2, 8047970, 89312e0) + 10c8 083e0a99 startup_hacks (804790c, fe9985c8, 8047948, 811f57b) 0811f57b _start_crt (8, 8047970, fefd0c6f, 0, 0, 0) + 9a 0811f44a _start (8, 8047a94, 8047f7e, 8047f7e, 8047f7e, 8047f7e) + 1a 371: postgres -D /export/home/elmer/c12/buildroot/HEAD/pgsql.build/src/test fe91c345 ioctl (a, d001, 8043090) 0853c949 WaitEventSetWaitBlock (895cfd8, 749d, 8043150, 1) + 26 0853c804 WaitEventSetWait (895cfd8, 749d, 8043150, 1, 5000001, 0) + 12e 0853bd8c WaitLatch (fe58011c, 29, 749d, 5000001) + a6 084a7985 AutoVacLauncherMain (0, 0, 80433e8, 84a753a, fe98b000, fe812a40) + 42d 084a7553 StartAutoVacLauncher (3, 0, 16f, 0, fe8efa50, fe98b000) + 7e 084bd570 reaper (12, 0, 80434cc, fe98b000, fe812a40, fe98b000) + 2d4 fe917d15 __sighndlr (12, 0, 80434cc, 84bd29c, fe8524f8, fe990f80) + 15 fe90b832 call_user_handler (12, 0, 80434cc) + 1e9 fe90ba40 sigacthandler (12, 0, 80434cc) + e4 --- called from signal handler with signal 18 (SIGCLD) --- fe91c5d5 __pollsys (8043720, 1, 80437e8, 0, 6, 40) + 15 fe8ac772 pselect (7, 804382c, fe995240, fe995240, 80437e8, 0) + 272 fe8acb08 select (7, 804382c, 0, 0, 804582c) + 89 084bb7b3 ServerLoop (fef70530, fef70530, 8962330, 1, 8962330, 0) + db 084bb259 PostmasterMain (8, 895cba0, 8047908, 83e08d2, 8047970, 89312e0) + 10c8 083e0a99 startup_hacks (804790c, fe9985c8, 8047948, 811f57b) 0811f57b _start_crt (8, 8047970, fefd0c6f, 0, 0, 0) + 9a 0811f44a _start (8, 8047a94, 8047f7e, 8047f7e, 8047f7e, 8047f7e) + 1a elmer@ailbhein:~$ script done on March 18, 2021 at 03:31:31 PM CET
I still have the crash dump and will be willing to provide that.
How to reproduce:
Set up a postgresql buildfarm member, use this config:
https://www.eenfach.de/~olbohlen/build-farm.conf
Then run:
35 0,3,6,9,12,18,22 * * * ( cd /export/home/elmer/c12 && PATH=/usr/gnu/bin:$PATH ./run_branches.pl --run-all --verbose >/export/home/elmer/c12/cron.out 2>&1 )
It will hang on the regression tests for posgresql-HEAD, 32bit built
Files
Related issues
Updated by Olaf Bohlen over 2 years ago
I uploaded a vmdump here: https://www.eenfach.de/~olbohlen/share/vmdump.3
compressed 6.1G, uncompressed 18.1G
The download requires a user/password, drop me a /msg on freenode (Agnar) or drop me a mail to olbohlen@eenfach.de
Updated by Olaf Bohlen over 2 years ago
- File typescript typescript added
- File typescript.mdb-k typescript.mdb-k added
so, I got this again. This time updated outputs (note, c12 directory indicates -m32 build, c12x is 64bit build).
Also this time I include kernel stacktraces for all relevant processes.
(see attached files)
Updated by Olaf Bohlen over 2 years ago
and of course I have the matching crash dump, let me know if someone is interested in it.
Updated by Dan McDonald almost 2 years ago
Sorry for not seeing this earlier.
In your initial posting, register `rdi` has some STRANGE contents. This suggests to me that you may be seeing a use-after-free bug.
Have you reproduced this using a DEBUG kernel by any chance? DEBUG kernels turn on the full kmem debugging AND all of the ASSERTs fire. A dump from a DEBUG kernel would be most useful, I think.
Updated by Olaf Bohlen almost 2 years ago
No, unfortunately it's not a debug kernel. If I remember correctly Andy Fiddaman had more insights on this issue, we chatted about it.
Also I think Robert should know about this one.
Updated by Dan McDonald about 1 year ago
We have OS-5886 in SmartOS that might be this same problem. https://smartos.org/bugview/OS-5886
Updated by Patrick Mooney about 1 year ago
- Subject changed from illumos-ffe7853a48 panics on genunix:pollhead_delete+55 for posgresql-HEAD buildfarm tests to pollhead_delete trips over bad pointer
Updated by Patrick Mooney about 1 year ago
I've confirmed that the program and D script listed in OS-5886 are still able to trigger the race on current bits (OmniOSCE r42) when kmem_flags
is set to 0xf
like on DEBUG.
Updated by Patrick Mooney about 1 year ago
Since epoll is backed by /dev/poll
, I ran the epoll test suite on bits before and after the fix was applied to check for a change in behavior. The results were the same. Similarly, the os-test
suite results were the same before and after.
I used some simplified bug reproducer programs (including the one from OS-5886) to check that at least the known pathological cases were now safe.
For further smoke testing, I ran the tokio test suite, which heavily relies on our epoll emulation for its event handling. The results were the same before and after.
Updated by Patrick Mooney about 1 year ago
Per rm's suggestion, I circled back to test a few more things:
I booted up a system featuring an ipmi device to confirm that ipmitool
, which appears poll on the ipmi device, continues to work after the change. (This was checked via ipmitool sensor
and ipmitool sel
)
Regarding event ports, I ran the libuv test suite (in v1.41, prior to its switch to using epoll on illumos), and confirmed that the results were the same before and after the change.
For ctfs, I restarted the fmd
service while tracing pollhead_clean
, to see that it had, in fact, cleaned up after itself and successfully restarted. I repeated the same tracing while running sleep
under ctrun
in order to confirm similar activity.
Updated by Electric Monk about 1 year ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit 2c76d75129011c98e79463bb84917b828f922a11
commit 2c76d75129011c98e79463bb84917b828f922a11 Author: Patrick Mooney <pmooney@pfmooney.com> Date: 2022-08-04T15:59:49.000Z 13700 pollhead_delete trips over bad pointer Reviewed by: Dan McDonald <danmcd@mnx.io> Reviewed by: Andy Fiddaman <andy@omnios.org> Reviewed by: Gordon Ross <gordon.w.ross@gmail.com> Approved by: Robert Mustacchi <rm@fingolfin.org>
Updated by Patrick Mooney about 1 year ago
- Related to Bug #14892: pollhead lifetime too short in signalfd added