Bug #12306
closedXPG4v2 slave pty behaviour should generally be disabled
100%
Description
As part of the work switching OmniOS userland to build with gcc9, and also while working on gcc9 patches for a proposed compiler to build a future illumos-gate, we have hit several problems around the fact that the behaviour of slave PTYs changes as soon as either xpg4-values.o or xpg6-values.o is linked into a program.
The default behaviour in gcc8 and gcc9 is to link in one of these at all times - xpg4 when building with the C90 standard, and xpg6 otherwise.
Both of these values files set the global __xpg4
variable to 1, which modifies the behaviour of libc.
- The
ptem
,ldterm
andttcompat
modules are automatically pushed (Note 1); - A flag is set in
ptem
that slightly modifies the behaviour. In particular, this causes empty mblks to be sent up the stream.
The second action means that a read(2)
call can legitimately return 0. Most applications (including some in gate like zlogin
) treat this as an EOF marker and fail in some way.
I tested this using this simple program:
#include <unistd.h> int main() { write(1, NULL, 0); return 0; }
When accessing a system via an OpenSSH daemon which has xpgX-values linked, running this immediately kills the terminal and truss
shows read() returning 0.
Rebuilding OpenSSH without xpgX-values does not exhibit this problem.
Following discussion on IRC, it appears that this modified behaviour was added to Solaris specifically to resolve an XPG test suite failure (since Solaris/illumos PTYs are STREAMs based, there are extra requirements in the standard; I don't have the detail).
Since most applications do not expect this (and often break), the proposal is to enable this only if an application is built in strict XPG4v2 compliance mode -i.e. if defined(_XPG4_2) && !defined(__EXTENSIONS__)
Note 1: This often caused duplicate modules to be pushed onto slave PTY streams but was resolved in 9042 multiples of tty streams modules cause weirdness
Files
Related issues
Updated by Andy Fiddaman almost 4 years ago
- Related to Bug #9042: multiples of tty streams modules cause weirdness added
Updated by Andy Fiddaman almost 4 years ago
- Related to Bug #12057: Writing part of the string to stderr makes zlogin exit added
Updated by Andy Fiddaman almost 4 years ago
I have posted a review for this change at: https://code.illumos.org/c/illumos-gate/+/377
Testing notes:¶
Original behaviour¶
This is the behaviour of illumos before the change.
I compiled a simple test program (attached - itest.c) in three different ways as follows;
omnios% gcc -std=gnu89 -o itest89 itest.c omnios% nm itest89 | grep values- | awk -F"|" '{print $8}' | paste - - values-Xa.c omnios% gcc -o itest11 itest.c omnios% nm itest11 | grep values- | awk -F"|" '{print $8}' | paste - - values-Xa.c values-xpg6.c omnios% gcc -D_XPG4_2 -D_STRICT_SYMBOLS -o itestxpg itest.c omnios% nm itestxpg | grep values- | awk -F"|" '{print $8}' | paste - - values-Xa.c values-xpg6.c
As expected given the configuration of the default compiler in OmniOS, the second and third files link against values-xpg6 which enables the modified behaviour for slave PTYs.
Running the test programs and entering nul
at the terminal prompt it provides (elided from the output below) gives:
omnios% ./itest89 ptem is NOT already pushed omnios% ./itest11 ptem module already present Received empty block <-- this being the problem that's affecting a lot of third-party software omnios% ./itestxpg ptem module already present Received empty block
I also ran a simple benchmark, opening and closing /dev/null
100,000 times since I expected that linking xpg6-values would add measurable overhead to every open(2)
call; it does and these figures were reproducible on my test system:
omnios% ptime ./bench89 (values-Xa) real 0.676696399 user 0.086918768 sys 0.589299880 omnios% ptime ./bench11 (values-Xa values-xpg6) real 1.048770457 user 0.183738201 sys 0.864532242
I also checked the flags on the ptem module pushed onto the pty that I was using for my login:
omnios% nm /usr/sbin/sshd | grep values- | awk -F"|" '{print $8}' | paste - - values-Xa.c values-xpg6.c omnios% ptree $$ 397 /usr/sbin/sshd 985 /usr/sbin/sshd -R 987 /usr/sbin/sshd -R 988 -zsh 6425 ptree 988 omnios% pfexec mdb -k > 0t988::pid2proc | ::pfiles FD TYPE VNODE INFO 0 CHR fffffe16ec9c1940 /dev/pts/1 1 CHR fffffe16ec9c1940 /dev/pts/1 2 CHR fffffe16ec9c1940 /dev/pts/1 10 CHR fffffe16ec9c1940 /dev/pts/1 11 DOOR fffffe16efeef500 /var/run/name_service_door [door to 'nscd' (proc=fffffe16eb404010)] > fffffe16ec9c1940::print vnode_t v_stream | ::stream +-----------------------+-----------------------+ | ^ v | +-----------------------+-----------------------+ | 0xfffffe16edbf3e18 | 0xfffffe16edbf3d20 | | ptem | ptem | | | | | cnt = 0t0 | cnt = 0t0 | | flg = 0x40000822 | flg = 0x40000832 | +-----------------------+-----------------------+ > 0xfffffe16edbf3e18::print queue_t q_ptr | ::print struct ptem state state = 0x4 #define IS_PTSTTY 0x4 /* is x/open terminal */
and, finally I ran a simple program that just does write(1, NULL, 0);
, and confirmed that this causes the terminal to hang.
NB: many real-world things cause these hangs. Building any go
software seems to be a particular trigger!
New behaviour¶
Using the same three builds of the test program (recompiled on the updated system so that the _XPG4_2 flag is honoured) I now see that the modified behaviour only occurs when built with this flag:
omnios% ./itest89 ptem is NOT already pushed omnios% ./itest11 ptem is NOT already pushed omnios% ./itestxpg ptem module already present Received empty block
As this change also moves the extra processing for open(2)
into the kernel, and is able to short-circuit the checks on whether it should be applied by limiting it to STREAMS and then only those for which unlockpt(3c)
has been called, the extra overhead that comes from linking with values-xpg6 is also gone:
omnios% ptime ./bench89 (values-Xa) real 0.680232059 user 0.088197490 sys 0.591368248 omnios% ptime ./bench11 (values-Xa values-xpg6) real 0.680204383 user 0.088167635 sys 0.591349650
The program that does write(1, NULL, 0);
no longer causes a terminal hang, and nor does building go software.
The IS_PTSTTY flag is no-longer set on the ptem module pushed on my login PTY (the sshd has not been recompiled)
omnios% nm /usr/sbin/sshd | grep values- | awk -F"|" '{print $8}' | paste - - values-Xa.c values-xpg6.c omnios% ptree $$ 434 /usr/sbin/sshd 1268 /usr/sbin/sshd -R 1270 /usr/sbin/sshd -R 1271 -zsh 11661 ptree 1271 omnios% pfexec mdb -k > 0t1271::pid2proc | ::pfiles FD TYPE VNODE INFO 0 CHR fffffe16f4abfe40 /dev/pts/2 1 CHR fffffe16f4abfe40 /dev/pts/2 2 CHR fffffe16f4abfe40 /dev/pts/2 10 CHR fffffe16f4abfe40 /dev/pts/2 11 DOOR fffffe16f0528600 /var/run/name_service_door [door to 'nscd' (proc=fffffe16ec696040)] > fffffe16f4abfe40::print vnode_t v_stream | ::stream +-----------------------+-----------------------+ | ^ v | +-----------------------+-----------------------+ | 0xfffffe16fbe238e8 | 0xfffffe16fbe237f0 | | ptem | ptem | | | | | cnt = 0t0 | cnt = 0t0 | | flg = 0x40000822 | flg = 0x40000832 | +-----------------------+-----------------------+ > 0xfffffe16fbe238e8::print queue_t q_ptr | ::print struct ptem state state = 0I have done additional testing with:
- screen
- tmux
- midnight commander
to confirm that they are no longer affected by the `write(1, NULL, 0);` program, and do not exit when handling signals like SIGINTR.
Updated by Electric Monk over 3 years ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
git commit 7d8deab2c421c563ab11a55e623ed48109e237af
commit 7d8deab2c421c563ab11a55e623ed48109e237af Author: Andy Fiddaman <omnios@citrus-it.co.uk> Date: 2020-03-05T19:45:24.000Z 12306 XPG4v2 slave pty behaviour should generally be disabled Reviewed by: Robert Mustacchi <rm@fingolfin.org> Reviewed by: John Levon <john.levon@joyent.com> Approved by: Dan McDonald <danmcd@joyent.com>
Updated by Andy Fiddaman over 3 years ago
- Related to Bug #12779: mdb str_flags is missing STRXPG4TTY added