Project

General

Profile

Bug #12306

XPG4v2 slave pty behaviour should generally be disabled

Added by Andy Fiddaman 5 days ago. Updated 3 days ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
lib - userland libraries
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:

Description

As part of the work switching OmniOS userland to build with gcc9, and also while working on gcc9 patches for a proposed compiler to build a future illumos-gate, we have hit several problems around the fact that the behaviour of slave PTYs changes as soon as either xpg4-values.o or xpg6-values.o is linked into a program.

The default behaviour in gcc8 and gcc9 is to link in one of these at all times - xpg4 when building with the C90 standard, and xpg6 otherwise.

Both of these values files set the global __xpg4 variable to 1, which modifies the behaviour of libc.

In particular, this causes the following additional actions when a slave PTY is opened through the open(2) call:
  • The ptem, ldterm and ttcompat modules are automatically pushed (Note 1);
  • A flag is set in ptem that slightly modifies the behaviour. In particular, this causes empty mblks to be sent up the stream.

The second action means that a read(2) call can legitimately return 0. Most applications (including some in gate like zlogin) treat this as an EOF marker and fail in some way.

I tested this using this simple program:

#include <unistd.h>

int main() {
    write(1, NULL, 0);
    return 0;
}

When accessing a system via an OpenSSH daemon which has xpgX-values linked, running this immediately kills the terminal and truss shows read() returning 0.
Rebuilding OpenSSH without xpgX-values does not exhibit this problem.

Following discussion on IRC, it appears that this modified behaviour was added to Solaris specifically to resolve an XPG test suite failure (since Solaris/illumos PTYs are STREAMs based, there are extra requirements in the standard; I don't have the detail).

Since most applications do not expect this (and often break), the proposal is to enable this only if an application is built in strict XPG4v2 compliance mode -i.e.
if defined(_XPG4_2) && !defined(__EXTENSIONS__)

Note 1: This often caused duplicate modules to be pushed onto slave PTY streams but was resolved in 9042 multiples of tty streams modules cause weirdness


Files

itest.c (2.18 KB) itest.c pty test program Andy Fiddaman, 2020-02-16 11:23 AM

Related issues

Related to illumos gate - Bug #9042: multiples of tty streams modules cause weirdnessClosed2018-02-05

Actions
Related to illumos gate - Bug #12057: Writing part of the string to stderr makes zlogin exitClosed

Actions

History

#1

Updated by Andy Fiddaman 5 days ago

  • Related to Bug #9042: multiples of tty streams modules cause weirdness added
#2

Updated by Andy Fiddaman 5 days ago

  • Related to Bug #12057: Writing part of the string to stderr makes zlogin exit added
#3

Updated by Andy Fiddaman 3 days ago

I have posted a review for this change at: https://code.illumos.org/c/illumos-gate/+/377

Testing notes:

Original behaviour

This is the behaviour of illumos before the change.

I compiled a simple test program (attached - itest.c) in three different ways as follows;

omnios% gcc -std=gnu89 -o itest89 itest.c
omnios% nm itest89 | grep values- | awk -F"|" '{print $8}' | paste - -
values-Xa.c

omnios% gcc -o itest11 itest.c
omnios% nm itest11 | grep values- | awk -F"|" '{print $8}' | paste - -
values-Xa.c     values-xpg6.c

omnios% gcc -D_XPG4_2 -o itestxpg itest.c
omnios% nm itestxpg | grep values- | awk -F"|" '{print $8}' | paste - -
values-Xa.c     values-xpg6.c

As expected given the configuration of the default compiler in OmniOS, the second and third files link against values-xpg6 which enables the modified behaviour for slave PTYs.

Running the test programs and entering nul at the terminal prompt it provides (elided from the output below) gives:

omnios% ./itest89
ptem is NOT already pushed

omnios% ./itest11
ptem module already present
Received empty block            <-- this being the problem that's affecting a lot of third-party software

omnios% ./itestxpg
ptem module already present
Received empty block

I also ran a simple benchmark, opening and closing /dev/null 100,000 times since I expected that linking xpg6-values would add measurable overhead to every open(2) call; it does and these figures were reproducible on my test system:

omnios% ptime ./bench89                 (values-Xa)
real        0.676696399
user        0.086918768
sys         0.589299880

omnios% ptime ./bench11                 (values-Xa values-xpg6)
real        1.048770457
user        0.183738201
sys         0.864532242

I also checked the flags on the ptem module pushed onto the pty that I was using for my login:

omnios% nm /usr/sbin/sshd | grep values- | awk -F"|" '{print $8}' | paste - -
values-Xa.c     values-xpg6.c

omnios% ptree $$
397    /usr/sbin/sshd
  985    /usr/sbin/sshd -R
    987    /usr/sbin/sshd -R
      988    -zsh
        6425   ptree 988
omnios% pfexec mdb -k
> 0t988::pid2proc | ::pfiles
FD   TYPE            VNODE INFO
   0  CHR fffffe16ec9c1940 /dev/pts/1
   1  CHR fffffe16ec9c1940 /dev/pts/1
   2  CHR fffffe16ec9c1940 /dev/pts/1
  10  CHR fffffe16ec9c1940 /dev/pts/1
  11 DOOR fffffe16efeef500 /var/run/name_service_door [door to 'nscd' (proc=fffffe16eb404010)]
> fffffe16ec9c1940::print vnode_t v_stream | ::stream

+-----------------------+-----------------------+
            |                       ^
            v                       |
+-----------------------+-----------------------+
| 0xfffffe16edbf3e18    | 0xfffffe16edbf3d20    |
| ptem                  | ptem                  |
|                       |                       |
| cnt = 0t0             | cnt = 0t0             |
| flg = 0x40000822      | flg = 0x40000832      |
+-----------------------+-----------------------+

> 0xfffffe16edbf3e18::print queue_t q_ptr | ::print struct ptem state
state = 0x4

#define IS_PTSTTY       0x4     /* is x/open terminal */

and, finally I ran a simple program that just does write(1, NULL, 0);, and confirmed that this causes the terminal to hang.

NB: many real-world things cause these hangs. Building any go software seems to be a particular trigger!

New behaviour

Using the same three builds of the test program (recompiled on the updated system so that the _XPG4_2 flag is honoured) I now see that the modified behaviour only occurs when built with this flag:

omnios% ./itest89
ptem is NOT already pushed

omnios% ./itest11
ptem is NOT already pushed

omnios% ./itestxpg
ptem module already present
Received empty block

Truss shows that open is being called with the expected arguments in each case:

omnios% truss -t open ./itest89 </dev/null 2>&1 | grep pts
open("/dev/pts/3", O_RDWR)                      = 5
omnios% truss -t open ./itest11 </dev/null 2>&1 | grep pts
open("/dev/pts/3", O_RDWR)                      = 5
omnios% truss -t open ./itestxpg </dev/null 2>&1 | grep pts
open("/dev/pts/3", O_RDWR|O_XPG4OPEN)           = 5

As this change also moves the extra processing for open(2) into the kernel, and is able to short-circuit the checks on whether it should be applied by limiting it to STREAMS and then only those for which unlockpt(3c) has been called, the extra overhead that comes from linking with values-xpg6 is also gone:

omnios% ptime ./bench89                 (values-Xa)
real        0.680232059
user        0.088197490
sys         0.591368248

omnios% ptime ./bench11                 (values-Xa values-xpg6)
real        0.680204383
user        0.088167635
sys         0.591349650

The program that does write(1, NULL, 0); no longer causes a terminal hang, and nor does building go software.

The IS_PTSTTY flag is no-longer set on the ptem module pushed on my login PTY (the sshd has not been recompiled)

omnios% nm /usr/sbin/sshd | grep values- | awk -F"|" '{print $8}' | paste - -
values-Xa.c     values-xpg6.c

omnios% ptree $$
434    /usr/sbin/sshd
  1268   /usr/sbin/sshd -R
    1270   /usr/sbin/sshd -R
      1271   -zsh
        11661  ptree 1271
omnios% pfexec mdb -k
> 0t1271::pid2proc | ::pfiles
FD   TYPE            VNODE INFO
   0  CHR fffffe16f4abfe40 /dev/pts/2
   1  CHR fffffe16f4abfe40 /dev/pts/2
   2  CHR fffffe16f4abfe40 /dev/pts/2
  10  CHR fffffe16f4abfe40 /dev/pts/2
  11 DOOR fffffe16f0528600 /var/run/name_service_door [door to 'nscd' (proc=fffffe16ec696040)]
> fffffe16f4abfe40::print vnode_t v_stream | ::stream

+-----------------------+-----------------------+
            |                       ^
            v                       |
+-----------------------+-----------------------+
| 0xfffffe16fbe238e8    | 0xfffffe16fbe237f0    |
| ptem                  | ptem                  |
|                       |                       |
| cnt = 0t0             | cnt = 0t0             |
| flg = 0x40000822      | flg = 0x40000832      |
+-----------------------+-----------------------+

> 0xfffffe16fbe238e8::print queue_t q_ptr | ::print struct ptem state
state = 0
I have done additional testing with:
  • screen
  • tmux
  • midnight commander

to confirm that they are no longer affected by the `write(1, NULL, 0);` program, and do not exit when handling signals like SIGINTR.

#4

Updated by Andy Fiddaman 3 days ago

Also available in: Atom PDF