Project

General

Profile

Actions

Bug #13839

closed

sockfs should improve its inode generation

Added by Jason King 5 months ago. Updated 4 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
kernel
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

I had a system where netstat -u was showing two different processes listening on port 80. While one of these processes was expected to do so, the other was not (thus surprise). Running pfiles against each process however did not show the unexpected process listening on port 80. Looking closer, it turned out both listening sockets had the same major/minor number and the same inode value:

# ls -li /proc/254/fd/19 /proc/233/fd/6
     40812 s---------   0 root     root           0 Jan  1  1970 /proc/233/fd/6
     40812 s---------   0 root     root           0 Jan  1  1970 /proc/254/fd/19
# pfiles 254
...
  19: S_IFSOCK mode:0666 dev:563,0 ino:40812 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_REUSEADDR,SO_SNDBUF(400000),SO_RCVBUF(400000)
        sockname: AF_INET6 ::  port: 80
...
# pfiles 233
...
   6: S_IFSOCK mode:0666 dev:563,0 ino:40812 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_REUSEADDR,SO_SNDBUF(400000),SO_RCVBUF(400000)
        sockname: AF_INET x.x.x.x  port: 2380
...

Looking closer, using mdb, I verified that each file_t and vnode_t were different:

>  0t233::pid2proc | ::pfiles -f ! awk 'NR==1 { print } $2 == 6 { print }'
            FILE   FD     FLAG            VNODE     OFFSET             CRED  CNT
fffffe16f7d93750    6       83 fffffe17091bf580          0 fffffe16e440b178    1
>  0t254::pid2proc | ::pfiles -f ! awk 'NT==1 { print} $2 == 19 { print }'
fffffe17076e28e0   19       83 fffffe1707254540          0 fffffe16e440b008    1

Then looking at how inode values are assigned to sockfs vnodes, in prgetfdinfo(), we see

    if (VOP_GETATTR(vp, &vattr, 0, file_cred, NULL) == 0) {
          fdinfo->pr_major = getmajor(vattr.va_fsid);
          fdinfo->pr_minor = getminor(vattr.va_fsid);
          fdinfo->pr_rmajor = getmajor(vattr.va_rdev);
          fdinfo->pr_rminor = getminor(vattr.va_rdev);
          fdinfo->pr_ino = (ino64_t)vattr.va_nodeid;

We then see the following in socket_vop_getattr():

     /*
       * Calculate the amount of bitshift to a sonode pointer which will
       * still keep it unique.  See below.
       */
      if (sonode_shift == 0)
          sonode_shift = highbit(sizeof (struct sonode));
      ASSERT(sonode_shift > 0);

      so = VTOSO(vp);
...
     /*
       * If the va_nodeid is > MAX_USHORT, then i386 stats might fail.
       * So we shift down the sonode pointer to try and get the most
       * uniqueness into 16-bits.
       */
      vap->va_nodeid = ((ino_t)so >> sonode_shift) & 0xFFFF;
      vap->va_nlink = 0;
      vap->va_size = 0;

Looking more in mdb:

> sonode_shift.18423/d
sonode_shift.18423:
sonode_shift.18423:             10
> fffffe1707254540::print vnode_t v_data
v_data = 0xfffffe17027db010
> fffffe17091bf580::print vnode_t v_data
v_data = 0xfffffe17027db290

Right shifting those values 10 bits yields the same lower 16-bits, causing the same inode value.

We should utilize a better strategy for generating inode numbers for sockfs vnodes.

Actions #1

Updated by Andy Fiddaman 5 months ago

At first glance, the strategy seems like it might be reasonable if we need to generate a 16-bit ID, but it is shifting by one bit too many.

> ::sizeof struct sonode
sizeof (struct sonode) = 0x290
> ::sizeof struct sonode | =j
                1010010000
                | |  |
                | |  +------ bit 4 mask 0x010
                | +--------- bit 7 mask 0x080
                +----------- bit 9 mask 0x200

> 0xfffffe17027db010 >> 9 =x
                3ed8
> 0xfffffe17027db290 >> 9 =x
                3ed9
Actions #2

Updated by Jason King 5 months ago

My guess is that the original author probably forgot that highbit() returns values 1-32, and 0 for no bits set, so the result is off by one (to use as a shift value).

However I'm not sure that the 16-bit inode requirement has been true since at least the mid-90s, and likely systems like a SysV vestige (and likely was never a Solaris limitation). I suspect at minimum using 32-bits should be fine.

Actions #3

Updated by Electric Monk 4 months ago

  • Gerrit CR set to 1561
Actions #4

Updated by Jason King 4 months ago

To test, I tried (using mdb) to identify a number of sockets that without the fix, would generate identical inode values:

> ::walk socket_cache | ::print struct sonode so_vnode | ::eval .>>0t10&ffff=J ! sort | uniq -d
                26c3
                488
                5695
                a9c8
                bb67
                e81a
                e974
                fc1e
                fc35
                fd68
                fdf2

(the idea being to take the address of the socket's vnode_t, shift right 10 bits, then sort the addresses and print the duplicate values).

Picking one of the values (fc1e) -- at the time of testing #13867 hadn't been integrated (it was the testing of this issue that led to the discovery #13867 ), so a duplicated value that represented two TCP sockets was chosen.

Using mdb, the expected inode values (after the fix) were obtained:

>  ::walk socket_cache s| ::print struct sonode so_vnode | ::eval .>>0t10&ffff=J | ::grep .==fc1e | ::eval <s=J | ::eval .>>0t9&ffffffff=U
                193198170
                193198171
>

Then their existence was confirmed via a search in /proc:

# find /proc -type s \( -inum 193198170 -o -inum 193198171 \)
/proc/284/fd/17
/proc/284/fd/18

Examining the process with pfiles confirmed they were two different sockets listening on two different ports:

# pfiles 284
...
  17: S_IFSOCK mode:0666 dev:563,0 ino:193198171 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_REUSEADDR,SO_SNDBUF(400000),SO_RCVBUF(400000)
        sockname: AF_INET6 ::  port: 8443
  18: S_IFSOCK mode:0666 dev:563,0 ino:193198170 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_REUSEADDR,SO_SNDBUF(400000),SO_RCVBUF(400000)
        sockname: AF_INET6 ::  port: 443
...

Finally, running netstat -aun confirmed that the output was now correct:

# netstat -aun | grep 443
      *.8443               *.*            root        284 xxxxxxx             0      0 400000      0 LISTEN
      *.443                *.*            root        284 xxxxxxx             0      0 400000      0 LISTEN
Actions #5

Updated by Electric Monk 4 months ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit e7434800e5d3d9760f83932621a63a9b102dee76

commit  e7434800e5d3d9760f83932621a63a9b102dee76
Author: Jason King <jason.brian.king@gmail.com>
Date:   2021-06-24T20:29:15.000Z

    13839 sockfs should improve its inode generation
    Reviewed by: Andy Fiddaman <andy@omnios.org>
    Approved by: Joshua M. Clulow <josh@sysmgr.org>

Actions

Also available in: Atom PDF