Project

General

Profile

Actions

Bug #13839

open

sockfs should improve its inode generation

Added by Jason King 16 days ago. Updated 16 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
kernel
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

I had a system where netstat -u was showing two different processes listening on port 80. While one of these processes was expected to do so, the other was not (thus surprise). Running pfiles against each process however did not show the unexpected process listening on port 80. Looking closer, it turned out both listening sockets had the same major/minor number and the same inode value:

# ls -li /proc/254/fd/19 /proc/233/fd/6
     40812 s---------   0 root     root           0 Jan  1  1970 /proc/233/fd/6
     40812 s---------   0 root     root           0 Jan  1  1970 /proc/254/fd/19
# pfiles 254
...
  19: S_IFSOCK mode:0666 dev:563,0 ino:40812 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_REUSEADDR,SO_SNDBUF(400000),SO_RCVBUF(400000)
        sockname: AF_INET6 ::  port: 80
...
# pfiles 233
...
   6: S_IFSOCK mode:0666 dev:563,0 ino:40812 uid:0 gid:0 rdev:0,0
      O_RDWR|O_NONBLOCK FD_CLOEXEC
        SOCK_STREAM
        SO_REUSEADDR,SO_SNDBUF(400000),SO_RCVBUF(400000)
        sockname: AF_INET x.x.x.x  port: 2380
...

Looking closer, using mdb, I verified that each file_t and vnode_t were different:

>  0t233::pid2proc | ::pfiles -f ! awk 'NR==1 { print } $2 == 6 { print }'
            FILE   FD     FLAG            VNODE     OFFSET             CRED  CNT
fffffe16f7d93750    6       83 fffffe17091bf580          0 fffffe16e440b178    1
>  0t254::pid2proc | ::pfiles -f ! awk 'NT==1 { print} $2 == 19 { print }'
fffffe17076e28e0   19       83 fffffe1707254540          0 fffffe16e440b008    1

Then looking at how inode values are assigned to sockfs vnodes, in prgetfdinfo(), we see

    if (VOP_GETATTR(vp, &vattr, 0, file_cred, NULL) == 0) {
          fdinfo->pr_major = getmajor(vattr.va_fsid);
          fdinfo->pr_minor = getminor(vattr.va_fsid);
          fdinfo->pr_rmajor = getmajor(vattr.va_rdev);
          fdinfo->pr_rminor = getminor(vattr.va_rdev);
          fdinfo->pr_ino = (ino64_t)vattr.va_nodeid;

We then see the following in socket_vop_getattr():

     /*
       * Calculate the amount of bitshift to a sonode pointer which will
       * still keep it unique.  See below.
       */
      if (sonode_shift == 0)
          sonode_shift = highbit(sizeof (struct sonode));
      ASSERT(sonode_shift > 0);

      so = VTOSO(vp);
...
     /*
       * If the va_nodeid is > MAX_USHORT, then i386 stats might fail.
       * So we shift down the sonode pointer to try and get the most
       * uniqueness into 16-bits.
       */
      vap->va_nodeid = ((ino_t)so >> sonode_shift) & 0xFFFF;
      vap->va_nlink = 0;
      vap->va_size = 0;

Looking more in mdb:

> sonode_shift.18423/d
sonode_shift.18423:
sonode_shift.18423:             10
> fffffe1707254540::print vnode_t v_data
v_data = 0xfffffe17027db010
> fffffe17091bf580::print vnode_t v_data
v_data = 0xfffffe17027db290

Right shifting those values 10 bits yields the same lower 16-bits, causing the same inode value.

We should utilize a better strategy for generating inode numbers for sockfs vnodes.

Actions #1

Updated by Andy Fiddaman 16 days ago

At first glance, the strategy seems like it might be reasonable if we need to generate a 16-bit ID, but it is shifting by one bit too many.

> ::sizeof struct sonode
sizeof (struct sonode) = 0x290
> ::sizeof struct sonode | =j
                1010010000
                | |  |
                | |  +------ bit 4 mask 0x010
                | +--------- bit 7 mask 0x080
                +----------- bit 9 mask 0x200

> 0xfffffe17027db010 >> 9 =x
                3ed8
> 0xfffffe17027db290 >> 9 =x
                3ed9
Actions #2

Updated by Jason King 16 days ago

My guess is that the original author probably forgot that highbit() returns values 1-32, and 0 for no bits set, so the result is off by one (to use as a shift value).

However I'm not sure that the 16-bit inode requirement has been true since at least the mid-90s, and likely systems like a SysV vestige (and likely was never a Solaris limitation). I suspect at minimum using 32-bits should be fine.

Actions

Also available in: Atom PDF