Project

General

Profile

Bug #9052

nss_files exec attr search leaks memory on dlclose

Added by Robert Mustacchi over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Category:
lib - userland libraries
Start date:
2018-02-06
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

This was initially discovered by several pfexecd processes that had grown rather large, to the point of being 2-4 GiB in size. I started this investigation by looking at two different core files that had this from over the years. All the memory was found in the heap. Which was interesting. Unsure of where else to start, I took a shot in the dark and ran the strings program on this.

What I discovered on the 4 GiB core was that it had over 200,000 copies of {{/etc/security/exec_attr}.

To try and make forward progress I went through and wrote the following D script to deal with if this was hit again:

/*
 * Try and track down where pfexecd is leaking and what's going on.
 */

pid$target::malloc:entry,
pid$target::free:entry,
pid$target::mmap:entry,
pid$target::munmap:entry,
pid$target::realloc:entry
{
        @[probefunc, ustack()] = count();
}

pid$target::callback_pfexec:entry,
pid$target::callback_forced_privs:entry,
pid$target::callback_user_privs:entry
{
        @pfexec[probefunc] = count();
}

tick-5m
{
        printf("Allocations\n");
        printa(@);
        printf("Callback counts\n");
        printa(@pfexec);
        exit(0);
}

Based on this, we were able to get a bunch of interesting data. Upon processing the data provided in SWSUP-946, there were a few interesting things. The first thing I noticed was the fact that ld.so.1 and nss_files.so were performing the allocations. This led me to determine that this happened because we weren't talking to nscd and instead doing things locally.

I then compared the allocations and frees:

$ awk '$0 ~ /^  [a-z]/ { CURRENT=$1; } $1 ~ /^[0-9][0-9]*$/ { counts[CURRENT] += $1; } END { for (name in counts) { print name, counts[name]; } }' < leak.out 
malloc 305
free 295
munmap 20

The munmap was uninteresting, because this was related to shared libraries. However, the malloc and free count being different was interesting. The next thing I tried to do was to line up the malloc and free instances that existed. Most of the ones coming from pfexecd and coming from libld.so.1 all matched up. This meant that we had the following remaining:

  free                                              
              libc.so.1`free
              0xfe913733
              libc.so.1`_nss_src_state_destr+0x41
              libc.so.1`_nss_db_state_destr+0x65
              libnsl.so.1`_getexecprof+0x30b
              libsecdb.so.1`getexecprof+0xb8
              pfexecd`callback_forced_privs+0x41
              pfexecd`callback+0x4f
              libc.so.1`__door_return+0x4b
               10
  free                                              
              libc.so.1`free
              0xfe913733
              libc.so.1`_nss_src_state_destr+0x41
              libc.so.1`_nss_db_state_destr+0x65
              libnsl.so.1`_getexecprof+0x321
              libsecdb.so.1`getexecprof+0xb8
              pfexecd`callback_forced_privs+0x41
              pfexecd`callback+0x4f
              libc.so.1`__door_return+0x4b
               10
  free                                              
              libc.so.1`free
              libc.so.1`fclose+0xc4
              0xfe913691
              0xfe913a55
              0xfe916a24
              libc.so.1`nss_search+0x1df
              libnsl.so.1`_getexecprof+0x2ad
              libsecdb.so.1`getexecprof+0xb8
              pfexecd`callback_forced_privs+0x41
              pfexecd`callback+0x4f
              libc.so.1`__door_return+0x4b
               10
  free                                              
              libc.so.1`free
              0xfe917175
              0xfe9171bd
              libc.so.1`nss_search+0x1df
              libnsl.so.1`_getexecprof+0x2ef
              libsecdb.so.1`getexecprof+0xb8
              pfexecd`callback_forced_privs+0x41
              pfexecd`callback+0x4f
              libc.so.1`__door_return+0x4b
               10
  free                                              
              libc.so.1`free
              0xfe9136ab
              0xfe913a55
              0xfe916a24
              libc.so.1`nss_search+0x1df
              libnsl.so.1`_getexecprof+0x2ad
              libsecdb.so.1`getexecprof+0xb8
              pfexecd`callback_forced_privs+0x41
              pfexecd`callback+0x4f
              libc.so.1`__door_return+0x4b
               10
  free                                              
              libc.so.1`free
              0xfe9136ab
              0xfe91708c
              0xfe9171a6
              libc.so.1`nss_search+0x1df
              libnsl.so.1`_getexecprof+0x2ef
              libsecdb.so.1`getexecprof+0xb8
              pfexecd`callback_forced_privs+0x41
              pfexecd`callback+0x4f
              libc.so.1`__door_return+0x4b
               10

  malloc                                            
              libc.so.1`malloc
              0xfe913819
              0xfe916a24
              libc.so.1`nss_search+0x1df
              libnsl.so.1`_getexecprof+0x2ad
              libsecdb.so.1`getexecprof+0xb8
              pfexecd`callback_forced_privs+0x41
              pfexecd`callback+0x4f
              libc.so.1`__door_return+0x4b
               10
  malloc                                            
              libc.so.1`malloc
              libc.so.1`calloc+0x59
              0xfe916b77
              0xfe9171a6
              libc.so.1`nss_search+0x1df
              libnsl.so.1`_getexecprof+0x2ef
              libsecdb.so.1`getexecprof+0xb8
              pfexecd`callback_forced_privs+0x41
              pfexecd`callback+0x4f
              libc.so.1`__door_return+0x4b
               10
  malloc                                            
              libc.so.1`malloc
              libc.so.1`strdup+0x2e
              0xfe91711a
              0xfe9171bd
              libc.so.1`nss_search+0x1df
              libnsl.so.1`_getexecprof+0x2ef
              libsecdb.so.1`getexecprof+0xb8
              pfexecd`callback_forced_privs+0x41
              pfexecd`callback+0x4f
              libc.so.1`__door_return+0x4b
               10
  malloc                                            
              libc.so.1`malloc
              libc.so.1`_findbuf+0x10b
              libc.so.1`_filbuf+0x51
              0xfe9135cd
              0xfe91387e
              0xfe916a24
              libc.so.1`nss_search+0x1df
              libnsl.so.1`_getexecprof+0x2ad
              libsecdb.so.1`getexecprof+0xb8
              pfexecd`callback_forced_privs+0x41
              pfexecd`callback+0x4f
              libc.so.1`__door_return+0x4b
               10
  malloc                                            
              libc.so.1`malloc
              0xfe916cda
              0xfe9171a6
              libc.so.1`nss_search+0x1df
              libnsl.so.1`_getexecprof+0x2ef
              libsecdb.so.1`getexecprof+0xb8
              pfexecd`callback_forced_privs+0x41
              pfexecd`callback+0x4f
              libc.so.1`__door_return+0x4b
               10
  malloc                                            
              libc.so.1`malloc
              0xfe9134e4
              0xfe916b2e
              libc.so.1`nss_get_backend_u+0x101
              libc.so.1`nss_search+0xf9
              libnsl.so.1`_getexecprof+0x2ef
              libsecdb.so.1`getexecprof+0xb8
              pfexecd`callback_forced_privs+0x41
              pfexecd`callback+0x4f
              libc.so.1`__door_return+0x4b
               10
  malloc                                            
              libc.so.1`malloc
              0xfe9134e4
              0xfe9169f2
              libc.so.1`nss_get_backend_u+0x101
              libc.so.1`nss_search+0xf9
              libnsl.so.1`_getexecprof+0x2ad
              libsecdb.so.1`getexecprof+0xb8
              pfexecd`callback_forced_privs+0x41
              pfexecd`callback+0x4f
              libc.so.1`__door_return+0x4b
               10
  free                                              
              libc.so.1`free
              0xfe9136ab
              0xfe91708c
              0xfe91713f
              0xfe9171bd
              libc.so.1`nss_search+0x1df
              libnsl.so.1`_getexecprof+0x2ef
              libsecdb.so.1`getexecprof+0xb8
              pfexecd`callback_forced_privs+0x41
              pfexecd`callback+0x4f
              libc.so.1`__door_return+0x4b
               20

  malloc                                            
              libc.so.1`malloc
              libc.so.1`calloc+0x59
              0xfe916b77
              0xfe91713f
              0xfe9171bd
              libc.so.1`nss_search+0x1df
              libnsl.so.1`_getexecprof+0x2ef
              libsecdb.so.1`getexecprof+0xb8
              pfexecd`callback_forced_privs+0x41
              pfexecd`callback+0x4f
              libc.so.1`__door_return+0x4b
               20

From here, I wasn't 100% sure where to go next. I took a guess about what the common module would be here and compared that with a local reproduction. This pointed me to the exec attribute related fetching in nss_files.so, in particular the function _exec_files_XY_all. There are two allocations that happen explicitly in this function. One is associated with the backend; however, the other is much more nefarious.

The _exec_files_XY_all function has a static char * local variable. The fact that it's static is rather pertinent. What ends up happening is that _exec_files_XY_all uses that local variable to cache the current copy of the /etc/security/exec_attr file. However, because it is trying to cache this and keep it around, there is zero that actually makes sure that it ever gets freed.

So with this in mind, here's what I believed actually happened in this system:

  • We get a callback_forced_privs() request to pfexecd
  • pfexecd tries to ask nss (through libsecdb) for the exec_attr
  • Because nscd isn't loaded, nss_files.so gets dlopened
  • nss_files.so performs the operation, loading a copy of /etc/security/exec_attr into the function static variable and finishes
  • libc dlcloses the nss_files.so module
  • The copy of exec_attr in nss has been leaked

This corresponds to the evidence in the core file where we have hundreds of thousands of copies of this one file resident in the core.

Thus, the fundamental issue is that we have function global data that cannot be freed. I suspect this was never noticed because nscd is usually enabled by default and would never unmap things. Therefore this just will always exist.

There are a still mysteries I'm trying to unravel. It's not 100% clear why I don't trigger this myself in a zone. But I think this at least explains what's going on and why we don't notice this normally with nscd and it matches the data we found in the cores.

To test this I set up a few shells in advance. Importantly, all the actions listed below should be taken as the root user. Using pfexec to elevate privileges based on a profile will load the nss_files backend because we go through a different path.

To reproduce this:

  1. svcadm restart name-service-cache
  2. svcadm restart pfexec
  3. Use pfexec normally once to get a basic set of libraries loaded such as pfexec ls
  4. In a root shell that exists svcadm disable name-service-cache
  5. Take a pmap via pmap $(pgrep pfexecd) and then from another root shell run su - bin.
  6. With each iteration, check the size of the heap. While there will be some growth the first time this is run because nscd is disabled, subsequent runs will grow the memory or not if the fix is in place.

What I found was that when I had a fixed nss library in place that this would no longer grow in size. Note, it's important that you won't see this if nss_files is already mapped in as then the dlopen/close won't do the full relocation dance. I also tested this by setting up coal and making sure that the primary user (manatee) came up correctly.

History

#1

Updated by Electric Monk over 1 year ago

  • Status changed from New to Closed

git commit d1e6ffe20e2ccf6a38cfbe4d1bc3a20b92d37fd1

commit  d1e6ffe20e2ccf6a38cfbe4d1bc3a20b92d37fd1
Author: Robert Mustacchi <rm@joyent.com>
Date:   2018-02-20T20:00:37.000Z

    9052 nss_files exec attr search leaks memory on dlclose
    Reviewed by: Tim Kordas <tim.kordas@joyent.com>
    Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
    Reviewed by: Dan McDonald <danmcd@joyent.com>
    Reviewed by: Richard Lowe <richlowe@richlowe.net>
    Approved by: Gordon Ross <gordon.ross@nexenta.com>

Also available in: Atom PDF