Project

General

Profile

Actions

Bug #5533

open

LOADPRIVDATA macro could return NULL on memory stress

Added by Serghei Samsi over 8 years ago. Updated over 7 years ago.

Status:
In Progress
Priority:
Normal
Assignee:
-
Category:
lib - userland libraries
Start date:
2015-01-13
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:
External Bug:

Description

During memory stress condition,
cron service is dumping core multiple times.

mdb ./cron.webmail_mtc.20110.1413125400.core 
Loading modules: [ libc.so.1 ld.so.1 ]
> $C
fb5ac7d8 libc_hwcap1.so.1`ucred_size+0x1b(8d0bad0, f6173000, fb5ac7f8, f6088733, 8d0bad0, f5fd2000)
fb5ac7e8 libc_hwcap1.so.1`_ucred_alloc+0x19(8d0bad0, f5fd2000, fb5ac828, f5fae7ba, ffffffff, f5f90018)
fb5ac7f8 libc_hwcap1.so.1`ucred_get+0x19(ffffffff, f5f90018, f60d8df5, f5fae79f)
fb5ac828 libbsm.so.1`adt_init+0x8e(8d0bad0, 1, 8ce83b8, f5facfd5)
fb5ac868 libbsm.so.1`adt_start_session+0x62(fb5ac8b8, 0, 1, f5d21a18)
fb5b2178 pam_unix_cred.so.1`pam_sm_setcred+0x211(8d0a828, 1, 0, 0)
fb5b21e8 libpam.so.1`run_stack+0x5e5(8d0a828, 0, 1, 1, 10, 2)
fb5b2218 libpam.so.1`pam_setcred+0x31(8d0a828, 1, 806b430, 806e04c)
fb5b2238 set_user_cred+0xa1(8ce6098, 0, fb5b4338, 8056fda)
fb5b4338 ex+0x3d6(8cea720, fb5b4380, fb5b4428, 8053ab7)
fb5b4428 main+0x3a6(1, fb5b4460, fb5b4468, fb5b441c)
fb5b4454 _start+0x7d(1, fb5b4524, 0, fb5b4533, fb5b454a, fb5b4551)

> ucred_size+0x1b::dis
libc_hwcap1.so.1`ucred_size:    pushl  %ebp
libc_hwcap1.so.1`ucred_size+1:  movl   %esp,%ebp
libc_hwcap1.so.1`ucred_size+3:  pushl  %ebx
libc_hwcap1.so.1`ucred_size+4:  subl   $0x4,%esp
libc_hwcap1.so.1`ucred_size+7:  andl   $0xfffffff0,%esp
libc_hwcap1.so.1`ucred_size+0xa:call   +0x0     <libc_hwcap1.so.1`ucred_size+0xf>
libc_hwcap1.so.1`ucred_size+0xf:popl   %ebx
libc_hwcap1.so.1`ucred_size+0x10:       addl   $0xea386,%ebx
libc_hwcap1.so.1`ucred_size+0x16:       call   -0xa2c8  <libc_hwcap1.so.1`__priv_getdata>
libc_hwcap1.so.1`ucred_size+0x1b:       movl   0xc(%eax),%eax
libc_hwcap1.so.1`ucred_size+0x1e:       leal   -0x4(%ebp),%esp
libc_hwcap1.so.1`ucred_size+0x21:       popl   %ebx
libc_hwcap1.so.1`ucred_size+0x22:       leave  
libc_hwcap1.so.1`ucred_size+0x23:       ret    
0xf6088c8f:                     nop    
0xf6088c90:                     nop    
0xf6088c91:                     nop    
0xf6088c92:                     nop    
0xf6088c93:                     nop    
0xf6088c94:                     nop    
> $r
%cs = 0x0043            %eax = 0x00000000 
%ds = 0x004b            %ebx = 0xf6173000 
%ss = 0x004b            %ecx = 0xf6174ce0 libc_hwcap1.so.1`pd_lock
%es = 0x004b            %edx = 0x00000000 
%fs = 0x0000            %esi = 0x08d0bad0 
%gs = 0x01c3            %edi = 0x00000001 

 %eip = 0xf6088c86 libc_hwcap1.so.1`ucred_size+0x1b
 %ebp = 0xfb5ac7d8
%kesp = 0x00000000

%eflags = 0x00010286
  id=0 vip=0 vif=0 ac=0 vm=0 rf=1 nt=0 iopl=0x0
  status=<of,df,IF,tf,SF,zf,af,PF,cf>

   %esp = 0xfb5ac7d0
%trapno = 0xe
   %err = 0x4

Problem piece of code is here (file /illumos-gate/usr/src/lib/libc/port/gen/ucred.c):

369size_t
370ucred_size(void)
371{
372    priv_data_t *d;
373
374    LOADPRIVDATA(d);
375
376    return (d->pd_ucredsize);
377}
378

We should add NULL test for priv_data_t pointer after line 376 like that:

376 return ((d != NULL) ? d->pd_ucredsize : 0);

Then ucred_alloc() will fail as expected (due to malloc(0)).

Another place in the same file:

213const priv_set_t *
214ucred_getprivset(const ucred_t *uc, priv_ptype_t set)
215{
216    /* LINTED: alignment */
217    const prpriv_t *pr = UCPRIV(uc);
218    int pset = priv_getsetbyname(set);
219    priv_data_t *d;
220
221    if (pr == NULL || pset == -1) {
222        errno = EINVAL;
223        return (NULL);
224    }
225
226    LOADPRIVDATA(d);
227
228    return ((const priv_set_t *)
229        &pr->pr_sets[d->pd_pinfo->priv_setsize * pset]);
230}

It seems that all places with LOADPRIVDATA macro should be fixed too.


Files

Actions #1

Updated by Serghei Samsi over 8 years ago

  • Assignee changed from Serghei Samsi to Marcel Telka
Actions #2

Updated by Marcel Telka over 8 years ago

  • Category set to lib - userland libraries
  • Status changed from New to In Progress
Actions #3

Updated by Marcel Telka over 8 years ago

The similar issue is with GETPRIVDATA macro too.

Actions #4

Updated by Marcel Telka over 8 years ago

Usually, the LOADPRIVDATA/GETPRIVDATA macros are used directly or indirectly in various API functions. There are two sorts of LOADPRIVDATA/GETPRIVDATA users:

I) those who could fail, because they have defined the failure modes of operation, for example getppriv(2)/setppriv(2)

To fix these, we will just fail in a case either the LOADPRIVDATA or GETPRIVDATA fails.

II) those who cannot fail, for example priv_union(3c)

In this case the LOADPRIVDATA/GETPRIVDATA calls cannot fail. The LOADPRIVDATA/GETPRIVDATA implementation is simple: in a case the privdata global variable is not set, the implementation attempts to initialize it. Once the privdata is set for the first time, it cannot change, so all subsequent LOADPRIVDATA/GETPRIVDATA calls will just succeed.

To make sure the LOADPRIVDATA/GETPRIVDATA calls won't fail for these cases we will just preload the privdata global variable during the application start in libc_init().

Actions #5

Updated by Marcel Telka over 8 years ago

Testing of the fix

First, I checked that the fix works as expected and the privdata global
variable is always set during the application start. To confirm this I used
the following script named privtest.mdb:

#!/bin/sh

cat <<EOF | mdb $1
::status
::bp main
:r
privdata::print
EOF

Here is an example of results on a system without the fix:

$ ./privtest.mdb /bin/ls
debugging executable file (32-bit)
file: /usr/bin/ls
status: idle
mdb: stop at main
mdb: target stopped at:
main:           leal   0x4(%esp),%ecx
0
$

Here is the output on a system with the fix:

$ ./privtest.mdb /bin/ls
debugging executable file (32-bit)
file: /usr/bin/ls
status: idle
mdb: stop at main
mdb: target stopped at:
main:           leal   0x4(%esp),%ecx
0xfee00108
$

Next, I ran the onu-ed system for a while to make sure that everything is
working stable and there are no random core files generated due to the code
added to libc_init().

Finally, I tried to exhaust the memory to see how an application start would
fail. I ran the following 'executor' application:

#include <unistd.h>

int
main(void)
{
    sleep(10);

    execlp("/bin/bash", "bash", (char *)0);

    return 0;
}

During the sleep(10) period I started this 'allocator':

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>

int
main(void)
{
    int s = 1024*1024;

    for (;;) {
        void *p = malloc(s);

        if (p == NULL && s == 1024*1024) {
            s = 1024;
            continue;
        }

        if (p == NULL) {
            printf("malloc failed\\\\n");
            sleep(10);
        }
    }

    return 0;
}

And this happened:

# ./executor &
[1] 103755
# ./allocator 
*** libc thread failure: cannot preload the priv data
malloc failed
^C
[1]+  Abort                   (core dumped) ./executor
# echo ::status | mdb core
debugging core file of bash (32-bit) from openindiana
file: /usr/bin/bash
initial argv: bash
threading model: native threads
status: process terminated by SIGABRT (Abort), pid=103755 uid=0 code=-1
panic message: *** libc thread failure: cannot preload the priv data
#

So it worked as expected.

Actions #6

Updated by Marcel Telka over 8 years ago

  • Status changed from In Progress to Pending RTI
Actions #7

Updated by Gordon Ross over 8 years ago

  • Status changed from Pending RTI to In Progress

Kicked back (from "pending rti" to "in progress") for questions and possibly more work.

I have a general design question about this fix.

Would it be feasible to change the implementation so that LOADPRIVDATA will never fail?
For example, if we know the maximum size of the data that would be loaded by this, perhaps we could initialize it to point to storage that's reserved at compile time? (i.e. BSS space, or initialized data if we know what the content should be at compile time).
That's a solution I've seen elsewhere for problems of this sort, i.e. when you can't load locale data, you're left with the compiled-in C locale defaults. I'm thinking of something like that.

Could something like that work here?

Actions #8

Updated by Marcel Telka over 7 years ago

For reference I attached my version of the fix.

Actions

Also available in: Atom PDF