Project

General

Profile

Bug #4246

kstat read improperly returned ENOMEM

Added by Robert Mustacchi almost 6 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
kernel
Start date:
2013-10-20
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

Summarized from the Joyent bug report:

This code is taking the result of kstat_read() looking at the "data" field, and pulling out the "zonename". But "data" is undefined. That's odd. Well, there's this code in node-kstat's "read" function:

291 Handle<Value>
292 KStatReader::read(kstat_t *ksp)
293 {
294         Handle<Object> rval = Object::New();
295         Handle<Object> data;
296 
297         rval->Set(String::New("class"), String::New(ksp->ks_class));
298         rval->Set(String::New("module"), String::New(ksp->ks_module));
299         rval->Set(String::New("name"), String::New(ksp->ks_name));
300         rval->Set(String::New("instance"), Integer::New(ksp->ks_instance));
301 
302         if (kstat_read(ksr_ctl, ksp, NULL) == -1) {
303                 /*
304                  * It is deeply annoying, but some kstats can return errors
305                  * under otherwise routine conditions.  (ACPI is one
306                  * offender; there are surely others.)  To prevent these
307                  * fouled kstats from completely ruining our day, we assign
308                  * an "error" member to the return value that consists of
309                  * the strerror().
310                  */
311                 rval->Set(String::New("error"), String::New(strerror(errno)));
312                 return (rval);
313         }

We can check core.node.44430 on RM08218 to see if we hit this case. Indeed, this shows us the set of possible objects that could represent kstat read results:

::findjsobjects -p instance
a4e93511
8598301d
85575481
85983291
8511e795
81975c9d

and this one's our winner:

> 85575481::jsprint
{
    name: "z6385_net0",
    error: "Not enough space",
    module: "link",
    class: "net",
    instance: 0,
}

According to usr/src/lib/libc/port/gen/errlist, "Not enough space" is errno 12, or ENOMEM. kstat_read() is documented to return ENOMEM when:

     ENOMEM
                  Insufficient storage space is available.

It's unclear why this would have happened. There certainly haven't been any anonymous memory allocation failures in the global zone:

# kstat -m memory_cap -i 0
module: memory_cap                      instance: 0     
name:   global                          class:    zone_memory_cap
    anon_alloc_fail                 0
    anonpgin                        0
    crtime                          0
    execpgin                        32440
    fspgin                          287917
    n_pf_throttle                   0
    n_pf_throttle_usec              0
    nover                           0
    pagedout                        0
    pgpgin                          320357
    physcap                         18446744073709551615
    rss                             0
    snaptime                        2939894.787492560
    swap                            1955274752
    swapcap                         18446744073709551615
    zonename                        global

I'm not sure what we can do about this in Marlin. We're hitting this when we gather the baseline kstat values before dispatching a task to a zone for the first time in this zone's lifetime. If this condition is extremely transient, we could retry the kstat_read() a few times (possibly even with a timeout). It's unclear to me how likely this is to work. The alternative is to fail the task, which is not retryable in this context, and so sucks for the user.

--

This is much simpler than we thought, and the clue is in the output from the very first D script. This output is from the third one, but it's the same as the first one:

 11    144            read_kstat_data:entry entry: pid 55897
user ksp = kstat32_t {
    hrtime_t ks_crtime = 0x292ae73c9fd858
    caddr32_t ks_next = 0x9630290
    kid32_t ks_kid = 0x74dd3b
    char [31] ks_module = [ "link" ]
    uint8_t ks_resv = 0
    int32_t ks_instance = 0
    char [31] ks_name = [ "z9363_net0" ]
    uint8_t ks_type = 0x1
    char [31] ks_class = [ "net" ]
    uint8_t ks_flags = 0x21
    caddr32_t ks_data = 0x9f1f488
    uint32_t ks_ndata = 0x16
    size32_t ks_data_size = 0x420
    hrtime_t ks_snaptime = 0x292ae73c9fd858
    int32_t _ks_update = 0
    caddr32_t _ks_private = 0
    int32_t _ks_snapshot = 0
    caddr32_t _ks_lock = 0
}
 11    145           read_kstat_data:return return: pid 55897
user ksp = kstat32_t {
    hrtime_t ks_crtime = 0x292ae73c9fd858
    caddr32_t ks_next = 0x9630290
    kid32_t ks_kid = 0x74dd3b
    char [31] ks_module = [ "link" ]
    uint8_t ks_resv = 0
    int32_t ks_instance = 0
    char [31] ks_name = [ "z9363_net0" ]
    uint8_t ks_type = 0x1
    char [31] ks_class = [ "net" ]
    uint8_t ks_flags = 0x1
    caddr32_t ks_data = 0x9f1f488
    uint32_t ks_ndata = 0x16
    size32_t ks_data_size = 0x445
    hrtime_t ks_snaptime = 0x292c55a57d94e6
    int32_t _ks_update = 0
    caddr32_t _ks_private = 0
    int32_t _ks_snapshot = 0
    caddr32_t _ks_lock = 0
}
 11    145           read_kstat_data:return 2013 Oct  2 00:16:05: kstat_read() returned ENOMEM! pid = 55897, psargs = /opt/smartdc/agents/lib/node_modules/marlin/build/node/bin/node /opt/smartdc/ag

Note that on the way into read_kstat_data(), ks_flags is 0x21, which is KSTAT_FLAG_INVALID. This flag is set when the kstat is allocated, cleared on kstat_install, and set again in kstat_delete. It's supposed to indicate when a kstat is not visible to userland. However, this userland program has such a kstat_t, which means it was returned by the kstat_read() of kstat 0. Indeed, header_kstat_snapshot() does not look at this flag at all, nor does libkstat. Also of note, read_kstat_data() does check KSTAT_FLAG_INVALID on the actual kstat, not the one that the user passed in.

So it looks like what happened here is that this program did a kstat_chain_update() which read kstat 0 during the small window in dls_stat_create() where the size is 0x420. It read a kstat header with size 0x420 and flag KSTAT_FLAG_INVALID. dls_stat_create() finished creating the kstat with the correct size and cleared the INVALID flag. Then the program tried to read the kstat, but got ENOMEM because the size was too small.

The cleanest fix would seem to be to have header_kstat_update() and header_kstat_snapshot() skip kstats with this flag set.

History

#1

Updated by Robert Mustacchi over 5 years ago

  • Status changed from New to Resolved
  • % Done changed from 90 to 100

Resolved in dc9df4786b08572d6032efbd47f727287e691656.

#2

Updated by Forrest Fletcher over 1 year ago

Issues of the online working have been huge for the students. The handling of the issues and https://www.rushmypapers.org/ is depicted for the further selection of the items for the smoothness in life of the teachers.

#3

Updated by Thaddeus Hutchinson over 1 year ago

I always want to try taking part with buddies as I think it is a good deal with we can have with our life. It was really nice to have been here as I could much more things know about https://www.huffingtonpost.com/ursula-nwobu/write-my-essay-the-new-bu_1_b_11451790.html site and different categories.

Also available in: Atom PDF