Project

General

Profile

Actions

Bug #13926

closed

core files can fail to dump leading large sections

Added by Robert Mustacchi 3 months ago. Updated about 2 months ago.

Status:
Closed
Priority:
Normal
Category:
kernel
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

While working on #13925 I noticed something weird when taking a core file (via pkill -ABRT) of a rust program. In this case I was running the debug build of the dropshot basic example. There were two section headers that notably differed from their gcore brethren:

$ elfdump -c core.dropshot.abrt 

Section Header[1]:  sh_name: .symtab
    sh_addr:      0x400000            sh_flags:   0
    sh_size:      0                   sh_type:    [ SHT_SYMTAB ]
    sh_offset:    0                   sh_entsize: 0x18 (0 entries)
    sh_link:      2                   sh_info:    38512
    sh_addralign: 0x8               

Section Header[2]:  sh_name: .strtab
    sh_addr:      0x400000            sh_flags:   [ SHF_STRINGS ]
    sh_size:      0                   sh_type:    [ SHT_STRTAB ]
    sh_offset:    0                   sh_entsize: 0
    sh_link:      0                   sh_info:    0
    sh_addralign: 0x1               

If you look at the version from gcore it looks more like the following:

Section Header[1]:  sh_name: .symtab
    sh_addr:      0x400000            sh_flags:   0
    sh_size:      0x206dc0            sh_type:    [ SHT_SYMTAB ]
    sh_offset:    0x1c3f37c           sh_entsize: 0x18 (88552 entries)
    sh_link:      2                   sh_info:    38512
    sh_addralign: 0x8               

Section Header[2]:  sh_name: .strtab
    sh_addr:      0x400000            sh_flags:   [ SHF_STRINGS ]
    sh_size:      0x5c29ed            sh_type:    [ SHT_STRTAB ]
    sh_offset:    0x1e4613c           sh_entsize: 0
    sh_link:      0                   sh_info:    0
    sh_addralign: 0x1               

Staring at the code and doing a bunch of experiments, I hypothesized that this would happen if we failed to do something correctly in copy_scn(). I was able to prove this hypothesis with the following bit of DTrace:

$ pfexec dtrace -n 'fbt::copy_scn:entry{ self->t = 1; } ' -n 'fbt::copy_scn:return{self->t = 0; }' -n 'fbt::vn_rdwr:entry/self->t/{ self->res = args[9]; self->len = args[3]; }' -n 'fbt::vn_rdwr:return/self->t/{ @["mismatch", *self->res >= self->len] = count();  @["vn", arg1] = count(); }' -n 'fbt::core_write:return/self->t/{ @["core", arg1] = count(); }'
dtrace: description 'fbt::copy_scn:entry' matched 2 probes
dtrace: description 'fbt::copy_scn:return' matched 2 probes
dtrace: description 'fbt::vn_rdwr:entry' matched 1 probe
dtrace: description 'fbt::vn_rdwr:return' matched 1 probe
dtrace: description 'fbt::core_write:return' matched 1 probe
^C

  mismatch                                                  1                2
  core                                                              0               24
  mismatch                                                  0               48
  vn                                                                0               50

Further, I briefly stopped in kmdb and found that we had a case where the buf and size arguments to copy_scn() were in fact both NULL and zero. This explains why the copy_scn() call failed and the sections were null. Of course, the question is why did this happen. In this case, the answer is somewhat interesting. In process_scns() we always initialize the buffer and size of it to NULL/0 respectively. As we encounter section headers that we want to save that are larger than the current size, but critically less than elf_datasz_max, we will then reallocate the size of this buffer. Ultimately what happened here is that the first two sections are in fact actually longer than the 1 MiB limit that we have. This means that we won't allocate the buffer and it is in fact not sized correctly.

We should do two things here:

1. Fix this so we correctly dump this.
2. Add a section header equivalent to the program header flag PF_SUNW_FAILURE which could be used to speed this up. Perhaps called SHF_ILLUMOS_FAILURE or equivalent.

Actions #1

Updated by Robert Mustacchi 2 months ago

  • % Done changed from 0 to 80
Actions #2

Updated by Robert Mustacchi 2 months ago

I ended up implementing the first part here, but decided to pass on implementing the failure indicator for the moment. We do write the errno in the first four bytes at least. To test this, I started by taking the program in question, the dropshot basic example on a debug build and took a gcore and followed that with a kill -ABRT.

I then went to analyze the two cores. The key here is that because gcore didn't change with this (though was tested with #13927), the expectation is that things generally shouldn't have changed.I looked at a few different things here:

1. I first did an nm | sort. I found that we had the identical symbol tables. This was not the case previously and was a major improvement as the original program symbol table was what we failed to dump.
2. I compared the pmap output from the two cores and found they were identical.
3. I compared the full elfdump output between the two. There were more differences here primarily because the order of the symbol table and corresponding string tables were different in the two core generation tools because gcore leverages libproc's mapping order logic while the kernel goes in the program header order. I believe that the testing via nm verifies that while the order was different we actually had the same symbol (and also string) tables. Notably practically all the other sections were the same with a minor case of an alignment difference on one of them.
4. I spot checked gcore and sending SIGABRT to a few other processes to verify that there was nothing different there due to this change and did not find anything.
5. I used elfdump -I 5 -w to extract the heap from both cores and verified that they were identical. I also verified that when all program headers were dumped they were the same size; however, they understandably did not have exactly the same content because of signals killing it in the other case.
6. I also verified that there were no memory leaks due to the testing with #13927.

Actions #3

Updated by Electric Monk about 2 months ago

  • Status changed from New to Closed
  • % Done changed from 80 to 100

git commit 5d228828cbfb65f9632a1eedca4291380fca8303

commit  5d228828cbfb65f9632a1eedca4291380fca8303
Author: Robert Mustacchi <rm@fingolfin.org>
Date:   2021-08-03T14:36:50.000Z

    13926 core files can fail to dump leading large sections
    13927 core dump of PROT_NONE segment leads to confusing behavior
    Reviewed by: Jason King <jason.brian.king@gmail.com>
    Reviewed by: Patrick Mooney <pmooney@pfmooney.com>
    Reviewed by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
    Approved by: Dan McDonald <danmcd@joyent.com>

Actions

Also available in: Atom PDF