Project

General

Profile

Actions

Feature #14712

closed

Replace find_elf with something faster

Added by Jason King about 1 month ago. Updated 7 days ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
tools - gate/build tools
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

While #13248 and #13982 improved the speed of find_elf.pl, it still ends up being an exercise in how fast the system can fork() + exec() because of it's need to call elfedit on potentially every file it encounters. The -f and/or -s flag can improve this somewhat, but it would be better to convert this into an executable that uses libelf to gather the necessary data.

Actions #1

Updated by Electric Monk about 1 month ago

  • Gerrit CR set to 2163
Actions #2

Updated by Jason King 10 days ago

To test this, a nightly build was run with the -t option to use the usr/src/tools code (i.e. the new find_elf). The ELF-data.i386/object_list was compared to a nightly build of master (i.e. no change).

There were two main differences in the find_elf output:

- In the first instance, the perl script picked a different 'original' object for things that were hard linked. For example, the perl find_elf chose usr/bin/rev as the 'object' and bin/bg, bin/cd, etc. as aliases while the C version chose usr/bin/alias as the 'object' while bin/rev, bin/bg, etc where aliases.
- In the second instance (related to the above one), the perl script did not always detect that a file was actually a hard link of an existing file it had processed, so the perl script would output what was in essence the same file multiple times (with different paths). For example, the perl script (at least in the VM I tested with) did not identify kernel/crypto/amd64/md5 and kernel/misc/amd64/md5 as the same file (even though ls -i verified they in fact had the same inode value).

The net result of these differences was some amount of build noise when doing builds against the same workspace across this change.When the runtime.attr files (created by check_rtime) are diffed by nightly across the change, the files that the perl script did not identify as aliases will show up as 'missing' in the runtime.attr based off the C find_elf output (as check_rtime skips aliases so a given file is only processed once).

It is probably worth noting that the perl version stored this info in a hash, so it's output was technically non-deterministic so difference such as the above were always possible (though likely required the addition or removal of a hard link to alter the way the data was hashed by perl). Regardless, the changes seem reasonable, and don't reflect a material change in the information they represent (and of course the noise is only present with transitioning between the perl and C version if one does not clean out the workspace between builds).

As an additional test, find_elf output was also run against various arbitrary parts of a live system (e.g. /usr) and the output compared. The differences are what would be expected as explained by the above behavior.

Actions #3

Updated by Jason King 8 days ago

Since the question might come up, doing a very simple comparison of run times between the two, a run of each version yielded the following run times when both the perl and C versions were run against the same proto area:

The perl version:
real 1m20.860s
user 1m7.127s
sys 4m7.592s

The C version:
real 0m10.784s
user 0m1.245s
sys 0m9.175s

Both were run a few times to ensure as much information was cached in RAM as possible (so this represents a best case for both scripts). Running both versions multiple times yielded similar results (though I didn't average them together to get a mean or variance). The difference seemed more than 'good enough'.

Actions #4

Updated by Electric Monk 7 days ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit d203f83453f8fc67147d6661b4c202c221ed0411

commit  d203f83453f8fc67147d6661b4c202c221ed0411
Author: Jason King <jason.brian.king@gmail.com>
Date:   2022-06-24T20:36:58.000Z

    14712 Replace find_elf with something faster
    Reviewed by: Peter Tribble <peter.tribble@gmail.com>
    Reviewed by: Andy Fiddaman <andy@omnios.org>
    Approved by: Dan McDonald <danmcd@mnx.io>

Actions

Also available in: Atom PDF