Bug #13982
closedexcessive threads slows down find_elf
100%
Description
i was recently on a system and found that find_elf was taking quite a long time, on the order of 10-15 minutes on a system with 256 threads. I was actually able to reproduce this and found that setting the DMAKE_MAX_JOBS in both cases resulted in sub-optimal time even on a system that had a single socket, faster, and less prototype CPUs. Note, because a lot of testing here was done over and over on the same workspace, most of the data and related was cached in the ARC and what wasn't was on either a SATA or NVMe device depending on the system. However, the microstate time suggests that we were not really
For example, here's the ptime -m output on the larger system:
real 11:38.453445556 user 1:16.246767121 sys 19:35.662120867 trap 0.027121963 tflt 0.000067207 dflt 0.041581569 kflt 0.000000000 lock 5:55:42.802274076 slp 2:31.110492619 lat 47.070167751 stop 43:19:01.297988822
I did a series of experiments on this system and a less prototype system (which had DMAKE_MAX_JOBS set to 34) and found that in all cases limiting the number of threads actually improved things here. in comparison, we see about 10x faster performance with 8 threads:
real 1:06.239679833 user 1:11.588261203 sys 3:48.228552512 trap 0.027870508 tflt 0.007836830 dflt 0.169006699 kflt 0.001164796 lock 1:17.338284433 slp 2:11.645642483 lat 3.808370936 stop 5:32.783047233
The reason there's a dramatic difference in stop time here is because fork1() must still actually get all threads serialized. When looking at the fork rate with DTrace (e.g. instrumenting cfork with a predicate), in the large thread case I saw a consistent 12 forks per second. Note that there is other work going on with each fork; however, when set to 8 jobs, it was going between 95-120 depending on this prototype, lower frequency system. This makes sense, as there's a lot less threads to try and coordinate and stop, reducing the overall cost even if we're not actually duplicating them all via forkall().
For context, here is the approximate ptime -m output from several different chosen values:
1 thread:
real 2:57.265522813 user 47.469055181 sys 1:57.157531583 trap 0.005804202 tflt 0.000000000 dflt 0.000173237 kflt 0.000000000 lock 0.013249417 slp 5:35.036242456 lat 1.343701798 stop 24.036933125
4 threads:
real 1:09.665161596 user 58.943810439 sys 3:02.334504686 trap 0.019857111 tflt 0.001991922 dflt 0.045399272 kflt 0.000873948 lock 1.524442200 slp 3:09.884952247 lat 2.037143747 stop 2:23.712300256
16 threads:
real 1:33.188876060 user 1:15.710282746 sys 4:25.749822644 trap 0.029206395 tflt 0.003432280 dflt 0.140559800 kflt 0.000824983 lock 5:54.414623916 slp 2:06.719068022 lat 5.540346113 stop 16:56.440859479
32 threads:
real 2:15.436185160 user 1:16.567693330 sys 5:24.169648195 trap 0.035824669 tflt 0.000909151 dflt 0.082744993 kflt 0.000640469 lock 16:05.650229058 slp 1:57.733760604 lat 8.507864634 stop 54:01.475709686
Based on all this I believe we should just take a prudent attempt and set a max of 8 threads for the time being on this.
Related issues
Updated by Robert Mustacchi about 2 years ago
- Related to Feature #13248: parallelise the quest for elves added
Updated by Robert Mustacchi about 2 years ago
- Subject changed from excesive threads slows down find_elf to excessive threads slows down find_elf
Updated by Robert Mustacchi about 2 years ago
I tested this by verifying that in an environment where DMAKE_MAX_JOBS is set higher than the thread count that pflags reported that we had limited the size of the thread pool correctly during the build (and manually).
Updated by Electric Monk about 2 years ago
- Status changed from New to Closed
- % Done changed from 80 to 100
git commit 1d806c5f41e9c01235a8d6889ebc483b3de3cf4c
commit 1d806c5f41e9c01235a8d6889ebc483b3de3cf4c Author: Robert Mustacchi <rm@fingolfin.org> Date: 2021-07-30T20:28:41.000Z 13982 excessive threads slows down find_elf Reviewed by: Andrew Stormont <andyjstormont@gmail.com> Reviewed by: Andy Fiddaman <andy@omnios.org> Reviewed by: Toomas Soome <tsoome@me.com> Approved by: Dan McDonald <danmcd@joyent.com>