Feature #14417
closedDisable gcc's reorder-blocks-and-partition optimisation
100%
Description
From the gcc manual at https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
-freorder-blocks-and-partition In addition to reordering basic blocks in the compiled function, in order to reduce number of taken branches, partitions hot and cold basic blocks into separate sections of the assembly and .o files, to improve paging and cache locality performance. This optimization is automatically turned off in the presence of exception handling, for linkonce sections, for functions with a user-defined section attribute and on any architecture that does not support named sections. Enabled for x86 at levels -O2, -O3.
When building illumos-gate with gcc-10, this optimisation creates additional cold
symbols in objects.
For example, in genunix:
$ nm /kernel/amd64/genunix | fgrep .cold [464] | 198772| 18|FUNC |LOCL |0 |1 |cred2ucaud.cold [537] | 232595| 20|FUNC |LOCL |0 |1 |cyclic_omni_stop.cold [2333] | 1044975| 10|FUNC |LOCL |0 |1 |devmap_device.cold [1407] | 634691| 10|FUNC |LOCL |0 |1 |free_action.cold [1403] | 632088| 10|FUNC |LOCL |0 |1 |free_mod.cold [1063] | 479491| 10|FUNC |LOCL |0 |1 |freeproc.cold [3354] | 1561509| 10|FUNC |LOCL |0 |1 |getflabel_cipso.cold [1516] | 666274| 10|FUNC |LOCL |0 |1 |kmem_error.cold [2930] | 1394313| 10|FUNC |LOCL |0 |1 |mdi_phci_unregister.cold [1841] | 822193| 10|FUNC |LOCL |0 |1 |netstack_shared_kstat_remove.cold [1840] | 821890| 10|FUNC |LOCL |0 |1 |netstack_shared_zone_remove.cold [3178] | 1502483| 10|FUNC |LOCL |0 |1 |pm_return_lock.cold [3145] | 1479103| 2|FUNC |LOCL |0 |1 |pm_set_device_threshold.cold [1405] | 633595| 9|FUNC |LOCL |0 |1 |ref_action.cold [2795] | 1308164| 10|FUNC |LOCL |0 |1 |rele_dm.cold [1406] | 633815| 19|FUNC |LOCL |0 |1 |unref_action.cold [1401] | 630780| 10|FUNC |LOCL |0 |1 |unref_mod.cold
gcc7 does not do this, although the optimisation is similarly enabled for -O2 and above. In total there are 352 .cold
symbols introduced proto objects with gcc10.
Related issues
Updated by Andy Fiddaman over 1 year ago
A wsdiff (with gcc7 primary) is unchanged with this patch modulo objects which contain DWARF data have changed due the different gcc options.
With a gcc10 primary, .cold
symbols are no longer created.
Updated by Rich Lowe over 1 year ago
Out of interest, are these aliases or does it actually split the code and jump/call into them?
Updated by Andy Fiddaman over 1 year ago
Rich Lowe wrote in #note-3:
Out of interest, are these aliases or does it actually split the code and jump/call into them?
Neither?
The original function seems unchanged but there is a new .cold
function that doesn't look like it's called from anywhere.
Here's an example, where pcishpc_create_controller() is the same with and without the optimisation that creates the .cold block. It very probably shouldn't be creating this new section at all without some profiling options.
disassembly for root.gcc10/kernel/misc/amd64/pcie pcishpc_create_controller.cold() pcishpc_create_controller.cold: bf a8 01 00 00 movl $0x1a8,%edi pcishpc_create_controller.cold+0x5: e8 00 00 00 00 call +0x0 <pcishpc_create_controller.cold+0xa> pcishpc_create_controller.cold+0xa: 48 8d b8 18 01 00 leaq 0x118(%rax),%rdi 00 pcishpc_create_controller.cold+0x11: 48 89 c3 movq %rax,%rbx pcishpc_create_controller.cold+0x14: 4c 89 20 movq %r12,(%rax) pcishpc_create_controller.cold+0x17: 31 c9 xorl %ecx,%ecx pcishpc_create_controller.cold+0x19: 31 f6 xorl %esi,%esi pcishpc_create_controller.cold+0x1b: ba 01 00 00 00 movl $0x1,%edx pcishpc_create_controller.cold+0x20: e8 00 00 00 00 call +0x0 <pcishpc_create_controller.cold+0x25> pcishpc_create_controller.cold+0x25: 48 8d 7b 08 leaq 0x8(%rbx),%rdi pcishpc_create_controller.cold+0x29: 31 c9 xorl %ecx,%ecx pcishpc_create_controller.cold+0x2b: ba 04 00 00 00 movl $0x4,%edx pcishpc_create_controller.cold+0x30: 31 f6 xorl %esi,%esi pcishpc_create_controller.cold+0x32: e8 00 00 00 00 call +0x0 <pcishpc_create_controller.cold+0x37> pcishpc_create_controller.cold+0x37: c7 43 10 01 00 00 movl $0x1,0x10(%rbx) 00 pcishpc_create_controller.cold+0x3e: c7 04 25 84 00 00 movl $0x0,0x84 00 00 00 00 00 pcishpc_create_controller.cold+0x49: 0f 0b ud2 pcishpc_create_controller.cold+0x4b: 4c 89 2c 25 88 00 movq %r13,0x88 00 00 pcishpc_create_controller.cold+0x53: 0f 0b ud2
Updated by Andy Fiddaman over 1 year ago
A great many of these .cold functions are 10 bytes and look like this:
disassembly for root.gcc10/kernel/amd64/genunix free_action.cold() free_action.cold: 48 8b 04 25 00 00 movq 0x0,%rax 00 00 free_action.cold+0x8: 0f 0b ud2
Updated by Andy Fiddaman over 1 year ago
- Related to Feature #14421: use GCC 10 as default primary compiler added
Updated by Electric Monk over 1 year ago
- Status changed from In Progress to Closed
- % Done changed from 0 to 100
git commit 49a4bc70cea348bb81385206be344cf03f3d2b7c
commit 49a4bc70cea348bb81385206be344cf03f3d2b7c Author: Andy Fiddaman <omnios@citrus-it.co.uk> Date: 2022-01-24T10:25:07.000Z 14417 Disable gcc's reorder-blocks-and-partition optimisation Reviewed by: Toomas Soome <tsoome@me.com> Approved by: Gordon Ross <gordon.w.ross@gmail.com>