Project

General

Profile

Actions

Feature #14417

closed

Disable gcc's reorder-blocks-and-partition optimisation

Added by Andy Fiddaman over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Bite-size
Tags:
Gerrit CR:
External Bug:

Description

From the gcc manual at https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

-freorder-blocks-and-partition
In addition to reordering basic blocks in the compiled function, in order to reduce number of taken branches, partitions hot and cold basic blocks into separate sections of the assembly and .o files, to improve paging and cache locality performance.

This optimization is automatically turned off in the presence of exception handling, for linkonce sections, for functions with a user-defined section attribute and on any architecture that does not support named sections.

Enabled for x86 at levels -O2, -O3.

When building illumos-gate with gcc-10, this optimisation creates additional cold symbols in objects.

For example, in genunix:

$ nm /kernel/amd64/genunix | fgrep .cold
[464]   |              198772|                  18|FUNC |LOCL |0    |1      |cred2ucaud.cold
[537]   |              232595|                  20|FUNC |LOCL |0    |1      |cyclic_omni_stop.cold
[2333]  |             1044975|                  10|FUNC |LOCL |0    |1      |devmap_device.cold
[1407]  |              634691|                  10|FUNC |LOCL |0    |1      |free_action.cold
[1403]  |              632088|                  10|FUNC |LOCL |0    |1      |free_mod.cold
[1063]  |              479491|                  10|FUNC |LOCL |0    |1      |freeproc.cold
[3354]  |             1561509|                  10|FUNC |LOCL |0    |1      |getflabel_cipso.cold
[1516]  |              666274|                  10|FUNC |LOCL |0    |1      |kmem_error.cold
[2930]  |             1394313|                  10|FUNC |LOCL |0    |1      |mdi_phci_unregister.cold
[1841]  |              822193|                  10|FUNC |LOCL |0    |1      |netstack_shared_kstat_remove.cold
[1840]  |              821890|                  10|FUNC |LOCL |0    |1      |netstack_shared_zone_remove.cold
[3178]  |             1502483|                  10|FUNC |LOCL |0    |1      |pm_return_lock.cold
[3145]  |             1479103|                   2|FUNC |LOCL |0    |1      |pm_set_device_threshold.cold
[1405]  |              633595|                   9|FUNC |LOCL |0    |1      |ref_action.cold
[2795]  |             1308164|                  10|FUNC |LOCL |0    |1      |rele_dm.cold
[1406]  |              633815|                  19|FUNC |LOCL |0    |1      |unref_action.cold
[1401]  |              630780|                  10|FUNC |LOCL |0    |1      |unref_mod.cold

gcc7 does not do this, although the optimisation is similarly enabled for -O2 and above. In total there are 352 .cold symbols introduced proto objects with gcc10.


Related issues

Related to illumos gate - Feature #14421: use GCC 10 as default primary compilerClosed

Actions
Actions #1

Updated by Electric Monk over 1 year ago

  • Gerrit CR set to 1963
Actions #2

Updated by Andy Fiddaman over 1 year ago

A wsdiff (with gcc7 primary) is unchanged with this patch modulo objects which contain DWARF data have changed due the different gcc options.

With a gcc10 primary, .cold symbols are no longer created.

Actions #3

Updated by Rich Lowe over 1 year ago

Out of interest, are these aliases or does it actually split the code and jump/call into them?

Actions #4

Updated by Andy Fiddaman over 1 year ago

Rich Lowe wrote in #note-3:

Out of interest, are these aliases or does it actually split the code and jump/call into them?

Neither?

The original function seems unchanged but there is a new .cold function that doesn't look like it's called from anywhere.

Here's an example, where pcishpc_create_controller() is the same with and without the optimisation that creates the .cold block. It very probably shouldn't be creating this new section at all without some profiling options.

disassembly for root.gcc10/kernel/misc/amd64/pcie

pcishpc_create_controller.cold()
    pcishpc_create_controller.cold:      bf a8 01 00 00     movl   $0x1a8,%edi
    pcishpc_create_controller.cold+0x5:  e8 00 00 00 00     call   +0x0 <pcishpc_create_controller.cold+0xa>
    pcishpc_create_controller.cold+0xa:  48 8d b8 18 01 00  leaq   0x118(%rax),%rdi
                                         00
    pcishpc_create_controller.cold+0x11: 48 89 c3           movq   %rax,%rbx
    pcishpc_create_controller.cold+0x14: 4c 89 20           movq   %r12,(%rax)
    pcishpc_create_controller.cold+0x17: 31 c9              xorl   %ecx,%ecx
    pcishpc_create_controller.cold+0x19: 31 f6              xorl   %esi,%esi
    pcishpc_create_controller.cold+0x1b: ba 01 00 00 00     movl   $0x1,%edx
    pcishpc_create_controller.cold+0x20: e8 00 00 00 00     call   +0x0 <pcishpc_create_controller.cold+0x25>
    pcishpc_create_controller.cold+0x25: 48 8d 7b 08        leaq   0x8(%rbx),%rdi
    pcishpc_create_controller.cold+0x29: 31 c9              xorl   %ecx,%ecx
    pcishpc_create_controller.cold+0x2b: ba 04 00 00 00     movl   $0x4,%edx
    pcishpc_create_controller.cold+0x30: 31 f6              xorl   %esi,%esi
    pcishpc_create_controller.cold+0x32: e8 00 00 00 00     call   +0x0 <pcishpc_create_controller.cold+0x37>
    pcishpc_create_controller.cold+0x37: c7 43 10 01 00 00  movl   $0x1,0x10(%rbx)
                                         00
    pcishpc_create_controller.cold+0x3e: c7 04 25 84 00 00  movl   $0x0,0x84
                                         00 00 00 00 00
    pcishpc_create_controller.cold+0x49: 0f 0b              ud2
    pcishpc_create_controller.cold+0x4b: 4c 89 2c 25 88 00  movq   %r13,0x88
                                         00 00
    pcishpc_create_controller.cold+0x53: 0f 0b              ud2
Actions #5

Updated by Andy Fiddaman over 1 year ago

A great many of these .cold functions are 10 bytes and look like this:

disassembly for root.gcc10/kernel/amd64/genunix

free_action.cold()
    free_action.cold:     48 8b 04 25 00 00  movq   0x0,%rax
                          00 00
    free_action.cold+0x8: 0f 0b              ud2
Actions #6

Updated by Andy Fiddaman over 1 year ago

  • Related to Feature #14421: use GCC 10 as default primary compiler added
Actions #7

Updated by Electric Monk over 1 year ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

git commit 49a4bc70cea348bb81385206be344cf03f3d2b7c

commit  49a4bc70cea348bb81385206be344cf03f3d2b7c
Author: Andy Fiddaman <omnios@citrus-it.co.uk>
Date:   2022-01-24T10:25:07.000Z

    14417 Disable gcc's reorder-blocks-and-partition optimisation
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Gordon Ross <gordon.w.ross@gmail.com>

Actions

Also available in: Atom PDF