Project

General

Profile

Actions

Bug #13727

closed

rust demangler loops when characters trail terminating E

Added by Jason King about 2 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
lib - userland libraries
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

A user attempting to run elfdump -C on libxul.so (a Firefox library), it was hanging in rust_demangle():

$ pstack 5186
5186:   /usr/bin/amd64/elfdump -C -s libxul.so
00007fffaf397b9d sv_init_sv (7fffbfff6480, 7fffbfff6460) + 1d
00007fffaf3a1cdd rust_demangle (7fff96a14c45, 4b, 7fffaf3b4f50) + 16d
00007fffaf3a196e sysdemangle (7fff96a14c45, 0, 0) + ee
00007fffaf3383d8 conv_demangle_name () + 38
00007fffaf3371c4 Elf64_demangle_name (7fff96a14c45) + 44
00007fffaf336068 Elf64_syms_table_entry (0, 1, 7fffbfff682e, 6, 3e, 7fff95e916a8, 0, 0, ...) + d8
00000000004205fe output_symbol () + 33e
0000000000422ae0 symbols64 () + 160
0000000000427e18 regular64 () + 878
000000000040dee5 decide () + 65
000000000040e751 main () + 331
000000000040d217 _start_crt () + 87
000000000040d178 _start () + 18

The actual symbol being demangled was _ZN7mozilla3dom13BrowserParent22RecvUpdateContentCacheERKNS_12ContentCacheE -- a valid C++ mangled symbol.

Enabling demangle debugging provides some insight:

LIBDEMANGLE: name = '_ZN7mozilla3dom13BrowserParent22RecvUpdateContentCacheERKNS_12ContentCacheE'
LIBDEMANGLE: lang = auto (0)
LIBDEMANGLE: detected language is rust
LIBDEMANGLE: checking for '_ZN' or '__ZN' in '_ZN7mozilla3dom13BrowserParent22RecvUpdateContentCacheERKNS_12ContentCacheE'
LIBDEMANGLE: parsed prefix; remaining='7mozilla3dom13BrowserParent22RecvUpdateContentCacheERKNS_12ContentCacheE'
LIBDEMANGLE: rustdem_parse_num: str='7mozilla3dom13BrowserParent22RecvUpdateContentCacheERKNS_12ContentCacheE'
LIBDEMANGLE: rustdem_parse_num: num=7
LIBDEMANGLE: rustdem_parse_name_segment: segment='mozilla'
LIBDEMANGLE: rustdem_parse_name_segment: consumed '7mozill'
LIBDEMANGLE: rustdem_parse_num: str='3dom13BrowserParent22RecvUpdateContentCacheERKNS_12ContentCacheE'
LIBDEMANGLE: rustdem_parse_num: num=3
LIBDEMANGLE: rustdem_parse_name_segment: segment='dom'
LIBDEMANGLE: rustdem_parse_name_segment: consumed '3do'
LIBDEMANGLE: rustdem_parse_num: str='13BrowserParent22RecvUpdateContentCacheERKNS_12ContentCacheE'
LIBDEMANGLE: rustdem_parse_num: num=13
LIBDEMANGLE: rustdem_parse_name_segment: segment='BrowserParent'
LIBDEMANGLE: rustdem_parse_name_segment: consumed '13BrowserPare'
LIBDEMANGLE: rustdem_parse_num: str='22RecvUpdateContentCacheERKNS_12ContentCacheE'
LIBDEMANGLE: rustdem_parse_num: num=22
LIBDEMANGLE: rustdem_parse_name_segment: segment='RecvUpdateContentCache'
LIBDEMANGLE: rustdem_parse_name_segment: consumed '22RecvUpdateContentCac'

(Ignoring the last truncated value, which is merely a bug w/ the debug statement truncating the output), it parses '22RecvUpdateContentCache' then goes into a loop.

Since this isn't a valid mangled rust symbol, the rust demangling code should return an error (and not loop forever). Unfortunately, the rust demangling is seeing the E terminal after 22RecvUpdateContentCache and stopping, but rust_demangle() is attempting to continue to demangle because there are more characters remaining in the name.

With a bit more history with legacy mangled rust symbols, we can improve the demangling code further (since the mangling format is not actually document, there still is some guesswork here, but guesswork with I think a reasonable basis). A rust mangled symbol seems to largely follow the C++ 'nested name' production -- N(<len><name>{len})+[<len>h<hex>]E. The main difference is that the rust version does not allow as many non-terminal symbols (e.g. <CV-qualifier> or <ref-qualifier>) as does C++, and that the optional disambiguating hash value as the last component (I say optional because it's not clear when the compiler will or will not generate the hash).

The current code allows for the possibility of multiple concatenated nested names, but based on current experience, this does not seem to be something that the rust compiler will generate (also, re-examining the rustc-demangle crate also suggests this is the case) -- each mangled symbol appears to consist of a single nested name (with an arbitrary number of components/unqalified names). We can tighten the rust parsing to reflect this. Additionally, once it reaches the terminal E in a nested name, any remaining characters (such as in the above example) should cause a demangling failure (as it cannot be a valid rust symbol).

We can also adjust the logic when demangling a symbol without an explicit language given -- try to demangle as rust first, and if that fails, try c++ (and if that fails, fail the request). Any C++ mangled name simple enough to pass as a rust symbol should demangle the same using either code (and produce the same results), and more complicated C++ symbols (as is the case with the symbol that triggered the issue) should fail the rust demangling, and we can fall back to the C++ mangling.

Actions #1

Updated by Jason King about 2 years ago

  • Description updated (diff)
Actions #2

Updated by Electric Monk about 2 years ago

  • Gerrit CR set to 1425
Actions #3

Updated by Jason King about 2 years ago

To test, I ran all of the existing demangler tests, as well as the added test case that originally triggered the bug. All now pass.

Actions #4

Updated by Electric Monk about 2 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit f5ac85908213ce2217329b835bf3c91f1c04b793

commit  f5ac85908213ce2217329b835bf3c91f1c04b793
Author: Jason King <jason.brian.king@gmail.com>
Date:   2021-04-20T18:46:51.000Z

    13727 rust demangler loops when characters trail terminating E
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: Andy Fiddaman <Andy@omniosce.org>
    Approved by: Robert Mustacchi <rm@fingolfin.org>

Actions

Also available in: Atom PDF