Bug #4111

wcwidth() still not right for some characters

Added by Garrett D'Amore over 7 years ago. Updated about 7 years ago.

locale - data and messages
Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:


Yuri Pankov reported that some characters, such as れ (Unicode 0x308c) are not handled properly in the terminal, with the fix for 2408 applied. Turns out that every even numbered unicode point between 3080 and 308c is also wrong. (And probably very many others besides!)

Analysis shows that my changes for 2408 had a bug in the yacc grammar, which led certain characters to be stripped from consideration. Its quite amazing, since all of my trial characters were handled properly; perhaps I unknowingly only tested characters on odd lines. (I chose a fairly random sampling, but by no means exhaustive.)

The fix is fortunately very trivial (took a while to figure out though!)

As a bonus, this fix (which I'll post momentarily) shrinks the file size from about 250K per locale to about 94 K, because we get much more use of overlapping ctype data. This should also result in faster lookups for ctype data for extended characters, as well, as the binary search only needs to consider less than half the original data size (so about 1-2 fewer search tries in the binary search, on average.)


Updated by Garrett D'Amore over 7 years ago

  • Tags deleted (needs-triage)

Updated by Garrett D'Amore over 7 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 80 to 100

commit 33cf071bf2879e2693faa899bc1c16776dcbf897
Author: Garrett D'Amore <>
Date: Wed Sep 4 14:48:51 2013 -0700

4111 wcwidth() still not right for some characters
Reviewed by: Yuri Pankov &lt;&gt;
Reviewed by: Gordon Ross &lt;&gt;
Approved by: Dan McDonald &lt;&gt;

Updated by Zhiwen Zheng about 7 years ago

what files should I substitute on a running openindiana to resolve this problem?

Also available in: Atom PDF