Bug #4111
closedwcwidth() still not right for some characters
100%
Description
Yuri Pankov reported that some characters, such as れ (Unicode 0x308c) are not handled properly in the terminal, with the fix for 2408 applied. Turns out that every even numbered unicode point between 3080 and 308c is also wrong. (And probably very many others besides!)
Analysis shows that my changes for 2408 had a bug in the yacc grammar, which led certain characters to be stripped from consideration. Its quite amazing, since all of my trial characters were handled properly; perhaps I unknowingly only tested characters on odd lines. (I chose a fairly random sampling, but by no means exhaustive.)
The fix is fortunately very trivial (took a while to figure out though!)
As a bonus, this fix (which I'll post momentarily) shrinks the file size from about 250K per locale to about 94 K, because we get much more use of overlapping ctype data. This should also result in faster lookups for ctype data for extended characters, as well, as the binary search only needs to consider less than half the original data size (so about 1-2 fewer search tries in the binary search, on average.)
Updated by Garrett D'Amore almost 10 years ago
webrev here: http://cr.illumos.org/~gdamore/localedef2/
Updated by Garrett D'Amore almost 10 years ago
- Status changed from In Progress to Resolved
- % Done changed from 80 to 100
commit 33cf071bf2879e2693faa899bc1c16776dcbf897
Author: Garrett D'Amore <garrett@dey-sys.com>
Date: Wed Sep 4 14:48:51 2013 -0700
4111 wcwidth() still not right for some characters
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Approved by: Dan McDonald <danmcd@nexenta.com>
Updated by Zhiwen Zheng over 9 years ago
what files should I substitute on a running openindiana to resolve this problem?