Bug #3154
Nonconforming tolower and toupper with UTF-8 locales
Added by Bernardo Signori over 8 years ago.
Updated over 8 years ago.
Category:
cmd - userland programs
Description
With any UTF-8 locale, tolower and toupper return null when passed an argument that is an invalid UTF-8 character. My understanding of the c standard is than any "arguments in the domain" that have no corresponding lowercase (or uppercase) letter should be returned unchanged.
Files
test.c (395 Bytes)
test.c |
test case |
Bernardo Signori, 2012-09-05 02:06 PM
|
|
test2.c (525 Bytes)
test2.c |
better test case |
Bernardo Signori, 2012-09-05 02:32 PM
|
|
"an invalid UTF-8 character" != "have no corresponding lowercase (or uppercase) letter"
Indeed.
debian:yuri:~$ uname -a
Linux debian 2.6.32-5-amd64 #1 SMP Sun May 6 04:00:17 UTC 2012 x86_64 GNU/Linux
debian:yuri:~$ ./test
tolower test PASS
toupper test PASS
phoenix:yuri:~$ uname -a
FreeBSD phoenix.xvoid.org 10.0-CURRENT FreeBSD 10.0-CURRENT #0 r240078: Tue Sep 4 07:38:03 MSK 2012 yuri@phoenix.xvoid.org:/usr/obj/data/src/freebsd/head/sys/PHOENIX amd64
phoenix:yuri:~$ ./test
tolower test PASS
toupper test PASS
sirius:yuri:~$ uname -a
SunOS sirius 5.11 tzdata2012e i86pc i386 i86pc
sirius:yuri:~$ ./test
tolower test FAIL
toupper test FAIL
Updated the test to make sure the locale changed, otherwise the tests may pass because the locale is still C.
Seems to be a bug in localedef.
Could it be that ctypes contains not all values < _CACHED_RUNES? And
the identity map will so not be complete?
for (ctn = avl_first(&ctypes); ctn; ctn = AVL_NEXT(&ctypes, ctn)) {
- Status changed from New to In Progress
- % Done changed from 0 to 70
- Category set to cmd - userland programs
- Status changed from In Progress to Resolved
- % Done changed from 70 to 100
- Tags deleted (
needs-triage)
Resolved in r13844 commit:90142813f484
Also available in: Atom
PDF