Bug #992
closedtowlower/towupper are broken
0%
Description
$ cat tow-test.c #include <locale.h> #include <stdio.h> #include <wchar.h> int main(void) { wint_t rusP = 0x41f; wint_t rusp = 0x43f; setlocale(LC_ALL, ""); printf("towlower: %lc -> %lc\n", rusP, towlower(rusP)); printf("towupper: %lc -> %lc\n", rusp, towupper(rusp)); return (0); } Illumos: $ LC_ALL=en_US.UTF-8 ./tow-test towlower: П -> П towupper: п -> п $ LC_ALL=ru_RU.UTF-8 ./tow-test towlower: П -> П towupper: п -> П Solaris 11: $ LC_ALL=en_US.UTF-8 ./tow-test towlower: П -> п towupper: п -> П $ LC_ALL=ru_RU.UTF-8 ./tow-test towlower: П -> п towupper: п -> П FreeBSD 8: $ LC_ALL=en_US.UTF-8 ./tow-test towlower: П -> п towupper: п -> П $ LC_ALL=ru_RU.UTF-8 ./tow-test towlower: П -> п towupper: п -> П
The letter is cyrillic 'pe' (just as an example, problem isn't limited to it).
I'm not sure if towlower/towupper are actually at fault here or problem is deeper and setting Garrett as assignee as he was working on it.
Related issues
Updated by Yuri Pankov almost 12 years ago
Looks like our maplower is badly broken, here is a part of the dump of both (notice that maplower contains lowercase chars, which it should have capital ones, and mappings doesn't exist):
printf("magic=%s encoding=%s\n", _CurrentRuneLocale->__magic, _CurrentRuneLocale->__encoding); rr = &_CurrentRuneLocale->__maplower_ext; base = rr->__ranges; printf("maplower: ranges=%d\n", rr->__nranges); for (lim = 0; lim != rr->__nranges; lim++) { re = base + lim; printf("range=%d rangelen=%d chars=%lc-%lc -> maps=%lc-%lc\n", (int)lim, (int)(re->__max - re->__min + 1), re->__min, re->__max, re->__map, re->__map + re->__max - re->__min); } printf("\n"); rr = &_CurrentRuneLocale->__mapupper_ext; base = rr->__ranges; printf("mapupper: ranges=%d\n", rr->__nranges); for (lim = 0; lim != rr->__nranges; lim++) { re = base + lim; printf("range=%d rangelen=%d chars=%lc-%lc -> maps=%lc-%lc\n", (int)lim, (int)(re->__max - re->__min + 1), re->__min, re->__max, re->__map, re->__map + re->__max - re->__min); } printf("\n");
added to lib/libc/port/locale/towlower.c
Illumos:
magic=RuneMagiUTF-8 encoding=UTF-8 maplower: ranges=393 range=0 rangelen=1 chars=ā-ā -> maps=^@-^@ range=1 rangelen=1 chars=ă-ă -> maps=^@-^@ range=2 rangelen=1 chars=ą-ą -> maps=^@-^@ range=3 rangelen=1 chars=ć-ć -> maps=^@-^@ range=4 rangelen=1 chars=ĉ-ĉ -> maps=^@-^@ range=5 rangelen=1 chars=ċ-ċ -> maps=^@-^@ range=6 rangelen=1 chars=č-č -> maps=^@-^@ range=7 rangelen=1 chars=ď-ď -> maps=^@-^@ range=8 rangelen=1 chars=đ-đ -> maps=^@-^@ range=9 rangelen=1 chars=ē-ē -> maps=^@-^@ range=10 rangelen=1 chars=ĕ-ĕ -> maps=^@-^@ ... mapupper: ranges=351 range=0 rangelen=1 chars=ā-ā -> maps=Ā-Ā range=1 rangelen=1 chars=ă-ă -> maps=Ă-Ă range=2 rangelen=1 chars=ą-ą -> maps=Ą-Ą range=3 rangelen=1 chars=ć-ć -> maps=Ć-Ć range=4 rangelen=1 chars=ĉ-ĉ -> maps=Ĉ-Ĉ range=5 rangelen=1 chars=ċ-ċ -> maps=Ċ-Ċ range=6 rangelen=1 chars=č-č -> maps=Č-Č range=7 rangelen=1 chars=ď-ď -> maps=Ď-Ď range=8 rangelen=1 chars=đ-đ -> maps=Đ-Đ range=9 rangelen=1 chars=ē-ē -> maps=Ē-Ē range=10 rangelen=1 chars=ĕ-ĕ -> maps=Ĕ-Ĕ ...
FreeBSD (useful, as our code is mostly identical for this stuff):
magic=RuneMagiUTF-8 encoding=UTF-8 maplower: ranges=410 range=0 rangelen=1 chars=Ā-Ā -> maps=ā-ā range=1 rangelen=1 chars=Ă-Ă -> maps=ă-ă range=2 rangelen=1 chars=Ą-Ą -> maps=ą-ą range=3 rangelen=1 chars=Ć-Ć -> maps=ć-ć range=4 rangelen=1 chars=Ĉ-Ĉ -> maps=ĉ-ĉ range=5 rangelen=1 chars=Ċ-Ċ -> maps=ċ-ċ range=6 rangelen=1 chars=Č-Č -> maps=č-č range=7 rangelen=1 chars=Ď-Ď -> maps=ď-ď range=8 rangelen=1 chars=Đ-Đ -> maps=đ-đ range=9 rangelen=1 chars=Ē-Ē -> maps=ē-ē range=10 rangelen=1 chars=Ĕ-Ĕ -> maps=ĕ-ĕ ... mapupper: ranges=418 range=0 rangelen=1 chars=ā-ā -> maps=Ā-Ā range=1 rangelen=1 chars=ă-ă -> maps=Ă-Ă range=2 rangelen=1 chars=ą-ą -> maps=Ą-Ą range=3 rangelen=1 chars=ć-ć -> maps=Ć-Ć range=4 rangelen=1 chars=ĉ-ĉ -> maps=Ĉ-Ĉ range=5 rangelen=1 chars=ċ-ċ -> maps=Ċ-Ċ range=6 rangelen=1 chars=č-č -> maps=Č-Č range=7 rangelen=1 chars=ď-ď -> maps=Ď-Ď range=8 rangelen=1 chars=đ-đ -> maps=Đ-Đ range=9 rangelen=1 chars=ē-ē -> maps=Ē-Ē range=10 rangelen=1 chars=ĕ-ĕ -> maps=Ĕ-Ĕ ...
Updated by Yuri Pankov almost 12 years ago
What we need to do here - fix __maplower_ext and make *.UTF-8/LC_CTYPE/LCL_DATA contain mappings for all locales, not just current one - no other OS makes case mappings and character type dependent on the current locale...
Updated by Yuri Pankov almost 12 years ago
Fixing the first appears to be easy, there's a nasty typo in cmd/localedef/ctype.c:
diff -r 4f5bb85e2547 usr/src/cmd/localedef/ctype.c
--- a/usr/src/cmd/localedef/ctype.c Fri May 06 07:32:53 2011 -0700
+++ b/usr/src/cmd/localedef/ctype.c Sun May 08 11:21:21 2011 +0400
@@ -321,8 +321,8 @@
ct[rl.runetype_ext_nranges - 1].map = ctn->ctype;
last_ct = ctn;
}
- if (ctn->toupper == 0) {
- last_up = NULL;
+ if (ctn->tolower == 0) {
+ last_lo = NULL;
} else if ((last_lo != NULL) &&
(last_lo->tolower + 1 == ctn->tolower)) {
lo[rl.maplower_ext_nranges-1].max = wc;
Updated by Yuri Pankov almost 12 years ago
- Assignee changed from Garrett D'Amore to Yuri Pankov
Updated by Gordon Ross over 11 years ago
- Status changed from New to Resolved
changeset: 13399:a1d28d03839f tag: tip user: Yuri Pankov <yuri.pankov@gmail.com> date: Thu May 12 03:21:34 2011 +0400 description: 992 towlower/towupper are broken Reviewed by: Garrett D'Amore <garrett@nexenta.com> Approved by: Gordon Ross <gwr@nexenta.com>