Bug #1214
closedtowlower/towupper still missing some mappings
0%
Description
While reviewing the fix for issue 992
https://www.illumos.org/issues/992
I compared the compiled upper/lower mappings created by
that fix to the list of characters shown in this file:
http://unicode.org/Public/UNIDATA/CaseFolding.txt
and there appear to be several missing.
Just spot checking, I noticed these missing:
COPTIC CAPITAL * | COPTIC SMALL * DESERET CAPITAL * | DESERET SMALL * GLAGOLITIC CAPITAL * | GLAGOLITIC SMALL *
Hereare some handy descriptions of these:
http://www.fileformat.info/info/unicode/block/coptic/list.htm
http://www.fileformat.info/info/unicode/block/deseret/list.htm
http://www.fileformat.info/info/unicode/block/glagolitic/list.htm
It has been pointed out that case folding is a different thing from toupper/tolower mappings, and I know that. However, they are very similar (case folding is a superset of toupper/tolower) and the above data file is a very handy place to see what all the toupper/tolower mappings are for Unicode.
None of the above are likely to be noticed, but nonetheless, it would be nice if someone could (a) do a more careful comparison of the case folding data with our toupper/tolower results, i.e. with a test program, and then (b) add locale data files to fill in the gaps listed above, plus any others found via comparison.
Updated by Gordon Ross almost 12 years ago
- Status changed from Resolved to New
I copied 992 to create this. I guess that made it "resolved", which was unintentional.
Updated by Garrett D'Amore almost 12 years ago
Looking at this, I suspect that the problem is that we simply lack CLDR input files for these regions. So they aren't included.
It really is very unfortunate that these mappings were not part of the localedef input files from CLDR.
Updated by Garrett D'Amore almost 12 years ago
- Status changed from New to Closed
So, having looked over these, the only ones missing are for alphabets that are not currently in use. I'm not for example interested in figuring out how to support Egyptian or the experimental Deseret alphabet. I'm going to punt on this -- because if these alphabets were important, then there would be CLDR data for them and we could import them correctly. As there isn't, they don't matter.
We won't fix. So closing.