Project

General

Profile

Actions

Bug #1214

closed

towlower/towupper still missing some mappings

Added by Gordon Ross over 10 years ago. Updated over 10 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
locale - data and messages
Start date:
2011-05-04
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:

Description

While reviewing the fix for issue 992
https://www.illumos.org/issues/992
I compared the compiled upper/lower mappings created by
that fix to the list of characters shown in this file:
http://unicode.org/Public/UNIDATA/CaseFolding.txt
and there appear to be several missing.

Just spot checking, I noticed these missing:

COPTIC CAPITAL * | COPTIC SMALL *
DESERET CAPITAL * | DESERET SMALL *
GLAGOLITIC CAPITAL * | GLAGOLITIC SMALL *

Hereare some handy descriptions of these:
http://www.fileformat.info/info/unicode/block/coptic/list.htm
http://www.fileformat.info/info/unicode/block/deseret/list.htm
http://www.fileformat.info/info/unicode/block/glagolitic/list.htm

It has been pointed out that case folding is a different thing from toupper/tolower mappings, and I know that. However, they are very similar (case folding is a superset of toupper/tolower) and the above data file is a very handy place to see what all the toupper/tolower mappings are for Unicode.

None of the above are likely to be noticed, but nonetheless, it would be nice if someone could (a) do a more careful comparison of the case folding data with our toupper/tolower results, i.e. with a test program, and then (b) add locale data files to fill in the gaps listed above, plus any others found via comparison.

Actions #1

Updated by Gordon Ross over 10 years ago

  • Status changed from Resolved to New

I copied 992 to create this. I guess that made it "resolved", which was unintentional.

Actions #2

Updated by Garrett D'Amore over 10 years ago

Looking at this, I suspect that the problem is that we simply lack CLDR input files for these regions. So they aren't included.

It really is very unfortunate that these mappings were not part of the localedef input files from CLDR.

Actions #3

Updated by Garrett D'Amore over 10 years ago

  • Status changed from New to Closed

So, having looked over these, the only ones missing are for alphabets that are not currently in use. I'm not for example interested in figuring out how to support Egyptian or the experimental Deseret alphabet. I'm going to punt on this -- because if these alphabets were important, then there would be CLDR data for them and we could import them correctly. As there isn't, they don't matter.

We won't fix. So closing.

Actions

Also available in: Atom PDF