Project

General

Profile

Actions

Bug #5089

open

iconv don't support conversion from/to WCHAR_T

Added by Alexander Pyhalov about 8 years ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Category:
lib - userland libraries
Start date:
2014-08-14
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage
Gerrit CR:
External Bug:

Description

illumos (and OI) doesn't provide conversion tables to/from WCHAR_T.

Test example (extracted from glib/gtkmm test case). Works under FreeBSD with libiconv-1.14. Fails under current (2014-08-14) OI /hipster .

    #include <stdio.h>
    #include <iconv.h>
    #include <errno.h>

    int main()
    {
      iconv_t cd;
      if(try_conversion("WCHAR_T","UTF-8",&cd)) {
                    puts("not supported");
      };
      return 0;
    }

    int try_conversion (const char *to_codeset,
                    const char *from_codeset,
                    iconv_t    *cd)
    {
      *cd = iconv_open (to_codeset, from_codeset);

      if (*cd == (iconv_t)-1 && errno == EINVAL)
        return 1;
      else
        return 0;
    }
Actions #1

Updated by Robert Mustacchi 2 months ago

  • Category set to lib - userland libraries
  • Assignee set to Robert Mustacchi

Implementing this is a little tricky. The problem is that the assumption that a wchar_t is simply a UTF-32 value (e.g. just a strict unicode code point) is not true here. However, when a consumer is using this, it means they are either inputting wchar_t values or outputting wchar_t values. So for us to implement this there are a few steps that we need to do:

1. Map the current locale's LC_CTYPE to a known module type and make sure we can open that.
2. Take care of transforming an input wchar_t to the normal byte stream expected (wcrtomb).
3. Perform the normal iconv transformation at this point.
4. Go through and if needed, transform the output byte stream into its wchar_t representation (mbrtowc).

It is possible that for some locales we can skip some of these steps, but it's probably not worth special casing those. One note is that because this needs a locale, we will snapshot the locale in question as part of opening the iconv module. This is obviously not a well defined thing, but this seems the most prudent behavior. Effectively, whatever the current locale of the calling thread is at the time of iconv_open(), is what'll be used for the lifetime of the iconv logic. Any other behavior seems fraught with peril and makes it a bit harder for folks to handle this.

Actions

Also available in: Atom PDF