Project

General

Profile

Actions

Bug #4412

closed

character handling functions should return 0 for argument range 128-255 in UTF-8 locales

Added by Alexander Pyhalov over 8 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
lib - userland libraries
Start date:
2013-12-22
Due date:
% Done:

100%

Estimated time:
Difficulty:
Bite-size
Tags:
Gerrit CR:

Description

Currently, islower()/isupper()/isalpha()/(and so on) function man page specify that

If the argument to any of the character handling  macros  is
     not  in the domain of the function, the result is undefined.

And as (char)128-255 is not a correct UTF-8 symbol these functions as I think stick to undefined behavior. This is incorrect and breaks a lot of software, they should return 0 for symbols 128-255.
According to POSIX (http://pubs.opengroup.org/onlinepubs/7999959899/functions/islower.html): "The c argument is an int, the value of which the application shall ensure is a character representable as an unsigned char or equal to the value of the macro EOF. If the argument has any other value, the behavior is undefined". Nothing is stated about "a correct character in current locale", just "a character representable as an unsigned char or equal to the value of the macro EOF".

Test case:

$ cat  test_islower.c
#include <stdio.h>
#include <ctype.h>
#include <locale.h>

int main()
{
  int i;

  setlocale(LC_ALL, "en_US.UTF-8");

  for(i=0;i<256;i++){
  if(islower(i)){
    printf("lower, %d %c\\n",i,i);
  }
  }
  return 0;
}
$ gcc test_islower.c   -o test_islower
$./test_islower

Result shows that islower considers 170,181,186,223-246,248-255 to be lower case characters.

Actions

Also available in: Atom PDF