Project

General

Profile

Bug #10045

/bin/tr misuses LC_COLLATE when computing character ranges

Added by Hubert Garavel over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2018-12-06
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

On a freshly installed machine (SunOS 5.11 illumos-7d1eae8c55 i86pc i386 i86pc)
/bin/tr gives incorrect results when LC_COLLATE is different from "C"

\# locale
LANG=fr_FR.UTF-8
LC_CTYPE=fr_FR.ISO8859-1
LC_NUMERIC=C
LC_TIME=C
LC_COLLATE=fr_FR.ISO8859-1
LC_MONETARY=fr_FR.ISO8859-15
LC_MESSAGES=C
LC_ALL=
\# echo adresse | tr a-z A-Z
Aðséßßé

This result is wrong. The correct result should be ADRESSE,
as observed on Linux.

The correct result is obtained when setting LC_COLLATE to C

\# echo adresse | LC_COLLATE=C tr a-z A-Z
ADRESSE

or when setting LC_ALL to C

\# LC_ALL=C ; export LC_ALL ; locale
LANG=fr_FR.UTF-8
LC_CTYPE="C"
LC_NUMERIC=C
LC_TIME=C
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES=C
LC_ALL=C
\# echo adresse | tr a-z A-Z
ADRESSE

Note: a very similar bug was described here for FreeBSD :
[[https://forums.freebsd.org/threads/tr-and-lc_collate.6875/]]
Maybe the fix is available in the BSD version of tr.

Also available in: Atom PDF