Project

General

Profile

Bug #6480

tr seems to sometimes have issues when invoked in certain scripts in UTF-8 locales

Added by Richard PALO almost 4 years ago. Updated almost 4 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2015-11-26
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

I have the following sequence in a script file (extract from pkgsrc verifypc)

#!/bin/sh
#
    log=$1

    [ -f ${log} ] || err "$1 not found" || exit 1

    lines=$(cat ${log} | sort -u | tr ' ' '¬')
    for l in ${lines}; do
        pcname=$(echo ${l} | cut -d '¬' -f 1)
        pcop=$(echo ${l} | cut -d '¬' -f 2)
        pcver=$(echo ${l} | cut -d '¬' -f 3)
    printf "name='%s' op='%s' ver='%s'\n" $pcname $pcop $pcver 
    done

    exit 0

with the attached pkg-config.log I get (running verbose via bash):

richard@omnis:/home/richard/src/ttr$ bash -x verifypc pkg-config.log 
+ log=pkg-config.log
+ '[' -f pkg-config.log ']'
++ cat pkg-config.log
++ sort -u
++ tr ' ' $'\254'
tr: illegal sequence
+ lines=
+ exit 0

what's funny, is that if I run the command manually it works fine!

since I have gnu coretutils installed, if I use 'gtr' in the script, it works fine as well.

Is it possible that tr has a problem in str.c around this code segment?

        default:
            clen = mbrtowc(&wch, s->str, MB_LEN_MAX, NULL);
            if (clen == (size_t)-1 || clen == (size_t)-2 ||
                clen == 0)
                errx(1, "illegal sequence");
            is_octal = 0;
            s->lastch = wch;
            s->str += clen;
            break;
        }


Files

pkg-config.log (619 Bytes) pkg-config.log test file Richard PALO, 2015-11-26 06:32 AM

History

#1

Updated by Richard PALO almost 4 years ago

BTW, if I use a different marker such as '|', it does work, so I should remind that it seems to be
related to the fact that the marker character used is '¬' (apparently 0xFE).

#2

Updated by Richard PALO almost 4 years ago

This does seem limited to UTF-8 locales:

richard@omnis:/home/richard/src/ttr$ LANG=C ./verifypc pkg-config.log name='gconf-2.0' op='(any)' ver='(null)'
name='gio-2.0' op='(any)' ver='(null)'
name='glib-2.0' op='>=' ver='2.16.0'
name='gnome-desktop-2.0' op='>=' ver='2.9.91'
name='libnautilus-extension' op='(any)' ver='(null)'
name='libnautilus-extension' op='>=' ver='2.21.2'
richard@omnis:/home/richard/src/ttr$ LANG=fr_FR.ISO8859-1 ./verifypc pkg-config.log 
name='gconf-2.0' op='(any)' ver='(null)'
name='gio-2.0' op='(any)' ver='(null)'
name='glib-2.0' op='>=' ver='2.16.0'
name='gnome-desktop-2.0' op='>=' ver='2.9.91'
name='libnautilus-extension' op='(any)' ver='(null)'
name='libnautilus-extension' op='>=' ver='2.21.2'
richard@omnis:/home/richard/src/ttr$ LANG=fr_FR.ISO8859-15 ./verifypc pkg-config.log 
name='gconf-2.0' op='(any)' ver='(null)'
name='gio-2.0' op='(any)' ver='(null)'
name='glib-2.0' op='>=' ver='2.16.0'
name='gnome-desktop-2.0' op='>=' ver='2.9.91'
name='libnautilus-extension' op='(any)' ver='(null)'
name='libnautilus-extension' op='>=' ver='2.21.2'
richard@omnis:/home/richard/src/ttr$ LANG=fr_FR.UTF-8 ./verifypc pkg-config.log  
tr: illegal sequence
richard@omnis:/home/richard/src/ttr$ LANG=en_US.UTF-8 ./verifypc pkg-config.log  
tr: illegal sequence
richard@omnis:/home/richard/src/ttr$ LANG=en_US.ISO8859-1 ./verifypc pkg-config.log 
name='gconf-2.0' op='(any)' ver='(null)'
name='gio-2.0' op='(any)' ver='(null)'
name='glib-2.0' op='>=' ver='2.16.0'
name='gnome-desktop-2.0' op='>=' ver='2.9.91'
name='libnautilus-extension' op='(any)' ver='(null)'
name='libnautilus-extension' op='>=' ver='2.21.2'

#3

Updated by Richard PALO almost 4 years ago

BTW, this is omnis bloody onu'd to recent...

Works on

richard@smicro:~/ttr$ uname -a
SunOS smicro 5.11 oi_151a9 i86pc i386 i86pc Solaris
richard@smicro:~/ttr$ LANG=fr_FR.UTF-8 ./verifypc pkg-config.log 
name='gconf-2.0' op='(any)' ver='(null)'
name='gio-2.0' op='(any)' ver='(null)'
name='glib-2.0' op='>=' ver='2.16.0'
name='gnome-desktop-2.0' op='>=' ver='2.9.91'
name='libnautilus-extension' op='(any)' ver='(null)'
name='libnautilus-extension' op='>=' ver='2.21.2'
richard@smicro:~/ttr$ LANG=en_US.UTF-8 ./verifypc pkg-config.log 
name='gconf-2.0' op='(any)' ver='(null)'
name='gio-2.0' op='(any)' ver='(null)'
name='glib-2.0' op='>=' ver='2.16.0'
name='gnome-desktop-2.0' op='>=' ver='2.9.91'
name='libnautilus-extension' op='(any)' ver='(null)'
name='libnautilus-extension' op='>=' ver='2.21.2'

#4

Updated by Richard PALO almost 4 years ago

  • Subject changed from tr seems to sometimes have issues when invoked in certain scripts to tr seems to sometimes have issues when invoked in certain scripts in UTF-8 locales
#5

Updated by Richard PALO almost 4 years ago

  • Status changed from New to Rejected

UTF-8 only supports 7-bit characters, therefore '\254' is invalid and therefore needs to be encoded as '\302\254'
Thanks for the reminder Yuri.

Also available in: Atom PDF