Bug #6480
closedtr seems to sometimes have issues when invoked in certain scripts in UTF-8 locales
0%
Description
I have the following sequence in a script file (extract from pkgsrc verifypc)
#!/bin/sh # log=$1 [ -f ${log} ] || err "$1 not found" || exit 1 lines=$(cat ${log} | sort -u | tr ' ' '¬') for l in ${lines}; do pcname=$(echo ${l} | cut -d '¬' -f 1) pcop=$(echo ${l} | cut -d '¬' -f 2) pcver=$(echo ${l} | cut -d '¬' -f 3) printf "name='%s' op='%s' ver='%s'\n" $pcname $pcop $pcver done exit 0
with the attached pkg-config.log I get (running verbose via bash):
richard@omnis:/home/richard/src/ttr$ bash -x verifypc pkg-config.log + log=pkg-config.log + '[' -f pkg-config.log ']' ++ cat pkg-config.log ++ sort -u ++ tr ' ' $'\254' tr: illegal sequence + lines= + exit 0
what's funny, is that if I run the command manually it works fine!
since I have gnu coretutils installed, if I use 'gtr' in the script, it works fine as well.
Is it possible that tr has a problem in str.c around this code segment?
default: clen = mbrtowc(&wch, s->str, MB_LEN_MAX, NULL); if (clen == (size_t)-1 || clen == (size_t)-2 || clen == 0) errx(1, "illegal sequence"); is_octal = 0; s->lastch = wch; s->str += clen; break; }
Files
Updated by Richard PALO over 6 years ago
BTW, if I use a different marker such as '|', it does work, so I should remind that it seems to be
related to the fact that the marker character used is '¬' (apparently 0xFE).
Updated by Richard PALO over 6 years ago
This does seem limited to UTF-8 locales:
richard@omnis:/home/richard/src/ttr$ LANG=C ./verifypc pkg-config.log name='gconf-2.0' op='(any)' ver='(null)' name='gio-2.0' op='(any)' ver='(null)' name='glib-2.0' op='>=' ver='2.16.0' name='gnome-desktop-2.0' op='>=' ver='2.9.91' name='libnautilus-extension' op='(any)' ver='(null)' name='libnautilus-extension' op='>=' ver='2.21.2' richard@omnis:/home/richard/src/ttr$ LANG=fr_FR.ISO8859-1 ./verifypc pkg-config.log name='gconf-2.0' op='(any)' ver='(null)' name='gio-2.0' op='(any)' ver='(null)' name='glib-2.0' op='>=' ver='2.16.0' name='gnome-desktop-2.0' op='>=' ver='2.9.91' name='libnautilus-extension' op='(any)' ver='(null)' name='libnautilus-extension' op='>=' ver='2.21.2' richard@omnis:/home/richard/src/ttr$ LANG=fr_FR.ISO8859-15 ./verifypc pkg-config.log name='gconf-2.0' op='(any)' ver='(null)' name='gio-2.0' op='(any)' ver='(null)' name='glib-2.0' op='>=' ver='2.16.0' name='gnome-desktop-2.0' op='>=' ver='2.9.91' name='libnautilus-extension' op='(any)' ver='(null)' name='libnautilus-extension' op='>=' ver='2.21.2' richard@omnis:/home/richard/src/ttr$ LANG=fr_FR.UTF-8 ./verifypc pkg-config.log tr: illegal sequence richard@omnis:/home/richard/src/ttr$ LANG=en_US.UTF-8 ./verifypc pkg-config.log tr: illegal sequence richard@omnis:/home/richard/src/ttr$ LANG=en_US.ISO8859-1 ./verifypc pkg-config.log name='gconf-2.0' op='(any)' ver='(null)' name='gio-2.0' op='(any)' ver='(null)' name='glib-2.0' op='>=' ver='2.16.0' name='gnome-desktop-2.0' op='>=' ver='2.9.91' name='libnautilus-extension' op='(any)' ver='(null)' name='libnautilus-extension' op='>=' ver='2.21.2'
Updated by Richard PALO over 6 years ago
BTW, this is omnis bloody onu'd to recent...
Works on
richard@smicro:~/ttr$ uname -a SunOS smicro 5.11 oi_151a9 i86pc i386 i86pc Solaris richard@smicro:~/ttr$ LANG=fr_FR.UTF-8 ./verifypc pkg-config.log name='gconf-2.0' op='(any)' ver='(null)' name='gio-2.0' op='(any)' ver='(null)' name='glib-2.0' op='>=' ver='2.16.0' name='gnome-desktop-2.0' op='>=' ver='2.9.91' name='libnautilus-extension' op='(any)' ver='(null)' name='libnautilus-extension' op='>=' ver='2.21.2' richard@smicro:~/ttr$ LANG=en_US.UTF-8 ./verifypc pkg-config.log name='gconf-2.0' op='(any)' ver='(null)' name='gio-2.0' op='(any)' ver='(null)' name='glib-2.0' op='>=' ver='2.16.0' name='gnome-desktop-2.0' op='>=' ver='2.9.91' name='libnautilus-extension' op='(any)' ver='(null)' name='libnautilus-extension' op='>=' ver='2.21.2'
Updated by Richard PALO over 6 years ago
- Subject changed from tr seems to sometimes have issues when invoked in certain scripts to tr seems to sometimes have issues when invoked in certain scripts in UTF-8 locales
Updated by Richard PALO over 6 years ago
- Status changed from New to Rejected
UTF-8 only supports 7-bit characters, therefore '\254' is invalid and therefore needs to be encoded as '\302\254'
Thanks for the reminder Yuri.