Project

General

Profile

Bug #132

tr breaks scripts

Added by Russ Price almost 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
2010-09-02
Due date:
% Done:

100%

Estimated time:
Difficulty:
Tags:
Gerrit CR:

Description

After building and booting Illumos, I tried starting up NetBeans, only to be greeted by:


russ@castle:~$ /usr/netbeans/bin/netbeans
expr: syntax error
/usr/netbeans/bin/netbeans[138]: [: argument expected
/usr/netbeans/bin/netbeans[140]: [: argument expected
Invalid maximum heap size: -Xmxm
Could not create the Java virtual machine.
russ@castle:~$

The culprit is the call to tr in the NetBeans startup script. If I use the Solaris or GNU versions of tr, the line:

prtconf | grep Memory | tr -dc '[0-9]'

gives me:

4064

with no spaces or newlines. If I use Illumos tr, I get the output:

: 4064

with a space at the beginning and end of the line and a newline afterwards.

Workaround: create a modified startup script that forces the use of GNU tr.


Files

trtests.sh (3.79 KB) trtests.sh Chris Love, 2010-09-03 08:53 PM

Related issues

Has duplicate illumos gate - Bug #183: tr -dc in the netbeans stat script failsClosed2010-09-10

Actions

History

#1

Updated by Garrett D'Amore almost 10 years ago

  • Priority changed from Normal to High

This incorrect behavior is confirmed; legacy Solaris tr does not behave this way, nor does the version in xpg4. This must be fixed.

#2

Updated by Chris Love almost 10 years ago

It looks like the FreeBSD version of tr isn't correctly interpreting the [0-9] as a range, so the leading '[' and trailing ']' get added to the character set in addition to the digits 0-9 prior to inverting it. Instead it's falling into some code that would handle sequences of characters: [x*n]. The relevant functions are setup(), next() and bracket().

There's a related case which fails with this version of tr but works with legacy Solaris tr (at least for LANG=C): tr -dc '[0123456789]'

#3

Updated by Chris Love almost 10 years ago

Additional testing with the attached script suggests a lot of other issues lurking.

#4

Updated by John Grafton almost 10 years ago

The current version of tr uses a splay tree algorithm to keep track of character modifications. Unfortunately, in porting the FreeBSD version to Illumos, the splay tree algorithm began to behave erratically. I opted to port the significantly simpler NetBSD version of tr and with minor edits, it passes all of the above tests posted by Chris.

I submitted a webrev with the tr modifications.

#5

Updated by Rich Lowe almost 10 years ago

I looked into debugging this this morning, as a user complained about netbeans.

It looks like the precedence of the operations in cset_in_hard was screwed up while gagging the compiler.

(diff from original FreeBSD code):

@@ -153,11 +150,11 @@
     struct csclass *csc;

     for (csc = cs->cs_classes; csc != NULL; csc = csc->csc_next)
-        if (csc->csc_invert ^ iswctype(ch, csc->csc_type) != 0)
+        if ((csc->csc_invert ^ iswctype(ch, csc->csc_type)) != 0)
             return (cs->cs_invert ^ true);
     if (cs->cs_root != NULL) {
         cs->cs_root = cset_splay(cs->cs_root, ch);
-        return (cs->cs_invert ^ cset_rangecmp(cs->cs_root, ch) == 0);
+        return ((cs->cs_invert ^ cset_rangecmp(cs->cs_root, ch)) == 0);
     }
     return (cs->cs_invert ^ false);
 }

If memory serves, == and != bind more tightly than ^, so the compiler silencing parens here change behaviour for the worse. Placing the parens along precedence lines appears to cause this to behave in a way I believe to be correct:

diff -r e019ccb2cb3b usr/src/cmd/tr/cset.c
--- a/usr/src/cmd/tr/cset.c    Fri Sep 03 09:10:51 2010 -0400
+++ b/usr/src/cmd/tr/cset.c    Tue Sep 07 08:27:15 2010 -0400
@@ -150,11 +150,11 @@
     struct csclass *csc;

     for (csc = cs->cs_classes; csc != NULL; csc = csc->csc_next)
-        if ((csc->csc_invert ^ iswctype(ch, csc->csc_type)) != 0)
+        if (csc->csc_invert ^ (iswctype(ch, csc->csc_type) != 0))
             return (cs->cs_invert ^ true);
     if (cs->cs_root != NULL) {
         cs->cs_root = cset_splay(cs->cs_root, ch);
-        return ((cs->cs_invert ^ cset_rangecmp(cs->cs_root, ch)) == 0);
+        return (cs->cs_invert ^ (cset_rangecmp(cs->cs_root, ch) == 0));
     }
     return (cs->cs_invert ^ false);
 }
% prtconf | grep Memory| ./tr -dc '[0123456789]'
5375% 
#6

Updated by Garrett D'Amore almost 10 years ago

So I screwed it up. (This is why I hate using a coding style where you have to recall specific operator precedence -- the ()'s should always have been explicit in the first place, IMO!)

Thanks for figuring this out Rich.

Lets get this integrated asap.

#7

Updated by Rich Lowe almost 10 years ago

case 'g' in trtests echo "[[[[[[[]]]]]]" | tr -d '[=]=]' is a separate bug, in bracket().

When it calls strchr to find the end of the class it finds the content, feels it's looking at [=], and that it isn't an equivalence class.

I think that's it being off by one when trying to skip over stuff:

    case '=':                /* "[=equiv=]" */
        if ((p = strchr(s->str + 2, ']')) == NULL)
            return (0);
        if (*(p - 1) != '=' || p - s->str < 4)
            goto repeat;
        s->str += 2;
        genequiv(s);
        return (1);

s->str will be '[=]=]', so I think we'd want +3 on line 2 here so that we skip the content character, too. This issue seems to be in the original FreeBSD code, also, does it work there?

#8

Updated by Rich Lowe almost 10 years ago

  • Status changed from New to In Progress
  • Assignee set to Rich Lowe

Apparently I'm taking this...

#9

Updated by John Grafton almost 10 years ago

No, the 'g' test fails in FreeBSD also.

#10

Updated by Rich Lowe almost 10 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Should be fixed in r13186 commit:c777be6727c6

Also available in: Atom PDF