Project

General

Profile

Bug #516

libc regex misses \< and \> word delimiters

Added by Garrett D'Amore almost 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
High
Category:
lib - userland libraries
Start date:
2010-12-13
Due date:
% Done:

100%

Estimated time:
Difficulty:
Tags:
Gerrit CR:

Description

When testing the new "sed" program, I discovered something rather surprising.

Expressions containing \< and \> don't work. Legacy sed gets it via a different regex library in libgen. FreeBSD doesn't get it at all.

In fact, POSIX doesn't say anything about these or the concept of "words" at all.

FreeBSD uses an interesting and non-portable solution of [[:<:]] and [[:>:]], which we seem to have "inherited".

Yet legacy Solaris needs \< and \>. Apparently we believe this extension is not in conflict with POSIX. Indeed, GNU sed uses this syntax as well.

We need to support \< and \>.

Note that fixing this a pre-requisite for FreeBSD sed.

#1

Updated by Garrett D'Amore almost 10 years ago

   1.
      # HG changeset patch
   2.
      # User Garrett D'Amore <garrett@nexenta.com>
   3.
      # Date 1292303192 28800
   4.
      # Node ID a4778a844d7c71ef098613b26bfe009fb5aea9e2
   5.
      # Parent  bfcab07df843172eb6cf39fc9413535338b70574
   6.
      516 libc regex misses \< and \> word delimiters
   7.

   8.
      diff -r bfcab07df843 -r a4778a844d7c usr/src/lib/libc/port/locale/regcomp.c
   9.
      --- a/usr/src/lib/libc/port/locale/regcomp.c    Mon Dec 13 15:33:46 2010 -0800
  10.
      +++ b/usr/src/lib/libc/port/locale/regcomp.c    Mon Dec 13 21:06:32 2010 -0800
  11.
      @@ -392,7 +392,17 @@
  12.
              case '\\':
  13.
                     (void) REQUIRE(MORE(), REG_EESCAPE);
  14.
                     wc = WGETNEXT();
  15.
      -              ordinary(p, wc);
  16.
      +              switch (wc) {
  17.
      +              case '<':
  18.
      +                     EMIT(OBOW, 0);
  19.
      +                     break;
  20.
      +              case '>':
  21.
      +                     EMIT(OEOW, 0);
  22.
      +                     break;
  23.
      +              default:
  24.
      +                     ordinary(p, wc);
  25.
      +                     break;
  26.
      +              }
  27.
                     break;
  28.
              case '{':            /* okay as ordinary except if digit follows */
  29.
                     (void) REQUIRE(!MORE() || !isdigit((uch)PEEK()), REG_BADRPT);
  30.
      @@ -545,6 +555,12 @@
  31.
              case '[':
  32.
                     p_bracket(p);
  33.
                     break;
  34.
      +       case BACKSL|'<':
  35.
      +              EMIT(OBOW, 0);
  36.
      +              break;
  37.
      +       case BACKSL|'>':
  38.
      +              EMIT(OEOW, 0);
  39.
      +              break;
  40.
              case BACKSL|'{':
  41.
                     SETERROR(REG_BADRPT);
  42.
                     break;

#2

Updated by Garrett D'Amore almost 10 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

This was integrated last night with

changeset: 13257:d3807abc6720
user: Garrett D'Amore <>
date: Wed Dec 15 17:12:35 2010 -0800
description:
508 od doesn't set text domain
509 need an open source sed(1) replacement
516 libc regex misses \< and \> word delimiters
Reviewed by:
Reviewed by:
Approved by:

Also available in: Atom PDF