Project

General

Profile

Bug #11686

Improper regexp matching when both \> and $ are present

Added by Hubert Garavel 10 months ago. Updated 10 months ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
lib - userland libraries
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

A regular expression of the form

    word\>$

does not match lines containing only "word".
However, such lines are matched by
    word\>

or
    word$

This bug can be observed for grep.

    $ echo "word" | /bin/grep 'word\>'
    word

    $ echo "word" | /bin/grep 'word$'
    word

    $ echo "word" | /bin/grep 'word\>$'
    -- empty output
    -- on Solaris 10 and Linux, the same command outputs "word" 

This bug can also be observed for sed.

    $ echo "word" | /bin/sed 's/word\>/X/'
    X

    $ echo "word" | /bin/sed 's/word$/X/'
    X

    $ echo "word" | /bin/sed 's/word\>$/X/'
    word
    -- on Solaris 10 and Linux, the same command outputs "X" 

The fact that the bug occurs for both grep and sed suggests that it
is a deeper problem in some regular expression library.

Interestingly, there is no similar bug with begin-of-line matching:

    $ echo "word" | /bin/grep '^\<word'
    word

    $ echo "word" | /bin/sed 's/^\<word/X/'
    X

Wendelin Serwe and Hubert Garavel

History

#1

Updated by Yuri Pankov 10 months ago

  • Assignee set to Yuri Pankov
  • Status changed from New to In Progress
  • Category set to lib - userland libraries

I'll look into it.

Also available in: Atom PDF