Project

General

Profile

Bug #4580

/usr/bin/grep can't handle multibyte characters

Added by Alexander Pyhalov over 5 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Start date:
2014-02-06
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
needs-triage

Description

Grep doesn't work correctly with non-english text.
For example, I can't grep for cyrillic "а" character.

$ cat ~/tmp/test 
абвгдеёжзийклмнопрстуфхцчшщъыьэюя
АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
$ /usr/bin/grep -i а ~/tmp/test
grep: RE error 67: Illegal byte sequence

History

#1

Updated by Yuri Pankov over 5 years ago

Sounds like yet another reason to make /usr/xpg4/bin/grep the default one:

$ /usr/xpg4/bin/grep -i абвгде greptest
абвгдеёжзийклмнопрстуфхцчшщъыьэюя
АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
$

#2

Updated by Yuri Pankov about 5 years ago

  • Subject changed from grep doesn't work correctly with non-english text to /usr/bin/grep can't handle multibyte characters
#3

Updated by Garrett D'Amore about 5 years ago

Agreed. However, note that xpg4 grep uses extended regexes rather than simple ones. This may break some people. We should discuss.

#4

Updated by Electric Monk over 1 year ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

git commit d2d52addd50254d1b7c318c6784172d8d7de20c6

commit  d2d52addd50254d1b7c318c6784172d8d7de20c6
Author: Alexander Pyhalov <apyhalov@gmail.com>
Date:   2018-01-09T19:12:32.000Z

    8858 /usr/bin/grep doesn't support -E option
    4580 /usr/bin/grep can't handle multibyte characters
    8929 8868 tests are not delivered with system/test/utiltest
    8860 Example in grep(1) is incorrect
    Reviewed by: Peter Tribble <peter.tribble@gmail.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: Yuri Pankov <yuripv@gmx.com>
    Approved by: Robert Mustacchi <rm@joyent.com>

Also available in: Atom PDF