Actions
Bug #11741
closedregexec: fix processing multibyte strings
Start date:
Due date:
% Done:
100%
Estimated time:
Difficulty:
Bite-size
Tags:
Gerrit CR:
External Bug:
Description
From FreeBSD's PR153502:
I'm seeing odd behavior from programs using regex(3) like less(1), vi(1) and sed(1) when using LANG=en_US.UTF-8 and UTF-8 inputs.
Sometimes it seems to work right:
$ echo 'é' | sed -ne '/^.$/p' é $ echo 'éé' | sed -ne '/^..$/p' éé $ echo 'aéa' | sed -ne '/a.a/p' aéa $ echo 'aéa' | sed -ne '/a.*a/p' aéa $ echo 'aaéaa' | sed -ne '/aa.aa/p' aaéaa $ echo 'aéaéa' | sed -ne '/a.a.a/p' aéaéa
But not always:
$ echo 'éa' | sed -ne '/.a/p' $ echo 'aéaa' | sed -ne '/a.aa/p' $ echo 'éaé' | sed -ne '/.a./p'
Seems like using ".*", ".+", ".{0,}" or ".{1,}" works right, but ".{0,1}", ".{1,1}" or a lone "." doesn't always.
Updated by Electric Monk over 3 years ago
- Status changed from In Progress to Closed
- % Done changed from 50 to 100
git commit 695dd8d1c21542efb8ca2e82c6eb63007a6a5212
commit 695dd8d1c21542efb8ca2e82c6eb63007a6a5212 Author: Yuri Pankov <yuri.pankov@nexenta.com> Date: 2019-10-22T15:10:25.000Z 11741 regexec: fix processing multibyte strings Reviewed by: Toomas Soome <tsoome@me.com> Approved by: Dan McDonald <danmcd@joyent.com>
Actions