Project

General

Profile

Bug #8908

regcomp: reduce size of bitmap for multibyte locales

Added by Alexander Pyhalov almost 3 years ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
lib - userland libraries
Start date:
2017-12-08
Due date:
% Done:

100%

Estimated time:
Difficulty:
Bite-size
Tags:
Gerrit CR:

Description

While running GNU grep test suite on our xpg4 grep I've got the following core dump:

core 'core' of 11660:   ./grep.xpg4 -i    in
 feede70f pthread_key_create_once_np (fef529f0, fee7d3fb, 0, 0, 0, 0) + f
 fee7d347 tsdalloc (e, 4, fee8e0bc, 0, 0, 0) + 4f
 fee8dd16 uselocale (0, 0, 0, 0, 0, 0) + 43
 fee8e643 mbrtowc  (76480e4, 76481f8, 3, 76480e8, fee69b37, 8343c00) + 1e
 feea14fd wgetnext (8047ac0, e, 7648168, fef52000, 8047ac0, 76481f9) + 56
 feea15d6 p_b_symbol (8047ac0, e, 7648188, fef52000, 8047ac0, 76481fb) + 9e
 feea1a95 p_b_term (8047ac0, 8342330, 76481e0, fef52000, 8047ac0, fef52000) + 1c5
 feea200f p_bracket (8047ac0, b5, 76481e0, fef52000, b5, 76482db) + 15e
 feea1dda bothcases (8047ac0, b5, 7648258, feea1cb9, 83422f0, ff) + 79
 feea1e3f ordinary (8047ac0, b5, 76482c0, fef52000, 8047ac0, fef52000) + 4d
 feea20ba p_bracket (8047ac0, b5, 76482c0, fef52000, b5, 76483bb) + 209
 feea1dda bothcases (8047ac0, b5, 7648338, feea1cb9, 83422b0, ff) + 79
 feea1e3f ordinary (8047ac0, b5, 76483a0, fef52000, 8047ac0, fef52000) + 4d
 feea20ba p_bracket (8047ac0, b5, 76483a0, fef52000, b5, 764849b) + 209
 feea1dda bothcases (8047ac0, b5, 7648418, feea1cb9, 8342270, ff) + 79
....

The test case is the following:

!/bin/sh
# Check that case folding works even with titlecase and similarly odd chars.

# Copyright 2014-2017 Free Software Foundation, Inc.

# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.

# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.

# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

LC_ALL=en_US.UTF-8
export LC_ALL

a='\302\265'     # U+00B5
b='\316\234'     # U+039C
c='\316\274'     # U+03BC

printf "$a\\n$b\\n$c\\n" >in
pattern="$a" 
pat=$(printf "$pattern\\n")
./grep.xpg4 -i "\\(\\)\\1$pat" in >out-regex
./grep.xpg4 -i "$pat" in >out-dfa

#1

Updated by Yuri Pankov almost 3 years ago

  • Subject changed from /usr/xpg4/bin/grep goes into infinite recursion to regexec(3C) goes into infinite recursion
  • Category set to lib - userland libraries

Apparently, it's regexec(3C) issue and not grep one.

#2

Updated by Alexander Pyhalov almost 3 years ago

Expected result - out-regex and out-dfa should match.

#3

Updated by Alexander Pyhalov almost 3 years ago

$ cat out-regex.ggrep
µ
Μ
μ

#4

Updated by Yuri Pankov almost 3 years ago

  • Status changed from New to In Progress
  • Assignee set to Yuri Pankov
  • % Done changed from 0 to 30
  • Difficulty changed from Medium to Bite-size
  • Tags deleted (needs-triage)
#5

Updated by Yuri Pankov almost 3 years ago

  • Subject changed from regexec(3C) goes into infinite recursion to regcomp(3C) goes into infinite recursion for wide characters in 128-255 range
#6

Updated by Yuri Pankov about 1 year ago

  • Subject changed from regcomp(3C) goes into infinite recursion for wide characters in 128-255 range to regcomp: reduce size of bitmap for multibyte locales
#7

Updated by Electric Monk about 1 year ago

  • Status changed from In Progress to Closed
  • % Done changed from 30 to 100

git commit 1603eda21695ca85bfde0e5c75a27d94ac4ce4ff

commit  1603eda21695ca85bfde0e5c75a27d94ac4ce4ff
Author: Yuri Pankov <yuri.pankov@nexenta.com>
Date:   2019-10-22T15:10:03.000Z

    8908 regcomp: reduce size of bitmap for multibyte locales
    Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
    Approved by: Dan McDonald <danmcd@joyent.com>

Also available in: Atom PDF