Project

General

Profile

Feature #13040

grep can leverage bmg in more cases

Added by Robert Mustacchi 8 months ago. Updated 8 months ago.

Status:
Closed
Priority:
Normal
Category:
cmd - userland programs
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

In the internal implementation, there are basically two engines that are used by grep:

1. regcomp(3C) and regexec(3C)
2. Boyer-Moore-Gosper algorithm (bmg)

Right now the use of the bmg algorithm is limited to fgrep with certain constraints, such as a single pattern, not using the -i or -v flag, etc. However, there are a large number of grep and egrep patterns that don't have any special characters that imply things in a regular expression. As such, if we encounter a pattern that doesn't use any of the special characters, then this dramatically reduces the cost of grep.

For example, if you take a look at some of the cases of #12825, there are a few pessimal cases. While the current bmg algorithm doesn't handle -v, if we use the two examples provided by Jonathan and Periklis, we can see some dramatic improvements. In this case the ./grep invocation is the one that has enabled this.

$ ptime grep -c aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa testfile 
200000

real        1.921856648
user        1.909033754
sys         0.011928132

$ ptime ./grep -c aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa testfile 
200000

real        0.046147721
user        0.034433071
sys         0.011271350

This doesn't help all of the cases that #12825 brings up, but it at least improves things a bit.


Related issues

Related to illumos gate - Bug #12825: grep -x is very slowNew

Actions

Also available in: Atom PDF