Feature #13040
grep can leverage bmg in more cases
100%
Description
In the internal implementation, there are basically two engines that are used by grep:
1. regcomp(3C) and regexec(3C)
2. Boyer-Moore-Gosper algorithm (bmg)
Right now the use of the bmg algorithm is limited to fgrep with certain constraints, such as a single pattern, not using the -i
or -v
flag, etc. However, there are a large number of grep and egrep patterns that don't have any special characters that imply things in a regular expression. As such, if we encounter a pattern that doesn't use any of the special characters, then this dramatically reduces the cost of grep.
For example, if you take a look at some of the cases of #12825, there are a few pessimal cases. While the current bmg algorithm doesn't handle -v, if we use the two examples provided by Jonathan and Periklis, we can see some dramatic improvements. In this case the ./grep
invocation is the one that has enabled this.
$ ptime grep -c aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa testfile 200000 real 1.921856648 user 1.909033754 sys 0.011928132 $ ptime ./grep -c aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa testfile 200000 real 0.046147721 user 0.034433071 sys 0.011271350
This doesn't help all of the cases that #12825 brings up, but it at least improves things a bit.
Related issues