Project

General

Profile

Actions

Feature #13040

closed

grep can leverage bmg in more cases

Added by Robert Mustacchi 12 months ago. Updated 11 months ago.

Status:
Closed
Priority:
Normal
Category:
cmd - userland programs
Start date:
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

In the internal implementation, there are basically two engines that are used by grep:

1. regcomp(3C) and regexec(3C)
2. Boyer-Moore-Gosper algorithm (bmg)

Right now the use of the bmg algorithm is limited to fgrep with certain constraints, such as a single pattern, not using the -i or -v flag, etc. However, there are a large number of grep and egrep patterns that don't have any special characters that imply things in a regular expression. As such, if we encounter a pattern that doesn't use any of the special characters, then this dramatically reduces the cost of grep.

For example, if you take a look at some of the cases of #12825, there are a few pessimal cases. While the current bmg algorithm doesn't handle -v, if we use the two examples provided by Jonathan and Periklis, we can see some dramatic improvements. In this case the ./grep invocation is the one that has enabled this.

$ ptime grep -c aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa testfile 
200000

real        1.921856648
user        1.909033754
sys         0.011928132

$ ptime ./grep -c aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa testfile 
200000

real        0.046147721
user        0.034433071
sys         0.011271350

This doesn't help all of the cases that #12825 brings up, but it at least improves things a bit.


Related issues

Related to illumos gate - Bug #12825: grep -x is very slowNew

Actions
Actions #1

Updated by Robert Mustacchi 12 months ago

  • Related to Bug #12825: grep -x is very slow added
Actions #2

Updated by Electric Monk 12 months ago

  • Gerrit CR set to 845
Actions #3

Updated by Robert Mustacchi 12 months ago

I tested this with the updated grep test suite, test run below. I also spot checked the performance of a number of different simple patters as included above. Finally, I ran a wsdiff that compared this workspace being built on an onu'd platform with all of the grep changes to one without. The only differences were limited to the known set related to DOF and a few timestamps that have snuck in over the years.

rm@beowulf:~$ pfexec /opt/util-tests/bin/utiltest 
Test: /opt/util-tests/tests/allowed-ips (run as root)             [00:00] [PASS]
Test: /opt/util-tests/tests/chown_test (run as root)              [00:00] [PASS]
Test: /opt/util-tests/tests/date_test (run as root)               [00:00] [PASS]
Test: /opt/util-tests/tests/find/findtest (run as root)           [00:00] [PASS]
Test: /opt/util-tests/tests/grep_test (run as root)               [00:03] [PASS]
Test: /opt/util-tests/tests/libjedec_test (run as root)           [00:00] [PASS]
Test: /opt/util-tests/tests/libsff/libsff (run as root)           [00:00] [PASS]
Test: /opt/util-tests/tests/make_test (run as root)               [00:00] [PASS]
Test: /opt/util-tests/tests/mdb/mdbtest (run as root)             [00:00] [PASS]
Test: /opt/util-tests/tests/mergeq/mqt (run as root)              [00:00] [PASS]
Test: /opt/util-tests/tests/mergeq/wqt (run as root)              [00:00] [PASS]
Test: /opt/util-tests/tests/printf_test (run as root)             [00:00] [PASS]
Test: /opt/util-tests/tests/set-linkprop (run as root)            [00:00] [PASS]
Test: /opt/util-tests/tests/sleep/sleeptest (run as root)         [00:22] [PASS]
Test: /opt/util-tests/tests/smbios (run as root)                  [00:00] [PASS]
Test: /opt/util-tests/tests/xargs_test (run as root)              [00:00] [PASS]
Test: /opt/util-tests/tests/awk/runtests.sh (run as nobody)       [02:43] [PASS]
Test: /opt/util-tests/tests/ctf/precheck (run as root)            [00:00] [PASS]
Test: /opt/util-tests/tests/ctf/ctftest (run as root)             [00:06] [PASS]
Test: /opt/util-tests/tests/demangle/afl-fast (run as root)       [00:01] [PASS]
Test: /opt/util-tests/tests/demangle/gcc-libstdc++ (run as root)  [00:00] [PASS]
Test: /opt/util-tests/tests/demangle/llvm-stdcxxabi (run as root) [00:00] [PASS]
Test: /opt/util-tests/tests/libcustr/custr_remove (run as root)   [00:00] [PASS]
Test: /opt/util-tests/tests/libcustr/custr_trunc (run as root)    [00:00] [PASS]
Test: /opt/util-tests/tests/libnvpair_json/json_00_blank (run as root) [00:00] [PASS]
Test: /opt/util-tests/tests/libnvpair_json/json_01_boolean (run as root) [00:00] [PASS]
Test: /opt/util-tests/tests/libnvpair_json/json_02_numbers (run as root) [00:00] [PASS]
Test: /opt/util-tests/tests/libnvpair_json/json_03_empty_arrays (run as root) [00:00] [PASS]
Test: /opt/util-tests/tests/libnvpair_json/json_04_number_arrays (run as root) [00:00] [PASS]
Test: /opt/util-tests/tests/libnvpair_json/json_05_strings (run as root) [00:00] [PASS]
Test: /opt/util-tests/tests/libnvpair_json/json_06_nested (run as root) [00:00] [PASS]
Test: /opt/util-tests/tests/libnvpair_json/json_07_nested_arrays (run as root) [00:00] [PASS]

Results Summary
PASS      32

Running Time:   00:03:21
Percent passed: 100.0%
Log directory:  /var/tmp/test_results/20200819T141521
Actions #4

Updated by Electric Monk 11 months ago

  • Status changed from New to Closed
  • % Done changed from 90 to 100

git commit dbe930bf51e0d7458b24d30e9f25756c5da54ddf

commit  dbe930bf51e0d7458b24d30e9f25756c5da54ddf
Author: Robert Mustacchi <rm@fingolfin.org>
Date:   2020-08-25T23:00:10.000Z

    13040 grep can leverage bmg in more cases
    Reviewed by: Andy Fiddaman <andy@omniosce.org>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Approved by: Gordon Ross <gordon.w.ross@gmail.com>

Actions

Also available in: Atom PDF