Project

General

Profile

Bug #12825

grep -x is very slow

Added by Jonathan Perkin 2 months ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:

Description

I noticed that some grep processes were taking an inordinate (15+ minutes) to run during a pkgsrc bulk build. It appears to be related to the use of the -x flag when matching long lines.

As a contrived example:

$ for i in {1..100000}; do
>     echo "/this/is/a/reasonably/long/path/used/to/test/the/slowness/of/the/illumos/grep/implementation" >>testfile
> done
$ time /usr/xpg4/bin/grep -vx -e '/this/is/a/reasonably/long/path/used/to/test/the/slowness/of/the/illumos/grep.*' testfile | wc -l
0

real  0m18.282s
user  0m18.211s
sys   0m0.047s

GNU grep in comparison completes almost instantly:

$ time ggrep -vx -e '/this/is/a/reasonably/long/path/used/to/test/the/slowness/of/the/illumos/grep.*' testfile | wc -l
0

real  0m0.105s
user  0m0.088s
sys   0m0.029s

Changing the grep by one character so that it no longer matches ("grp" vs "grep") makes it fast:

$ time /usr/xpg4/bin/grep -vx -e '/this/is/a/reasonably/long/path/used/to/test/the/slowness/of/the/illumos/grp.*' testfile | wc -l
100000

real  0m0.163s
user  0m0.134s
sys   0m0.094s

For reference the pkgsrc command line which took over 15 minutes to complete was this against the output of find(1) across /home/pbulk/build/graphics/papirus-icon-theme/work/.destdir returning around 90,000 lines:

argv[0]: /usr/xpg4/bin/grep
argv[1]: -vx
argv[2]: -e
argv[3]: /home/pbulk/build/graphics/papirus-icon-theme/work/.destdir/opt/local/lib/perl5/5.30.0/x86_64-solaris-thread-multi-64/perllocal.pod
argv[4]: -e
argv[5]: /home/pbulk/build/graphics/papirus-icon-theme/work/.destdir/opt/local/lib/R/doc/html/packages.html
argv[6]: -e
argv[7]: /home/pbulk/build/graphics/papirus-icon-theme/work/.destdir/opt/local/lib/R/doc/html/search/index.txt
argv[8]: -e
argv[9]: /home/pbulk/build/graphics/papirus-icon-theme/work/.destdir/opt/local/pkg/.*
argv[10]: -e
argv[11]: /home/pbulk/build/graphics/papirus-icon-theme/work/.destdir/opt/local/emul/linux/proc.*
argv[12]: -e
argv[13]: /home/pbulk/build/graphics/papirus-icon-theme/work/.destdir/opt/local/emul/linux32/proc.*
argv[14]: -e
argv[15]: /home/pbulk/build/graphics/papirus-icon-theme/work/.destdir/opt/local/pkg.refcount.*
argv[16]: -e
argv[17]: /home/pbulk/build/graphics/papirus-icon-theme/work/.destdir/shared/packages/SmartOS/trunk/x86_64/.*
argv[18]: -e
argv[19]: /home/pbulk/build/graphics/papirus-icon-theme/work/.destdir/shared/distfiles/.*
argv[20]: -e
argv[21]: /home/pbulk/build/graphics/papirus-icon-theme/work/.destdir/var/.*
argv[22]: -e
argv[23]: /home/pbulk/build/graphics/papirus-icon-theme/work/.destdir/opt/local/share/icons/.*/icon-theme.cache
argv[24]: -e
argv[25]: /home/pbulk/build/graphics/papirus-icon-theme/work/.destdir/opt/local/lib/charset.alias
argv[26]: -e
argv[27]: /home/pbulk/build/graphics/papirus-icon-theme/work/.destdir/opt/local/share/locale/locale.alias

I haven't looked at the code yet, just logging now so it doesn't get forgotten.

History

#1

Updated by Periklis Akritidis about 2 months ago

Simpler repro:

for i in {1..100000}; do
    echo aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa >> testfile
done

time grep -c aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa testfile # Slow
time ggrep -c aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa testfile # Fast
time grep -c baaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa testfile # Fast

Also available in: Atom PDF