Bug #516
closedlibc regex misses \< and \> word delimiters
100%
Description
When testing the new "sed" program, I discovered something rather surprising.
Expressions containing \< and \> don't work. Legacy sed gets it via a different regex library in libgen. FreeBSD doesn't get it at all.
In fact, POSIX doesn't say anything about these or the concept of "words" at all.
FreeBSD uses an interesting and non-portable solution of [[:<:]] and [[:>:]], which we seem to have "inherited".
Yet legacy Solaris needs \< and \>. Apparently we believe this extension is not in conflict with POSIX. Indeed, GNU sed uses this syntax as well.
We need to support \< and \>.
Note that fixing this a pre-requisite for FreeBSD sed.
Updated by Garrett D'Amore over 12 years ago
1. # HG changeset patch 2. # User Garrett D'Amore <garrett@nexenta.com> 3. # Date 1292303192 28800 4. # Node ID a4778a844d7c71ef098613b26bfe009fb5aea9e2 5. # Parent bfcab07df843172eb6cf39fc9413535338b70574 6. 516 libc regex misses \< and \> word delimiters 7. 8. diff -r bfcab07df843 -r a4778a844d7c usr/src/lib/libc/port/locale/regcomp.c 9. --- a/usr/src/lib/libc/port/locale/regcomp.c Mon Dec 13 15:33:46 2010 -0800 10. +++ b/usr/src/lib/libc/port/locale/regcomp.c Mon Dec 13 21:06:32 2010 -0800 11. @@ -392,7 +392,17 @@ 12. case '\\': 13. (void) REQUIRE(MORE(), REG_EESCAPE); 14. wc = WGETNEXT(); 15. - ordinary(p, wc); 16. + switch (wc) { 17. + case '<': 18. + EMIT(OBOW, 0); 19. + break; 20. + case '>': 21. + EMIT(OEOW, 0); 22. + break; 23. + default: 24. + ordinary(p, wc); 25. + break; 26. + } 27. break; 28. case '{': /* okay as ordinary except if digit follows */ 29. (void) REQUIRE(!MORE() || !isdigit((uch)PEEK()), REG_BADRPT); 30. @@ -545,6 +555,12 @@ 31. case '[': 32. p_bracket(p); 33. break; 34. + case BACKSL|'<': 35. + EMIT(OBOW, 0); 36. + break; 37. + case BACKSL|'>': 38. + EMIT(OEOW, 0); 39. + break; 40. case BACKSL|'{': 41. SETERROR(REG_BADRPT); 42. break;
Updated by Garrett D'Amore over 12 years ago
- Status changed from New to Resolved
- % Done changed from 0 to 100
This was integrated last night with
changeset: 13257:d3807abc6720
user: Garrett D'Amore <garrett@nexenta.com>
date: Wed Dec 15 17:12:35 2010 -0800
description:
508 od doesn't set text domain
509 need an open source sed(1) replacement
516 libc regex misses \< and \> word delimiters
Reviewed by: gwr@nexenta.com
Reviewed by: roland.mainz@nexenta.com
Approved by: gwr@nexenta.com