Project

General

Profile

Actions

Bug #628

closed

minor perf enhancement for UTF-8

Added by Garrett D'Amore over 11 years ago. Updated over 11 years ago.

Status:
Resolved
Priority:
Normal
Category:
lib - userland libraries
Start date:
2011-01-13
Due date:
% Done:

100%

Estimated time:
Difficulty:
Tags:
Gerrit CR:

Description

During code review, I happened to notice that we have some duplicate tests in the UTF-8 code, which can be somewhat costly during conversions between multibyte and wide characters. As this is a very hot code path, we should take any reasonable actions to make this more performant.

The diffs look like this:

diff -r a4992c1f6d25 usr/src/lib/libc/port/locale/utf8.c
--- a/usr/src/lib/libc/port/locale/utf8.c    Thu Jan 13 09:05:18 2011 -0800
+++ b/usr/src/lib/libc/port/locale/utf8.c    Thu Jan 13 09:10:36 2011 -0800
@@ -1,5 +1,5 @@
 /*
- * Copyright 2010 Nexenta Systems, Inc.  All rights reserved.
+ * Copyright 2011 Nexenta Systems, Inc.  All rights reserved.
  * Copyright (c) 2002-2004 Tim J. Robbins
  * All rights reserved.
  *
@@ -110,13 +110,6 @@
         /* Incomplete multibyte sequence */
         return ((size_t)-2);

-    if (us->want == 0 && ((ch = (unsigned char)*s) & ~0x7f) == 0) {
-        /* Fast path for plain ASCII characters. */
-        if (pwc != NULL)
-            *pwc = ch;
-        return (ch != '\0' ? 1 : 0);
-    }
-
     if (us->want == 0) {
         /*
          * Determine the number of octets that make up this character
@@ -132,10 +125,12 @@
          */
         ch = (unsigned char)*s;
         if ((ch & 0x80) == 0) {
-            mask = 0x7f;
-            want = 1;
-            lbound = 0;
-        } else if ((ch & 0xe0) == 0xc0) {
+            /* Fast path for plain ASCII characters. */
+            if (pwc != NULL)
+                *pwc = ch;
+            return (ch != '\0' ? 1 : 0);
+        }
+        if ((ch & 0xe0) == 0xc0) {
             mask = 0x1f;
             want = 2;
             lbound = 0x80;
@@ -312,12 +307,6 @@
         /* Reset to initial shift state (no-op) */
         return (1);

-    if ((wc & ~0x7f) == 0) {
-        /* Fast path for plain ASCII characters. */
-        *s = (char)wc;
-        return (1);
-    }
-
     /*
      * Determine the number of octets needed to represent this character.
      * We always output the shortest sequence possible. Also specify the
@@ -325,8 +314,9 @@
      * about the sequence length.
      */
     if ((wc & ~0x7f) == 0) {
-        lead = 0;
-        len = 1;
+        /* Fast path for plain ASCII characters. */
+        *s = (char)wc;
+        return (1);
     } else if ((wc & ~0x7ff) == 0) {
         lead = 0xc0;
         len = 2;
Actions #1

Updated by Garrett D'Amore over 11 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Fixed in:

garrett@thinkpad{6}> hg outgoing
comparing with ssh:/illumos-gate
searching for changes
changeset: 13265:ff6d445369ca
tag: tip
user: Garrett D'Amore <>
date: Thu Jan 13 21:05:28 2011 -0800
description:
615 remove support legacy 7-bit ASCII
628 minor perf enhancement for UTF-8
Reviewed by:
Approved by:

Actions

Also available in: Atom PDF