Project

General

Profile

Bug #5224

snprintf rounding under [default] FE_TONEAREST

Added by Richard PALO about 5 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
lib - userland libraries
Start date:
2014-10-29
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Difficulty:
Hard
Tags:
needs-triage

Description

In the attached test program, the double value 1.0/3.0 is output with the default (FE_TONEAREST)
as well as the other modes.
It appears that there are anomalies, at least involving 'f' precision 17 and 18
(for the first test series FE_TONEAREST), which seemingly should output
"f precision 17 => +0.3333333333333333*1*" and
"f precision 18 => +0.33333333333333331*5*" respectfully.

(For the 'e' precision, it appears to affect precision 16 and 17.)

FLT_ROUNDS = 1, fegetround()=0
FLT_EVAL_METHOD = 2
DBL_MANT_DIG = 53
DECIMAL_DIG = 21
DBL_DIG = 15
fesetround(FE_TONEAREST)
e precision 15 => +3.333333333333333e-01
f precision 15 => +0.333333333333333
e precision 16 => +3.3333333333333332e-01
f precision 16 => +0.3333333333333333
e precision 17 => +3.33333333333333312e-01
f precision 17 => +0.33333333333333332
e precision 18 => +3.333333333333333148e-01
f precision 18 => +0.333333333333333312
e precision 19 => +3.3333333333333331483e-01
f precision 19 => +0.3333333333333333148
e precision 20 => +3.33333333333333314830e-01
f precision 20 => +0.33333333333333331483
e precision 21 => +3.333333333333333148296e-01
f precision 21 => +0.333333333333333314830
e precision 22 => +3.3333333333333331482962e-01
f precision 22 => +0.3333333333333333148296
fesetround(FE_DOWNWARD)
e precision 15 => +3.333333333333333e-01
f precision 15 => +0.333333333333333
e precision 16 => +3.3333333333333331e-01
f precision 16 => +0.3333333333333333
e precision 17 => +3.33333333333333314e-01
f precision 17 => +0.33333333333333331
e precision 18 => +3.333333333333333148e-01
f precision 18 => +0.333333333333333314
e precision 19 => +3.3333333333333331482e-01
f precision 19 => +0.3333333333333333148
e precision 20 => +3.33333333333333314829e-01
f precision 20 => +0.33333333333333331482
e precision 21 => +3.333333333333333148296e-01
f precision 21 => +0.333333333333333314829
e precision 22 => +3.3333333333333331482961e-01
f precision 22 => +0.3333333333333333148296
fesetround(FE_UPWARD)
e precision 15 => +3.333333333333334e-01
f precision 15 => +0.333333333333334
e precision 16 => +3.3333333333333332e-01
f precision 16 => +0.3333333333333334
e precision 17 => +3.33333333333333315e-01
f precision 17 => +0.33333333333333332
e precision 18 => +3.333333333333333149e-01
f precision 18 => +0.333333333333333315
e precision 19 => +3.3333333333333331483e-01
f precision 19 => +0.3333333333333333149
e precision 20 => +3.33333333333333314830e-01
f precision 20 => +0.33333333333333331483
e precision 21 => +3.333333333333333148297e-01
f precision 21 => +0.333333333333333314830
e precision 22 => +3.3333333333333331482962e-01
f precision 22 => +0.3333333333333333148297
fesetround(FE_TOWARDZERO)
e precision 15 => +3.333333333333333e-01
f precision 15 => +0.333333333333333
e precision 16 => +3.3333333333333331e-01
f precision 16 => +0.3333333333333333
e precision 17 => +3.33333333333333314e-01
f precision 17 => +0.33333333333333331
e precision 18 => +3.333333333333333148e-01
f precision 18 => +0.333333333333333314
e precision 19 => +3.3333333333333331482e-01
f precision 19 => +0.3333333333333333148
e precision 20 => +3.33333333333333314829e-01
f precision 20 => +0.33333333333333331482
e precision 21 => +3.333333333333333148296e-01
f precision 21 => +0.333333333333333314829
e precision 22 => +3.3333333333333331482961e-01
f precision 22 => +0.3333333333333333148296

According to ISO C 7.19.6.2 n°13

For e, E, f, F, g, and G conversions, if the number of significant decimal digits is at most
DECIMAL_DIG, then the result should be correctly rounded. 249) If the number of
significant decimal digits is more than DECIMAL_DIG but the source value is exactly
representable with DECIMAL_DIG digits, then the result should be an exact
representation with trailing zeros. Otherwise, the source value is bounded by two
adjacent decimal strings L < U, both having DECIMAL_DIG significant digits; the value
of the resultant decimal string D should satisfy L ≤ D ≤ U, with the extra stipulation that
the error should have a correct sign for the current rounding direction.

249) For binary-to-decimal conversion, the result format’s values are the numbers representable
with the given format specifier. The number of significant digits is determined by the format
specifier, and in the case of fixed-point conversion by the source value as well.

Interestingly, according to a source, this may be fixed in Or*cle Solaris 11.x...
It would be nice to check.


Files

cflt2.c (1.32 KB) cflt2.c Richard PALO, 2014-10-10 04:13 PM
cflt3.truss.oi (7.84 KB) cflt3.truss.oi snprintf trace extract working on oi_151a9 Richard PALO, 2014-10-19 09:07 AM
cflt3.truss (4.35 KB) cflt3.truss snprintf trace failing on omnios Richard PALO, 2014-10-19 09:07 AM
tmul.c (1.06 KB) tmul.c Richard PALO, 2014-10-19 01:15 PM
cflt4.c (1.12 KB) cflt4.c Richard PALO, 2014-10-22 06:33 AM

Subtasks

Feature #5276: Add Richard PALO's fpround test.ClosedDan McDonald

Actions

History

#1

Updated by Richard PALO about 5 years ago

BTW, this seems to work correctly on oi_151a9

The problem was perceived on Omnios (r151012).

#2

Updated by Richard PALO about 5 years ago

I believe I've isolated the anomaly to '__mul_set' found in the file (lib/libc/i386/fp/_base_il.c)
that is, using gcc -m32...

I'm able to simply reproduce the issue now with the following test program
(where my_mul_set is like above, and my_lmul_set tests using long double):

#include <stdio.h>
#include <float.h>

double
my_mul_set(double x, double y) {
    double z;
    z = x * y;
    return z;
}

long double
my_lmul_set(double x, double y) {
    long double z;
    z = x * y;
    return z;
}
void main()
{
    double onethird = 1.0/3.0;
    const double p17 = 1.0e17;

    double result;
    long double lresult;

    result = my_mul_set(onethird, p17);
    printf("initial = %.*f => result: %f\\n", DECIMAL_DIG, onethird, result);

    lresult = my_lmul_set(onethird, p17);
    printf("initial = %.*f => result: %Lf\\n", DECIMAL_DIG, onethird, lresult);
}

giving:

$ ./a.out
initial = 0.333333333333333314830 => result: 33333333333333332.000000
initial = 0.333333333333333314830 => result: 33333333333333331.482422

As you can imagine, this is insufficient for -m64 where the two results are identical.

#3

Updated by Richard PALO about 5 years ago

What could be the difference in this codepath that allows it to run correctly on oi_151a9 and not on omnios-r151010 or omnios-r151012... Is it possible that it has something to do with the fp setup code?

It would sure be useful to come across a 32-bit libc.so with debug... (or for that matter an omnios system that does work correctly, just in case it is related to hardware)..

#4

Updated by Richard PALO about 5 years ago

It does seem to be related to the fp registers (putsw/getsw, as indicated the differing code paths between the oi_151a9 and omnios traces attached. (indicated by the 'Result may not be exact' branch)

double
__mul_set(double x, double y, int *pe) {
    extern void _putsw(), _getsw();
    int sw;
    double z;

    _putsw(0);
    z = x * y;
    _getsw(&sw);
    if ((sw & 0x3f) == 0) {
        *pe = 0;
    } else {
        /* Result may not be exact. */
        *pe = 1;
    }
    return (z);
}

I'll need to see if I can reproduce this in my test program now.

#5

Updated by Richard PALO about 5 years ago

In the attached updated test program, I call both my local version of __mul_set (called my_mul_set) as well as the libc version...(SOURCEDEBUG build objects _base_il.o,cerror.o,cerror64.o and fpcw.o are copied from lib/libc/i386/pics)

The result is:

ATTENTION: My result may be inexact!
my_mul_set initial = 0.333333333333333314830 => result: 33333333333333332.000000
---------------------------------------
__mul_set initial = 0.333333333333333314830 => result: 33333333333333332.000000

The correct result is to display the ATTENTION: statement.

I can verify that libc's __fpstart is invoked, and the following values are set:

(gdb) p/x _sse_hw
$1 = 0x1
(gdb) p/x _fp_hw
$2 = 0x3
(gdb) p/x __flt_rounds
$3 = 0x1

So it's more than likely that there's more involved to libc debugging than this feable attempt.

The experiment does show that the _getsw should indeed return the indication of "loss of precision".

#6

Updated by Richard PALO about 5 years ago

I just happened to notice a particular difference between cflt3.truss (omnios) and cflt3.truss.oi
in that _ndoprint() invokes localeconv() with:

18075/1@1:              -> libc:uselocale()
18075/1@1:                -> libc:tsdalloc()
18075/1@1:                  -> libc:thr_keycreate_once()
18075/1@1:                    -> libc:lmutex_lock()
18075/1@1:                    <- libc:lmutex_lock() = 0xfef6b380
18075/1@1:                    -> libc:thr_keycreate()
18075/1@1:                      -> libc:lmutex_lock()
18075/1@1:                      <- libc:lmutex_lock() = 0xfef6b380
18075/1@1:                      -> libc:lmutex_unlock()
18075/1@1:                      <- libc:lmutex_unlock() = 1
18075/1@1:                    <- libc:thr_keycreate() = 0
18075/1@1:                    -> libc:membar_producer()
18075/1@1:                    <- libc:membar_producer() = 0
18075/1@1:                    -> libc:lmutex_unlock()
18075/1@1:                    <- libc:lmutex_unlock() = 0
18075/1@1:                    -> libc:membar_consumer()
18075/1@1:                    <- libc:membar_consumer() = 0
18075/1@1:                  <- libc:thr_keycreate_once() = 0
18075/1@1:                  -> libc:pthread_getspecific()
18075/1@1:                  <- libc:pthread_getspecific() = 0
18075/1@1:                  -> libc:lmalloc()
18075/1@1:                    -> libc:getbucketnum()
18075/1@1:                    <- libc:getbucketnum() = 3
18075/1@1:                    -> libc:lmutex_lock()
18075/1@1:                    <- libc:lmutex_lock() = 0xfef6b380
18075/1@1:                    -> libc:lmutex_unlock()
18075/1@1:                    <- libc:lmutex_unlock() = 0
18075/1@1:                  <- libc:lmalloc() = 0xfee01a00
18075/1@1:                  -> libc:thr_setspecific()
18075/1@1:                  <- libc:thr_setspecific() = 0
18075/1@1:                  -> libc:lmalloc()
18075/1@1:                    -> libc:getbucketnum()
18075/1@1:                    <- libc:getbucketnum() = 0
18075/1@1:                    -> libc:lmutex_lock()
18075/1@1:                    <- libc:lmutex_lock() = 0xfef6b380
18075/1@1:                    -> libc:lmutex_unlock()
18075/1@1:                    <- libc:lmutex_unlock() = 0
18075/1@1:                  <- libc:lmalloc() = 0xfee00180
18075/1@1:                <- libc:tsdalloc() = 0xfee00180
18075/1@1:              <- libc:uselocale() = 0xfef6a240

in oi it's just:

26546/1@1:              -> libc:__get_current_monetary_locale()
26546/1@1:              <- libc:__get_current_monetary_locale() = 0xfef686e8
26546/1@1:              -> libc:__get_current_numeric_locale()
26546/1@1:              <- libc:__get_current_numeric_locale() = 0xfef686d8

is it possible that these threading bits are modifying the floating point environment expected in future calls?

As can already be seen, it does seem to affect the '-a' option of truss.

#7

Updated by Richard PALO about 5 years ago

I find suspect where on the oi machine the following:

libc:pthread_setcancelstate(0x0, 0x0, 0x64, 0xfedf0e46)

omnios has the following:

libc:pthread_setcancelstate(0x0, 0x0, 0xfef800b8, 0xfef800b8)

I guess I always disliked fishing.

#8

Updated by Richard PALO about 5 years ago

I made a new test program, which runs correctly on both OpenBSD and NetBSD, and on oi_151a9.
NB: I put in a limit to DBL_DECIMAL_DIG digits, when defined (which is 17 instead of 21).

#9

Updated by Dan McDonald about 5 years ago

  • Status changed from New to In Progress
  • Assignee set to Dan McDonald
  • Priority changed from Normal to High
  • % Done changed from 0 to 70
  • Difficulty changed from Medium to Hard

What happened was that two libc-internal functions: __mul_set() and __div_set() were victims of hyper-aggressive compiler optimizations.

Both of these perform a multiple/divide, but also check the x86/amd64 floating-point exception status afterwards, and set a passed-in boolean accordingly. The code is structured as follows:

          clear_register();  /* i386/amd64-specific call... */
          z = x * y;
          query_register(); /* i386/amd64-specific call... */

What the compiler ends up issuing, however, can (and in this case IS) actually:

          clear_register();  /* i386/amd64-specific call... */
          query_register(); /* i386/amd64-specific call... */
          z = x * y;   /* Legal, but broken, because of query_register()... */

We need a way to preserve the ordering shown in the first block. Suggestions include: making 'z' volatile, which can prevent the compiler from shuffling it; and writing part or all of the functions in assembly, which eliminates compiler meddling altogether.

#10

Updated by Electric Monk about 5 years ago

  • Status changed from In Progress to Closed

git commit 5a172a1e0f9068ec8db35fa123baecf7bd540eb5

commit  5a172a1e0f9068ec8db35fa123baecf7bd540eb5
Author: Dan McDonald <danmcd@omniti.com>
Date:   2014-10-30T17:35:53.000Z

    5224 snprintf rounding under [default] FE_TONEAREST
    Reviewed by: Keith Wesolowski <keith.wesolowski@joyent.com>
    Approved by: Robert Mustacchi <rm@joyent.com>

Also available in: Atom PDF