Project

General

Profile

Bug #166

CR6901979 error in xdr_float.c not fixed

Added by Udo Grabowski about 10 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
lib - userland libraries
Start date:
2010-09-07
Due date:
% Done:

100%

Estimated time:
0.10 h
Difficulty:
Tags:
Gerrit CR:

Description

/usr/src/lib/libnsl/rpc/xdr_float.c lines 122/123 and 197/198 refer to old preprocessor symbols
amd64 etc., which do no longer exist. An oversight in CR 6493074 caused CR 6901979,
which is still not fixed in Opensolaris because the responsible engineer Gerald Thornbrugh
(Oracle) has no time. This has been fixed in Solaris 10 and is available as IDR35 for the
Opensolaris 2009.06 SU7 support repository.
This bug breaks important commercial software used in science, e.g., IDL (from Ittvis) .

Both lines should read (double underscores):

#if defined(_mc68000) || defined(_sparc) || defined(_u3b2) || \
defined(
_u3b15) || defined(_i386) || defined(_amd64)


Files

xdrtest.c (645 Bytes) xdrtest.c Jason King, 2011-01-19 03:58 PM
test.c (4.65 KB) test.c Jason King, 2011-01-26 04:41 PM
test.c (5.25 KB) test.c Jason King, 2011-01-26 06:56 PM
test.c (1.68 KB) test.c Jason King, 2011-01-30 01:46 PM
#1

Updated by Garrett D'Amore about 10 years ago

  • Project changed from site to illumos gate
#2

Updated by Udo Grabowski about 10 years ago

There are more lines to fix here:
Lines 71,112,122,123,197,198,201,289,329,229,349,424,434,517.
Question: are __mc68000, __u3b2, __u3b15 and __vax are still supported
targets ? If not, these definitions can be just eliminated.

#3

Updated by Udo Grabowski about 10 years ago

Found Fasttrack PSARC/2010/211 <http://www.mail-archive.com/opensolaris-arc@opensolaris.org/msg00968.html>
which has been approved, so __mc68000, __vax, __u3b2 and __u3b15 are
no longer supported.

#4

Updated by Gordon Ross about 10 years ago

There might be better symbols available to determine which is the native float format, etc.
Have a look at sys/isa_defs.h and _IEEE_754, for example.

Many cases in xdr_float.c should be able to say basically:

#ifdef _IEEE_754
    /*
     * OtW format is IEEE so just copy.  Includes CPUs:
     * amd64, i386, sparc, sparcv9, m68k, u3b2, etc.
     */
    return (XDR_PUTINT32(XDRS, (int *)fp));  /* or whatever... */
#else
    /* vax and whatever - do we care anymore? */
#endif

#5

Updated by Jason King almost 10 years ago

This seems to be working in current versions of illumos, can someone confirm?
Did perhaps the SS12 update redefine one of the symbols?

I've attached a test case based on the os.o bug description..

#6

Updated by Jason King almost 10 years ago

Regardless, the function could be cleaned up.

http://cr.illumos.org/view/krrvaz21/ has a proposed cleanup.

#7

Updated by Jason King almost 10 years ago

  • Status changed from New to In Progress
  • Assignee set to Jason King

Updated webrev http://cr.illumos.org/view/ltia0hr8/ removed the vax bits

#8

Updated by Jason King almost 10 years ago

Further investigation and testing shows that
  1. The IEEE 754 fp representation is the same on all platforms (regardless of the byteorder of ints)
  2. The XDR OTW format is IEEE 754 for 32-bit (float) and 64-bit (double) floating point values
  3. The XDR_{GET,PUT}INT32 routines do conversion (if needed) to BE format.

Thus the suggested workaround from the bugs.os.o ticket is incorrect. It seems likely this code has been broken on x86/amd64 platforms for as long as Solaris has supported them (verified as far back as Solaris 9).

An updated webrev of the correct fix is at http://cr.illumos.org/view/hool5vqu/

I've also attached a more thorough test case. This passes (with the fix) on x86, amd64, sparcv8, and sparcv9, but fails on x86 & amd64.

#9

Updated by Jason King almost 10 years ago

Slightly updated test case to include Inf, -Inf, Nan, and -NaN

#10

Updated by Garrett D'Amore almost 10 years ago

  • Category set to lib - userland libraries
  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Fixed in:

changeset: 13275:229035fe864a
tag: tip
user: Jason King <>
date: Wed Jan 26 19:18:35 2011 -0800
description:
166 CR6901979 error in xdr_float.c not fixed
Reviewed by:
Reviewed by:
Approved by:

#11

Updated by Udo Grabowski almost 10 years ago

It seems that this issue may not be fixed. I tested the test.c program
on 134b (unfixed), and it fails as expected, and then tested on
Osol 2009-SU7 (amd64) with the IDR35 fix delivered by Sun that gives
the correct representation (that is, IDL works correctly). But the
test.c program still completely fails on the fixed platform. This
means that the test program must be wrong, and so the fix is probably
also wrong.

To my understanding the XDR format is always big_endian, and the
xdr_float.c should convert this to little_endian on import from a
XDR file (or stream/socket), or to big_endian when exporting to a
XDR file on little_endian platforms only.

Since IDL treats XDR the same on all platforms, a working IDL is a hint for
a correct implementation, which means that IDR35 delivered a correct fix,
with which the test program should work on OSOL 2009.06 + IDR35.
So the fix should follow what the original code did with the correct compiler
definitions (__amd64 instead of amd64 etc.)

#12

Updated by Jason King almost 10 years ago

Please provide the output of the test program. It should generate identical XDR byte streams on x86, amd64, sparc, and sparcv9 (I verified this before putback).

I strongly suspect the IDR is wrong. Integer values are encoded big-endian, thus need to be byteswapped when read in on x86 platforms. However floating point values are stored using the IEEE 754 representation (this is straight from RFC 1831). IEEE 754 which uses the same byte order on all platforms. There is no conversion required. The suggested fix in the Sun ticket is plain wrong (and probably has been wrong for as long as Solaris has supported x86 -- it's been verified as far back as Solaris 9, but couldn't find any older machines). The suggested fix will byteswap floating point values (which is incorrect).

#13

Updated by Jason King almost 10 years ago

You might also want to contact the vendor of the software --- I have a feeling it's been compensating for this error on x86 for a while (it would be good to confirm that).
The loss of the #ifdef made the code fall through to the platform-independent code which should convert close to the actual value, so was less ideal but mostly correct.

It is essential that the actual raw hex values be examined of the resulting XDR stream (as well as the original values) to understand the transformation.

#14

Updated by Udo Grabowski almost 10 years ago

Of course, it swaps:
Double test
Test value: 0x60217895 0x775db1e2 = 9.574971e+266
Encode test: FAILED
Expected: 0xfeedface 0x60217895 0x775db1e2 0xfeedface
Actual: 0xfeedface 0xe2b15d77 0x95782160 0xfeedface
Decode test: FAILED
Expected: 0x60217895 0x775db1e2 = 9.574971e+266
Actual: 0xe2b15d77 0x95782160 = -3.006403e-205

Interestingly, the test.c works on sparc 5.10 (which is the fixed standard Solaris XDR code !)

I don't know where you got the information that IEEE754 representation MUST be BE on
all platforms, I think this assumption is simply wrong (and would be totally inefficient for
x86/x64), Wikipedia says this:

From en.wikipedia.org/wiki/Endianness
Floating-point and endianness

On some machines, while integers are represented in little-endian form, floating point numbers are represented in big-endian form.[3] Because there are many floating point formats, and because there is an absence of a "network" standard representation for them, no endianness standard for transferring floating point values between heterogeneous systems has become universally accepted. This means that floating point data written on one machine may not be readable on another, and this is the case even if both use IEEE 754 floating point arithmetic since the endianness of the memory representation is not part of the IEEE specification.[4] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^

#15

Updated by Jason King almost 10 years ago

  • Status changed from Resolved to In Progress

After more research (and sleep), the original fix is wrong (Dr. Grabowski is correct).

A correct fix (which is closer to the original code, but with all the cruft removed) is available for review at

http://cr.illumos.org/view/ka87apne/

#16

Updated by Jason King almost 10 years ago

This is a much better test case.

The bytestream was generated on a S10 sparc box, and validated that both S10 sparc and x86 generate the same bytestream, and can read it back.

#17

Updated by Gordon Ross over 9 years ago

  • Status changed from In Progress to Resolved

changeset: 13327:b9e92167922a

Also available in: Atom PDF