Project

General

Profile

Actions

Bug #3150

closed

poll times out immediately in 32bit kernels

Added by Richard PALO about 11 years ago. Updated almost 11 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
kernel
Start date:
2012-09-05
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:
Gerrit CR:
External Bug:

Description

Just tried update from a5 to a6... Here is an extract from messages:

Sep  5 06:19:03 shuttleX genunix: [ID 544858 kern.notice] ^MOpenIndiana Build oi_151a6 32-bit (illumos 13793:10c3656ccf76)
Sep  5 06:19:03 shuttleX genunix: [ID 107366 kern.notice] SunOS Release 5.11 - Copyright 1983-2010 Oracle and/or its affiliates.
Sep  5 06:19:03 shuttleX genunix: [ID 864463 kern.notice] All rights reserved. Use is subject to license terms.
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: lgpg
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: tsc
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: msr
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: mtrr
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: pge
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: de
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: cmov
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: mmx
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: mca
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: pae
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: cv8
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: pat
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: sep
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: sse
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: cpuid
Sep  5 06:19:03 shuttleX unix: [ID 168242 kern.info] mem = 1965632K (0x77f90000)
Sep  5 06:19:03 shuttleX acpica: [ID 567683 kern.notice] ACPI: RSDP f6e60 00014 (v0 Nvidia)
Sep  5 06:19:03 shuttleX acpica: [ID 409769 kern.notice] ACPI: RSDT 77ff3000 0002C (v1 Nvidia AWRDACPI 42302E31 AWRD 00000000)
Sep  5 06:19:03 shuttleX acpica: [ID 309273 kern.notice] ACPI: FACP 77ff3040 00074 (v1 Nvidia AWRDACPI 42302E31 AWRD 00000000)
Sep  5 06:19:03 shuttleX acpica: [ID 478335 kern.notice] ACPI: DSDT 77ff30c0 04139 (v1 NVIDIA AWRDACPI 00001000 MSFT 0100000D)
Sep  5 06:19:03 shuttleX acpica: [ID 516688 kern.notice] ACPI: FACS 77ff0000 00040
Sep  5 06:19:03 shuttleX acpica: [ID 908020 kern.notice] ACPI: APIC 77ff7200 0005A (v1 Nvidia AWRDACPI 42302E31 AWRD 00000000)
...
Sep  5 06:20:00 shuttleX svc.startd[10]: [ID 652011 daemon.warning] svc:/system/hal:default: Method "/lib/svc/method/svc-hal start" failed with exit status 95.
Sep  5 06:20:00 shuttleX svc.startd[10]: [ID 748625 daemon.error] system/hal:default failed fatally: transitioned to maintenance (see 'svcs -xv' for details)
...
Sep  5 06:20:36 shuttleX fmd: [ID 377184 daemon.error] SUNW-MSG-ID: SMF-8000-YX, TYPE: defect, VER: 1, SEVERITY: major
Sep  5 06:20:36 shuttleX EVENT-TIME: Wed Sep  5 06:20:33 CEST 2012
Sep  5 06:20:36 shuttleX PLATFORM: i86pc, CSN: -, HOSTNAME: shuttleX
Sep  5 06:20:36 shuttleX SOURCE: software-diagnosis, REV: 0.1
Sep  5 06:20:36 shuttleX EVENT-ID: 77dc2002-a576-ec22-9fe8-8d0a8932ef37
Sep  5 06:20:36 shuttleX DESC: A service failed - a start, stop or refresh method failed.
Sep  5 06:20:36 shuttleX   Refer to http://illumos.org/msg/SMF-8000-YX for more information.
Sep  5 06:20:36 shuttleX AUTO-RESPONSE: The service has been placed into the maintenance state.
Sep  5 06:20:36 shuttleX IMPACT: svc:/system/hal:default is unavailable.
Sep  5 06:20:36 shuttleX REC-ACTION: Run 'svcs -xv svc:/system/hal:default' to determine the generic reason why the service failed, the location of any logfiles, and a list of other services impacted.

is it possibly related to :

8 weeks ago2950 sync hald-runner with upstream
changeset
Milan Jurik <milan.jurik@xylab.cz> [Sun, 08 Jul 2012 13:53:30 -0500] rev 13801
2950 sync hald-runner with upstream
Reviewed by: Andrew Stormont <Andrew.Stormont@nexenta.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Approved by: Richard Lowe <richlowe@richlowe.net>


Files

ecpu-a5.txt (96.6 KB) ecpu-a5.txt elfedit /usr/lib/hal/hald-addon-cpufreq (a5) Richard PALO, 2012-09-06 05:48 AM
ecpu-a6.txt (96.6 KB) ecpu-a6.txt elfedit /usr/lib/hal/hald-addon-cpufreq.sav (a6) Richard PALO, 2012-09-06 05:48 AM
test.c (608 Bytes) test.c test case, broken in 32-bit kernel mode Milan Jurik, 2012-09-29 06:30 PM

Related issues

Related to illumos gate - Bug #3115: poll(2) returns prematurely in presence of spurious wakeupsResolvedChristopher Siden2012-08-22

Actions
Actions #1

Updated by Radek R about 11 years ago

I have the same issue on my Thinkpad T42. Please help resolve this bug. I tried to update drivers /update_drv,devfsadm/ but without any results.

Actions #2

Updated by Richard PALO about 11 years ago

The workaround that worked for me was to grab the following two files from oi_151a5 (I used beadm mount oi_151a.1.5 which is the name in use for prestable5).

/usr/lib/hal/hald-runner 
/usr/lib/hal/hald-addon/cpufreq
{these files names I determined using diff}

For me, I did this from console mode so a simple 'pfexec svcadm clear hal' and I was back in action.

Actions #3

Updated by Richard PALO about 11 years ago

After brief research, I believe the developers should check this out, since apparently hald-addon-cpufreq has not been changed in years (according to opengrok) but the binary is different.
Perhaps there is a corner case problem here in code generation, or something of the like.

Actions #4

Updated by Richard PALO about 11 years ago

elfedit output for, e.g., meld comparison.

Actions #5

Updated by Milan Jurik about 11 years ago

  • Status changed from New to Feedback

I would start with simple step, suggested by SMF:

Sep 5 06:20:36 shuttleX REC-ACTION: Run 'svcs -xv svc:/system/hal:default' to determine the generic reason why the service failed, the location of any logfiles, and a list of other services impacted.

Additionally it would be great to know name of processor from prtdiag

Why do you think hald-addon-cpufreq is involved? Yes, binary is slightly different, maybe slightly different optimization was used, or build environment.

Actions #6

Updated by Brendan Oakley about 11 years ago

I am also affected by this issue. Output of 'svcs -xv svc:/system/hal:default':

svc:/system/hal:default (Hardware Abstraction Layer daemon)
State: maintenance since September 12, 2012 12:09:59 PM PDT
Reason: Start method exited with $SMF_EXIT_ERR_FATAL.
See: http://illumos.org/msg/SMF-8000-KS
See: man -M /usr/man -s 1M hald
See: /var/svc/log/system-hal:default.log
Impact: 2 dependent services are not running:
svc:/system/filesystem/rmvolmgr:default
svc:/application/graphical-login/gdm:default

Relevant portion of /var/svc/log/system-hal:default.log:

[ Sep 12 12:07:46 Enabled. ]
[ Sep 12 12:09:20 Enabled. ]
[ Sep 12 12:09:56 Executing start method ("/lib/svc/method/svc-hal start"). ]
hal failed to start: error 2
[ Sep 12 12:09:59 Method "start" exited with status 95. ]

In my case the name of the processor from prtdiag is:
Intel(R) Pentium(R) 4 processor J2E1

Actions #7

Updated by Dmitry Kozhinov about 11 years ago

I am also affected. ASUS A7N8X-X motherboard, AMD Athlon XP processor. Please fix the issue ASAP.

Actions #8

Updated by Milan Jurik about 11 years ago

Every time you should provide requested data if you are asking for help. Currently I am in contact with Brendan Oakley only.

Actions #9

Updated by Milan Jurik about 11 years ago

Additional info is to run as root:

/usr/lib/hal/hald --daemon=no

and then

/usr/lib/hal/hald --daemon=yes

And add info about output here.

Actions #10

Updated by Richard PALO about 11 years ago

Didn't work for me neither after reboot. Went back to a5 for that reason.

After following the suggestion, no difference really. That is, no output in particular.
I did relook at the svcs -xv output:

svc:/system/hal:default (Hardware Abstraction Layer daemon)
 État : maintenance depuis 14 septembre 2012 13:12:19 CEST
Motif : méthode de démarrage s'est arrêté avec $SMF_EXIT_ERR_FATAL.
   Voir : http://illumos.org/msg/SMF-8000-KS
   Voir : man -M /usr/man -s 1M hald
   Voir : /var/svc/log/system-hal:default.log
Effet : 2 services dépendants ne fonctionnent pas :
        svc:/system/filesystem/rmvolmgr:default
        svc:/application/graphical-login/gdm:default

seems that a6 is toxique for a number of 32bit systems.

Milan Jurik wrote:

Additional info is to run as root:

/usr/lib/hal/hald --daemon=no

and then

/usr/lib/hal/hald --daemon=yes

And add info about output here.

Actions #11

Updated by Milan Jurik about 11 years ago

  • Status changed from Feedback to In Progress

Yes, 32-bit env seems to be toxic, I can reproduce it on my system with forced 32-bit environment. I am investigating it.

Actions #12

Updated by Milan Jurik almost 11 years ago

select() is broken in 32-bit kernel mode.

Actions #13

Updated by Rich Lowe almost 11 years ago

  • Project changed from OpenIndiana Distribution to illumos gate
Actions #14

Updated by Milan Jurik almost 11 years ago

In the test case (test.c) select is called with timeout 25 seconds. But on 32-bit kernel on the latest OI it returns asap with 0. That is not correct behavior.

hald in parent_wait_for_child() is waiting for its child for 250 seconds. But on 32-bit kernel it does not wait and kills own child thinking timeout passed.

Actions #15

Updated by Rich Lowe almost 11 years ago

  • Subject changed from update to prestable6: svc:/system/hal:default: Method "/lib/svc/method/svc-hal start" failed with exit status 95 to poll times out immediately in 32bit kernels
  • Priority changed from Normal to Urgent
Actions #16

Updated by Rich Lowe almost 11 years ago

  • Category set to kernel
  • Assignee set to Rich Lowe
Actions #17

Updated by Rich Lowe almost 11 years ago

The comparison to nsec_per_tick is getting botched, and we're always sleeping for 1 tick. Presumably we're being treated as signed by MAX or something like that.

Actions #18

Updated by Rich Lowe almost 11 years ago

  • % Done changed from 20 to 80
  • Tags deleted (needs-triage)
Actions #19

Updated by Rich Lowe almost 11 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 80 to 100

Resolved in r13846 commit:e00baf913d93

Actions #20

Updated by Richard PALO almost 11 years ago

Confirmed, successfully upgraded to a7 and, given the need for nvidia legacy, downgraded to
NVIDIA-Solaris-x86-96.43.21 as well.
Thanks

Actions #21

Updated by Dmitry Kozhinov almost 11 years ago

Rich Lowe wrote:

Resolved in r13846 commit:e00baf913d93

Thank you! I am a happy user again.

Actions

Also available in: Atom PDF