Project

General

Profile

Bug #3150

poll times out immediately in 32bit kernels

Added by Richard PALO almost 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
kernel
Start date:
2012-09-05
Due date:
% Done:

100%

Estimated time:
Difficulty:
Medium
Tags:

Description

Just tried update from a5 to a6... Here is an extract from messages:

Sep  5 06:19:03 shuttleX genunix: [ID 544858 kern.notice] ^MOpenIndiana Build oi_151a6 32-bit (illumos 13793:10c3656ccf76)
Sep  5 06:19:03 shuttleX genunix: [ID 107366 kern.notice] SunOS Release 5.11 - Copyright 1983-2010 Oracle and/or its affiliates.
Sep  5 06:19:03 shuttleX genunix: [ID 864463 kern.notice] All rights reserved. Use is subject to license terms.
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: lgpg
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: tsc
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: msr
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: mtrr
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: pge
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: de
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: cmov
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: mmx
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: mca
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: pae
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: cv8
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: pat
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: sep
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: sse
Sep  5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: cpuid
Sep  5 06:19:03 shuttleX unix: [ID 168242 kern.info] mem = 1965632K (0x77f90000)
Sep  5 06:19:03 shuttleX acpica: [ID 567683 kern.notice] ACPI: RSDP f6e60 00014 (v0 Nvidia)
Sep  5 06:19:03 shuttleX acpica: [ID 409769 kern.notice] ACPI: RSDT 77ff3000 0002C (v1 Nvidia AWRDACPI 42302E31 AWRD 00000000)
Sep  5 06:19:03 shuttleX acpica: [ID 309273 kern.notice] ACPI: FACP 77ff3040 00074 (v1 Nvidia AWRDACPI 42302E31 AWRD 00000000)
Sep  5 06:19:03 shuttleX acpica: [ID 478335 kern.notice] ACPI: DSDT 77ff30c0 04139 (v1 NVIDIA AWRDACPI 00001000 MSFT 0100000D)
Sep  5 06:19:03 shuttleX acpica: [ID 516688 kern.notice] ACPI: FACS 77ff0000 00040
Sep  5 06:19:03 shuttleX acpica: [ID 908020 kern.notice] ACPI: APIC 77ff7200 0005A (v1 Nvidia AWRDACPI 42302E31 AWRD 00000000)
...
Sep  5 06:20:00 shuttleX svc.startd[10]: [ID 652011 daemon.warning] svc:/system/hal:default: Method "/lib/svc/method/svc-hal start" failed with exit status 95.
Sep  5 06:20:00 shuttleX svc.startd[10]: [ID 748625 daemon.error] system/hal:default failed fatally: transitioned to maintenance (see 'svcs -xv' for details)
...
Sep  5 06:20:36 shuttleX fmd: [ID 377184 daemon.error] SUNW-MSG-ID: SMF-8000-YX, TYPE: defect, VER: 1, SEVERITY: major
Sep  5 06:20:36 shuttleX EVENT-TIME: Wed Sep  5 06:20:33 CEST 2012
Sep  5 06:20:36 shuttleX PLATFORM: i86pc, CSN: -, HOSTNAME: shuttleX
Sep  5 06:20:36 shuttleX SOURCE: software-diagnosis, REV: 0.1
Sep  5 06:20:36 shuttleX EVENT-ID: 77dc2002-a576-ec22-9fe8-8d0a8932ef37
Sep  5 06:20:36 shuttleX DESC: A service failed - a start, stop or refresh method failed.
Sep  5 06:20:36 shuttleX   Refer to http://illumos.org/msg/SMF-8000-YX for more information.
Sep  5 06:20:36 shuttleX AUTO-RESPONSE: The service has been placed into the maintenance state.
Sep  5 06:20:36 shuttleX IMPACT: svc:/system/hal:default is unavailable.
Sep  5 06:20:36 shuttleX REC-ACTION: Run 'svcs -xv svc:/system/hal:default' to determine the generic reason why the service failed, the location of any logfiles, and a list of other services impacted.

is it possibly related to :

8 weeks ago2950 sync hald-runner with upstream
changeset
Milan Jurik <milan.jurik@xylab.cz> [Sun, 08 Jul 2012 13:53:30 -0500] rev 13801
2950 sync hald-runner with upstream
Reviewed by: Andrew Stormont <Andrew.Stormont@nexenta.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Approved by: Richard Lowe <richlowe@richlowe.net>


Files

ecpu-a5.txt (96.6 KB) ecpu-a5.txt elfedit /usr/lib/hal/hald-addon-cpufreq (a5) Richard PALO, 2012-09-06 05:48 AM
ecpu-a6.txt (96.6 KB) ecpu-a6.txt elfedit /usr/lib/hal/hald-addon-cpufreq.sav (a6) Richard PALO, 2012-09-06 05:48 AM
test.c (608 Bytes) test.c test case, broken in 32-bit kernel mode Milan Jurik, 2012-09-29 06:30 PM

Related issues

Related to illumos gate - Bug #3115: poll(2) returns prematurely in presence of spurious wakeupsResolved2012-08-22

Actions

History

#1

Updated by Radek R almost 7 years ago

I have the same issue on my Thinkpad T42. Please help resolve this bug. I tried to update drivers /update_drv,devfsadm/ but without any results.

#2

Updated by Richard PALO almost 7 years ago

The workaround that worked for me was to grab the following two files from oi_151a5 (I used beadm mount oi_151a.1.5 which is the name in use for prestable5).

/usr/lib/hal/hald-runner 
/usr/lib/hal/hald-addon/cpufreq
{these files names I determined using diff}

For me, I did this from console mode so a simple 'pfexec svcadm clear hal' and I was back in action.

#3

Updated by Richard PALO almost 7 years ago

After brief research, I believe the developers should check this out, since apparently hald-addon-cpufreq has not been changed in years (according to opengrok) but the binary is different.
Perhaps there is a corner case problem here in code generation, or something of the like.

#4

Updated by Richard PALO almost 7 years ago

elfedit output for, e.g., meld comparison.

#5

Updated by Milan Jurik almost 7 years ago

  • Status changed from New to Feedback

I would start with simple step, suggested by SMF:

Sep 5 06:20:36 shuttleX REC-ACTION: Run 'svcs -xv svc:/system/hal:default' to determine the generic reason why the service failed, the location of any logfiles, and a list of other services impacted.

Additionally it would be great to know name of processor from prtdiag

Why do you think hald-addon-cpufreq is involved? Yes, binary is slightly different, maybe slightly different optimization was used, or build environment.

#6

Updated by Brendan Oakley almost 7 years ago

I am also affected by this issue. Output of 'svcs -xv svc:/system/hal:default':

svc:/system/hal:default (Hardware Abstraction Layer daemon)
State: maintenance since September 12, 2012 12:09:59 PM PDT
Reason: Start method exited with $SMF_EXIT_ERR_FATAL.
See: http://illumos.org/msg/SMF-8000-KS
See: man -M /usr/man -s 1M hald
See: /var/svc/log/system-hal:default.log
Impact: 2 dependent services are not running:
svc:/system/filesystem/rmvolmgr:default
svc:/application/graphical-login/gdm:default

Relevant portion of /var/svc/log/system-hal:default.log:

[ Sep 12 12:07:46 Enabled. ]
[ Sep 12 12:09:20 Enabled. ]
[ Sep 12 12:09:56 Executing start method ("/lib/svc/method/svc-hal start"). ]
hal failed to start: error 2
[ Sep 12 12:09:59 Method "start" exited with status 95. ]

In my case the name of the processor from prtdiag is:
Intel(R) Pentium(R) 4 processor J2E1

#7

Updated by Dmitry Kozhinov almost 7 years ago

I am also affected. ASUS A7N8X-X motherboard, AMD Athlon XP processor. Please fix the issue ASAP.

#8

Updated by Milan Jurik almost 7 years ago

Every time you should provide requested data if you are asking for help. Currently I am in contact with Brendan Oakley only.

#9

Updated by Milan Jurik almost 7 years ago

Additional info is to run as root:

/usr/lib/hal/hald --daemon=no

and then

/usr/lib/hal/hald --daemon=yes

And add info about output here.

#10

Updated by Richard PALO almost 7 years ago

Didn't work for me neither after reboot. Went back to a5 for that reason.

After following the suggestion, no difference really. That is, no output in particular.
I did relook at the svcs -xv output:

svc:/system/hal:default (Hardware Abstraction Layer daemon)
 État : maintenance depuis 14 septembre 2012 13:12:19 CEST
Motif : méthode de démarrage s'est arrêté avec $SMF_EXIT_ERR_FATAL.
   Voir : http://illumos.org/msg/SMF-8000-KS
   Voir : man -M /usr/man -s 1M hald
   Voir : /var/svc/log/system-hal:default.log
Effet : 2 services dépendants ne fonctionnent pas :
        svc:/system/filesystem/rmvolmgr:default
        svc:/application/graphical-login/gdm:default

seems that a6 is toxique for a number of 32bit systems.

Milan Jurik wrote:

Additional info is to run as root:

/usr/lib/hal/hald --daemon=no

and then

/usr/lib/hal/hald --daemon=yes

And add info about output here.

#11

Updated by Milan Jurik almost 7 years ago

  • Status changed from Feedback to In Progress

Yes, 32-bit env seems to be toxic, I can reproduce it on my system with forced 32-bit environment. I am investigating it.

#12

Updated by Milan Jurik almost 7 years ago

select() is broken in 32-bit kernel mode.

#13

Updated by Rich Lowe almost 7 years ago

  • Project changed from OpenIndiana Distribution to illumos gate
#14

Updated by Milan Jurik almost 7 years ago

In the test case (test.c) select is called with timeout 25 seconds. But on 32-bit kernel on the latest OI it returns asap with 0. That is not correct behavior.

hald in parent_wait_for_child() is waiting for its child for 250 seconds. But on 32-bit kernel it does not wait and kills own child thinking timeout passed.

#15

Updated by Rich Lowe almost 7 years ago

  • Subject changed from update to prestable6: svc:/system/hal:default: Method "/lib/svc/method/svc-hal start" failed with exit status 95 to poll times out immediately in 32bit kernels
  • Priority changed from Normal to Urgent
#16

Updated by Rich Lowe almost 7 years ago

  • Category set to kernel
  • Assignee set to Rich Lowe
#17

Updated by Rich Lowe almost 7 years ago

The comparison to nsec_per_tick is getting botched, and we're always sleeping for 1 tick. Presumably we're being treated as signed by MAX or something like that.

#18

Updated by Rich Lowe almost 7 years ago

  • % Done changed from 20 to 80
  • Tags deleted (needs-triage)
#19

Updated by Rich Lowe almost 7 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 80 to 100

Resolved in r13846 commit:e00baf913d93

#20

Updated by Richard PALO almost 7 years ago

Confirmed, successfully upgraded to a7 and, given the need for nvidia legacy, downgraded to
NVIDIA-Solaris-x86-96.43.21 as well.
Thanks

#21

Updated by Dmitry Kozhinov almost 7 years ago

Rich Lowe wrote:

Resolved in r13846 commit:e00baf913d93

Thank you! I am a happy user again.

Also available in: Atom PDF