Bug #3150
poll times out immediately in 32bit kernels
100%
Description
Just tried update from a5 to a6... Here is an extract from messages:
Sep 5 06:19:03 shuttleX genunix: [ID 544858 kern.notice] ^MOpenIndiana Build oi_151a6 32-bit (illumos 13793:10c3656ccf76) Sep 5 06:19:03 shuttleX genunix: [ID 107366 kern.notice] SunOS Release 5.11 - Copyright 1983-2010 Oracle and/or its affiliates. Sep 5 06:19:03 shuttleX genunix: [ID 864463 kern.notice] All rights reserved. Use is subject to license terms. Sep 5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: lgpg Sep 5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: tsc Sep 5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: msr Sep 5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: mtrr Sep 5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: pge Sep 5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: de Sep 5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: cmov Sep 5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: mmx Sep 5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: mca Sep 5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: pae Sep 5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: cv8 Sep 5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: pat Sep 5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: sep Sep 5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: sse Sep 5 06:19:03 shuttleX unix: [ID 223955 kern.info] x86_feature: cpuid Sep 5 06:19:03 shuttleX unix: [ID 168242 kern.info] mem = 1965632K (0x77f90000) Sep 5 06:19:03 shuttleX acpica: [ID 567683 kern.notice] ACPI: RSDP f6e60 00014 (v0 Nvidia) Sep 5 06:19:03 shuttleX acpica: [ID 409769 kern.notice] ACPI: RSDT 77ff3000 0002C (v1 Nvidia AWRDACPI 42302E31 AWRD 00000000) Sep 5 06:19:03 shuttleX acpica: [ID 309273 kern.notice] ACPI: FACP 77ff3040 00074 (v1 Nvidia AWRDACPI 42302E31 AWRD 00000000) Sep 5 06:19:03 shuttleX acpica: [ID 478335 kern.notice] ACPI: DSDT 77ff30c0 04139 (v1 NVIDIA AWRDACPI 00001000 MSFT 0100000D) Sep 5 06:19:03 shuttleX acpica: [ID 516688 kern.notice] ACPI: FACS 77ff0000 00040 Sep 5 06:19:03 shuttleX acpica: [ID 908020 kern.notice] ACPI: APIC 77ff7200 0005A (v1 Nvidia AWRDACPI 42302E31 AWRD 00000000) ... Sep 5 06:20:00 shuttleX svc.startd[10]: [ID 652011 daemon.warning] svc:/system/hal:default: Method "/lib/svc/method/svc-hal start" failed with exit status 95. Sep 5 06:20:00 shuttleX svc.startd[10]: [ID 748625 daemon.error] system/hal:default failed fatally: transitioned to maintenance (see 'svcs -xv' for details) ... Sep 5 06:20:36 shuttleX fmd: [ID 377184 daemon.error] SUNW-MSG-ID: SMF-8000-YX, TYPE: defect, VER: 1, SEVERITY: major Sep 5 06:20:36 shuttleX EVENT-TIME: Wed Sep 5 06:20:33 CEST 2012 Sep 5 06:20:36 shuttleX PLATFORM: i86pc, CSN: -, HOSTNAME: shuttleX Sep 5 06:20:36 shuttleX SOURCE: software-diagnosis, REV: 0.1 Sep 5 06:20:36 shuttleX EVENT-ID: 77dc2002-a576-ec22-9fe8-8d0a8932ef37 Sep 5 06:20:36 shuttleX DESC: A service failed - a start, stop or refresh method failed. Sep 5 06:20:36 shuttleX Refer to http://illumos.org/msg/SMF-8000-YX for more information. Sep 5 06:20:36 shuttleX AUTO-RESPONSE: The service has been placed into the maintenance state. Sep 5 06:20:36 shuttleX IMPACT: svc:/system/hal:default is unavailable. Sep 5 06:20:36 shuttleX REC-ACTION: Run 'svcs -xv svc:/system/hal:default' to determine the generic reason why the service failed, the location of any logfiles, and a list of other services impacted.
is it possibly related to :
8 weeks ago2950 sync hald-runner with upstream changeset Milan Jurik <milan.jurik@xylab.cz> [Sun, 08 Jul 2012 13:53:30 -0500] rev 13801 2950 sync hald-runner with upstream Reviewed by: Andrew Stormont <Andrew.Stormont@nexenta.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Richard Lowe <richlowe@richlowe.net>
Files
Related issues
Updated by Radek R over 8 years ago
I have the same issue on my Thinkpad T42. Please help resolve this bug. I tried to update drivers /update_drv,devfsadm/ but without any results.
Updated by Richard PALO over 8 years ago
The workaround that worked for me was to grab the following two files from oi_151a5 (I used beadm mount oi_151a.1.5 which is the name in use for prestable5).
/usr/lib/hal/hald-runner /usr/lib/hal/hald-addon/cpufreq{these files names I determined using diff}
For me, I did this from console mode so a simple 'pfexec svcadm clear hal' and I was back in action.
Updated by Richard PALO over 8 years ago
After brief research, I believe the developers should check this out, since apparently hald-addon-cpufreq has not been changed in years (according to opengrok) but the binary is different.
Perhaps there is a corner case problem here in code generation, or something of the like.
Updated by Richard PALO over 8 years ago
- File ecpu-a5.txt ecpu-a5.txt added
- File ecpu-a6.txt ecpu-a6.txt added
elfedit output for, e.g., meld comparison.
Updated by Milan Jurik over 8 years ago
- Status changed from New to Feedback
I would start with simple step, suggested by SMF:
Sep 5 06:20:36 shuttleX REC-ACTION: Run 'svcs -xv svc:/system/hal:default' to determine the generic reason why the service failed, the location of any logfiles, and a list of other services impacted.
Additionally it would be great to know name of processor from prtdiag
Why do you think hald-addon-cpufreq is involved? Yes, binary is slightly different, maybe slightly different optimization was used, or build environment.
Updated by Brendan Oakley over 8 years ago
I am also affected by this issue. Output of 'svcs -xv svc:/system/hal:default':
svc:/system/hal:default (Hardware Abstraction Layer daemon)
State: maintenance since September 12, 2012 12:09:59 PM PDT
Reason: Start method exited with $SMF_EXIT_ERR_FATAL.
See: http://illumos.org/msg/SMF-8000-KS
See: man -M /usr/man -s 1M hald
See: /var/svc/log/system-hal:default.log
Impact: 2 dependent services are not running:
svc:/system/filesystem/rmvolmgr:default
svc:/application/graphical-login/gdm:default
Relevant portion of /var/svc/log/system-hal:default.log:
[ Sep 12 12:07:46 Enabled. ]
[ Sep 12 12:09:20 Enabled. ]
[ Sep 12 12:09:56 Executing start method ("/lib/svc/method/svc-hal start"). ]
hal failed to start: error 2
[ Sep 12 12:09:59 Method "start" exited with status 95. ]
In my case the name of the processor from prtdiag is:
Intel(R) Pentium(R) 4 processor J2E1
Updated by Dmitry Kozhinov over 8 years ago
I am also affected. ASUS A7N8X-X motherboard, AMD Athlon XP processor. Please fix the issue ASAP.
Updated by Milan Jurik over 8 years ago
Every time you should provide requested data if you are asking for help. Currently I am in contact with Brendan Oakley only.
Updated by Milan Jurik over 8 years ago
Additional info is to run as root:
/usr/lib/hal/hald --daemon=no
and then
/usr/lib/hal/hald --daemon=yes
And add info about output here.
Updated by Richard PALO over 8 years ago
Didn't work for me neither after reboot. Went back to a5 for that reason.
After following the suggestion, no difference really. That is, no output in particular.
I did relook at the svcs -xv output:
svc:/system/hal:default (Hardware Abstraction Layer daemon) État : maintenance depuis 14 septembre 2012 13:12:19 CEST Motif : méthode de démarrage s'est arrêté avec $SMF_EXIT_ERR_FATAL. Voir : http://illumos.org/msg/SMF-8000-KS Voir : man -M /usr/man -s 1M hald Voir : /var/svc/log/system-hal:default.log Effet : 2 services dépendants ne fonctionnent pas : svc:/system/filesystem/rmvolmgr:default svc:/application/graphical-login/gdm:default
seems that a6 is toxique for a number of 32bit systems.
Milan Jurik wrote:
Additional info is to run as root:
/usr/lib/hal/hald --daemon=no
and then
/usr/lib/hal/hald --daemon=yes
And add info about output here.
Updated by Milan Jurik over 8 years ago
- Status changed from Feedback to In Progress
Yes, 32-bit env seems to be toxic, I can reproduce it on my system with forced 32-bit environment. I am investigating it.
Updated by Milan Jurik over 8 years ago
select() is broken in 32-bit kernel mode.
Updated by Rich Lowe over 8 years ago
- Project changed from OpenIndiana Distribution to illumos gate
Updated by Milan Jurik over 8 years ago
In the test case (test.c) select is called with timeout 25 seconds. But on 32-bit kernel on the latest OI it returns asap with 0. That is not correct behavior.
hald in parent_wait_for_child() is waiting for its child for 250 seconds. But on 32-bit kernel it does not wait and kills own child thinking timeout passed.
Updated by Rich Lowe over 8 years ago
- Subject changed from update to prestable6: svc:/system/hal:default: Method "/lib/svc/method/svc-hal start" failed with exit status 95 to poll times out immediately in 32bit kernels
- Priority changed from Normal to Urgent
Updated by Rich Lowe over 8 years ago
The comparison to nsec_per_tick is getting botched, and we're always sleeping for 1 tick. Presumably we're being treated as signed by MAX or something like that.
Updated by Rich Lowe over 8 years ago
- Status changed from In Progress to Resolved
- % Done changed from 80 to 100
Resolved in r13846 commit:e00baf913d93
Updated by Richard PALO over 8 years ago
Confirmed, successfully upgraded to a7 and, given the need for nvidia legacy, downgraded to
NVIDIA-Solaris-x86-96.43.21 as well.
Thanks
Updated by Dmitry Kozhinov over 8 years ago
Rich Lowe wrote:
Resolved in r13846 commit:e00baf913d93
Thank you! I am a happy user again.