Bug #4376
opensystem crash if run as a guest in KVM with passthrough
0%
Description
If I make illumos VM in linux KVM, it runs smooth. But when I passhthrough any PCI device, say, NIC or HBA it crashes on startup.
This is reproductible on any OpenSolaris codebased distro.
It works in ESXi, but ESXi sucks.
Would make Illumous waaay more used as storage solutions (virtualised storage appliance? yay!).
Updated by Igor Kozhukhov over 9 years ago
DilOS has been tested on Debian KVM as well with virtio + virtnet drivers. also has been configured dilos-xen-dom0 with PV and HVM guests - no crashes.
Updated by Gleb Panov over 9 years ago
Pure VM works.
Try to passthrough any device into VM. I tried LSI 9207, 9211, and Intel X520 DA -- all crashes.
I repeat, problem occures only if I try to plug physical device into VM. VirtIO is not a solution for me.
Updated by Alek Pinchuk over 9 years ago
If there is a crashdump it would be useful to have that
Updated by Ivan Vecera over 9 years ago
Hello,
I have the same problem... Now running Dilos 1.3.5 as PVHVM under Debian Jessie with Xen 4.3 64bit hypervisor. First, to successful boot without passthrough apix_enable=0 is needed. When I try to passthrough network device the system crashes during boot...
Trace:
.....
scsi_vhci0 at root
scsi_vhci0 is /scsi_vhci
Reading Intel IOMMU boot options
npe0 at root: space 0 offset 0
npe0 is /pci@0,0
panic[cpu0]/thread=fffffffffbc2ee00: hat_devload: loading a mapping to free page ffffff0006cbcdd0
Warning - stack not written to the dump buffer
fffffffffbc70fa0 unix:_allsyscalls_size+1e42b ()
fffffffffbc71000 rootnex:rootnex_map_regspec+1d6 ()
fffffffffbc710d0 rootnex:rootnex_map+1c7 ()
fffffffffbc71110 genunix:ddi_map+2f ()
fffffffffbc711f0 npe:npe_bus_map+3d1 ()
fffffffffbc71230 genunix:ddi_map+2f ()
fffffffffbc712e0 genunix:ddi_regs_map_setup+e5 ()
fffffffffbc71330 genunix:pci_config_setup+4f ()
fffffffffbc713d0 npe:npe_initchild+e0 ()
fffffffffbc71460 npe:npe_ctlops+298 ()
fffffffffbc714c0 genunix:init_node+80 ()
fffffffffbc71510 genunix:i_ndi_config_node+d5 ()
fffffffffbc71540 genunix:i_ddi_attachchild+3f ()
fffffffffbc71590 genunix:i_ddi_attach_node_hierarchy+7b ()
fffffffffbc715d0 genunix:attach_driver_nodes+a2 ()
fffffffffbc715f0 genunix:i_ddi_attach_hw_nodes+1a ()
fffffffffbc71600 unix:configure+44 ()
fffffffffbc71620 unix:startup_end+79 ()
fffffffffbc71630 unix:startup+4f ()
fffffffffbc71670 genunix:main+36 ()
fffffffffbc71680 unix:_locore_start+90 ()
panic: entering debugger (no dump device, continue to reboot)
Loaded modules: [ scsi_vhci mac uppc specfs pcplusmp ]
kmdb: target stopped at:
kmdb_enter+0xb: movq %rax,%rdi
[0]>
Updated by Bayard Bell about 9 years ago
Gleb Panov wrote:
Pure VM works.
Try to passthrough any device into VM. I tried LSI 9207, 9211, and Intel X520 DA -- all crashes.
I repeat, problem occures only if I try to plug physical device into VM. VirtIO is not a solution for me.
Gleb,
Do you have kit still in place for repro? Are you able to provide access to a previous crash, following Alek's question?
Updated by Gleb Panov about 9 years ago
Sadly, no.
I might rebuild it in a week or so, but will it be of any use?
I mean, there's definately a bug, but noone seems to interested in fixing it. We're leaning towards linux.
Updated by Bayard Bell about 9 years ago
- Assignee set to Bayard Bell
- Tags deleted (
needs-triage)
Gleb Panov wrote:
Sadly, no.
I might rebuild it in a week or so, but will it be of any use?
I mean, there's definately a bug, but noone seems to interested in fixing it. We're leaning towards linux.
Gleb,
I've recently got kit that I'm planning to deploy this way (E5-2600v2 workstation running Debian jessie with an LSI 9207 for passthrough under KVM), and, as I'm an illumos developer, I'll sort through it one way or the other. Was hoping to have a look at a core dump sooner rather than later to see what I could do about this bug, but I expect I'll have finished the build within a week to be able to see this myself.
Could you provide a bit more info about the system on which you saw this previously?
Updated by Gleb Panov about 9 years ago
Um... There were two: Supermicro AMD G34 based server with 16-core Opteron, and a Intel (OEM mb) Xeon 5645. OS was debian 7, cannot recall exact kernel... 3.4, I beleive.
I've found posts about the came behaiviour, but not here.
Also I tested Proxmox custom kernel, as a stable-as-jurrasic-cr%# fallback option, same result. System crashes during initial boot, at the same time. Happens if any HW was passthrough'ed into solaris-os. Tested openindiana, nexenta and omnios (most recent builds for that date). Basicaly, did all I could to keep osol in production. Tough luck.
I tested on AMD, mostly, since it's our production base brick. Then repeated with some HBAs on Intel - all the same.
I quit that company, so I cannot reproduce it easily or exactly. Afraid you'll see faster yourself.
But I'm still waiting for the fix, it would be very useful.