scsi/mpt: Disconnected command timeout for Target X
i have 3 sun servers running OSOL svn_111b. all of them have a LSI 1068E controller with each one j4200 disk enclosure attached. two of the servers i upgraded to openindiana 151a, on the third server i installed a fresh copy of OI 151a.
the two upgraded servers run fine, whereas the freshly installed server would hang as soon as there's some load on the sas controller. sometimes after just zfs send 1gb of data, sometimes only after 200gb of data i'd see "Disconnected command timeout for Target X" in the message log. the symptoms are the same as reported on http://wesunsolve.net/bugid/id/6900767 . after a while the zpool will be marked as degraded and taken offline - the errors keep being logged tho. when this happens, the only way to recover is a hard reset of the server.
there seem quite a few other people having the same problem with older OSOL/Solaris11 Express setups. but what strikes me as pretty odd is that the two upgraded servers work just fine. there were quite some difference on which drivers were loaded, but even after changing drivers on the newly installed server i would encounter the error.
while none of the other recommended workarounds (like disabling MSI) worked, after installing Solaris 11 (per comment http://webcache.googleusercontent.com/search?q=cache:dA3SsRssN70J:www.nexenta.org/issues/214) i haven't seen the error even after copying >2tb of data. so it seem confirmed that oracle fixed the problem in the lastest solaris version.
if someone could guide me through what data to gather (and how) to find differences between the defunct fresh installation, the functional upgraded installation and solaris11, i'd be willing to reinstall the server once more. i'd really love getting rid of the solaris11 installation ;-)