Bug #10939

libdiskstatus trusts disk mode sense data to its death

Added by Robert Mustacchi 10 months ago. Updated 9 months ago.

lib - userland libraries
Start date:
Due date:
% Done:


Estimated time:


We encountered fmd dumping core with the following stack:

fcdfe658`memcpy+0x115(4, 3f, 0, 87a6008, f3, fcdfe730)
fcdfe6c8`scsi_mode_sense+0xd5(8781a28, 3f, 0, 87a6008, f3, fcdfe72c)
fcdfe758`load_modepages+0x80(8781a28, 0, 54, fea86af4)
fcdfe788`ds_scsi_open_common+0x15(87832e8, 8781a28, feb813c8, fdcc51da)
fcdfe7a8`ds_scsi_open_uscsi+0x49(87832e8, 1, 0, fed51f1a, fed6d000, 836aa88)
fcdfe7e8`disk_status_open+0x101(87848c0, fcdfe814, fcdfe818, fdce3aec, 836aa88, 836aa90)
fcdfe838`disk_status+0x121(80d8238, 81fa0d0, 0, 872c5a0, fcdfe93c, 8369ac8)
fcdfe888`topo_method_call+0xa3(81fa0d0, fccb21ae, 0, 872c5a0, fcdfe93c, fcdfe92c)
fcdfe8c8`topo_method_invoke+0x36(81fa0d0, fccb21ae, 0, 872c5a0, fcdfe93c, fcdfe92c)
fcdfe958`dt_analyze_disk+0x11f(80d7f40, 81fa0d0, 83f41a0, fed50d75, 81fa130, fed6d000)
fcdfe998`topo_walk_step+0x9a(872c6a8, 1, 0, 0)
fcdfe9c8`step_child+0xa6(81fa130, 872c6a8, 1, 0, 81fa190, fed6d000)
fcdfea08`topo_walk_step+0xc6(872c6a8, 1, 0, 0)
fcdfea38`step_child+0xa6(81fa190, 872c6a8, 1, 0, 8316120, fed6d000)
fcdfea78`topo_walk_step+0xc6(872c6a8, 1, 1, 0)
fcdfeaa8`step_sibling+0xa9(81fa3d0, 872c6a8, 1, 0, 81fa430, fed6d000)
fcdfeae8`topo_walk_step+0xf8(872c6a8, 1, 0, 0)
fcdfeb18`step_child+0xa6(81fa430, 872c6a8, 1, 0, 81fa490, fed6d000)
fcdfeb58`topo_walk_step+0xc6(872c6a8, 1, 0, 0)
fcdfeb88`step_child+0xa6(81fa490, 872c6a8, 1, 0, 82b35e0, fed6d000)
fcdfebc8`topo_walk_step+0xc6(872c6a8, 1, 2, 0)
fcdfebf8`step_sibling+0xa9(81fa4f0, 872c6a8, 1, 0, 82b35e0, fed6d000)
fcdfec38`topo_walk_step+0xf8(872c6a8, 1, 1, 0)
fcdfec68`step_sibling+0xa9(81fa7f0, 872c6a8, 1, 0, 826d500, fed6d000)
fcdfeca8`topo_walk_step+0xf8(872c6a8, 1, 0, 0)
fcdfecd8`step_sibling+0xa9(81fa8b0, 872c6a8, 1, 0, 826d500, fed6d000)
fcdfed18`topo_walk_step+0xf8(872c6a8, 1, 1, 0)
fcdfed48`step_sibling+0xa9(81fa970, 872c6a8, 1, 0, 8247c60, fed6d000)
fcdfed88`topo_walk_step+0xf8(872c6a8, 1, 0, 0)
fcdfedb8`step_sibling+0xa9(81fad90, 872c6a8, 1, 0, 81fadf0, fed6d000)
fcdfedf8`topo_walk_step+0xf8(872c6a8, 1, 0, 0)
fcdfee28`step_child+0xa6(81fadf0, 872c6a8, 1, 0, 8247f40, fed6d000)
fcdfee68`topo_walk_step+0xc6(872c6a8, 1, 0, fed6d000)
fcdfee98`step_sibling+0xa9(836eb58, 872c6a8, 1, 0, 80bc540, 80d7f40)
fcdfeed8`topo_walk_step+0xf8(872c6a8, 1, fcdfef28, feecb894)
fcdfef28`dt_timeout+0x91(80bc540, 1, 0, 806ffe9)
fcdfef78 fmd_module_dispatch+0xd0(80bc540)
fcdfefa8 fmd_module_start+0xe7(80bc540, 0, fed86aa0, fef45000)
fcdfefc8 fmd_thread_start+0x5b(8265100, 0, 0, 0)
fcdfeff8`_lwp_start(fed86a40, 0, 0, 0, 0, 0)

Let's see where exactly we're coming from:

...`scsi_mode_sense+0xcd:pushl  0x4(%eax)`scsi_mode_sense+0xd0:call   +0x3685  <`uscsi_mode_sense>`scsi_mode_sense+0xd5:movl   %eax,%esi

memcpy doesn't push a frame pointer, which isn't the most helpful thing.
So let's see what we do know based on our current stack pointer:

> <esp,20/nap
0xfcdfe4c4:     0xfcdfe4e4      
0xfcdfe4c8:     0xfcdfe4e0      
0xfcdfe4d0:     0x87a6008       
0xfcdfe4d4:     0xfcdfe4e4      
0xfcdfe4d8:     0xfffffffd      
0xfcdfe4e0:     0x3800          
0xfcdfe4e4:     0               
0xfcdfe4e8:     0xa01           
0xfcdfe4ec:     0               
0xfcdfe4f0:     0               
0xfcdfe4f4:     0xffff0a05      
0xfcdfe4f8:     0x23f10         
0xfcdfe4fc:     0xff3f          
0xfcdfe500:     0xa08           
0xfcdfe504:     0               
0xfcdfe508:     0               
0xfcdfe50c:     0x1000a1b       
0xfcdfe510:     0               
0xfcdfe514:     0               
0xfcdfe518:     0               
0xfcdfe51c:     0               
0xfcdfe520:     0               
0xfcdfe524:     0               
0xfcdfe528:     0               
0xfcdfe52c:     0               
0xfcdfe530:     0               
0xfcdfe534:     0               
0xfcdfe538:     0               
0xfcdfe53c:     0               
0xfcdfe540:     0              

While mdb doesn't know the argument's to our elided friend due to the
lack of a pushed frame pointer, we can reconstruct it based on the most
recent frame pointer. This will tell us the arguments to

> fcdfe658,10/nap
0xfcdfe658:     0xfcdfe6c8      
0xfcdfe660:     4               
0xfcdfe664:     0x3f            
0xfcdfe668:     0               
0xfcdfe66c:     0x87a6008       
0xfcdfe670:     0xf3            
0xfcdfe674:     0xfcdfe730      
0xfcdfe678:     0xfcdfe69c      
0xfcdfe67c:     0xfcdfe698     

This translate into, uscsi_mode_sense(4, 0x3f, 0, 0x87a6008, 0xf3,
0xfcdfe730, 0xfcdfe69c, 0xfcdfe698). In this case, based on running
pfiles on the core file, we know that disk 4 is:


This also suggests that the arguments to memcpy are the following:

> <esp,10/nap                         
0xfcdfe4c4:     0xfcdfe4e4      
0xfcdfe4c8:     0xfcdfe4e0      
0xfcdfe4d0:     0x87a6008       
0xfcdfe4d4:     0xfcdfe4e4      
0xfcdfe4d8:     0xfffffffd      
0xfcdfe4e0:     0x3800          
0xfcdfe4e4:     0               
0xfcdfe4e8:     0xa01           
0xfcdfe4ec:     0               
0xfcdfe4f0:     0               
0xfcdfe4f4:     0xffff0a05      
0xfcdfe4f8:     0x23f10         
0xfcdfe4fc:     0xff3f          
0xfcdfe500:     0xa08          

In this case, this looks like: memcpy(0x87a6008, 0xfcdfe4e4,

That suggests a length of -3, a subtraction, which is probably wrong.
This likely means that the data in the header that we received back were
questionable perhaps. So, if we look at this, in theory we thing the
second argument should be a struct mode page. Let's look at that data:

> 0xfcdfe4e4::whatis
fcdfe4e4 is in [ stack tid=0xf ]
> 0xfcdfe4e4::print struct mode_page
    code = 0
    ps = 0
    length = 0

OK, so there's not much there. Now, the question is can we figure out
where our subtraction went wrong. We'll need to see if we can track down
where we did the alloca in that function, never a great sign. The alloca
buffer will be based on the struct scsi_ms_header size and the page_size
argument, in this case, the struct is 0xc bytes, and the argument is
0xf3 bytes long. This gives us a good old fashion 0xff bytes. Next
question is can we find that.

The function will have already written the mdoe header out to the output
argument. Let's see what that is and if that matches the value that we
found for the element 'hdr'.

> 0xfcdfe730::print struct scsi_ms_header
    ms_header = {
        length = 0
        medium_type = 0x38
        device_specific = 0
        bdesc_length = 0
    ms_descriptor = {
        density_code = 0
        blks_hi = 0
        blks_mid = 0
        blks_lo = 0
        reserved = 0
        blksize_hi = 0
        blksize_mid = 0
        blksize_lo = 0
> uscsi_mode_sense::dis ! less
> <ebp-0x30/K
0xfcdfe628:     fcdfe4e0        
> fcdfe4e0::whatis
fcdfe4e0 is in [ stack tid=0xf ]
> fcdfe4e0::print struct mode_header
    length = 0
    medium_type = 0x38
    device_specific = 0
    bdesc_length = 0

Well, these match, that's good. So it seems like what's happened here
is now somewhat straightforward. If we put in the values we use for the
calculation for the sizeof:

1262                 (void) memcpy(page_data, (caddr_t)pg,
1263                     (hdr->length + sizeof (header->ms_header.length)) -
1264                     (MODE_HEADER_LENGTH + hdr->bdesc_length));

This becomes:

1262                 (void) memcpy(page_data, (caddr_t)pg,
1263                     (0 + 1) -
1264                     (4 + 0));

In other words, -3. So, given that, it makes sense that we ended up
blowing up in memcpy, since we really can't copy all of those bytes.
With that in mind, we now have several more questions to go ask

1. A length of zero is rather suspicious, as that means that we
shouldn't even have the medium type set. (Seems unlikely). So why did we
end up getting it?

2. What's the right way to handle this kind of condition? In this case,
we should error out. We should basically sanity check the values against
our buffer sizes before we do anything else there. Probably a header
returning a length of zero should be thought of as bad.

At this time, I've gone ahead and implemented something that looks at the length of the header we get back and also make sure we won't copy a negative amount of bytes.

To test this fix, we applied it on the system that was generating these events and confirmed that fmd stays up. Secondly, I went through and tested an fmtopo -V on a system before and after the fix and verified that the only changes were due to timestamps and sensor readings that had changed. Everything else was identical, which means that we likely didn't regress other aspects here.



Updated by Electric Monk 9 months ago

  • Status changed from New to Closed

git commit 0e35cc03de180a2fdef6c639e43b43824f85c65b

commit  0e35cc03de180a2fdef6c639e43b43824f85c65b
Author: Robert Mustacchi <>
Date:   2019-06-05T00:47:31.000Z

    10939 libdiskstatus trusts disk mode sense data to its death
    Reviewed by: Jerry Jelinek <>
    Reviewed by: Rob Johnston <>
    Reviewed by: Toomas Soome <>
    Reviewed by: Gergő Doma <>
    Reviewed by: Andy Fiddaman <>
    Approved by: Dan McDonald <>

Also available in: Atom PDF