Bug #1673

Panic in conn_ip_output()

Added by Robert Mustacchi over 9 years ago. Updated over 9 years ago.

Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:


panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff00b87477a0 addr=28 occurred in module "ip" due to a NULL pointer dereference

ffffff00b8747900 conn_ip_output+0x4b(ffffff2a49e71ce0, ffffff1c528efb00)
ffffff00b8747930 tcp_send_data+0x84(ffffff1c528eeb80, ffffff2a49e71ce0)
ffffff00b8747960 tcp_ack_timer+0x125(ffffff1c528ee880)
ffffff00b87479a0 tcp_timer_handler+0x3b(ffffff1c528ee880, ffffff1e76ce0d78, ffffff1992140c40, 0)
ffffff00b8747bc0 squeue_drain+0x1f8(ffffff1992140c40, 2, 638481dc554a0)
ffffff00b8747c20 squeue_worker+0x132(ffffff1992140c40)
ffffff00b8747c30 thread_start+8()

In conn_ip_output we're getting a null pointer here:

> <rip::dis 
conn_ip_output+0x20:            movq   (%r14),%r8
conn_ip_output+0x23:            movq   %r8,-0x50(%rbp)
conn_ip_output+0x27:            movq   0x140(%r14),%rbx
conn_ip_output+0x2e:            movq   0x20(%r14),%r15
conn_ip_output+0x32:            testq  $0x1,%r8
conn_ip_output+0x39:            movq   %rdi,%r13
conn_ip_output+0x3c:            je     +0x9     <conn_ip_output+0x47>
conn_ip_output+0x3e:            xorl   %eax,%eax
conn_ip_output+0x40:            movl   %eax,0x104(%r15)
conn_ip_output+0x47:            movl   0x28(%r14),%eax
conn_ip_output+0x4b:            cmpl   %eax,0x28(%r15)

> ffffff1c528efb00::print ip_xmit_attr_t ixa_ire
ixa_ire = 0

Which corresponds to:

 190         if (ire->ire_generation != ixa->ixa_ire_generation) {

Because the value of ire is null. There is an assertion that the ire != NULL earlier on, but since we're not on debug bits we didn't hit that and instead continued on.


Updated by Dan McDonald over 9 years ago

Here's what I can glean thus far:

  • The socket with the relevant TCP session would report ECONNRESET to any process. (chase from tcp_t to conn_t to conn_upper_handle/sonode to so_error).
  • The TCP state is TCPS_BOUND...
  • I thought that the TCP received a RST from the peer, and some cleanup was performed. The obvious candidate is tcp_clean_death(), but the conn_t that panics hasn't been unbound. There IS a running conn_ixa_cleanup from an "ifconfig <foo> unplumb" in the thread list, but that shouldn't affect this connection.

The IP transmission attributes (ixa/ip_xmit_attr_t) indeed has its IRE set to NULL, but it also has its IRE generation number set to 1, aka. IRE_GENERATION_VERIFY. The ixa_cleanup() function does this, but from what I can tell, nobody should call this on an active TCP connection's ixa w/o doing other things (like unlinking it from fanouts).

I'm wondering if ip_xmit_attr() needs to relax the (ixa_ire != NULL) restriction and replace it with (ixa_ire != NULL || ixa_ire_generation == IRE_GENERATION_VERIFY)? A few well-placed changes here would cause the ip_verify_ire() path to run and reset ixa_ire & friends. On the other hand, such a solution may be papering over the real problem with ixa resets without fuller connection teardown.


Updated by Dan McDonald over 9 years ago

  • Assignee set to Dan McDonald
  • % Done changed from 0 to 10

I've found out that there are some TIME_WAIT cases that may cause this panic, but again, the state is in TCPS_BOUND, which is an unnatural state for a closing connection.


Updated by Dan McDonald over 9 years ago

Another user has reported a TCP panic while using kmem_flags=0xf that indicates use-of-freed-data. Use of freed data MAY explain what you're seeing here as well. It's going to be tough going to figure it out, however. Stay tuned...


Updated by Bryan Cantrill over 9 years ago

We have seen this panic now twice in the field, both times from a machine known to be under heavy networking load.

For the panic mentioned here, it seems that tcp_clean_death() has been called for the tcp_t in question (the connection is condemned). Further, from the code there are clearly conditions in which the tcp_t is left in the TCPS_BOUND state (though it is not immediately clear the path that this particular connection took). The question then becomes: why are we doing deferred ACK processing on an otherwise condemned connection? As part of the connection condemnation process, tcp_timers_stop() is called to remove any active times; how can the tcp_ack_timer() be called after this action has been performed? As it turns out, the semantics of tcp_timeout_cancel() are that the timer can fire after the cancel has returned if the system-level timeout (that is, the callout) has already fired and the tcp_timer_t has already been enqueued on the squeue. In this case, tcp_timeout_cancel() returns 1 - but essentially all callers of tcp_timeout_cancel() (including tcp_timers_stop()) do not check this return value and explicitly cast it to void.

If this were the case -- if we were seeing the artifact of enqueueing the timer handling -- we would expect a substantial amount of work to be enqueued on the squeue. And indeed, looking up the stack reveals the mblk_t that represents the squeue (0xffffff1e74f0f088); this queue is 42,707 messages long. Moreover, based on the panic_hrtime (1751034600281951), the expiration value passed to squeue_drain (1750732248536224) and the value of squeue_drain_ns (20000000), we have been processing this squeue for 302 seconds -- over five minutes!

It seems that the fix for this is potentially straightforward: tcp_timeout_cancel() could set tcpt_tid to a timeout value that is known to be invalid; this value could then be checked in tcp_timer_handler() before proceeding with timeout processing.


Updated by Rich Lowe over 9 years ago

  • Status changed from New to Resolved
  • Assignee changed from Dan McDonald to Bryan Cantrill
  • % Done changed from 10 to 100

Resolved in r13551 commit:022a137cd76d

Also available in: Atom PDF