Bug #5026
closedintra-node/inter-zone networking doesn't always deliver SIGPOLL
100%
Description
NOTE: This was seen with OmniOS r151006, which is about a year old. I will update this if I can reproduce it on more modern Illumos code.
Consider a global-zone process using SIGPOLL handling to detect IO on a network socket. ntpd is a good example.
Consider a sending process on a same-machine non-global zone. It is possible, at the very least where both global and non-global zones have vnics over the same aggregation, that the SIGPOLL is never delivered to the process in the global (receiving) zone.
Some verbose annotated DTrace output:
ip`ire_recv_local_v4+0x132 ip`ill_input_short_v4+0x4d6 ip`ip_input_common_v4+0x372 ip`ip_input+0x2b dls`i_dls_link_rx+0x1cd mac`mac_rx_deliver+0x37 mac`mac_rx_soft_ring_process+0x19a mac`mac_rx_srs_proto_fanout+0x29a mac`mac_rx_srs_drain+0x363 mac`mac_rx_srs_process+0x3ce /* RIGHT HERE we leave a non-global zone and enter a global zone! */ mac`mac_tx_send+0x431 mac`mac_tx_soft_ring_process+0x79 mac`mac_tx_aggr_mode+0x7c mac`mac_tx+0xda dld`str_mdata_fastpath_put+0x53 ip`ip_xmit+0x94f ip`ire_send_wire_v4+0x3e9 ip`conn_ip_output+0x190 ip`udp_output_connected+0x139 ip`udp_send+0x7b2 4 | ip_input_local_v4:entry 4 -> ip_csum_hdr 4 <- ip_csum_hdr Returns 0x0 4 -> ip_fanout_v4 4 -> ip_input_cksum_v4 4 -> ip_input_sw_cksum_v4 4 -> ip_input_cksum_pseudo_v4 4 <- ip_input_cksum_pseudo_v4 Returns 0x145d8 4 -> ip_cksum 4 -> ip_ocsum 4 <- ip_ocsum Returns 0xffff 4 <- ip_cksum Returns 0xffff 4 <- ip_input_sw_cksum_v4 Returns 0x1 4 <- ip_input_cksum_v4 Returns 0x1 4 -> ipcl_classify_v4 4 <- ipcl_classify_v4 Returns 0xffffff32cde34540 4 -> udp_input 4 -> ip_find_hdr_v4 4 <- ip_find_hdr_v4 Returns 0x0 4 -> conn_recvancillary_size 4 <- conn_recvancillary_size Returns 0x28 4 -> allocb 4 -> kmem_cache_alloc 4 <- kmem_cache_alloc Returns 0xffffff7530dd1440 4 <- allocb Returns 0xffffffb7db752280 4 -> conn_recvancillary_add 4 -> gethrestime 4 -> pc_gethrestime 4 -> gethrtime 4 -> tsc_gethrtime 4 <- tsc_gethrtime Returns 0x6f4ee913f76957 4 <- gethrtime Returns 0x6f4ee913f76957 4 <- pc_gethrestime Returns 0x75609f00 4 <- gethrestime Returns 0x75609f00 4 <- conn_recvancillary_add Returns 0x28 4 -> udp_ulp_recv 4 -> so_queue_msg 4 -> so_queue_msg_impl /* Message is enqueed, but ... */ 4 -> so_enqueue_msg 4 <- so_enqueue_msg Returns 0xffffffdef28b2060 4 -> so_notify_data 4 -> socket_sendsig 4 -> prfind 4 -> getzoneid 4 <- getzoneid Returns 0x14 /* This function... */ 4 -> prfind_zone 4 -> pid_lookup 4 <- pid_lookup Returns 0xffffff32a5c98420 4 <- prfind_zone Returns 0x0 /* FAILED!!! Because the receiver's pid's zone isn't the sender's. */ 4 <- prfind Returns 0x0 4 <- socket_sendsig Returns 0x0 4 <- so_notify_data Returns 0x0 4 <- so_queue_msg_impl Returns 0xdfb8 4 <- so_queue_msg Returns 0xdfb8 4 <- udp_ulp_recv Returns 0xdfb8 4 <- udp_input Returns 0xdfb8 4 -> cv_broadcast 4 <- cv_broadcast Returns 0x0 4 <- ip_fanout_v4 Returns 0x0 4 <- ip_input_local_v4 Returns 0x0
Files
Updated by Robert Mustacchi almost 9 years ago
- Assignee set to Dan McDonald
- Tags deleted (
needs-triage)
In this case, we have a global zone process/process group that's been set up to receive signals on data. Here, we have a zone on that same machine sending data and it's thread gets co-opted in MAC to turn around and do some data processing for a MAC device that ends up sending packets up to another netstack that isn't the one we're currently in. In general this pattern should be fine; however, the problem is how socket_sendsig tries to look up the proc_t to send something to. Here it's using prfind. prfind implicitly treats the lookup as running in the zone of curproc. And that's exactly the bug.
socket_sendsig cannot operate under the assumption that it's curthread is the correct context to use for authorization. While socket upcalls are processed in the context of a netstack there's no indication that the current thread that is doing the processing in the kernel is in fact, in that zone/netstack.
The sonode actually has a notion of what zone it is in -- the so_zoneid. If we look at all assignments to the so_pgrp, we'll see that we validate that it's assigned in a context where we can actually make this assignment. Therefore, instead of using prfind, we should use prfind_zone and use the zone id in so_zoneid. In general it might actually be okay to use ALL_ZONES, but given that so_zoneid should always properly scope this, even if a global zone socket has ended up in a non-global zone, we should be good to go.
Updated by Dan McDonald almost 9 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 50
- Difficulty changed from Expert to Hard
Reproducing this bug is less complicated than I'd first imagined.
Steps to reproduce:
1.) Create two vnics. Give both a VLAN ID (using -v in "dladm create-vnic").
2.) Assign one to the global zone, the other to a non-global zone.
3.) Run the above test program in the global zone.
4.) Use netcat or something else to send a packet from the non-global-zone to the listener in the global zone.
The global zone process will not see a signal for packets sent from the non-global zone. Use an off-link peer as well sending different datagram sizes, and the program will show how it will only read the non-global-zone packets after it's been interrupted by off-link traffic.
Updated by Dan McDonald almost 9 years ago
Using Robert's analysis, the fix is straightforward. Using my reproduction, the testing and verification is also straightforward.
diff --git a/usr/src/uts/common/fs/sockfs/sockcommon_subr.c b/usr/src/uts/common index 00ab44e..957c8f9 100644 --- a/usr/src/uts/common/fs/sockfs/sockcommon_subr.c +++ b/usr/src/uts/common/fs/sockfs/sockcommon_subr.c @@ -22,6 +22,9 @@ /* * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved. */ +/* + * Copyright 2014, OmniTI Computer Consulting, Inc. All rights reserved. + */ #include <sys/types.h> #include <sys/param.h> @@ -396,7 +399,12 @@ socket_sendsig(struct sonode *so, int event) * the proc is active. */ mutex_enter(&pidlock); - proc = prfind(so->so_pgrp); + /* + * Even if the thread started in another zone, we're receiving + * on behalf of this socket's zone, so find the proc using the + * socket's zone ID. + */ + proc = prfind_zone(so->so_pgrp, so->so_zoneid); if (proc == NULL) { mutex_exit(&pidlock); return; @@ -413,7 +421,12 @@ socket_sendsig(struct sonode *so, int event) pid_t pgrp = -so->so_pgrp; mutex_enter(&pidlock); - proc = pgfind(pgrp); + /* + * Even if the thread started in another zone, we're receiving + * on behalf of this socket's zone, so find the pgrp using the + * socket's zone ID. + */ + proc = pgfind_zone(pgrp, so->so_zoneid); while (proc != NULL) { mutex_enter(&proc->p_lock); socket_sigproc(proc, event);
Updated by Electric Monk almost 9 years ago
- Status changed from In Progress to Closed
- % Done changed from 50 to 100
git commit d690b62cf13675007342e6bb43d4816ec57a46dd
commit d690b62cf13675007342e6bb43d4816ec57a46dd Author: Dan McDonald <danmcd@omniti.com> Date: 2014-07-24T19:04:32.000Z 5026 intra-node/inter-zone networking doesn't always deliver SIGPOLL Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com> Reviewed by: Gordon Ross <gordon.ross@nexenta.com> Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com> Reviewed by: Robert Mustacchi <rm@joyent.com> Approved by: Garrett D'Amore <garrett@damore.org>