Bug #13149

elide squeue wake-ups when prudent

Added by Patrick Mooney 10 days ago. Updated 8 days ago.

In Progress
Start date:
Due date:
% Done:


Estimated time:
Gerrit CR:


Upstreaming OS-6344 from SmartOS:

As part of the testing for OS-6150, I looked into the circumstances which resulted in heavy squeue wake-ups once the arbitrary sq_wait delay was removed. During a localhost-targeted ab(1) test, it turns out that a great many of the wake-ups are incurred during the squeue_sync_exit() call at the end of a TCP connect operation. The wake-up is to process a response packet as part of the connect.

A dtrace script can confirm this:

        self->c = (conn_t *)arg0;
        self->c = 0;

        this->s = self->c->conn_sqp;
        if (this->s->sq_first != NULL) {
                /* conn is stored in b_prev, proc in b_queue */
                this->conn = (conn_t *)this->s->sq_first->b_prev;
                this->proc = (void *)this->s->sq_first->b_queue;
                if (self->c == this->conn) {
                        if (this->s->sq_count == 1) {
                                @elide[this->proc] = count();
                        } else {
                                @optimize[this->proc, this->s->sq_count] = count();
                } else {
                        @wake[this->s->sq_count] = count();
        } else {
                @nowake = count();

        printa("No wake-up needed: %@u\n", @nowake);
        printa("Possible to elided wake-up:\n %a %@u\n", @elide);

        printf("Possible to optimized wake-up:\n");
        printa("%a %u %@u\n", @optimize);

        printf("Forced wake-up:\n");
        printa("%u %@u\n", @wake);

Yielding the following results during an ab test run:

No wake-up needed: 168853
Possible to elided wake-up:
 ip`tcp_input_data 161273
Possible to optimized wake-up:
ip`tcp_input_data 3 2
ip`tcp_input_data 2 894
Forced wake-up:
4 8
3 168
1 3345
2 4720

The noted "possibilities" indicate conditions where the first item on the squeue to be processed during the wake-up belongs to the same connection which is synchronously exiting the squeue. In that case, it seems cheaper to immediately process that single unit of data on our way out of the squeue.

Benchmarking data should be able to confirm any delta in cost and/or performance.

Further investigation revealed that the primary worker wake-up culprits during the ab(1) test are tcp_connect() and tcp_close(). The former incurs one during squeue_synch_exit() since the SYN/ACK response for a localhost connection setup is often rapid enough to make it onto the squeue by the time the connecting thread goes to exit the synchronous context. The latter suffers a similar problem, quickly queued response (although to the FIN instead of the SYN) during the non-synchronous SQUEUE_ENTER_ONE() which specifies SQ_NODRAIN.

In both cases, we can attempt to drain a single queued request if it matches the connection being acted upon. With the initial draft in place, I measured sq_count depth when this work processing for wake-up elision was potentially attempted:

        5                1
        4                8
        3             1954
        2            57862
        1          1857500
        0          3940144

With the noted logic in place, performance in the ab(1) test jumps from ~15500 req/s to ~18000 req/s.

This was tested using the same methodology as OS-6150: ab(1) measurement to track the decrease in latency for localhost TCP connections as well as high throughput (via iperf) to check for undesirable side effects. The results were the same, save for the increased req/s as noted above.

Related issues

Related to illumos gate - Bug #13148: do not sq_wait unnecessarilyIn Progress




Updated by Patrick Mooney 10 days ago

  • Related to Bug #13148: do not sq_wait unnecessarily added

Updated by Electric Monk 8 days ago

  • Gerrit CR set to 927

Also available in: Atom PDF