Bug #7863
closedAIO read/write requests return 0 bytes read/written
100%
Description
The following hardware configuration was used:
- 30 hard drives
- SMP system (32 cpu)
- x86_64 arch
On this target im running an application which makes an asynchronous IO (read or write) requests to the hard drives. I'm running one dedicated thread for each drive. In general the thread is doing something like this:
1. Got a new request.
2. Use aio_read() or aio_write() call to make it. The /dev/rdsk/* devices are used to perform the read/write operations. The port functionality is used to get the notification. It is important to use the port notification, otherwise issue will not reproduce.
3. In the handler call aio_error() if there are no errors (0 is returned).
Call aio_return and check its value.
The stress test was made to reproduce the problem: make lots of simultaneous requests to all 30 hard drives at once. After running it for the 30-60 seconds got the error below:
- aio_error() call returns 0
- aio_return() call returns 0, but the expected size is 1024 (the size of my read/write buffer)
This problem is reproduced pretty stable only if i run more than 20 simultaneous requests at once. I couldn't reproduce this issue on virtual machine (VMware, qemu) or on machine with 4 CPUs.
Updated by Dima Stepanov almost 7 years ago
[Investigation]
There are 30 threads which in general making the following steps to handle incoming IO requests:
1. Get the event from the port (port_get()).
2. If it is a read or write request, then use aio_read() or aio_write() calls to handle it.
When AIO request is competed the port is notified, as a result the aio_port_callback() routine is called, the backtrace is something like this:
port_getn()
port_copy_event()
aio_port_callback()
This aio_port_callback() (http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/os/aio.c#3806) routine performs some clean up changes and then calls aio_copyout_result_port() to store the error and return value. When the aio_copyout_result_port() (http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/os/aio_subr.c#1138) routine is executed. It was found out that sometimes the error and return value is really 0 in case of stress test:
errno = 0;
retval = iov->iov_len - bp->b_resid;
Both variables iov->iov_len and bp->b_resid are set to 0. That is correct for b_resid, but not for the iov_len field.
The issue for it is the race between the threads:
1. THREAD1:
1.1. AIO request completed
1.2. Notify thread using port
1.3. Call the aio_port_callback
1.4. mutex_enter(&aiop->aio_mutex);
1.5. Make some cleaning. Call the aio_req_free_port() routine (http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/os/aio_subr.c#573). Which performs the following action:
aiop->aio_free = reqp;
1.6. mutex_exit(&aiop->aio_mutex);
1.7. Call the aio_copyout_result_port() routine to store error/return values from the reqp variable.
2. THREAD2:
2.1. Get a new AIO request.
2.2. Call aio_read or aio_write.
2.3. Waiting on aiop->aio_mutex.
2.4. As soon as it is free get it. Sometimes it is the mutex before calling the aio_req_alloc() routine (http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/os/aio.c#2493). The following lines of code are executed:
if ((reqp = aiop->aio_free) != NULL) {
aiop->aio_free = reqp->aio_req_next;
bzero(reqp, sizeof (*reqp));
} else {
Since aio_free is not NULL (see p1.5) the bzero() call will be made.
So here we got the race condition between storing values from the reqp structure (p1.7) and set to 0 by calling bzero() (p2.5).
Updated by Dima Stepanov almost 7 years ago
[Design]
The idea of the fix is pretty straight forward:
Call the aio_copyout_result_port() routine before the aio_req_free_port() call.
The aio_copyout_result_port() function is called in one place which we are going to fix.
The aio_copyout_result() routine is called from several places (aio.c and aio_subr.c), but it is always called before the aio_req_free() call. So only one place should be fixed.
New rb request is created for review: https://www.illumos.org/rb/r/357/
Updated by Electric Monk almost 7 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
git commit d91fc1a462e1fbca9120dcbb0895574f74826bb3
commit d91fc1a462e1fbca9120dcbb0895574f74826bb3 Author: Dima Stepanov <dstepanov@argotech.io> Date: 2017-02-23T18:02:58.000Z 7863 AIO read/write requests return 0 bytes read/written Reviewed by: Igor Kozhukhov <igork@argotech.io> Reviewed by: Marcel Telka <marcel@telka.sk> Reviewed by: Dan McDonald <danmcd@omniti.com> Approved by: Robert Mustacchi <rm@joyent.com>