-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writes sometimes out of order, not in sequence #1359
Comments
Are you chaining writes across submission boundaries? It seems like you are in the example but this is not supported: https://github.com/axboe/liburing/blob/master/man/io_uring_enter.2#L1615 |
It says right there that “Even if the last SQE in a submission has this flag set, it will still terminate the current chain.” I.e. there’s no semantic difference if we take extra care to not set the link flag for the last SQE in each batch or not. We of course tested that too just to be sure and the bug is reproduced in that case as well. |
ok sure. and to be clear I cant test this today so just going from reading the code: Given you write a small amount at a time, its very unlikely for the pipe to be full, so you almost never actually queue those SQEs onto a background kernel thread (which is what would cause this). I guess you can get this to trigger earlier with bigger writes? or maybe a slower consumer? |
Assertions are disabled with the CPP macro NDEBUG, not optimization flags, so assert is enabled in these programs. We can't wait for all submissions; that would defy the reason to use io_uring as it would make the program process I/O synchronously. (I.e. any other work going on, like reading some other file descriptor, would be blocked en masse.) Re small amount of data: You can try it yourself by changing
Not in our tests so far. The variance of failure point is high: Sometimes we get an out-of-order write after just a few thousand writes and sometimes it takes many 100s of millions of writes, independent of the "write target." For example, we ran qemu with a serial port setup to /dev/null ( |
so I've run it locally. there are at least two bugs in your example. I see this output:
so the chain 1396->1403 was not finished when the 1403->1406 was submitted, and the latter ran first Bug (2) is probably confusing things more. You submit writes on the data array, but you dont wait for it to be completed before clobering the data. I have the following diff which fixes it:
|
After a nearly month-long bug hunt in our codebase we've found that sometimes writes via io_uring complete out of order. This is observed both at the reader end and in io_uring completions on the writer end. We've read the documentation carefully and written many test programs to try to disprove our theory of there being a bug in io_uring.
Attached are two test programs which reproduce the bug:
writeorder.c
takes as as an optional argument a file path (defaults to /dev/null) and then proceeds open(path, O_WRONLY) and write chunks of data in an infinite loop. It verifies that completions are sequenced; that each write() is reported as completing in order relative other writes.writeorder-pipe.c
works likewriteorder.c
but uses a pipe() instead of writing to a file. Additionally it verifies the sequence of written data on the reader end as well.Expected behavior: each write to complete in the order it was submitted.
Actual behavior: sometimes a later write completes before a prior write.
I.e. submissions: A B C D, completions A C B D. Additionally, the
writeorder-pipe.c
program will report that read data is out of sequence as well (so it's not just completions that are reported out of order.)This out-of-order event sometimes happens after 100s of millions of writes and sometimes after just a few thousand writes.
Reproduction:
cc -g -O2 -luring writeorder-pipe.c -o writeorder-pipe && ./writeorder-pipe
Tested with:
The text was updated successfully, but these errors were encountered: